“Cost-effectiveness analysis can skirt life valuation by relying instead on the premise that we want our limited resources to achieve maximal benefits (which may be set in units that we prefer not to value monetarily).”

[Thompson and Fortress, 1980, p. 555[1]]

“No definitive rules determine when the efficiency of a program is sufficient to justify its adoption.”

[Kaplan and Bush, 1982, p. 74[2]]

Cost-effectiveness analysis first emerged in the US in the mid-20th century, as a tool to directly inform Congress on efficient procurement for public works.[3] Although some similar methods emerged during the 1960s,[4] cost-effectiveness analysis did not establish itself in healthcare until the mid 1970s.[57] Applications of cost-effectiveness analysis within the US healthcare system grew over the 1990s and remains strong.[810] In the US, most cost-effectiveness analyses in health are not conducted for government because, unlike in other countries, such analyses are not imposed by fiat,[11] and the perspective of cost-effectiveness analysis reflects the decentralized and highly privatized structure of the US healthcare system.[1214] Despite calls to reconsider the role of cost and cost effectiveness in the major governmental healthcare programmes,[15] policy makers remain reluctant to adopt it due to perennial dislike of rationing from a broad range of constituencies.[12,16]

Even if cost-effectiveness analysis remains an academic enterprise in the US, the validity of the approach is greatly undermined by continued reference to a threshold that is now quite dated. The initial purpose of this paper was to reconsider the use of the $US50 000 cost-effectiveness analysis threshold, with a view to either abandon or replace it. We quickly realized there was a wealth of academic arguments against the $US50 000 threshold, and a literature was already emerging on the possible updating of the threshold. Based on our review of this literature, we were quickly drawn to a more fundamental question — should we have a fixed cost-effectiveness threshold given the many different types of health insurance formats in the US and given that many of the cost-effectiveness analyses conducted in the US are for academic purposes.

1. Returning to the Fundamentals of Evaluation

Early proponents of cost-effectiveness analysis claimed that it circumvented the complexities of placing a value on a life,[17] arguing that it focused attention on the question “How may we most effectively spend money to extend lives?”[1] To achieve this, a complete assessment of the healthcare budget, or what we can refer to as resource allocation, was required. The application of the resource allocation problem involves considering all interventions and the aggregate maximization of outcomes, previously life-years, but now more commonly QALYs, are determined by the consideration of all possible allocations given a fixed budget. Applying this method, the threshold (also known in this context as the shadow price or lambda) is identified endogenously.[18]

The practical reality is that most applications of cost-effectiveness analysis have failed to assess the allocation of a budget across all interventions, preferring to take a piecemeal, programme evaluation approach. The result is a series of comparative evaluations of a small number of interventions (usually only one or two) that are viewed as alternatives for a specific medical condition.[19] There are a number of political benefits to the programme evaluation approach: it is infinitely simpler, allows evaluators to focus only on new (costly) interventions and avoids the politically problematic issue of disinvestment (that is, the cancelling of a programme that has become obsolete).[20] The programme evaluation approach does have one significant downfall; it requires the valuation of things that we might “prefer not to value monetarily”[1] as it requires that the researcher or decision maker define a threshold of acceptability. In the absence of any real science to consider what an appropriate threshold might be in the US, a $US50 000 per QALY rule of thumb emerged.

The $US50 000 per QALY threshold is still commonly applied in the US and remarkably it is justified in any number of ways in the literature. A quick review of applications of cost-effectiveness analysis in the US identified researchers “assuming a $50 000/QALY threshold”[21] so an intervention can be “considered cost-effective.”[22] Other researchers argue that an intervention is cost effective when it is “comparable to that of haemodialysis (∼$50 000 per QALY gained), an oft-cited benchmark for cost-effectiveness,”[23] while others claim “a willingness-to-pay threshold of $50 000/QALY,”[24] often as part of a range, e.g. “willingness to pay (WTP) of $50 000 and $100 000 per quality-adjusted life-year (QALY) gained.”[25] Surprisingly, the threshold has been cited irrespective of the outcome, “50 000 US dollars/life year or quality-adjusted life year.”[26] Most often, many clinical applications use the threshold on the basis of it being “commonly cited”[27] or “commonly used.”[28]

2. The $US50 000 Threshold — Where did it Come from?

While the foundations of the $US50 000 threshold “may never be known” (D. Neuhauser, personal communication), Laufer[29] identifies two possible histories: the dialysis standard (which suggests that if society or payers are willing to pay $US50 000 per QALY for dialysis, they should be willing to pay $US50 000 per QALY for other interventions) and the guideline approach where explicit benchmarks and decision rules are set.[30,31] While alternate histories of the threshold have been made,[32] Laufer[29] correctly points out that neither of these traditions actually constitutes a formal justification of the $US50 000 threshold.

We find that both of these traditions can be traced to a single source — a pioneering paper by Kaplan and Bush[2] — but the story does not end there. While Kaplan and Bush[2] do indeed refer to a dialysis standard of $US50 000, they fail to reference their source for the calculation. More recently Hirth et al.[30] presents an interesting discussion of the dialysis standard (often being incorrectly cited as its source) but he, too, fails to reveal an accurate source for the calculation. A recent review by Grosse[32] confirms that the dialysis standard foundation of the threshold is myth, concluding that the “appeal of the $50 000 figure appears to lie in the convenience of a round number rather than in the value of renal dialysis.”

The foundation of the guideline approach to the threshold — often incorrectly accredited to Laupacis[33] — has an equally obscure origin. In the original source of the guidelines, Kaplan and Bush[2] argue for a range of acceptable cost-effectiveness ratios, from $US20 000 to $US100 000, and not a single threshold of acceptance. The range of ‘acceptable’ cost-effectiveness ratios (see table I) would depend on other factors, including available alternatives, yet a threshold of less than $US20 000 would almost always be acceptable and a threshold of greater than $US100 000 would rarely be acceptable.

Table I
figure Tab1

Kaplan and Bush’s guidelines[2] in $US, year 2008 values

In table I we also present the Bush and Kaplan[2] guidelines in present day prices, indicating that if these were to be applied today, there would be a large gray zone for decision making — between $US52 142 and $US260 708. This is within the ballpark of others who have attempted to update the threshold for inflation,[34] to estimate a new threshold from implied decision making[35] or a re-calculation of the so-called dialysis standard.[36] This range approach is still very much embraced in the literature, with more recent scholars arguing for a “soft threshold with a reasonably well-defined lower and upper boundary, allowing for considerations of uncertainty, equity, or context of treatment.”[31]

3. The $US50 000 Threshold — Time for It to Go

In considering the case of either keeping or abandoning the $US50 000 threshold (figure 1), it is clear that no debate is needed as all the arguments are one sided — all support abandoning the iconic threshold. Even if one debates the foundations of the threshold, the practical application of a $US50 000 limit in the threshold dates back to at least 1992,[32] and as such is out of date. Furthermore, it is now clear that the creation of the threshold is not related to dialysis, nor is there any study suggesting that society has a consistent valuation of a QALY in the US, and hence, $US50 000 is not a willingness-to-pay measure. Indeed, based on our review of the literature, we argue that a threshold of $US50 000 is not scientifically justifiable in any way, and its use must be abandoned.

Fig. 1
figure 1

The case for abandoning the $US50 000 threshold.

What does abandoning the $US50 000 threshold mean in reality? It is questionable if healthcare payers explicitly use a threshold (or even cost-effectiveness analysis data at all) in policy making, so why is it important to abandon it at all? While one could take the academic high road and argue that a baseless figure has no role in drawing scientific conclusions, the use (or rather abuse) of the $US50 000 threshold has more pragmatic consequences. First, it effects clinical practice directly (as clinicians read such findings in their clinical journal) and via systematic review and/or clinical guidelines. Second, while payers might not use an explicit threshold, the $US50 000 threshold and clinical guidelines/reviews that are based on the threshold could lead to rationing by proxy.

If payers do want to address issues of cost effectiveness (or are forced to by some future healthcare reform in the US), then we argue that there are two paths forward: search for an alternative, unique threshold for the US, or accept that establishment of unique thresholds is inconsistent with the theory and practice of cost-effectiveness analysis, and fully embrace a world where thresholds are variable and, as such, need to be customized for each setting.

4. The Argument for Finding a New Unique Threshold

We have identified several arguments in support of a having a unique threshold in the literature (figure 2). These revolve around the intuitive appeal of a single threshold, its ability to correct equity concerns in healthcare financing, there being no feasible alternative and the need for transparency.

Fig. 2
figure 2

The case for a unique vs non-unique threshold.

4.1 Intuitive Appeal

As has been well noted in the literature, the best argument for a unique threshold is that it is intuitively appealing to policy makers.[37] It allows for a definitive decision for societal decision making, i.e. which technologies should be adopted and which must be rejected.[11] While this might hold for single payer systems, it might be unrealistic that every payer in the US would agree on a single threshold, and certainly undermines the competitive nature of the healthcare system — i.e. where third-party payers need to offer competitive coverage to maintain members. Furthermore, if a unique threshold is chosen based on its intuitive appeal, researchers should be explicit in this assumption — referencing their results as normative and not objectively scientific.

4.2 Equity Concerns

Given that different payers have different capacities to pay, a decision-making policy using a single threshold might promote equity and fairness in the system. In the UK, it is claimed that the application of cost effectiveness might lead to more uniform provision of technologies[38] and as a consequence avoid ‘postcode prescribing’.[11] However, there is little evidence that cost-effectiveness analysis prevents variation in practice, especially when central agencies fail to support decisions based on cost-effectiveness analysis with sufficient funding required to support national adoption. Again, while it might be plausible that a uniform threshold might promote equity elsewhere, it might not work in the US where there are so many uninsured people. Establishing treatment guidelines based on the results of cost-effectiveness analysis in order to promote equity in coverage would not benefit the uninsured. The creation of such guidelines could also act as a barrier to coverage for the uninsured in the US if it lead to decreased price competition among providers or served as a barrier to the creation of low-cost insurance alternatives.

4.3 No Feasible Alternative

Given that estimating the true threshold (λ) would require a full resource allocation approach in evaluating all interventions simultaneously, and this is not feasible, then any estimate of the threshold is going to be poorly estimated and surrounded by great uncertainty. As such, a fixed threshold might be as good as — or no worse than — any other method to estimate the acceptability of the cost-effectiveness ratio. However, if it is true that our threshold is highly uncertain, then this uncertainty must be incorporated into the interpretation of cost-effectiveness analysis, especially in any hypothesis testing. Acceptability curves might allow a vehicle for this, but not necessarily in a completely transparent and intuitive manner. If a threshold is chosen in a sea of uncertainty, then again this must be stated in each and every analysis to aid the interpretation of the results.

4.4 The Need for Transparency

Some have argued that the application of a fixed threshold within the application of cost effectiveness is beneficial to society, in that it leads to a consistent and transparent application of cost-effectiveness analyses, and it clarifies the ‘rules of the game’ so that manufacturers are aware of the yardstick against which any new technology will be measured, and can factor this information into their decision making.[11] As such, industry can be viewed as having a love-hate relationship with standardized rules of the game; on the one hand, pharmaceutical manufacturers enjoy the knowledge of what conditions must be met but, on the other hand, they dislike the fact that the excessive rigidity of the approach does not handle exceptions well. That said, this argument suggests that it is beneficial to find a single threshold or an alternative decision rule, and it is not beneficial to select an arbitrary one.[39] The blind acceptance of the $US50 000 threshold, especially among researchers, has lead to a paucity of research on determining appropriate thresholds. Even among those researchers who have studied the value of the QALY, there has been needless baggage associated with the $US50 000 threshold, rather than a de novo enquiry into the matter.[36]

5. The Argument Against a Unique Threshold

While there has been a great deal of discussion in recent years about the need to update the threshold, less has been focused on the validity of having a fixed threshold at all (figure 2). Here we identify several arguments against having a unique threshold and for having non-unique thresholds. We find that arguments fall into four categories: the imperfections of the piecemeal evaluation approach, the need to correct impractical assumptions made in the application of cost-effectiveness analysis, demand-side variations and supply-side variations.

5.1 Piecemeal Evaluation

As discussed above, the notion of a threshold is grounded in resource allocation where a fixed budget is distributed across all possible interventions and where a threshold (λ or the shadow price) is solved for as part of — or more accurately is a by-product of — the maximization process. It is unfortunate that in practice this approach is never really used, rather interventions are evaluated in a piecemeal way. Comparing the problem to economic analyses, the piecemeal approach is a partial equilibrium and the resource allocation problem is a general equilibrium — albeit one that ignores all non-health sectors of the economy.[40] It has been well established that in the presence of market failures, partial equilibrium presents a corrupt vehicle for welfare analysis[41] unless results are adjusted in order to be consistent with the principles of general equilibrium. There are two paths forward to avoid these problems: return to the formal programme evaluation problem or correct our thresholds for imperfections in other markets. We argue that within the context of the multitude of different decision makers in the US, only the latter is a possible alternative.

5.2 Impractical Assumptions

Many of the assumptions of cost effectiveness are unrealistic, and when one loosens these assumptions for practical purposes, a single threshold becomes rather impractical. For example, the attainment of efficiency in cost effectiveness requires perfect divisibility of programmes and constant returns to scale on all programmes, whereby a programme may be partially implemented with the same cost effectiveness associated with full programme implementation, but clearly this not the case. As such, the inability to partially adopt a programme leads to variation in the threshold.[5,42] Another assumption inherent in cost-effectiveness analysis is that the benefits estimate, usually QALYs, adequately captures all the necessary benefits. Many of the objectives in the healthcare sector are not adequately captured in such measures,[43] nor do they account for legal, historical or environmental factors. These imbalances in benefits estimation have already produced ad hoc reconsiderations of thresholds. For example, many so-called lifestyle drugs are compared with a much lower threshold, while orphan drugs are often compared with a much higher threshold. Many other corrections are needed, but often ignored, in interpreting the results of cost-effectiveness analysis, and currently the only perceivable way to do this is to vary the threshold of acceptance.

5.3 Demand-Side Variation

One of the most basic premises in economics is that preferences vary (if everyone’s opportunity cost for goods were identical, then there would be no benefits from trade). This needs to be accounted for in the assessment of benefits, at both an individual and a population subgroup level,[4448] and when societal preferences are used, such variation needs to be accounted for by variations in the threshold. Characteristics of the target population matter also for societal valuations of benefits. For example, under the ‘fair-innings’ principle,[49] all individuals are entitled to equal life expectancy, implying the explicit favouring (i.e. differentiating thresholds) for programmes aimed at the disadvantaged (e.g. due to age, disease characteristics, or socio-economic status). Demand-side variations are especially important given the size and complexity of the US healthcare system, and variations in demand have to be accommodated through variations in the threshold.

5.4 Supply-Side Variation

The large number and diversity of payers in the US healthcare marketplace suggest that there are also many supply-side characteristics that need to be accounted for in the assessment of cost effectiveness and the selection of a threshold for an acceptable cost-effectiveness ratio. Already, we have spoken of the different types of healthcare plans and the need for competition (both in terms of premium, selective contracting and coverage). It is inappropriate to think that all forms of health insurance in the US should make the same coverage decision, implicitly meaning that each insurer needs to select their own threshold (or even different thresholds for different types of care). Even if one ignored the variations across insurers, there are major variations in the US in terms of the costs of care. For example, some regions need to pay very high costs for labour, but reap major benefits from economies of scale. In rural areas, the opposite might be true. This of course would distort the cost of services (as reflected by regional variation in the Federal prospective payments system), and subsequently the cost effectiveness of services. Unless one moves to conducting a cost-effectiveness analysis for each state or region in the US, then one needs to differentially interpret the results of a cost-effectiveness analysis.

6. Weighing the Case For and Against a Unique Cost-per-QALY Threshold in the US

The US has a unique and complex healthcare system with many types of providers operating in many different legal jurisdictions. While we are certainly likely to see a movement towards a more European-styled healthcare system, the US will remain very different from the single provider (e.g. the UK) or the single payer (e.g. pharmaceuticals in Australia) systems that rely most on strict policy decision based on cost-effectiveness analysis. Obviously there is benefit to the US system from having some data on the cost effectiveness of programmes, but more care needs to be taken regarding the interpretation of these data. In weighing the arguments for and against a unique threshold, we accept that a fixed threshold might be beneficial in some countries, just not in the US.

7. Categorizing Sources of Variation in the Threshold

If we reject the notion of a universal threshold, but accept that cost-effectiveness ratios must be compared with something, then methods for customizing thresholds need to be found. In this section, we offer a conceptualization of potential sources of threshold variation (illustrated in figure 3). We argue that thresholds should vary for four key reasons: (i) variations across payers; (ii) variations across time; (iii) budgetary impact; (iv) effectiveness measurement.

Fig. 3
figure 3

Sources in variation in the cost-per-QALY threshold. QOL = quality of life.

7.1 Thresholds Vary Across Payers

Given the significant variation in the types and nature of third-party payers in the US, this must translate into variations in acceptable thresholds. Payers face differences in costs, differences in the services that they cover and/or are available to them, major differences in patient costs, and even differences in the premiums they charge (and consequently the budget that is available to them).

7.2 Thresholds Vary Across Time

While we have rejected the notion of merely updating the $US50 000 threshold for inflation, it is important to note that time does matter. In addition to inflation, thresholds need to change because of technology and quality improvement, changes in demographics and other changes in clinical need.

7.3 Budgetary Impact Varies

While cost-effectiveness analysis attempts to compare costs and benefits, the included costs may not completely reflect the actual costs faced by a payer. As such, a payer might need to adjust the threshold to account for bias in the cost-effectiveness estimates. This is because projects are lumpy (so there might be local economies or diseconomies of scale), firms can negotiate their prices, plan costs differ from societal costs and there is a difference between average and marginal costs.

7.4 Effectiveness Measurement Varies

These differences in both the process of evaluating the threshold and the point estimates for values across regions translate into different valuations for human health and incremental changes in human health status. In addition to differences in the cost of available goods and services, there are regional differences in demographics and income that lead to variation in preferences for health-related purchases. Individuals’ ranking of perceived needs, including the need for incremental improvements in health status, will also vary across regions.

8. Conclusions

For some, arguing against a threshold such as $US50 000 per QALY is like arguing against the notion that the world is flat — it should be an argument that is easily won. Why then do many, especially in the US, still reference it as an accepted truth? If thresholds for cost-effectiveness analysis are to be utilized, it is critical that adequate attention is given to justify the number chosen and, more importantly, the assumptions and limitations of that number. Internationally, only a limited number of government agencies have been explicit with their thresholds or acceptable ranges of cost effectiveness[50] and even fewer offer a scientific justification for why their threshold was chosen.

The benefits of a unique threshold include its intuitive appeal and transparency, which provide the developer and manufacturer of a new technology with an understanding of what is required for achieving reimbursement. However, a unique threshold imposes impractical assumptions and does not account for supply- and demand-side variations. Based on our review of the arguments for and against having a single threshold, we argue that it is necessary for the US researchers to adopt a variable threshold approach to cost-effectiveness analysis. This would require the development of a series of empirically driven thresholds, recognizing that regional, dynamic, budgeting and methodological issues will impact the willingness to pay for a new technology, the value of a QALY, and, therefore, the optimal cost-per-QALY threshold.

Medical decision making does and should recognize that each formulary decision is unique and involves unique treatment alternatives, health outcomes, patient populations and preferences. This heterogeneity should influence not only the ‘value for money’ estimates of novel treatments but also the inherent acceptability of the cost-per-QALY that is paid. Until an improved approach to the analysis of healthcare decision making can be found, the imperfections of cost-effectiveness analysis will remain and with it the need for a variable cost-per-QALY threshold. That threshold should reflect not only on the specific decision maker but also the context in which the specific decision is being made.