Introduction

The EQ-5D is one of the most widely used preference-based instruments. In 2009, the EuroQol Group released a new version (EQ-5D-5L) of the instrument that included five levels of severity in each dimension, as opposed to three in the original version [1]. For the new instrument to generate a set of societal values for the 3,125 health states, it had to distinguish five levels of severity in five dimensions.

Previous valuation studies had predominantly used time trade-off (TTO) to obtain social preferences from which value sets for EQ-5D health states could be modeled [25]. However, increasing the number of health states from 243 to 3,125 made it considerably more costly and complicated to conduct valuation studies based on an interview method such as TTO. Conventional TTO also has problems with health states worse than the state ‘dead’ [6]. These issues led the EuroQol Group to explore new approaches to obtain social values for health states, notably discrete choice (DC) methodology.

In a typical DC task, respondents compare two different options (paired comparison) and indicate which one they prefer. Discrete choice experiments (DCE) have been used extensively in areas such as marketing and transport but not so much in health economics. The use of DCE for health-state valuation is a relatively recent development. Potential advantages include the relative ease of comprehension and administration of ordinal tasks and its greater reliability. DC models may also avoid some of the biases associated with traditional valuation methods [7]. Stolk et al. [8] demonstrated that DC modeling with the classic EQ-5D (three-level) instrument produces values that are congruent with values obtained by other valuation techniques, TTO in particular. That result confirmed previously published findings [912].

A question that arises about the use of DC for health-state valuation concerns how to anchor the values produced by the choice model onto the dead (0)—full health (1) scale that is required to compute quality-adjusted life years. One strategy is to use DC data in combination with TTO data. This would entail deriving values from DC data and then using values from TTO to rescale those DC values. The need to collect TTO data alongside a DC study, however, might make the valuation study more complex than necessary. So, instead, the DC task could be designed in such a way that a value for ‘dead’ can be extracted from the DC responses and then used to anchor the values. One way to do this is by explicitly comparing the health state ‘dead’ to the EQ-5D-5L health states that are being judged in the DC task. An objection on theoretical grounds is that responses obtained from choices comparing heath states to dead may violate the random utility theory underlying the DC model. This happens when a subset of respondents consider all health states to be better than dead—for example, due to their religious beliefs. The size and effect of the bias are yet unknown; in practice, the bias may be small. Indeed, when this approach was adopted for the valuation of EQ-5D-3L health states [8], the results were promising. Whether or not this will also be so when it is used for EQ-5D-5L valuation will be expanded upon in this paper.

The primary objective of the study reported here was to examine the results of two different approaches to rescale DC models incorporating ‘dead’ into the utility scale as an anchor point and to compare the results with those obtained anchoring on lead-time TTO. A secondary objective was to evaluate the effect of excluding DC responses elicited from those who did not consider any health state to be worse than the health state dead.

Methods

This pilot study used both a DC and a lead-time trade-off (lead-time TTO) approach to produce values for the set of 3,125 (55) health states defined by the EQ-5D-5L instrument. As a detailed description of each approach in the context of health-state valuation can be found elsewhere [8, 13], only a brief summary will suffice here. The study design followed recommendations from the EuroQol Group Valuation Task Force and was part of a multi-country initiative to explore methodological uncertainties about the valuation protocol for a new EQ-5D-5L value set.

Valuation of EQ-5D-5L health states

DC method

In the DC method, the respondents were asked to state their preference between two health states, A and B. This comparison of health states produces data that were subsequently analyzed to produce values on a latent scale. The profiles did not mention either their duration or what happens after these states. The DC design was generated using a Bayesian efficient approach [14] and consisted of 50 pairs of health states allocated to five blocks. These amounts were set in order to have sufficient power to estimate health-state values based on the proportions of choices between the pairs of states. To allow anchoring of the values on the ‘dead—full health’ scale, we extended the DC task by asking whether state A was worse than dead (WTD) and whether state B was WTD.

Lead-time TTO

The lead-time TTO method is an extension of the traditional TTO [13]. In a classic TTO, participants complete one task for health states considered better than dead and another task for those considered WTD. Lead-time TTO consists of a single task: to choose between Life A (T years in full health) and Life B [10 years in full health (lead time) plus 5 years in a target health state (disease time)]. All respondents start with Life A versus Life B where T = 15 years in 11111; depending on whether they choose A or B, the value of T is raised or lowered until the participants feel that A and B are the same. The lead-time TTO design was constructed with a Federov algorithm that allowed model parameters to be estimated without bias and with minimal variance [15]. The final lead-time TTO design contained 100 states in ten blocks.

Data collection

Four hundred persons, who were representative of the Spanish population in terms of age, gender, and education, took part in this study. An online survey administered via the EuroQol Valuation Technology (EQ-VT) software was used to collect DC and lead-time TTO responses. The final survey included the EQ-5D-5L questionnaire, ten DC tasks, and five lead-time TTO tasks as well as demographic questions. Participants were also queried about the difficulty of the DC and lead-time TTO tasks and how well they had understood them. The EQ-VT randomly assigned each participant to a DC block and a lead-time TTO block. In both types of block, the tasks were presented in random order. Given the number of participants, the study yielded an average of 80 observations for each DC pair (400 participants × 10 states/50 pairs) and 20 observations for each lead-time TTO state (400 participants × 5 states/100 states).

A survey company administered the study in Barcelona (June 2011). The researchers JMRG, ME, MH, and JC supervised the data collection with assistance from the EuroQol Group. Participants were recruited using telephone directories for the metropolitan area of Barcelona, personal contacts, a database of panelists, or ‘snowballing’ from contacts of participants included in this study.

Eight groups, each with an average of ten respondents, were recruited per day during 6 days, yielding the target of 400 participants. Each participant was assigned a computer and given an ID number and a password. Two computer rooms were available for each session. Interviews were conducted by two trained interviewers and four members of the Spanish Valuation Team (JMRG, ME, MH, and JC).

Statistical analysis

The sample as well as the DC and lead-time TTO responses were described with descriptive statistics. Four statistical models were used to estimate EQ-5D value sets: (1) a conditional logistic model, which produced the health-state values based only on choices between health states, thus ignoring responses to the dead questions (N = 397; henceforth DCTTO; (2) a rank-ordered logistic model, which was then used on the full DC dataset and included responses to the dead questions (N = 397, henceforth DCdead); (3) a rank-ordered logistic model, which used data only on those participants who chose at least one state worse than dead (N = 195, henceforth DCWTD); a linear regression model, which used the lead-time TTO responses (N = 373; henceforth called lead-time TTO). The three models that were estimated with DC responses had to be rescaled to indicate that 0 stands for dead and that 1 forms the upper bound for full health. This was achieved using the additional ‘dead’ questions in the DC experiments in the case of DCdead and DCWTD. For the DCTTO model, the worst health state predicted on the lead-time TTO model (profile 55555) was taken as an anchor point to rescale the arbitrary scale of the conditional logistic model. Details on each model are given below.

DCTTO model

In the case of DC, the values are not directly observable and have to be calculated from the responses to the choice exercise. We assume that the participants choose the health state that gives them higher utility, so this can be modeled as a conditional logistic model. As such, the independent variable Y I represents the choice of participant I between A or B. The model assumes a value decomposition in two parts, explainable by V iA plus an error ε i . If errors are assumed to be random and to show a type 1 extreme value distribution, a conditional logistic model emerges [8, 16, 17]. Let us assume that component V of the value can be explained with an additive model:

$$ V_{\text{iA}} = \mathop \sum \limits_{j = 1}^{J} X_{\text{iAj}} \cdot \beta_{j} $$
(1)

where X iAj are 20 dummies {0, 1}, per participant i, representing the severity levels for each dimension of EQ-5D-5L for state A. Then β j will represent the coefficient for each independent variable j.

Accordingly, it is possible to estimate the coefficients of the model and thus to extrapolate values that have not been observed within the population by using the linear part of the DCTTO model. The values obtained from the linear part of the model shown above are on an arbitrary scale. In order to rescale the values from the DCTTO model, the extreme negative value estimated in the lead-time TTO model (55555) was used to anchor the DCTTO 55555 health state to that value. Therefore, both models produce the same index value for the 55555 health state. To obtain a full set of utility decrements, every coefficient of the DC model is divided by the scalar (55555lead-time TTO − 1)/(55555DCTTO − 1). The outcome of this transformation for each coefficient yields the utility decrements for the DCTTO model.

DCdead model

A rank-order logistic analysis was performed for the DCdead model [8]. In the same way as for a conditional logistic model, a two-part decomposition is assumed for the value. Where V iA, this model can be written as follows:

$$ V_{\text{iA}} = \mathop \sum \limits_{j = 1}^{20} X_{\text{iAj}} \cdot\beta_{j} + X_{{i {\text{dead}}}} \cdot\beta_{\text{dead}} $$
(2)

Values are therefore obtained from the linear part (above) of the model on an arbitrary scale, as they are in the DCTTO model. For this DCdead model, the anchor point is the health state dead. Since the value for dead has to be 0, each coefficient is divided by \( \beta_{\text{death}} . \) ensuring \( \beta_{\text{death}}^{\prime } \) = −1. The final function to estimate index values is given by:

$$ V_{\text{iA}} = 1 - \mathop \sum \limits_{j = 1}^{20} X_{\text{iAj}} \cdot\beta^{'}_{j} + X_{{i {\text{dead}}}} \cdot\beta^{'}_{\text{dead}} $$
(3)

where \( \beta_{j}^{\prime } = {\raise0.7ex\hbox{${\beta_{j} }$} \!\mathord{\left/ {\vphantom {{\beta_{j} } {abs(\beta_{\text{dead}} )}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${abs(\beta_{\text{dead}} )}$}} \).

DCWTD model

The DCWTD model was estimated as a rank-order logistic model similar to the DCdead model. For this case, the data were restricted to responses from participants who chose at least one state worse than dead. This model was used to evaluate whether including participants who did not choose any state worse than dead would bias the coefficient estimates.

Lead-time TTO model

For lead-time TTO responses, a linear model was estimated. The specification of the model in its general form is:

$$ Y_{i} = \mathop \sum \limits_{j = 1}^{n} x_{ij} \cdot \beta_{j} + \varepsilon_{i} $$
(4)

where Y i represents the observed values from lead-time TTO data for participant i. A continuous variable, which takes values between −2 and 1, was created. The lead-time TTO values T from the survey were transformed into a −2 and 1 scale using the formula (TT_lead)/(T_total − T_lead). In our design, T_lead = 10 indicates that the additional years in full health occur at the beginning of the exercise, and T_total = 15 indicates the sum of T_lead and disease time (5 years). The independent variables X ij are 20 dummies {0, 1} for each participant i, representing the severity levels for each dimension of EQ-5D-5L. β j represents the coefficients for each independent variable j; ε i represents the errors for each participant i. Different specifications used in previously published examples were explored in order to fit the best model [25]. However, none of the models led to improved goodness of fit measured with log-likelihood, nor did they correct any inconsistencies in the models’ coefficients. Therefore, the lead-time TTO model presented in this study was estimated using a simple ordinary least squares model. Finally, a function to estimate values for each health state was created using the regression model specified in the following equation:

$$ Y_{i} = 1 - (\beta_{0} + \beta_{1} \cdot {\text{mo}}2_{i} + \beta_{2} \cdot {\text{mo}}3_{i} + \beta_{3} \cdot {\text{mo}}4_{i} + \beta_{4} \cdot {\text{mo}}5_{i} + \cdots + \beta_{20} \cdot {\text{ad}}5_{i} + \varepsilon_{i} ) $$
(5)

with mo2, mo3, mo4, mo5, sc2, sc3…, ad4, and ad5 indicating the corresponding dummy for the EQ-5D-5L severity level.

To compare the four models, we used descriptive statistics and quantile–quantile plots (Q-Q plots) of the value sets obtained from the different models. A Q-Q plot sets off estimates of the quantiles of two distributions against each other, and the pattern of points it displays is used to compare the two distributions of value sets. In addition, the value sets produced for each model are compared using the mean square difference (MSD) and concordance correlation coefficient (CCC) [18]. All values for the 3,125 health states are estimated by each of the estimated models. For each one:one comparison (model 1 vs. model 2), the MSD is calculated as follows:

$$ {\text{MSD}}_{{\bmod {\text{el}}1 {\text{vs}} \bmod {\text{el}}2}} = \frac{{\mathop \sum \nolimits_{i = 1}^{3,125} ({\text{indexvalue}}_{{\bmod {\text{el}}1_{i} }} - {\text{indexvalue}}_{{\bmod {\text{el}}2_{i} }} )^{2} }}{3,125} $$
(6)

All statistical analyses were performed on STATA 11 MP (StataCorp LP, College Station, TX).

Results

Sample characteristics

The study cohort comprised 400 persons with a mean age (standard deviation, SD) of 44.1 (16.9) years; and 59.7 % (239) were male (Table 1). More than half were employed or freelance and 15 % were retired. Less than half (43.75 %; 175) were in full health (11111). Few reported extreme or severe problems in any dimension of the EQ-5D-5L (three was the maximum number of respondents reporting extreme problems in the ‘usual activities’ dimension; see Table 2).

Table 1 Descriptive statistics of study sample (N = 400)
Table 2 Distribution of EQ-5D-5L responses across participants

Descriptive statistics

The DC responses were 61.7 % for state A and 38.3 % for state B. Reflecting differences in the impact of dimensions and levels on health status, not all choices followed the misery index (sum of the levels across domains) order. For example, the observed probability for choosing state 55534 over state 33355 was 0.852. Only 2.4 % of all respondents thought that state 55534 was WTD and 14.81 % thought that 33355 was WTD (Table 3). Some inconsistencies were observed in the estimated lead-time TTO valuations. For example, health state 55253 had a lower mean value (−0.4) than health state 55255 (−0.147) (P = 0.0004), even though the latter clearly dominates in term of severity of the five health domains (Table 4). A total of 195 (48.75 %) participants using DC and 216 (54 %) using lead-time TTO rated at least one state as WTD.

Table 3 Discrete choice responses for the 50 paired scenarios included in the valuation exercise
Table 4 Mean lead-time trade-off values and percentage of values WTD for the health states included in the valuation exercise

Models

For the estimation of the three DC models, we omitted two respondents from the analysis because their DC choices were always A or always B; the 328 responses without a logical order among state A, state B, and dead were also omitted. For the lead-time TTO model, it was necessary to clean the dataset for inconsistencies. In this case 24 respondents with the same value for all TTO tasks were excluded from the analysis, as were two respondents for whom data were missing due to technical problems.

Several model specifications were explored. However, only main effects models are presented here. The others did not perform better in terms of having fewer inconsistencies or maximizing the likelihood function. In order to allow comparison among the models’ coefficients, we present here the rescaled coefficients for the three final DC models. The DCWTD model has the highest likelihood value (−1,401.549), but DCTTO performs better than DCdead (−1,791.37 vs. −2,700.25 respectively) (Table 5).

Table 5 Parameter estimates for the modelsa based on data derived by discrete choice and lead-time trade-off values

Regarding the rescaling method for DC models, the value for 55555 was estimated with a lead-time TTO model to be −0.535. This value was used to anchor the DCTTO model, which previously had a value of −5.491 for state 55555. The ratio to rescale the coefficients was abs [(−5.491 − 1)/(−0.535 − 1)] = 4.228. The final rescaled coefficients for DCTTO are β j  = β j /4.228. In DCdead models, the dead state has a value of 0. The coefficient for the dead state βdead in the DCdead model is −6.494, since this coefficient must be −1 (meaning that the dead state has a value of 0). The rescaled coefficients are then β j  = β j /6.494. If the coefficient for the dead state βdead in the DCWTD model is −5.346, then the rescaled coefficients are β j  = β j /5.346.

In general, values in the lead-time TTO model were lower than in any of the DC rescaled models due to the estimated intercept value of 0.452. However, there are several inconsistencies for some estimated coefficients. In all of the estimated models, for example, the coefficient for moderate problems (level 3) in the pain/discomfort domain is positive, although not statistically significant. Other inconsistencies are statistically significant: the lower coefficients for slight (level 2) compared to moderate problems (level 3) in the self-care domain for the three DC models and in the mobility and usual-activities domain for DC. The value of the 55555 state in the DCdead model (0.100) was higher than the corresponding value for the DCWTD model (−0.004); however, for both DCdead models, these values were much higher than that in the lead-time TTO model (−0.535).

The two DC dead models are in concordance, with DCdead versus DCWTD having CCC = 0.848, and DCTTO versus lead-time TTO having CCC = 0.725 as well. However, the concordance among the remaing models is lower: (1) DCWTD vs. DCTTO : CCC: 0.677; (2) DCdead versus DCTTO: CCC = 0.478; (3) DCdead versus lead-time TTO: CCC = 0.239; (4) DCWTD vs. lead-time TTO: CCC = 0.349. Compared to DC models, lead-time TTO produced lower values for practically every health state (Fig. 1c, e, f). Both DCdead and DCWTD models estimated very similar values (Fig. 1a).

Fig. 1
figure 1

Quantile-quantile plots for comparison of values obtained from DCdead, DCWTD, DCTTO, and lead-time trade-off (TTO) models. For a full description of each model, see section "Statistics"

The MSD for differences between the 3,125 states in both DCdead models is 0.009. However, the MSD for the differences with the lead-time TTO model are 0.217, 0.142, and 0.045 for the DCdead, DCWTD, and DCTTO models, respectively. The MSD for the differences with DCTTO are 0.091 and 0.044 for DCdead and DCWTD, respectively.

Discussion and conclusions

In the study reported here we compared two approaches for rescaling DC values on the dead (0)—full health (0) scale to obtain an EQ-5D-5L value set that can be used in economic evaluation. The two approaches were: (1) DC incorporating an additional judgmental task in which the health state ‘dead’ is assessed against other health states; and (2) a DC model anchoring on lead-time TTO values.

None of the estimated models were completely consistent in terms of regression coefficients. All models had some positive coefficients. Also, to be consistent, a model must meet the condition that each dimension should satisfy an increasing order in the absolute value of the coefficients for each level of severity. According to the results, each of the models did satisfy the condition for some dimensions—but not for all. The DCTTO model did not satisfy the condition more often than the DCdead models, and its rescaled results produced higher utility decrements than both rescaled DCdead models. The rescaled DCWTD model differs less from rescaled DCTTO than from rescaled DCdead. However, we have to take into account that the intercept for the lead-time TTO model was extremely high, which leads to health state values that lack face validity. For example, a person with slight mobility problems has a value of <0.55, which is ridiculous when compared to the previous EQ-5D value set [25].

The reason for the inconsistencies in the logistic regression results is not clear. On the one hand, these inconsistencies could be explained by the fact that the DC design included only 50 pairs of health states, which may be inadequate to yield sufficient information (and thus power) to estimate the logistic models (some coefficients were not statistically significant). On the other hand, more power (thus, a larger sample size) may be needed for each pair of health states when the number of pairs is fixed. When the data were applied to the Spanish arm of the multi-country study, the inconsistencies in the DC model disappeared [19]; however that study had both more pairs (200) and more observations per pair. The questions touching upon dead, which are necessary for the DCdead models, were only conducted in the Spanish pilot study. Therefore, the analysis of DCdead models could not be extended to all countries for the sake of comparison. In that light, it would make sense to increase the number of pairs in the DC design that touch upon dead and also to increase the power per pair as this approach would ensure that future studies conducted by using a DC model incorporating dead will be consistent for the whole multi-country dataset.

On comparing the results of the modeling exercise for all participants versus those who rated at least one state as WTD, we found that the DCdead and DCWTD models produced similar results, with the only difference being the position of ‘dead’. In particular, we found higher utility decrements and thus lower health state values for EQ-5D-5L states when the participants who did not rate any state as WTD were removed from the analysis. However, this may not amount to bias and may simply reflect the preferences of the population. Whatever the reason, the impact on actual results was not large. It should be kept in mind that this was not a direct comparison, as the participants it covered were not identical. From a mathematical point of view and based on the RUT theory, estimation may fail when many participants do not choose any WTD option. Nevertheless, the DCdead model could be estimated and did not perform much worse than the DCWTD model in terms of likelihood.

There is some concern about the feasibility of some elements of the DC and lead-time TTO as conducted in this survey. In general, the participants understood the hypothetical nature of the health states and lives they were presented with. They knew they had to choose the health state/life that they preferred rather than the health state/life with which they identified the most. However, some problems arose in the course of both exercises, especially during the lead-time TTO task. Many individuals were confused when making choices and did not realize that the health conditions changed when they answered that ‘both lives are almost equal’. Although this consequence had been explained, it was necessary for the administrators to do the first lead-time TTO exercise together with the participants so they could do the rest of the exercises as required. The general impression was that many of the respondents did not answer the TTO part of the exercises appropriately. Some individuals reported that they could not decide when they were indifferent between both lives because they always preferred Life B. This indecisiveness could explain the illogical results obtained with the lead-time TTO model. In general, the respondents needed less assistance on the DC part of the survey, but many did comment on the difficulty of making choices between health states. The difficulties they encountered in the survey tasks emphasize the important role of the face-to-face interviews that are also part of the study design. DC and lead-time TTO elicitation techniques require the respondents to compare health states with ‘dead’; this question was posed directly in each of the DC exercises and indirectly in each of the lead-time TTO exercises. From the results we can deduce that a state was more frequently considered WTD in indirect (lead-time TTO) than direct questions (DC + dead), possibly due to the fact that in lead-time TTO the distinction between negative and positive values was not explicitly made. This fact could explain the lower values observed for the lead-time TTO method and hence the DCTTO.

Previous studies have investigated the incorporation of the health state dead in the DC task [8, 16, 17]. However, none of these used the EQ-5D-5L to allow a direct comparison. Stolk et al. [8] used the classic three-level version of EQ-5D. Our results do not confirm those obtained by Stolk et al., probably because their comparison was made with classic instead of lead-time TTO. Also, the five-level version makes the DC task more complicated for the respondents, and this complexity might have led some participants to make random choices when they could not decide between health states A and B.

DCdead models produce correlated results with slight differences (no bias). Incorporating the health state dead into the general DC technique produces results in concordance with the DCTTO. DC modeling warrants further research to optimize the design if it is to be used to estimate EQ-5D-5L value sets. The lead-time TTO produces very high utility decrements, and its consistency among responses is lower than that of DC models.