Summary

The EQ-5D DCE data provided an ideal data source to understand the health preference variations in Asia. By comparing the modeling results, the relative importance of dimensions and levels, we found Asian regions have diverse health preferences.

Introduction

EQ-5D is a generic preference-based health-related quality of life (HRQoL) questionnaire that is widely used around the world [1, 2]. When value sets are available, EQ-5D data can be converted to health utility [2]. Many countries have established their own EQ-5D value sets proceeding on the basis that health preferences differ among countries/populations [3, 4]. Indeed, studies have found differences between value sets [3, 5, 6]. In developing the value sets of three-level version of EQ-5D (EQ-5D-3L), published studies differed in terms of design, data collection protocol and the choice of model. By comparing these value sets, Norman et al. concluded that these variations in methods could obscure true differences in values [6]. For the latest five-level version of EQ-5D (EQ-5D-5L), the EuroQol Group developed a standardized protocol for data collection in valuation studies, which is named the EuroQol valuation technology protocol (EQ-VT) [7,8,9].

With application of the EQ-VT, EQ-5D-5L valuation data can be exploited to study whether important differences in health preferences across populations exist, as the method variations observed in the 3L studies are minimized. The EQ-VT data collection protocol uses both time trade-off (TTO) and discrete choice experiment (DCE) as preference elicitation methods [7]. Currently, all comparison studies of EQ-5D value sets only used the TTO data from the valuation studies [5, 10, 11]. This is partially because the TTO data is considered as the primary preference source in the EQ-VT protocol and some studies estimated their value set using TTO data only, for example, China and South Korea [12, 13]. So far, the DCE data collected using the EQ-VT protocol has not been utilized for the purpose of identifying preference differences across studies. While the TTO valuation data could be subject to interviewer effects as the task relies on the good performance of the interviewers [14], there is minimal interviewer effect for the DCE data.

As a preference elicitation method, DCE has been increasingly used in health preference studies [15]. Based on random utility theory, DCE is designed to ask respondents to choose a preferred multi-dimensional health state from two or more alternatives. The ordinal preference data can be modeled to predict health utility on a latent scale [16]. This means that the coefficients of DCE are not directly comparable across studies and most studies assessed their difference by calculating and comparing the relative importance of five health dimensions [17, 18].

Further, differences in health preferences among Asian populations are not well understood. By comparing the multiplicative model coefficients of the EQ-5D-5L TTO valuation data from seven Asian studies, Wang et al. noticed that there was no consensus about the rank ordering of the five dimensions [10]. Additionally, statistical test suggested most coefficients differed among Asian studies. In the study of Roudijk et al., the authors found that cultural variables (i.e. traditional/rational-secular, survival/self-expression) did not explain the variations of value differences (defined as utility differences between the mild and severe states) among EQ-5D valuation studies, including 10 Asian studies [11]. As stated before, these studies only explored the TTO valuation data.

Following the EQ-VT protocol, 11 studies (China, Indonesia, Japan, South Korea, Malaysia, Singapore, Thailand, Philippines, Vietnam, Hong Kong, Taiwan) have been completed in Asia. Of those studies, China and South Korea did not use the DCE data to model the value sets, and Singapore and Philippines have not yet published their value sets. The rest of the studies modeled the DCE data and TTO data jointly. Notably, no study has compared DCE-derived preference data among Asian populations. Given all studies used the standardized EQ-5D-5L instrument, DCE experimental design, and data collection protocol, it is possible to explore the variations of health preferences in Asia. We hypothesized that the health preferences differed in Asian populations. If this is true, the results of this study could further support the establishment of national/regional value sets for better guidance of health care decision-making and resource allocation rather than using a unified value set designed merely for the continent. In this study, we aim to understand the similarities and differences in Asians’ preferences for EQ-5D-5L health states in 11 Asian DCE datasets collected as part of EQ-5D-5L valuation studies.

Methods

EQ-5D-5L questionnaire and 11 Asian valuation studies

The EQ-5D-5L measures HRQoL using five dimensions (mobility, self-care, usual activities, pain/discomfort, anxiety/depression), and under each dimension, there are five levels of severity (no problems, slight problems, moderate problems, severe problems, extreme problems/unable to). In total, the EQ-5D-5L defines 55 = 3,125 health states [19]. We obtained the DCE data from the principal investigators of 11 Asian EQ-5D-5L valuation studies, namely China [12], Indonesia [20], Japan [21], South Korea [13], Malaysia [22], Singapore, Thailand [23], the Philippines, Vietnam [24], Hong Kong [25], and Taiwan [26]. The language(s) being used in each study is the official language and/or dialects that are spoken mostly by the population (see Table 1).

Table 1 Basic information of the 11 studies

DCE design and tasks

All Asian studies included in this study used the standard EQ-VT protocol for data collection [9, 27]. In general, the DCE design of the EQ-VT protocol consisted of a total of 196 pairs of health states including 186 pairs generated from a Bayesian efficient design algorithm and 10 pairs of mild states [27]. The priors for the Bayesian efficient design algorithm were extracted from a main effects model of an EQ-5D-3L DCE study [28]. The detailed experimental design development process and considerations were described in Oppe et al. [27]. The 196 pairs of EQ-5D-5L health states were distributed over 28 blocks, each consisting of 7 pairs of health states with similar severity. No dominant pairs were included. [27]. In each study, each respondent was assigned one block of DCE tasks to complete. The 7 pairs were presented in random order, and the right-left presentation of the two health states was also randomized [8]. Figure 1 shows the screenshot of one DCE task in EQ-VT software.

Fig. 1
figure 1

An example of Discrete Choice Experiments (DCE) in English

Data collection

Following the EQ-VT protocol, all respondents were interviewed face-to-face by a trained interviewer using the EQ-VT software. The data collection included four sections: The first section was for respondents to report their own health using the EQ-5D-5L descriptive system and the EQ-VAS. In the second section, respondents valued 10 different EQ-5D-5L health states using the composite time trade-off (cTTO) [8]. In the third section, respondents completed 7 pairs of EQ-5D-5L discrete choice tasks [27]. Finally, respondents reported their socio-economic and other background characteristics. We used the DCE data obtained from the third section for the analysis.

Analysis

To understand how the health preferences are different/similar with each other, the following analyses were done: (1) the statistical difference between the coefficients; (2) the relative importance of the five EQ-5D dimensions; (3) the utility decrements between each of the response levels.

For modeling, a 20-parameter main-effects mixed logit model was fitted for each study. In this model (Formula 1), utility was explained by 20 dummy variables and was on a latent scale (referred as latent utility). For each dimension (MO for mobility, SC for self-care, UA for usual activities, PD for pain/discomfort, AD for anxiety/depression), 4 dummy variables were used to represent the departure from level 1 to the other 4 levels, e.g. MO3 was 1 if the health state being valued had “moderate problems with mobility” and 0 for any other level of mobility [29]. In addition, a heteroscedastic conditional model was also fitted for each study [30]. The major difference between the heteroscedastic conditional logit model and the mixed logit model is that the heteroscedastic conditional logit model accounted for the heterogeneity in error variance and the mixed logit model accounted for the preference heterogeneity among respondents.

$$ Latent\;utility = \beta_{1} MO_{2} + \beta_{2} MO_{3} + \beta_{3} MO_{4} + \beta_{4} MO_{5} + \beta_{5} SC_{2} + \ldots + \beta_{20} AD_{5} + \varepsilon $$
(1)

Next, the statistic difference between two studies’ coefficients were explored using a pairwise comparison. For each pair, a dummy variable was generated as 0 for one study’s data and as 1 for the other. Next, a 20-parameter main-effects model plus 20 interaction terms was fitted for all two-by-two study combinations (see Formula 2). In this model with interaction terms, a significant interaction term suggests that the coefficient is statistically different between two studies. The number of statistically differed coefficients were summarized for each study pair. Notably, the coefficient of a significant interaction term may not exceed the minimal important difference (MID) on the utility scale [31].

$$ Latent\;utility = \beta_{1} MO_{2} + \beta_{2} MO_{3} + \beta_{3} MO_{4} + \beta_{4} MO_{5} + \beta_{5} SC_{2} + \ldots + \beta_{20} AD_{5} + \beta_{21} MO_{2} *study\;dummy + \beta_{22} MO_{3} *study \, dummy + \beta_{23} MO_{4} *study\;dummy \ldots + \beta_{40} AD_{5} *study \, dummy + \varepsilon $$
(2)

Using the mixed effect logit model results (Formula 1), the relative importance of dimensions and levels were estimated for each study [17, 18, 32]. The relative importance of the five dimensions were calculated in two steps. First, the dimension-level coefficient was divided by the mean of the same level from all the dimensions. For example, the adjusted coefficient for mobility level 3 was obtained by the MO3 coefficient divided by the sum of all level 3 coefficients for each dimension aMO3 = MO3/(MO3 + SC3 + UA3 + PD3 + AD3). This step resulted in adjusted coefficients for the last four levels (level 1 is the reference level) of every dimension. Second, the means of all adjusted coefficients for each dimension were calculated. Continuing the mobility example, the relative dimension importance of mobility for a study would be estimated as (aMO2 + aMO3 + aMO4 + aMO5)/4.

The relative importance of levels was also obtained in two steps: first, the sum of each level coefficient from all dimensions was calculated. Second, the sum of each level coefficient was divided by the sum of level 5 coefficients: e.g. the relative importance of level 2 was the sum coefficient of level 2 divided by the sum coefficient of level 5. In practice, relative importance for level 2 sum for a study would be calculated as follows: (MO2 + SC2 + UA2 + PD2 + AD2)/ (MO5 + SC5 + UA5 + PD5 + AD5). The relative importance results were summarized across 11 studies and two figures were plotted, one for the relative importance of the dimensions and one for the relative importance of levels (see Online Appendix 1 for the calculation of the relative important). If five dimensions are equally weighted by a population, all five dimensions should have a relative importance of 0.20 (i.e. 1 divided by 5). The relative importance of levels is interpreted as the percentage of the weight attached to level five problems. The 95% confidence intervals of relative importance were calculated using the Delta method (see Online Appendix 2 for an example STATA code). Analyses were performed using STATA 14 (Stata Corp LLC) [33].

Results

Data descriptions

Table 1 summarizes the key information from the 11 valuation studies. Based on the EQ-VT protocol, all studies recruited at least 1000 respondents. Quota sampling was the most used sampling strategy, but the quota differed. All studies were conducted between 2012 and 2017.

Modeling results

Table 2 shows the mixed logit modeling results. All coefficients for all studies were significant at 0.05 level except for the second level of usual activities in Taiwan. Vietnam and Philippines each had 1 and 2 inconsistent coefficients, respectively. Three inconsistencies occurred on the third level of self-care, mobility, and usual activities, respectively. Within each study, the standard errors of the coefficients generally increased with severity levels. Table 3 shows the number of coefficients that differed statistically between two studies. Overall, 9.3 out of 20 coefficients differed among studies. Almost all studies had at least 5 coefficient differences with others except for Taiwan versus Hong Kong, Taiwan versus Malaysia. Malaysia and Singapore differed the most with 16 statistically different coefficients. An example of this comparison between China and Indonesia can be found in Online Appendix 3.

Table 2 The results of the mixed logit model for 11 studies
Table 3 Number of coefficients differed statistically between two studies

Compared with the mixed logit model results, the heteroscedastic conditional logit model improved the non-significance for Taiwan but did not improve the coefficient inconsistency for Philippine and Vietnam. Furthermore, this model resulted one non-significant coefficient for South Korea and one inconsistency for Thailand. The heteroscedastic conditional modeling results can be found in Online Appendix 4.

Relative weight results

Table 4 shows the relative importance and their 95% confidence intervals of 11 studies. Figure 2 shows a universal rank order does not exist across 11 Asian populations. Mobility was the most important dimension for every study except for Vietnam. The lowest important dimension was either usual activities or self-care except for Philippines and Indonesia. Notably, these two functional dimensions had similar weights in China, Indonesia, Japan and Vietnam, and only Korea had larger relative weight for usual activities. Pain/discomfort was the second most important dimension for 6 studies, and it was valued higher than or equal to anxiety/depression in almost all studies except for Thailand. Singapore, Japan, Philippines, and Indonesia placed similar weights on pain/discomfort and anxiety/depression. The sum of the first three functional dimensions were larger than the sum of the two symptom dimensions across all studies.

Table 4 Relative importance of 11 studies, mean (95% confidence intervals)
Fig. 2
figure 2

Relative importance of five dimensions

Some individual characteristics can be spotted from Fig. 2. South Korea showed the largest difference between the dimensions of mobility and self-care. Japan had similar weights for dimensions other than mobility. Hong Kong, Malaysia and Taiwan showed similar rank order, i.e. Mobility > Pain/discomfort > Anxiety/depression > Self-care > Usual activities. China differed with these three studies by placing usual activities more important than self-care. Indonesia showed a different pattern by weighing more on usual activities and self-care over pain/discomfort and anxiety/depression. Both Vietnam and Singapore had similar weights three dimensions. Thailand and Vietnam were unique in the sense that Thailand valued anxiety/depression as the second most important dimension and Vietnam valued pain/discomfort as the most important dimension.

Compared with the large variations among the relative importance of health dimensions, the relative importance of levels were more comparable across studies (Fig. 3). The weights of mild (L2) and moderate problems (L3) were more similar across regions as compared to the weights of severe (L4). The L2 ranges from 0.156 for Taiwan to 0.322 for the Philippines; the L3 ranges from 0.211 for Thailand to 0.367 for Indonesia; the L4 ranges from 0.600 for South Korea to 0.837 for the Philippines. In the Philippines and Thailand, the difference between level 2 and level 3 were minimal. On average, level 2 accounted for 20% of the weight of level 5, level 3 accounted for approximately 30% of the weight of level 5 and level 4 accounted for 70% of the weight of the level 5. The smallest relative importance was 0.156 of L2 from Taiwan, which represents having a mild problem accounted for about 15.6% weight of having an extreme problem. The smallest L3 was from Thailand (0.211), and this value was smaller than L2 from some studies.

Fig. 3
figure 3

Relative importance of the severity levels

Discussion

The present study compared the DCE based modeling results and relative importance of EQ-5D-5L dimensions and levels of 11 Asian valuation studies. The strength of this study is all 11 studies followed the standardized EQ-VT protocol, which minimized possible noises in identifying the true differences. Based on our results, it is fair to declare that there does not exist a single preference pattern for Asian populations. This is in line with a previous study comparing TTO preference data [10]. A clear distinction between our DCE results and the TTO results is the relative weights for level 3 and level 4 are larger in the TTO study.

Our study first tested the differences of modeling coefficients and then compared the relative importance attached to the dimensions and levels of EQ-5D-5L. Both analyses suggest large health preference heterogeneities among Asians. First, the number of differed coefficients ranged between 2 (Malaysia vs. Taiwan) and 16 (Singapore vs. Malaysia) and the average number is 9.3, suggesting about half of the coefficients differed when pooled two studies’ data for a joint model. Second, both the relative importance of dimensions and levels differed among studies. Only Hong Kong, Taiwan and Malaysia showed the same order of five dimensions. Here we concluded some common patterns that, however, always come with exceptions. First, among the five dimensions, mobility is the most important dimension for every population except for Vietnam. This is similar to the results from a comparison of TTO-only preference data from 7 Asian regions [10]. However, western countries do not value mobility as highly; the Dutch, German, and US populations view mobility as third, fourth, and second most important dimension, respectively [34,35,36]. Purba et al. argued that in the western developed countries, problems with mobility had less influence due to better infrastructure provision and less emphasis on manual labor [20]. However, in high income and developed regions such as Singapore and Japan, mobility is still the most valued dimension. Second, the sum of three function dimensions (mobility, self-care and usual activities) were higher than the sum of two symptom dimensions (pain/discomfort and anxiety/depression). Also, either usual activities or self-care is the least important dimension. Indonesia and Philippines are the exceptions. This result agrees with the previous study of comparing TTO data among 7 Asian populations. In that study, Indonesia was the only one who valued pain/discomfort and anxiety/depression the lowest. Third, pain/discomfort was valued more important than anxiety/depression and is the second most important dimensions for 6 studies. These characteristics mark some notable difference between preference pattern from most European, American, and African populations [5, 37,38,39].

Despite these similarities, it is clear that a singular preference pattern does not exist for all Asian populations. For example, there is no agreement on the least important dimension in our comparison: 3 studies valued self-care, 2 studies valued anxiety/depression, and 6 studies valued usual activities as the least important. This contrasts to a previous study of comparing health preference pattern for Canada, England, the Netherlands, and Spain. In that study, Olsen et al. found a clear pattern existed for these four western countries and named it western preference pattern (WePP) [5]. In the WePP, four general characteristics were noticed in terms of the relative importance: 1) (PD + AD) ≈ (MO + SC + UA); 2) PD ≈ AD; 3) MO ≈ SC; 4) UA < SC. However, no Asian preferences fit well with these four characteristics. In fact, the sum of pain/discomfort and anxiety/depression was less than the weight of the other three dimensions in all Asian studies: (PD + AD) < (MO + SC + UA), suggesting that compared with the western countries, the Asian placed more weights on the functional dimensions. The second characteristic of ‘PD ≈ AD’ was only observed in the results from Indonesia, Singapore, and Malaysia. The third characteristic was clearly invalid in Asia as mobility was valued as the most important dimension while self-care had less relative importance in 4 studies. For the last characteristic, four Asian populations put similar or higher values for usual activities.

The differences of health preferences can be attributed to several reasons. First, in our sample, 11 populations come from diverse cultural, economic, political and social environments. Although no study has examined how these factors related to health preferences, country specific value set has been established on the notion that these factors shape people’s preference. Second, even though each study followed the same study protocol, their sampling method differed. Quota sampling method was the most used sampling strategy, but the quota varied across study. For example, ethnicity was used in some studies like Malaysia and Singapore, but not in China and South Korea. Similarly, some studies only recruited participants from urban areas, which may not be representative for the whole target population. Studies have shown that the demographic of respondents could influence the health preferences [40, 41]. Hence, different respondents recruited for each study may contribute to the observed differences. Last but not least, the EQ-5D-5L descriptive system was translated into different official languages from English. Though a standardized translation process was conducted to maintain equivalence between the translated questionnaire and its source version, different languages have different ways of expression which maybe inadequately captured [42].

This study has some limitations. First, the point estimates of the relative weights were used to identify the preference pattern. Considering the 95% confidence intervals were overlapped for some dimensions, the relative weight difference between dimensions may not be statistically significant. Assuming a scale length of 1.5 (i.e. 55555 has a value of -0.5, 11111 has a value of 1) and using a MID of 0.05, any relative importance difference over 0.03 should be meaningful. Nevertheless, since we do not know the actual scale length of each study, we did not use this criterion. Second, even though a standardized protocol was used, the demographic questions used for each study was customized by each local study team. Due to these sampling variations, we did not further test how these variations affect preferences. Only the heteroscedastic model shown in Online Appendix 4 demonstrates that the variances was constant for respondents with different ages and gender.

Norman et al. pointed out that differences in methods obscured the true differences in health preferences across countries after comparing published EQ-5D-3L value sets [6]. Our study has shown that using a standardized data collection protocol, study design and modeling choice, there still remained differences in EQ-5D-5L modeling results and the relative importance of dimensions and levels among Asian populations. Therefore, the effort of estimating a combined continental value set that was carried out for European and Western countries [5, 43] should be discouraged for Asia.

Conclusion

By comparing the DCE data modeling results, we found that the rank order of EQ-5D-5L dimensions and the relative weight of levels differed among Asian populations. These findings confirmed the health preference heterogeneity among Asian populations that was observed in previous studies using TTO data. All the evidence suggested the necessity of using local value set for estimating health utility.