Introduction
When economic evaluations of healthcare technologies are performed, the incremental cost-effectiveness ratio (ICER) is regarded as a standard calculation. Various outcomes can be used as the denominator of ICER, but quality-adjusted life year (QALY) is widely applied for various areas of cost-effectiveness analysis. One reason is that quality of life (QOL) is one of the most important outcomes for not only medical interventions, but also healthcare policies. To calculate the QALY, the QOL score must be measured on a scale of 0 (death) to 1 (full health). Preference-based measures, such as the EuroQol 5-dimension (EQ-5D) [
1,
2], the Health Utilities Index (HUI) [
3‐
5], and the Short Form 6-dimension (SF-6D) [
6‐
8], have been developed to calculate QOL scores. These measures were originally developed in English but have been translated into many languages. Japanese value sets for the EQ-5D (3L [
9] and 5L [
10]) and the SF-6D [
11] have also been developed.
The mean QOL score in the general population is normally <1 because some people will have a less than full health score. People with diseases or symptoms are likely to continue living in their local community. Others may not report their health state as full health even if they do not have any diseases. Such reductions in QOL should be reflected in QALY calculations for economic evaluations. In addition, to interpret QOL scores obtained through a survey, it is important to be compared with the score for the general population as a reference value. Therefore, the
population norms, which have been previously defined as “population reference data… for a specific country or international region” [
12], used for preference-based measures are essential for both researchers and policymakers. The norms for these measures, especially for the EQ-5D-3L, have already been reported in many countries, including the UK [
13], USA [
14,
15], six European countries (Belgium, France, Germany, Italy, Netherlands and Spain) [
16,
17], Spain (Catalonia) [
18], Switzerland (French-speaking population) [
19], Finland [
20], Denmark [
21], Portugal [
22], Poland [
23], Canada (Alberta) [
24], Australia (Queensland) [
25], China [
26], Taiwan [
27], Singapore [
28,
29], Sri Lanka [
30], and Brazil [
31]. The population norms for the SF-6D have also been investigated in some countries, including the UK [
32], USA [
15], Australia [
33], Portugal [
34], and Brazil [
35]. However, the Japanese population norms for QOL scores do not currently exist, with the exception of surveys performed in three areas [
12] that were originally performed to obtain a value set [
9]. Few standard norms for the EQ-5D-5L, a newly developed measure by the EuroQol Group, have been reported across the world.
The population in Japan was about 12.5 million in 2015, and almost all of the population speaks Japanese. Therefore, Japanese versions of the EQ-5D-3L, EQ-5D-5L, and SF-6D are widely used for calculating QOL scores in Japan, and Japan’s economic evaluation guideline [
36] recommends the use of measures with value sets developed in Japan. The Ministry of Health, Labour and Welfare (MHLW) of Japan has collected data on these measures based on our concept. They also collected responses to a questionnaire included in the National Livelihood Survey, which Japan’s MHLW performs annually. This questionnaire includes questions regarding disease types and subjective symptoms.
Therefore, the objective of this study was to analyze data to obtain the population norms for the Japanese versions of three preference-based measures: the EQ-5D-3L, EQ-5D-5L, and SF-6D. The second objective was to examine the characteristics of each measure and the relations among measures. We also aimed to present the relation between the QOL score for the general population and characteristics such as sex, age, diseases, symptoms, and other socio-demographic factors.
Methods
Sampling
Data in this study came from MHLW’s survey, which took a representative sample. In the survey, a total of 1000 adult respondents (aged ≥ 20 years) were targeted in a random sampling from 100 sites (municipalities). The method used to select the 100 sites was as follows: First, the number of sites in each region (8 regions) was determined in proportion to the population of each region. Then, in every region, the number of sites belonging to each stratum (prefecture × size of municipalities) was calculated based on the populations of the stratum. The surveyed district (Cho-me, in Japanese) was randomly determined in a manner corresponding to the allocated number of sites in each stratum. Respondents were also randomly sampled from each selected district, stratified according to sex and age. People in a hospital or a nursing home were not included.
The Basic Resident Register can be used to select respondents living on each street in a random manner. In Japan, each municipality has its own Basic Resident Register data, which includes information on the name, sex, address, and date of birth of all residents. Each municipality has permitted the use of such data for public surveys. A door-to-door survey was performed from January to March in 2013. Investigators visited the registered addresses and distributed the questionnaire. They then collected the questionnaires a few days later and checked for any apparent errors (placement method). These visits continued until the planned number of responses was collected for each district. The investigators obtained the informed consent of all the respondents.
Measures
Health status was measured using the EQ-5D-3L, EQ-5D-5L, and SF-6D. The respondents were presented with the EQ-5D-5L, EQ-5D-3L, and SF-6D (SF-36) in a fixed order. In addition, socio-demographic data for the respondents, such as sex, age, education, marital status, employment, and household income, were also collected.
The EQ-5D was developed by the EuroQol Group. The original version of the EQ-5D (now called the EQ-5D-3L) is comprised of five items: “mobility,” “self-care,” “usual activities,” “pain/discomfort,” and “anxiety/depression” assessed at three levels of description. To improve the lack of a sufficient sensitivity and the ceiling effect of the EQ-5D-3L, the newly developed EQ-5D-5L [
37] has increased the number of levels for each health dimension from three to five.
The SF-6D is a measure for converting responses to the SF-36 (or SF-12 [
38]) to a preference-based QOL score for economic evaluation. The SF-36 [
39‐
41] is the most widely used measure for assessing health states in the world. Responses to selected items of the SF-36 can be classified according to descriptions of the SF-6D system, which consists of six dimensions [physical functioning (PF), role limitation (RL), social functioning (SF), bodily pain (BP), mental health (MH), and vitality (VT)] with five or six levels (defining a total of 22,500 health states). As the direct use of the SF-6D questionnaire is not recommended, we used the Japanese SF-36, version 2 [
42].
The questionnaire also included a part of the National Livelihood Survey, which Japan’s MHLW performs annually. The questionnaire asks respondents whether they have any diseases for which they consult a doctor or not and whether they have any subjective symptoms or not. If they answer “yes,” they must then select the most important diseases and symptoms that they exhibit from a list of forty symptoms (having a fever, feeling sluggish, sleeplessness, etc.) and diseases (diabetes, obesity, hyperlipidemia, etc.).
Statistical analysis
The responses obtained for the EQ-5D-3L, EQ-5D-5L, and SF-6D were first converted to QOL scores based on the Japanese value sets. Summary statistics for the QOL scores were calculated according to sex and age category (20–29, 30–39, 40–49, 50–59, 60–69, and 70 years and older). The percentage of people reporting any problem in each dimension was calculated after stratifying the subjects according to sex and age category. Chi-square tests (or the Fisher exact test if the expected frequency was low) were applied to determine the significance between the frequency of respondents with any problem and sex or age. The McNemar test was performed to confirm the frequencies of respondents with any problem in the EQ-5D-3L and the EQ-5D-5L. The intraclass correlation coefficient (ICC) was used for reliability between the three measures in addition to the Bland–Altman plot [
43]. In the Bland–Altman plot, the average of the two measures was plotted on the x-axis, and the difference between the two measurements on the y-axis was used to check for systematic errors.
To detect the influence of socio-demographic factors and diseases/symptoms on the QOL scores, these variables were added (in addition to sex and age) to an analysis of variance (ANOVA). Diseases and symptoms for which more than 10 respondents had responded positively or that had a significant influence on the QOL score were included in the above statistical model. The influence of each disease and symptom was estimated using an ANOVA that included all the pertinent variables. The significance level was set at 0.05. Statistical analyses were performed using SAS 9.4.
We compared the QOL scores of the respondents between those with any subjective diseases/symptoms and those without using an ANOVA model. The difference was interpreted as the between-group minimal important difference (MID) of each preference-based measure in the general population. The MID, which corresponds to the smallest improvement considered to be worthwhile by a patient, is normally measured using a distribution-based or anchor-based method. Reportedly, “anchor-based differences can be determined either cross-sectionally at a single time point or longitudinally across multiple time points” [
44]. The former cross-sectional anchor-based method was applied to our data, as the diseases and subjective symptoms were regarded as the anchors for the between-group MID.
This analysis was approved by the Ethics Committee of the National Institute of Public Health.
Discussion
To our knowledge, this is the first study to examine the Japanese population norms of three preference-based QOL measures: the EQ-5D-3L, EQ-5D-5L, and SF-6D. Sampling was based on the Basic Resident Register data for each municipality. This sampling is regarded as one of the most rigid and reliable methods in Japan. The reason for the differences in the QOL scores, compared with the population norms in other countries, is unclear; however, the differences may be influenced by (a) differences in actual health states, (b) differences in the value sets used in each country, and/or (c) differences in the degree of the ceiling effect or other characteristics. The ceiling effect of the EQ-5D-3L (especially for pain/discomfort among younger respondents) may be higher in the present study than in western countries [
12]. Of note, the difference in the population norms does not necessarily indicate a difference in the respondents’ health states.
The results are shown stratified according to sex and age category. The QOL scores were significantly reduced if the respondents were older than 60 years of age, female, had a lower income, or a shorter period of education. According to our results, a larger income was associated with a higher QOL score. The causal relation (whether poverty causes a poor health state or a poor health state is the cause of poverty) is unclear, but this finding may be useful for public health policies. This relation was observed in other countries. For example, in the USA [
14], the QOL score as measured using the EQ-5D-3L was 0.81 for the poorest category (≤USD 10,000), although it was 0.92 for the richest (≥USD 75,000).
The percentage of reports of any health problem for the EQ-5D-5L is higher than that for EQ-5D-3L in almost all the sex and age categories. Some authors have pointed out that the EQ-5D-3L has a ceiling effect, which is defined as “the proportion of respondents scoring ‘no problems’ on any of the five dimensions” [
45], because the instrument lacks enough sensitivity. A three-level questionnaire allows respondents with a slightly worsened health state to be reported as having a full health state. This is one example of how the ceiling effect problem has been improved by the revision of the EQ-5D-3L, resulting in the EQ-5D-5L. According to Table
2, the standard deviation of the QOL score measured using the EQ-5D-5L tended to be smaller than that measured using the EQ-5D-3L. This result may also arise from the increased number of levels, enabling respondents to choose intermediate levels.
Compared with the EQ-5D measures, the QOL score measured using the SF-6D was lower in the general population. A poor agreement between the EQ-5D and the SF-6D scores was observed, with a low ICC of 0.249 (EQ-5D-3L) and 0.234 (EQ-5D-5L). One cause seems to be clear, considering the percentages of respondents with full health as shown in Table
4. The percentage of people who chose no problem on the SF-6D was much lower than that for either EQ-5D measure. This result may be characteristic of the SF-6D and not only for the Japanese population. In Australia [
33], the proportions of respondents in the 18- to 30-year age category who reported any problem in each dimension were as follows: 32 % for PF, 23 % for RL, 39 % for SF, 60 % for BP, 49 % for MH, and 94 % for VT. On the other hand, a Bland–Altman plot indicated that most outliers (an SF-6D score that was higher than the EQ-5D score) occurred at lower QOL scores. These tendencies were similar to those reported by Kontodimopoulos et al. [
46] in Greece. Thus, the SF-6D may have a floor effect [
47], i.e., the lowest QOL score of the SF-6D (0.292) is higher than that of the EQ-5D-5 (−0.025).
The Japanese population norms for the SF-6D seem to be lower than those for other countries, although that of EQ-5D-3L is similar to those of other countries (except Thailand). It is unclear whether this lower score is a result of the Japanese response pattern or a Japanese tariff for the SF-6D. According to these results, if the QOL score is used for economic evaluations, its interchangeability should be carefully considered [
48‐
52], since the baseline scores of the general population differ between the Japanese EQ-5D and the SF-6D.
We analyzed the differences in the QOL scores between respondents with diseases/symptoms and those without diseases/symptoms by comparing the cross-sectional between-group MID of each measure. The anchor-based MID is more commonly measured longitudinally across multiple time points, which is closer to the definition of MID. In the general population, repeated surveys are more difficult to perform than in clinical trials. Of note, our estimated score may not be the same as the intra-respondent MID. However, the between-group MID may be more useful when the results of between-group differences have been interpreted. Walters et al. [
53] showed that the mean MID of the SF-6D was 0.041 and that of the EQ-5D-3L was 0.074 in a review of studies. In cancer patients, the MID of the EQ-5D was estimated to be 0.08 (UK score) and 0.06 (US score), and these values were anchored to the performance status and the FACT-G score [
54]. According to a study examining post-traumatic stress disorder (PTSD), the MID was calculated as 0.05–0.08 (anchor-based method) and 0.04 to 0.10 (distribution method) [
55]. Considering these scores, our MID is consistent with previous studies.
A limitation of this study was its relatively small sample size, compared with other studies to identify population norms. We think that the sample number was sufficient to estimate the population norms according to sex and age category, considering the interpretable and consistent results with previous studies in other countries. However, a larger number of subjects may enable a clearer relation between the QOL score and diseases/symptoms to be identified. Furthermore, analyses of the effects of diseases with small prevalence could not be performed. Another limitation is the order in which the three instruments were presented to the respondents. As the order was fixed, and not randomized, the possible influence of the order on the results cannot be excluded based only on our data.
In conclusion, we demonstrated the following characteristics of three preference-based measures: (a) the Japanese population norms according to sex and age category, (b) the relation between QOL scores and socio-demographic factors, (c) the reliability of the three measures in the general Japanese population, (d) the percentage of reports of any problem, (e) the influence of diseases/symptoms on the QOL scores, and (f) the between-group MID. The respondents were randomly collected from all eight regions of Japan in a door-to-door survey, and the representativeness of the sample was considered to be good. The resulting information may be useful for calculating QALY in economic evaluations and research examining QOL score.