FormalPara Key Points for Decision Makers

Generally, the content validity of the Chinese version of EQ-5D-5L is satisfactory in rural Chinese.

The ‘pain/discomfort’ and ‘anxiety/depression’ domains may be subject to poor comprehensibility and there is a slight lack of clarity regarding the response levels.

It might be sensible to discuss how to improve the current Chinese version to make it more understandable for rural residents.

1 Introduction

The EQ-5D has been the most commonly used instrument for valuing health since its development in the 1980s [1]. The health utilities generated from EQ-5D provide a way to estimate quality-adjusted life-years (QALYs) for use in cost-effectiveness analysis (CEA) to inform resource allocation decisions in health care. EQ-5D has also been used in clinical trials, observational studies, population surveys and routine data collection in health care systems as a patient-reported outcome measure [1]. It includes five domains—mobility, self-care, usual activities, pain/discomfort and anxiety/depression, with multiple response levels for each domain that allow respondents to self-rate their health by choosing the appropriate level [2]. Currently, there are two versions available, the earlier EQ-5D-3L with three levels for each domain (i.e., no, some and extreme problems) and the newer EQ-5D-5L with five levels for each domain (i.e., no, slight, moderate, severe and extreme problems). The EQ-5D-5L was developed to address the presence of ceiling effects and poor sensitivity associated with the EQ-5D-3L [3,4,5,6] and it has been increasingly used in literature [7, 8].

In China, the EQ-5D-3L has been widely used to measure the health status of the general population [9,10,11] and patients [12,13,14]. The Chinese version of EQ-5D-5L became available in 2012 [15] and it has been used in several studies [12, 16, 17]. In light of the improved measurement properties [6], EQ-5D-5L is expected to be used more often in future studies. Both EQ-5D versions have been recommended by China Guidelines for Pharmacoeconomic Evaluations as a tool for conducting health technology assessment [18].

However, the Chinese version of EQ-5D-5L was developed through a response scaling exercise, among 50 participants recruited from downtown Beijing [15]. As commented by the authors, the small sample only included urban residents [15], so its validity in rural areas may be limited. Rural areas account for 90% of mainland China and rural residents comprise 40% of the Chinese population [19]. In China, rural residents and urban residents differ significantly in many aspects. For example, according to the 2010 population census, only 10% of rural residents received the senior secondary level or higher education while the figure for urban residents was 46% [20]. There was a higher proportion of people over 65 years of age among rural residents compared with those living in urban cities (rural: 10.1% and urban: 7.7%). Rural residents also have remarkably lower incomes [21] and inferior medical health services [22, 23] than their counterparts living in urban areas. In view of these huge differences, particular attention should be paid to the rural residents. To the best of our knowledge, the content validity of EQ-5D-5L or the earlier EQ-5D-3L has not previously been evaluated in rural residents in China. Therefore, we conducted this qualitative study to assess its content validity among rural Chinese.

2 Methods

2.1 Study Participants

Four regions (North, South, East and West) across China were selected on the basis of geographic location involving the cities of Tianjin (municipality), Guangzhou (Guangdong province), Nanjing (Jiangsu province) and Guiyang (Guizhou province) and then one county (rural area) from each region was selected for data collection. Participants from each county were recruited through convenience sampling. Rural residents who had lived in the county for the last 3 years and made a living by agricultural operations were eligible.

To obtain a representative sample, we used the national figures on the population distribution [20] and sought advice from an expert panel. Healthcare researchers from four universities in the four cities (Tianjin, Guangzhou, Nanjing and Guiyang) with sufficient survey experience in rural China were invited by the lead authors via personal contact and a total of 15 members were included in the expert panel. Teleconferences were held to achieve consensus between members. It was agreed that age, sex and education level should be taken into account, but not socioeconomic status considering that the survey was self-reported and the responses to socioeconomic status may not be reliable. Additionally, we did not include disease status as there is no clear evidence showing that the disease status could affect respondents’ understanding of the EQ-5D-5L and chronic disease is usually under-diagnosed among rural Chinese, so respondents may not be able to report accurate information. Disease status was not considered in quota sampling in the previous EQ-5D-5L valuation studies such as in the US [24] and in China [25].

In line with the recommendation for qualitative research that sample sizes of between 5 and 15 are typical [26], the target sample size for each county was 15. A total of eight interviewers were involved in the data collection, with two in each county. They were the junior members of the expert panel who were postgraduate students with experience in conducting surveys. Before the data collection, the interviewers received training from the team leaders and conducted pilot interviews. Training included briefing on the questionnaire, addressing queries and interviewers conducting pilot interviews with others. The feedback on the interviewing process was collected and then used to improve the interviews in the main data collection. Data were collected from September to October 2018 and involved semi-structured, one-on-one, face-to-face interviews, administered by trained interviewers using a paper-and-pencil questionnaire. The interviews were conducted at the participant’s home or other locations that allowed quietness and privacy. Participants provided written informed consent prior to the interview. During data collection, all participants were allocated with one study ID and their responses to the questions were recorded with this study ID only. No personal information was collected to protect privacy and we recorded the participants’ age as age group (18–35 years/36–55 years/56 years and above). Ethical approval was obtained by the Safety and Ethics Committee of the School of Pharmaceutical Science and Technology (SPST2018-03) in Tianjin University.

2.2 Survey Instrument

The survey instrument consisted of two tasks. The first task presented a cognitive debriefing procedure [27] to elicit participants’ feedback on the content of the EQ-5D-5L. In cognitive debriefing, each domain was assessed in terms of comprehensibility, relevance, clarity and comprehensiveness.

Comprehensibility was evaluated by examining whether the questions could be understood by participants as intended. A series of questions were asked about their interpretations of the questions, e.g., “could you give an example of usual activities”. For the two composite domains (pain/discomfort, anxiety/depression), one previous study reported the significant effect of splitting the anxiety/depression domain on self-reported health and health states valuation [28], and therefore, in this study, participants were asked to explain their interpretations of both terms and we also asked them whether there were differences between the two terms.

Relevance was assessed by examining whether the domains of EQ-5D were health-related and whether participants’ response could reflect their health condition using the questions “is the domain relevant to your health” and “do you choose the response level based on your health”.

Clarity was assessed for the response levels. Participants were asked whether they could distinguish the five response levels, and then if they could give examples of each response level; for example, “could you give an example of moderate problems with mobility”. We also explored the potential benefits or challenges of the increasing number of levels in the EQ-5D-5L relative to EQ-5D-3L. Participants were asked to recall all the responses levels after reading the question. We hypothesised that being unable to recall all five levels might indicate that the extra levels have posed additional cognitive challenges on participants to distinguish between all the options and this may further bias their response.

Comprehensiveness was assessed by examining whether there were important aspects of health not covered by the five domains of EQ-5D-5L using an open-ended question. The five domains in EQ-5D were developed based on European populations, so it may not be able to fully capture what matters for the Chinese population, especially for rural residents.

In the first task, participants were asked to choose a response level to each domain of the EQ-5D-5L according to their own health condition before and after debriefing, to check whether the debriefing procedure helped them in understanding and answering the questions. The second survey task required participants to select the most and least important domains related to their health among the five EQ-5D-5L domains. This task could also reflect their understanding of the concepts. Details of the instrument are available in the Appendix (see electronic supplementary material).

2.3 Analysis

All interviews were audio-recorded and transcribed, and then the transcripts were analysed using QSR NVivo 12 software following the content analysis framework [29].

3 Results

3.1 Sample Description

A total of sixty-two participants were recruited, with 27.4% aged 18–35 years, 51.6% aged 36–55 years and 21.0% aged 56 years and above. Among all participants, 51.6% were female; 25.8% participants had primary education or lower, 50.0% had junior secondary education and 24.2% had senior secondary education or higher (Table 1). Participants from different regions were slightly different in terms of age, gender and education level (Table 1).

Table 1 Participants’ characteristics

3.2 Debriefing

Before the debriefing task, the mean (standard deviation) EQ-5D-5L score was 0.939 (0.088) using the Chinese EQ-5D-5L value set [25] (Table 1 and Fig. 1a). Participants from different regions reported slightly different pre-debriefing EQ-5D-5L scores with people from the North reporting the highest while those from the West reported the lowest (North: 0.981; South: 0.948; East: 0.925 and West: 0.897) (Table 1 and Fig. 1c).

Fig. 1
figure 1

EQ-5D-5L scores pre- and post-debriefing. a EQ-5D-5L overall score (pre-debriefing). b EQ-5D-5L overall score (post-debriefing). c EQ-5D-5L score by region (pre-debriefing). d EQ-5D-5L score by region (post-debriefing)

The debriefing task took 38.8 min to complete on average and participants from the West spent the longest time (North: 32.6; South: 32.4; East: 41.0 and West: 49.9) (Table 1). After debriefing, five (8%) participants (4 from the West and 1 from the East) changed their responses, and the changes were in both directions of the response level. Two changed their response for ‘anxiety/depression’ from ‘no’ to ‘slight’ (Fig. 2a), one changed the response for ‘self-care’ from ‘no’ to ‘slight’ and the response for ‘pain/discomfort’ from ‘moderate’ to ‘slight’ (Fig. 2b), one changed the responses for ‘anxiety/depression’ from ‘severe’ to ‘no’ (Fig. 2c) and one changed the response for ‘mobility’ from ‘no’ to ‘slight’ and the response for ‘anxiety/depression’ from ‘no’ to ‘slight’ (Fig. 2d). After debriefing, the overall EQ-5D-5L score was identical but the score of participants from the East dropped from 0.925 to 0.918 while that of patients from the West increased from 0.897 to 0.907 (Table 1).

Fig. 2
figure 2

Changes of responses to the EQ-5D-5L pre- and post-debriefing

3.3 Comprehensibility

For the ‘mobility’, ‘self-care’ and ‘usual activities’ domains, all participants reported that they could understand the domains well by giving their interpretations.

“Mobility is the physical activity, such as do house work and go to the farmland.” (18–35 years, male).

“Self-care means that as an adult, I could handle my own life well, not like a child.” (18–35 years, female).

“Usual activities include going to community centres to meet friends, grocery shopping in markets, etc.” (36–55 years, female).

For the ‘pain/discomfort’ domain, all participants could give their interpretations, but when asked the differences between the two terms, forty-two (67.7%) participants indicated that although the two terms shared similar meaning, there were differences between ‘pain’ (Chinese: 疼痛) and ‘discomfort’ (Chinese: 不舒服). These differences were mainly related to the severity level and the range that each term covers.

“They are different. If you have pain, you also have discomfort, but discomfort does not necessarily relate to pain, for example, it can be sore waist or bloating.” (36–55 years, female).

“Discomfort can also be emotional, for example, unhappy, but without pain.” (18–35 years, male).

“Pain relates to physical feelings, but discomfort covers a wider range, including emotions.” (36–55 years, male).

For the ‘anxiety/depression’ domain, six (10.0%) participants could not understand ‘anxiety’ (Chinese: 焦虑) and nine (14.5%) participants mentioned they did not understand ‘depression’ (Chinese: 沮丧). This was mainly due to the wording used being too formal for them to understand. Two (3.2%) participants suggested some revisions to make the question more straightforward, such as using the terms ‘unhappiness’ or ‘sadness’ (Chinese: 不开心/难过).

“Just ask it simply, like whether you are happy or not.” (36–55 years, female).

“I like the question asking whether I am in a good mood rather than anxious or depressed.” (18–35 years, male).

Similar to the ‘pain/discomfort’ domain, thirty-five (56.5%) participants reported differences between ‘anxiety’ and ‘depression’. These differences may confuse them when trying to select the appropriate response level.

“They (anxiety and depression) are different. Anxiety means there is something that puzzles you and you could not stop thinking about it. Depression is in a bad mood, similar to sadness.” (18–35 years, male).

“Anxiety is worry, mainly emotionally, but depression means lack of vitality.” (18–35 years, female).

3.4 Relevance

All participants reported that the five domains were relevant to their health and they chose the response level based on their own health, supporting the use of EQ-5D-5L to measure health status.

“Yes, the domains are related to my health and I can understand them. I chose my answer based on my health.” (18–35 years, male).

3.5 Clarity

When asked whether they could distinguish the five response levels, all participants gave positive answers. When asked to recall the levels, thirty-eight (61.3%) participants could not recall all levels and the majority of them could only recall four levels.

“I only remember some of the items, not all.” (18–35 years, female).

“Eh, no problems at all, moderate problems, severe problems, like these.” (36–55 years, female).

“No problems, severe problems, moderate problems. Could not recall others.” (56 years and above, male).

Five (8.1%) participants suggested reducing the response levels to three or four.

“Three should represent almost all the conditions, that is, no, some and severe.” (18–35 years, male).

“I like four categories, no, some, moderate and severe.” (36–55 years, female).

Furthermore, three (4.8%) participants suggested that the Chinese wording of ‘some’ (一些) should be used instead of ‘only a little’ (一点) because ‘only a little’ in Chinese could not accurately describe the health state between ‘no problem’ and ‘moderate’.

“I always have only slight problems, such as joint pain, but I report no problems. The ‘only a little’ seems useless to me. If my pain gets worse, I think it would be ‘some’ problems, but not as severe as ‘moderate’.” (36–55 years, male).

“I don’t understand ‘only a little’ and we don’t use it here (the region), ‘some’ may be better.” (36–55 years, female).

3.6 Comprehensiveness

When asked about aspects of health not covered by the five domains of EQ-5D, fatigue and appetite were raised by some participants (n = 4, 6.5%).

“Fatigue should be included. If I feel fatigue, I could not do the daily activities as usual.” (56 years and above, male).

“Appetite is important for my health.” (56 years and above, female).

3.7 Importance

In the task of relative importance assessment, all participants provided their answers to the question of the most important domain but two participants did not answer the question of the least important one. The item ‘mobility’ was selected as the most important by 24 (38.7%) participants while ‘anxiety/depression’ was selected as the least important by 37 (59.7%) participants (Table 2).

Table 2 Most and least important domains among EQ-5D-5L

4 Discussion

Through a qualitative study, we assessed the content validity of the Chinese version of EQ-5D-5L among rural Chinese. Some shortcomings were observed, which may affect its appropriateness among these people.

Good comprehensibility was observed for the ‘mobility’, ‘self-care’ and ‘usual activities’ domains, but the ‘pain/discomfort’ and ‘anxiety/depression’ domains may be subject to poor comprehensibility among rural Chinese. First, the majority of participants reported differences between the two terms in these two composite domains, ‘pain/discomfort’ and ‘anxiety/depression’. As respondents commented, for the ‘pain/discomfort’ domain, the Chinese wording for ‘discomfort’ includes the emotions, but ‘pain’ in Chinese usually refers to physical feelings. The differences may confuse respondents when asked to select the appropriate level. For example, if the respondent is sad or angry, he/she may choose ‘slight’ level when considering discomfort, while the response may be ‘no problems’ when considering ‘pain’. These perceived differences were also reported for the ‘anxiety/depression’ domain and may cause similar problems. Furthermore, one previous study conducted in the UK has shown that splitting the ‘anxiety/depression’ terms would significantly affect the responses [28]. Thus, the comprehensibility of the composite terms might be worth further investigation.

Second, culture can affect people’s perceptions of health [30, 31] and there are cultural differences in understanding the same domain [32]. Taking ‘pain’ as an example, culture plays a significant role in pain perceptions, behaviours and expressions [33]. In Chinese culture, people believe that pain is an essential element of life, a ‘trial’ or a ‘sacrifice’ [34]. Thus, when a person is experiencing pain, he/she tends to endure it until the pain becomes unbearable [34]. Due to these differences, the health concepts that the instrument developed based on European populations intends to capture may not be fully captured when the same instrument is applied among Chinese. Cultural differences should be considered when adapting existing instruments in other cultural and language settings. To our knowledge, there is no study investigating the differences in understanding the same EQ-5D-5L questions between populations from different countries, but previous studies comparing the EQ-5D-5L value sets across countries have reported remarkable differences, indicating that for the same hypothetical health state, people from different cultures may select different response levels [35, 36]. One possible reason for these differences might be that they understand the same EQ-5D-5L domain differently, as illustrated in this study. Nevertheless, this hypothesis needs to be explored further in future studies.

Third, the formal wording used in the ‘anxiety/depression’ domain may also explain the poor comprehensibility. Some respondents reported difficulties in understanding the two terms ‘anxiety’ (Chinese: 焦虑) and ‘depression’ (Chinese: 沮丧). These two Chinese words are seldom used in daily conversations of rural residents, which would make them difficult to understand. Therefore, it might be worth considering using alternative words, which are more commonly used in daily life, such as ‘难过’ (‘sadness’ in Chinese) or ‘不开心’ (‘unhappiness’ in Chinese), as suggested by some participants.

As for the response levels in EQ-5D-5L, all participants reported that they could distinguish between the levels, but most could not recall all. The social desirability bias, which refers to the tendency of respondents to give socially desirable responses instead of choosing responses that are reflective of their true feelings [37], may explain why they all reported no difficulties in differentiating levels. Being unable to recall all levels might indicate that it would be challenging for them to choose one appropriate level from five and this may further decrease the validity of EQ-5D-5L among them. Some preliminary research also found that for rural residents, EQ-5D-3L seems to perform better than EQ-5D-5L [38], but which version is more suitable for rural residents needs further investigation. Therefore, future studies assessing the health-related quality of life in China using the EQ-5D-5L instrument should keep this in mind. Revision to the level wording has been suggested by some participants. Considering the original response scaling exercise was conducted in Northern China and there are differences in the oral languages used in different regions, revisions for the wording of levels might be worth consideration.

We also identified two health-related aspects beyond the EQ-5D-5L domains, ‘fatigue/energy’ and ‘appetite’. Fatigue is found to be associated with health-related quality of life [39, 40] and the ‘fatigue/energy’ item has been constructed in the most widely used quality-of-life instrument, Short Form 36 (SF-36) [41]. More importantly, most rural residents in China rely on physical labour to make a living, and thus the energy level is important, not only to their health, but to their life. Regarding ‘appetite’, Chinese believe that ‘the human is the iron, the food is a steel’, so the appetite is an indicator of the health and such finding is not surprising. These results could inform future research on ‘bolt-on’ dimensions to the EQ-5D. The bolt-on approach aims to address the fact that the current five domains cover only a limited range of health-related quality of life, and it is important to prioritise the domains that are important to the target population.

It is found that rural residents in China put the highest importance on ‘mobility’ and least importance on ‘anxiety/depression’. As discussed earlier, physical labour is their main source of income, so it is in line with our expectation that the physical health-related domain would be considered important. For the least important domain, research showed that the rural residents in China lacked mental health knowledge [42], and as a result, they may not be aware of the importance of mental health. The formal words used in this domain add more barriers to the understanding, so it is not surprising that this was considered as the least important.

Compared with the published qualitative studies of EQ-5D [43, 44], this study, for the first time, evaluated the content validity of EQ-5D-5L among rural Chinese. The results presented here provide some insights on the large-scale application of EQ-5D-5L in China. This study could also inform researchers that urban–rural differences should be noted when measuring health-related quality of life using patient-reported outcome measures. There are several limitations to this study. First, the respondents were recruited from only four regions across China, which could not fully represent the rural Chinese. There are large differences in culture, oral language and health knowledge between different regions. As the results showed, participants from different regions reported different pre- and post-debriefing EQ-5D-5L scores. These four counties were near the capital of the province or municipality, so for people living in more remote rural areas, there may be more problems in understanding the EQ-5D-5L. Second, data saturation was not adequately considered when designing this study, so it is likely that other shortcomings about the EQ-5D-5L may be missed. Third, the response rate was not recorded in this study, so there might be sampling bias that would affect the quality of the survey. Given these, the results should be seen as indicative. Future studies including participants from more regions to form a more representative sample would be preferred.

5 Conclusions

The content validity of the Chinese version of the EQ-5D-5L may be not satisfactory in rural residents as ‘pain/discomfort’ and ‘anxiety/depression’ domains are subject to poor comprehensibility. There are potential translation inaccuracies in domains and levels. It is suggested that future EQ-5D-5L-based studies implemented in rural China should keep these problems in mind. It might be sensible to discuss how to improve the current Chinese version and make it more understandable for rural residents.