Plain English summary
Anxiety and depression disorders are ‘common’ mental health disorders, due to the high proportion of people they inflict. Condition-specific measures reported by those with these disorders exist to reflect condition-specific symptoms and severity. Alternatively, more ‘generic’ measures aim to capture broader aspects of physical and/or mental health in a single measure. Additionally, for allocating finite budgets between alternative care interventions, measure scores that reflect ‘preference’ between alternative health states are recommended. Our study explored to what extent two preference-based measures captured condition-specific aspects of anxiety and depression: the EQ-5D-5L is a commonly used generic measure which focusses more of physical than mental health, whereas the ReQoL-UI is a newer ‘recovery-focussed quality-of-life’ measure which focusses more of mental than physical health. Our findings suggest the commonly used EQ-5D-5L has benefits for capturing anxiety severity and was responsive as condition severity changed overtime, but the ReQoL-UI could be recommended over the EQ-5D-5L to better capture depression severity and if ‘recovery-focussed quality of life’ was of interest relative to condition-specific symptoms and severity. Overall, our findings suggest each measure has their roles for capturing aspects of anxiety and depression severity, but neither on their own captured the whole broad nature of anxiety and depressive disorders.
Introduction
The 2010 Global Burden of Disease study estimates depression and anxiety disorders contribute a large portion of total disability amongst all mental health and substance use disorders, with increased societal costs through higher healthcare utilisation and absenteeism from work [
1‐
4]. Mental health disorders have been estimated to represent 23% of the total cause of disability, higher than cancer and coronary heart disease [
5]. In England, approximately 1/6 adults have a common mental disorder [
6]. In the UK, prevalence of depressive and anxiety symptoms are significantly higher relative to pre-COVID-19 pandemic levels [
7]. Therefore, prioritising mental health alongside other care interventions are important considerations for decision-makers.
Economic evaluation evidence helps inform resource allocation between alternative care interventions within a finite budget [
8,
9]. Estimating the cost-effectiveness of mental health interventions has become an area of debate [
8‐
10]. One aspect is the empirically demonstrated insensitivities to mental health aspects of health-related quality of life (HRQoL) of generic health measures compared to condition-specific measures [
9‐
12]. This includes EuroQol’s internationally used, preference-based EQ-5D three-level version (EQ-5D-3L) used for cost-utility analysis (CUA) and recommended by reimbursement agencies including the National Institute for Health and Care Excellence (NICE) for England and Wales [
13,
14]. In CUA, HRQoL measured on a preference-based scale anchored at 1 (
full health) and 0 (
a state equivalent to dead) is combined with length of life to generate quality-adjusted life years (QALYs), allowing comparisons between interventions that affect quantity and/or quality of life. Results suggesting the appropriateness of generic measures in patients with mental health conditions are mixed, with better support in common (e.g. anxiety and depression) relative to severe (e.g. schizophrenia and bipolar disorder) mental health populations [
11,
15‐
17]. Therefore, it has been argued that preference-based measures focussed on the impact of the mental disorder should be considered over generic measures which often focus more on physical than mental health [
17,
18].
In response to the insensitivities of the EQ-5D-3L representing 243 (3-levels
5-items) possible health states, the EQ-5D five-level version (EQ-5D-5L) has been developed representing 3125 (5
5) possible health states, with improved sensitivity and reduced ceiling effects [
19‐
27]. Country-specific EQ-5D-5L preference-based value sets are available (
https://euroqol.org/) with the current value set for England (VSE) based on a combined composite Time Trade-Off (cTTO) and Discrete Choice Experiment (DCE) hybrid model for eliciting preferences [
28‐
34]. However, an independent quality assurance study raised concerns about the VSE, with NICE’s interim position being to instead use the cross-walk algorithm by van Hout et al. [
35‐
40]. Therefore, EQ-5D-5L preference-based values can be calculated using ‘cross-walked’/‘mapped’ EQ-5D-3L United Kingdom (UK) value set scores based on the conventional TTO method [
41]; however, cross-walk algorithms also have inherent concerns (e.g. predictive errors) [
42‐
45]. EuroQol’s Blog provides updates for the new UK EQ-5D-5L valuation study [
46].
The Recovering Quality-of-Life 20-item (ReQoL-20) and 10-item (ReQoL-10) version have been developed as ‘recovery-focussed quality-of-life’ measures for mental health service users [
18]. A UK preference-based value set using the cTTO method can be assigned to seven ReQoL-10 items: the ReQoL Utility Index (ReQoL-UI) representing 78,125 (5
7) possible health states as an alternative to the EQ-5D-5L for calculating QALYs in mental health service users [
47]. The ReQoL-UI’s developers suggest that compared to the generic EQ-5D measures, it is a generic preference-based measure focussed more on mental than physical health [
47]. Initial ReQoL-10 and ReQoL-20 psychometric analyses in a general and patient population (≈ 35%, common mental health problem) supported their internal consistency, test–retest reliability, construct validity, and responsiveness, concluding they performed “markedly better than the EQ-5D[-3L]” [
18]; however, such ReQoL-UI evidence does not currently exist.
Our aim is to assess the psychometric properties (construct validity and responsiveness) of the preference-based EQ-5D-5L (VSE and cross-walk) and ReQoL-UI, compared to clinical measures for depression and anxiety: the Patient Health Questionnaire-9 (PHQ-9) and Generalised Anxiety Disorder-7 (GAD-7), respectively. Secondary psychometric analyses include the: (1) ReQoL-10, to compare its psychometric properties relative to the preference-based measures; (2) EQ-5D-5L’s single mental health ‘anxiety/depression’ item, to assess its psychometric properties for depression relative to anxiety severity given limited current evidence [
48,
49].
Results
Descriptive statistics
Overall, 361 people were randomised (241 intervention-arm: 120 control-arm). The majority of participants were female (71.5%), ‘White/White British and Irish’ ethnicity (84.2%), employed full-time (74.5%), not prescribed psychiatric medication (51.5%), and not receiving statutory sick pay (93.4%). The M.I.N.I. classified 80.3% as having major depressive (52.4%) or anxiety disorder (64.0%), with 36.0% having both.
Table
2 presents baseline number of responders and PROM scores across the whole cohort and by trial-arm across all time-points in Table
3. At baseline, the PHQ-9 and GAD-7 suffered no missing data with the EQ-5D-5L and ReQoL-10 being completed by 355 (98.3%) and 353 (97.8%) participants, respectively. At follow-up time-points, the number of completed PROMs declined due to ‘lost to follow-up’ or ‘excluded’ from the study, or ‘unknown’ (mainly for EQ-5D-5L and ReQoL-10). As part of the trial-based analyses, data missing at follow-up was classified as missing at random [
56].
Table 2
Outcome measure scores, floor and ceiling effects at baseline across trial-arms
PHQ-9 | 361 (100) | 14.332 | 14 | 4.991 | 27 | 0 | 27 | 2 | 1 (0.3) | 0 (0) |
GAD-7 | 361 (100) | 12.623 | 13 | 4.521 | 21 | 0 | 21 | 0 | 10 (2.8) | 1 (0.3) |
EQ-5D-5L VSE | 355 (98.3) | 0.730 | 0.783 | 0.163 | − 0.285 | 1 | − 0.010 | 1 | 0 (0) | 3 (0.8) |
EQ-5D-5L cross-walk | 355 (98.3) | 0.652 | 0.721 | 0.202 | − 0.594 | 1 | 0.076 | 1 | 0 (0) | 3 (0.8) |
ReQoL-UI | 353 (97.8) | 0.778 | 0.807 | 0.141 | − 0.195 | 1 | 0.115 | 0.995 | 0 (0) | 0 (0) |
ReQoL-10 | 353 (97.8) | 18.598 | 18 | 6.401 | 0 | 40 | 3 | 37 | 0 (0) | 0 (0) |
EQ-5D-5L depression/anxiety | 355 (98.3) | 3.259 | 3 | 0.827 | 5 | 1 | 5 | 1 | 26 (7.3) | 3 (0.8) |
Table 3
Observed PROM scores, number of responders, and standardised response means by trial-arm and time-points
PHQ-9 | t0 | 241 (100) | 14.41 (4.94) | – | – | – | – | – | – | 120 (100) | 14.18 (5.12) | – | – | – |
| t1 | 198 (82) | 9.28 (5.94) | 198 (82) | − 5.21 (5.23) | − 0.997 | 198 (82) | − 5.21 (5.23) | − 0.997 | 91 (76) | 11.58 (5.68) | 91 (76) | − 2.28 (5.13) | − 0.444 |
| t2 | 186 (77) | 8.17 (5.70) | 186 (77) | − 6.34 (5.43) | − 1.168 | 177 (73) | − 0.92 (4.26) | − 0.216 | – | – | – | – | – |
| t3 | 182 (76) | 7.17 (5.83) | 182 (76) | − 7.12 (5.90) | − 1.206 | 169 (70) | − 0.82 (5.20) | − 0.157 | – | – | – | – | – |
| t4 | 177 (73) | 6.81 (5.71) | 177 (73) | − 7.49 (5.94) | − 1.261 | 167 (69) | − 0.42 (4.88) | − 0.086 | – | – | – | – | – |
| t5 | 173 (72) | 6.79 (5.54) | 173 (72) | − 7.56 (6.38) | − 1.184 | 161 (67) | − 0.10 (4.72) | − 0.021 | – | – | – | – | – |
GAD-7 | t0 | 241 (100) | 12.66 (4.69) | – | – | – | – | – | – | 120 (100) | 12.54 (4.18) | – | – | – |
| t1 | 198 (82) | 8.20 (5.31) | 198 (82) | − 4.50 (5.17) | − 0.870 | 198 (82) | − 4.50 (5.17) | − 0.870 | 91 (76) | 10.79 (5.12) | 91 (76) | − 1.63 (4.71) | − 0.345 |
| t2 | 186 (77) | 7.38 (5.32) | 186 (77) | − 5.20 (5.37) | − 0.968 | 177 (73) | − 0.58 (4.04) | − 0.144 | – | – | – | – | – |
| t3 | 182 (76) | 6.93 (5.52) | 182 (76) | − 5.48 (5.98) | − 0.916 | 169 (70) | − 0.28 (4.85) | − 0.057 | – | – | – | – | – |
| t4 | 176 (73) | 6.48 (5.14) | 176 (73) | − 5.95 (5.87) | − 1.013 | 166 (69) | − 0.35 (4.60) | − 0.076 | – | – | – | – | – |
| t5 | 173 (72) | 6.08 (4.81) | 173 (72) | − 6.56 (5.87) | − 1.116 | 160 (66) | − 0.65 (4.35) | − 0.150 | – | – | – | – | – |
EQ-5D-5L | t0 | 238 (99) | 0.735 (0.152) | – | – | – | – | – | – | 117 (98) | 0.722 (0.182) | – | – | – |
VSE | t1 | 198 (82) | 0.794 (0.147) | 196 (81) | 0.058 (0.133) | 0.435 | 196 (81) | 0.058 (0.133) | 0.435 | 91 (76) | 0.756 (0.181) | 89 (74) | 0.029 (0.138) | 0.212 |
| t2 | 186 (77) | 0.816 (0.149) | 184 (76) | 0.078 (0.143) | 0.547 | 177 (73) | 0.020 (0.106) | 0.187 | – | – | – | – | – |
| t3 | 182 (76) | 0.830 (0.171) | 180 (75) | 0.092 (0.168) | 0.544 | 169 (70) | 0.013 (0.141) | 0.095 | – | – | – | – | – |
| t4 | 176 (73) | 0.837 (0.170) | 174 (72) | 0.096 (0.151) | 0.631 | 166 (69) | 0.002 (0.132) | 0.019 | – | – | – | – | – |
| t5 | 172 (71) | 0.814 (0.172) | 170 (71) | 0.075 (0.167) | 0.451 | 159 (66) | − 0.007 (0.159) | − 0.044 | – | – | – | – | – |
EQ-5D-5L | t0 | 238 (99) | 0.656 (0.193) | – | – | – | – | – | – | 117 (98) | 0.645 (0.218) | – | – | – |
cross-walk | t1 | 198 (82) | 0.723 (0.182) | 196 (81) | 0.065 (0.178) | 0.366 | 196 (81) | 0.065 (0.178) | 0.366 | 91 (76) | 0.676 (0.231) | 89 (74) | 0.020 (0.172) | 0.118 |
| t2 | 186 (77) | 0.753 (0.180) | 184 (76) | 0.090 (0.182) | 0.496 | 177 (73) | 0.023 (0.136) | 0.171 | – | – | – | – | – |
| t3 | 182 (76) | 0.767 (0.212) | 180 (75) | 0.105 (0.217) | 0.483 | 169 (70) | 0.019 (0.178) | 0.104 | – | – | – | – | – |
| t4 | 176 (73) | 0.779 (0.204) | 174 (72) | 0.112 (0.196) | 0.573 | 166 (69) | 0.007 (0.165) | 0.040 | – | – | – | – | – |
| t5 | 172 (71) | 0.751 (0.201) | 170 (71) | 0.092 (0.215) | 0.430 | 159 (66) | − 0.009 (0.193) | − 0.046 | – | – | – | – | – |
ReQoL-UI | t0 | 237 (98) | 0.788 (0.123) | – | – | – | – | – | – | 116 (97) | 0.757 (0.171) | – | – | – |
| t1 | 198 (82) | 0.810 (0.140) | 195 (81) | 0.020 (0.141) | 0.138 | 195 (81) | 0.020 (0.141) | 0.138 | 91 (76) | 0.793 (0.163) | 88 (73) | 0.025 (0.139) | 0.181 |
| t2 | 186 (77) | 0.836 (0.151) | 183 (76) | 0.045 (0.152) | 0.295 | 177 (73) | 0.022 (0.139) | 0.162 | – | – | – | – | – |
| t3 | 182 (76) | 0.840 (0.144) | 179 (74) | 0.050 (0.166) | 0.303 | 169 (70) | 0.002 (0.145) | 0.015 | – | – | – | – | – |
| t4 | 176 (73) | 0.863 (0.124) | 173 (72) | 0.070 (0.117) | 0.599 | 166 (69) | 0.019 (0.142) | 0.131 | – | – | – | – | – |
| t5 | 172 (71) | 0.850 (0.135) | 169 (70) | 0.065 (0.138) | 0.471 | 159 (66) | − 0.006 (0.127) | − 0.049 | – | – | – | – | – |
ReQoL-10 | t0 | 237 (98) | 18.52 (6.24) | – | – | – | – | – | – | 116 (97) | 18.76 (6.75) | – | – | – |
| t1 | 198 (82) | 21.00 (6.85) | 195 (81) | 2.20 (6.17) | 0.357 | 195 (81) | 2.20 (6.17) | 0.357 | 91 (76) | 20.25 (6.40) | 88 (73) | 0.93 (6.17) | 0.151 |
| t2 | 186 (77) | 24.10 (7.36) | 183 (76) | 5.37 (7.43) | 0.723 | 177 (73) | 3.04 (6.46) | 0.471 | – | – | – | – | – |
| t3 | 182 (76) | 24.59 (7.90) | 179 (74) | 5.79 (7.95) | 0.728 | 169 (70) | 0.27 (7.52) | 0.036 | – | – | – | – | – |
| t4 | 176 (73) | 25.49 (8.20) | 173 (72) | 6.66 (7.71) | 0.863 | 166 (69) | 0.92 (6.88) | 0.134 | – | – | – | – | – |
| t5 | 172 (71) | 24.54 (8.31) | 169 (70) | 6.00 (7.50) | 0.801 | 159 (66) | − 0.66 (7.35) | − 0.090 | – | – | – | – | – |
EQ-5D-5L | t0 | 238 (99) | 3.27 (0.83) | – | – | – | – | – | – | 117 (98) | 3.23 (0.82) | – | – | – |
depression | t1 | 198 (82) | 2.64 (0.94) | 196 (81) | − 0.62 (0.95) | − 0.649 | 196 (81) | − 0.62 (0.95) | − 0.649 | 91 (76) | 2.91 (1.01) | 89 (74) | − 0.29 (0.87) | − 0.336 |
/anxiety | t2 | 186 (77) | 2.44 (0.87) | 184 (76) | − 0.78 (0.99) | − 0.784 | 177 (73) | − 0.16 (0.84) | − 0.195 | – | – | – | – | – |
item | t3 | 182 (76) | 2.30 (0.96) | 180 (75) | − 0.96 (1.08) | − 0.883 | 169 (70) | − 0.15 (0.97) | − 0.159 | – | – | – | – | – |
| t4 | 176 (73) | 2.23 (0.99) | 174 (72) | − 1.02 (1.06) | − 0.958 | 166 (69) | − 0.08 (0.91) | − 0.086 | – | – | – | – | – |
| t5 | 172 (71) | 2.31 (0.98) | 170 (71) | − 0.95 (1.06) | − 0.897 | 159 (66) | 0.04 (0.90) | 0.049 | – | – | – | – | – |
A Consort diagram, further demographic details, histograms and additional score statistics are provided in Appendices S1–4.
Construct validity
Table
4 ACS results and LOWESS graphs (Appendix S5) suggest that the ReQoL-UI/-10 have stronger convergent validity with depression than anxiety severity; stronger than the convergent validity results for the EQ-5D-5L scores with depression but not anxiety severity. The EQ-5D-5L scores have similar convergent validity with depression and anxiety severity.
Table 4
Correlation coefficient matrix between measure scores at baseline
Condition-specific | | | | | |
PHQ-9 | − 0.391 (< 0.001) | − 0.382 (< 0.001) | − 0.529 (< 0.001) | − 0.576 (< 0.001) | 0.346 (< 0.001) |
GAD-7 | − 0.408 (< 0.001) | − 0.411 (< 0.001) | − 0.339 (< 0.001) | − 0.331 (< 0.001) | 0.514 (< 0.001) |
Recovery-focussed | | | | | |
ReQoL-UI | 0.601 (< 0.001) | 0.597 (< 0.001) | – | – | − 0.394 (< 0.001) |
ReQoL-10 | 0.435 (< 0.001) | 0.434 (< 0.001) | 0.818 (< 0.001) | – | − 0.431 (< 0.001) |
Table
5 AES results for caseness cut-offs suggest that the ReQoL-UI/-10 were better at quantifying a difference between depression than anxiety caseness, which the EQ-5D-5L scores did better than the ReQoL-UI/-10 for anxiety caseness, but still with a small AES. The results for the EQ-5D-5L depression/anxiety item suggests the item is better at quantifying a difference between those with anxiety than depression caseness.
Table 5
Testing known-group validity based on condition-specific cut-off groups at baseline
Condition-specific | | | | | | | | | | |
PHQ-9 | No Caseness, < 10 | 65 (18.0) | 0.785 (0.164) | | 0.712 (0.202) | | 0.854 (0.099) | | 24.219 (6.632) | | 3.092 (0.843) | |
| Caseness, ≥ 10 | 296 (82.0) | 0.718 (0.160) | 0.413 | 0.639 (0.200) | 0.364 | 0.761 (0.143) | 0.688 | 17.353 (5.646) | 1.177 | 3.297 (0.820) | − 0.248 |
| | | | (< 0.001) | | (< 0.001) | | (< 0.001) | | (< 0.001) | | (0.052) |
| Minimal, < 5 | 9 (2.5) | 0.847 (0.095) | | 0.791 (0.107) | | 0.935 (0.036) | | 31.625 (4.274) | | 3.000 (0.707) | |
| Mild, 5–9 | 56 (15.5) | 0.775 (0.171) | 0.440 | 0.699 (0.211) | 0.455 | 0.843 (0.100) | 0.970 | 23.161 (6.240) | 1.399 | 3.107 (0.867) | − 0.126 |
| Moderate, 10–14 | 117 (32.4) | 0.786 (0.115) | − 0.082 | 0.723 (0.140) | − 0.146 | 0.821 (0.089) | 0.230 | 20.104 (4.903) | 0.569 | 2.966 (0.658) | 0.193 |
| Mod. Sev., 15–19 | 120 (33.2) | 0.710 (0.164) | 0.535 | 0.629 (0.203) | 0.541 | 0.751 (0.138) | 0.608 | 16.739 (5.198) | 0.666 | 3.328 (0.832) | − 0.483 |
| Severe, ≥ 20 | 59 (16.3) | 0.599 (0.159) | 0.684 | 0.490 (0.205) | 0.685 | 0.661 (0.177) | 0.594 | 13.186 (4.950) | 0.694 | 3.897 (0.742) | − 0.708 |
| | | | (< 0.001) | | (< 0.001) | | (< 0.001) | | (< 0.001) | | (< 0.001) |
GAD-7 | No Caseness, < 8 | 51 (14.1) | 0.787 (0.126) | | 0.726 (0.157) | | 0.823 (0.077) | | 20.160 (5.811) | | 2.706 (0.807) | |
| Caseness, ≥ 8 | 310 (85.9) | 0.721 (0.166) | 0.409 | 0.640 (0.206) | 0.431 | 0.770 (0.148) | 0.377 | 18.340 (6.465) | 0.285 | 3.352 (0.795) | − 0.811 |
| | | | (0.020) | | (0.016) | | (0.035) | | (0.044) | | (< 0.001) |
| Minimal, < 5 | 11 (3.0) | 0.815 (0.124) | | 0.764 (0.147) | | 0.860 (0.064) | | 21.182 (5.269) | | 2.455 (0.688) | |
| Mild, 5–9 | 81 (22.4) | 0.794 (0.136) | 0.156 | 0.728 (0.179) | 0.207 | 0.826 (0.093) | 0.378 | 21.438 (6.566) | − 0.040 | 2.815 (0.792) | − 0.461 |
| Moderate, 10–14 | 146 (40.4) | 0.752 (0.162) | 0.275 | 0.685 (0.184) | 0.232 | 0.794 (0.124) | 0.283 | 18.851 (5.914) | 0.420 | 3.182 (0.698) | − 0.500 |
| Severe, ≥ 15 | 123 (34.1) | 0.654 (0.155) | 0.615 | 0.552 (0.201) | 0.693 | 0.719 (0.169) | 0.509 | 16.190 (6.064) | 0.445 | 3.725 (0.756) | − 0.749 |
| | | | (< 0.001) | | (< 0.001) | | (< 0.001) | | (< 0.001) | | (< 0.001) |
Recovery-focussed quality of life | | | | | | | | | |
ReQoL-10 | Clin. range, < 24 | 281 (79.6) | 0.707 (0.167) | | 0.624 (0.208) | | 0.746 (0.141) | | 16.224 (4.480) | | 3.361 (0.826) | |
| Gen. Pop., ≥ 24 | 72 (20.4) | 0.823 (0.101) | − 0.743 | 0.766 (0.119) | − 0.733 | 0.899 (0.044) | − 1.199 | 27.861 (3.825) | − 2.672 | 2.861 (0.718) | 0.620 |
| | | | (< 0.001) | | (< 0.001) | | (< 0.001) | | (< 0.001) | | (< 0.001) |
Table
5 AES results for severity cut-offs suggests the EQ-5D-5L scores were better at quantifying a difference between ‘severe’ relative to the next adjacent severity state than any other adjacent severity states on the PHQ-9 and GAD-7, and this AES tended to be larger than for the ReQoL-UI/-10 (apart from on the PHQ-9 for ReQoL-10); however, the ReQoL-UI/-10 relative to EQ-5D-5L scores had higher AES between the adjacent lesser severe states (i.e. PHQ-9, ‘moderately severe’ relative to ‘moderate’; GAD-7, ‘moderate’ relative to ‘mild’). Additionally on the PHQ-9, the EQ-5D-5L mean scores were greater for those in the ‘moderate’ than ‘mild’ state, which seems illogical and not in-line with the other measures’ scores.
Due to the small number of people classified as ‘minimal’ (PHQ-9, N = 9; GAD-7, N = 11), these results are not described but are a limitation of the analysis. Complementary construct validity analyses at the item-level and using the IAPT-PS and WSAS measures are presented in Appendices S6 and 7.
Responsiveness
Ceiling effects at baseline (Table
2) and at all time-points by trial-arm (Appendix S4) occurred in a lower proportion of responders for the ReQoL-UI/-10 than EQ-5D-5L.
Tables
3 and
6 SRM results suggest responsiveness differed dependent on time-points being compared relative to baseline (e.g. largest SRMs at 9 months across all measures). PHQ-9 and GAD-7 responsiveness was generally large. EQ-5D-5L scores were relatively more responsive than the ReQoL-UI across all time-points assessed, but the ReQoL-10 tended to be more responsive than its preference-based counterparts.
Table 6
Standardised response means (SRM)—intervention-arm participants grouped dependent on reliable change in PHQ-9 or GAD-7 score since baseline
EQ-5D-5L | t1–t0 | 196 (81) | 85 (43) | 0.102 (0.138) | 0.734 | 109 (56) | 0.029 (0.116) | 0.252 | 109 (56) | 0.088 (0.136) | 0.647 | 79 (40) | 0.021 (0.122) | 0.175 |
VSE | t2–t0 | 184 (76) | 103 (56) | 0.115 (0.134) | 0.856 | 79 (43) | 0.032 (0.141) | 0.231 | 115 (63) | 0.112 (0.131) | 0.855 | 61 (33) | 0.037 (0.141) | 0.259 |
| t3–t0 | 180 (75) | 109 (61) | 0.136 (0.157) | 0.865 | 68 (38) | 0.035 (0.137) | 0.258 | 114 (63) | 0.129 (0.159) | 0.815 | 54 (30) | 0.064 (0.118) | 0.538 |
| t4–t0 | 174 (72) | 114 (66) | 0.129 (0.136) | 0.943 | 58 (33) | 0.044 (0.151) | 0.291 | 118 (68) | 0.123 (0.132) | 0.930 | 49 (28) | 0.054 (0.174) | 0.310 |
| t5–t0 | 170 (71) | 107 (63) | 0.112 (0.141) | 0.794 | 58 (34) | 0.027 (0.185) | 0.145 | 122 (72) | 0.110 (0.139) | 0.792 | 39 (23) | 0.012 (0.200) | 0.061 |
EQ-5D-5L | t1–t0 | 196 (81) | 85 (43) | 0.118 (0.182) | 0.648 | 109 (56) | 0.031 (0.156) | 0.201 | 109 (56) | 0.104 (0.182) | 0.573 | 79 (40) | 0.017 (0.165) | 0.105 |
cross-walk | t2–t0 | 184 (76) | 103 (56) | 0.135 (0.164) | 0.823 | 79 (43) | 0.035 (0.191) | 0.183 | 115 (63) | 0.132 (0.163) | 0.808 | 61 (33) | 0.039 (0.190) | 0.205 |
| t3–t0 | 180 (75) | 109 (61) | 0.153 (0.208) | 0.735 | 68 (38) | 0.052 (0.174) | 0.302 | 114 (63) | 0.149 (0.203) | 0.731 | 54 (30) | 0.080 (0.174) | 0.462 |
| t4–t0 | 174 (72) | 114 (66) | 0.150 (0.170) | 0.884 | 58 (33) | 0.052 (0.215) | 0.243 | 118 (68) | 0.144 (0.178) | 0.812 | 49 (28) | 0.070 (0.205) | 0.340 |
| t5–t0 | 170 (71) | 107 (63) | 0.132 (0.189) | 0.699 | 58 (34) | 0.048 (0.234) | 0.207 | 122 (72) | 0.135 (0.186) | 0.727 | 39 (23) | 0.011 (0.247) | 0.043 |
ReQoL-UI | t1–t0 | 195 (81) | 85 (44) | 0.048 (0.144) | 0.334 | 108 (55) | − 0.001 (0.137) | − 0.009 | 109 (56) | 0.037 (0.143) | 0.262 | 78 (40) | − 0.006 (0.131) | − 0.049 |
| t2–t0 | 183 (76) | 103 (56) | 0.087 (0.134) | 0.646 | 78 (43) | − 0.008 (0.159) | − 0.052 | 115 (63) | 0.074 (0.145) | 0.510 | 60 (33) | 0.005 (0.145) | 0.036 |
| t3–t0 | 179 (74) | 109 (61) | 0.085 (0.159) | 0.533 | 67 (37) | 0.006 (0.158) | 0.037 | 113 (63) | 0.087 (0.155) | 0.563 | 54 (30) | − 0.001 (0.159) | − 0.006 |
| t4–t0 | 173 (72) | 113 (65) | 0.102 (0.096) | 1.059 | 58 (34) | 0.012 (0.129) | 0.096 | 116 (67) | 0.094 (0.103) | 0.911 | 50 (29) | 0.038 (0.125) | 0.303 |
| t5–t0 | 169 (70) | 108 (64) | 0.097 (0.134) | 0.729 | 56 (33) | 0.013 (0.129) | 0.103 | 122 (72) | 0.091 (0.121) | 0.748 | 38 (22) | 0.000 (0.168) | − 0.003 |
ReQoL-10 | t1–t0 | 195 (81) | 85 (44) | 4.506 (7.243) | 0.622 | 108 (55) | 0.500 (4.423) | 0.113 | 109 (56) | 3.385 (6.409) | 0.528 | 78 (40) | 0.590 (5.740) | 0.103 |
| t2–t0 | 183 (76) | 103 (56) | 8.359 (7.097) | 1.178 | 78 (43) | 1.679 (5.839) | 0.288 | 115 (63) | 7.174 (7.430) | 0.966 | 60 (33) | 2.367 (6.460) | 0.366 |
| t3–t0 | 179 (74) | 109 (61) | 8.817 (7.961) | 1.107 | 67 (37) | 1.403 (4.939) | 0.284 | 113 (63) | 8.310 (7.500) | 1.108 | 54 (30) | 2.352 (6.519) | 0.361 |
| t4–t0 | 173 (72) | 113 (65) | 9.124 (7.558) | 1.207 | 58 (34) | 2.155 (5.448) | 0.396 | 116 (67) | 8.586 (7.752) | 1.108 | 50 (29) | 3.160 (6.089) | 0.519 |
| t5–t0 | 169 (70) | 108 (64) | 8.361 (7.245) | 1.154 | 56 (33) | 2.214 (6.050) | 0.366 | 122 (72) | 7.705 (7.159) | 1.076 | 38 (22) | 1.868 (7.018) | 0.266 |
EQ-5D-5L | t1–t0 | 196 (81) | 85 (43) | − 1.000 (1.000) | − 1.000 | 109 (56) | − 0.349 (0.786) | − 0.443 | 109 (56) | − 0.881 (1.007) | − 0.875 | 79 (40) | − 0.304 (0.774) | − 0.393 |
depression | t2–t0 | 184 (76) | 103 (56) | − 1.010 (0.923) | − 1.093 | 79 (43) | − 0.494 (0.998) | − 0.494 | 115 (63) | − 1.035 (0.936) | − 1.106 | 61 (33) | − 0.443 (0.922) | − 0.480 |
/anxiety | t3–t0 | 180 (75) | 109 (61) | − 1.312 (0.978) | − 1.341 | 68 (38) | − 0.471 (0.969) | − 0.486 | 114 (63) | − 1.272 (0.989) | − 1.286 | 54 (30) | − 0.648 (0.935) | − 0.693 |
item | t4–t0 | 174 (72) | 114 (66) | − 1.333 (0.899) | − 1.483 | 58 (33) | − 0.483 (1.047) | − 0.461 | 118 (68) | − 1.280 (0.995) | − 1.286 | 49 (28) | − 0.592 (0.956) | − 0.619 |
| t5–t0 | 170 (71) | 107 (63) | − 1.215 (0.911) | − 1.333 | 58 (34) | − 0.621 (1.089) | − 0.570 | 122 (72) | − 1.230 (0.907) | − 1.355 | 39 (23) | − 0.359 (1.088) | − 0.330 |
Discussion
In terms of preference-based measures and scores used for economic evaluations of interventions for anxiety and depression as common comorbid conditions [
73], recommending either the EQ-5D-5L (VSE or cross-walk) or ReQoL-UI to cover the severity range of both conditions does not seem clear-cut based on these results.
For capturing anxiety severity, the recommendation given our findings is to use the EQ-5D-5L rather than ReQoL-UI. The psychometric properties of the EQ-5D-3L in those with anxiety and depression has previously been assessed, the general results suggesting construct validity and responsiveness for depression, but the results for anxiety severity are less convincing [
10,
16,
17,
74,
75]. When the EQ-5D-5L has been psychometrically compared against the EQ-5D-3L in study samples including those with depression and/or anxiety, the results have generally suggested the EQ-5D-5L improves on the EQ-5D-3L based on reduced ceiling effects, and improved discriminatory power (known-group validity) and convergent validity [
26,
27]; however, due to the inclusion of people with multiple conditions with no anxiety or depression-specific measure and use of different statistical methods, direct comparison with these studies is difficult.
When comparing the EQ-5D-5L VSE and cross-walk in our study, they have similar psychometric results. Within the UK, Mulhern et al. [
76] compared the UK EQ-5D-3L value set, EQ-5D-5L VSE and cross-walk concluding that there are important differences, including the distribution of the value sets systematically differed (e.g. Appendix S3) and the EQ-5D-5L values were higher than EQ-5D-3L/cross-walk values (e.g. Tables
2 and
3). Despite these identified differences, our psychometric results based on VSE and cross-walk in terms of construct validity were similar, with better responsiveness for the VSE relative to cross-walked scores—the suggestion being the preference-based scores may play more of a part in the measures’ responsiveness than construct validity, which logically make sense given the ‘construct’ should stem from the descriptive system but ‘responsiveness’ will be related to the scoring algorithm used.
Compared to the EQ-5D-5L, the ReQoL-UI/-10 clearly have better construct validity with depression than anxiety severity. One explanation could be that the GAD-7 is focussed on anxiety symptomology, whereas the ReQoL-UI/-10 departs from symptomology as recovery-focussed quality-of-life measures such that by construct design they wouldn’t be capturing similar aspects of anxiety; although, more symptomatic items are included in the ReQoL-20. However, responsiveness was generally small for the ReQoL-UI (medium when GAD-7/PHQ-9 > reliable change threshold); smaller than for the EQ-5D-5L scores and ReQoL-10. Direct comparisons with the psychometric assessment which suggested the ReQoL-10 performed “markedly better than the EQ-5D[-3L]” are difficult, but our results suggest the ReQoL-10 generally had better responsiveness and construct validity with depression than the EQ-5D-5L, but not with anxiety severity [
18]. The ReQoL-UI’s UK value set and less mental health items compared to the ReQoL-10 seems to be reducing its relative responsiveness; however, as the first study assessing the ReQoL-UI’s psychometric properties, there are currently no comparative empirical literature results.
The EQ-5D-5L’s single mental health ‘anxiety/depression’ item captured anxiety and depression constructs differently, generally having better construct validity with anxiety than depression severity. This item-level result has been suggested by previous studies, albeit suggesting the item captures aspects/changes associated with depression better than anxiety, but certainly not equally across constructs [
48,
49]. When assessing convergent validity at the item-level particularly for the preference-based measures, specific items are potentially driving the convergent validity results before accounting for the influence of the preference-based scores (e.g. EQ-5D-5L’s ‘anxiety/depression’ item with GAD-7 items and score)—see Appendix S6.
Reimbursement and policy implications
The results of this study highlight a range of considerations when using and interpreting scores from the EQ-5D-5L and ReQoL-UI (-10), and their subsequent effect on economic evaluation (or clinical assessment) evidence. We shall focus on two implications from a reimbursement and policy perspective, particularly associated with NICE given our focus on England/UK value sets and cross-walk.
First, our results suggest the VSE has marginally better psychometric properties over the NICE recommended cross-walk for capturing the impact of anxiety and/or depression severity [
37]. However, these results are perhaps not sufficient to make NICE change their interim position at this time given the ongoing debate around the VSE for which there is a new valuation study [
35,
36,
39,
40,
46]. Further work is required to understand how the EQ-5D-5L (VSE and cross-walk) and ReQoL-UI impact on QALY and subsequent cost-effectiveness estimates provided to decision-makers (the current authors are assessing this aspect for a future publication).
Secondly, different preference-based measures, value sets and cross-walk algorithms produce different QALYs [
77,
78], which is partly behind NICE’s EQ-5D-3L reference case to produce directly comparable results [
13,
37]. However, agencies like NICE state alternative preference-based measures can be used if supported by empirical evidence, such as comparative psychometric results [
13]. Here we suggest the EQ-5D-5L (VSE and cross-walk) better captures anxiety severity with better responsiveness than the ReQoL-UI. As the NICE preferred measure, there is no suggestion to choose the ReQoL-UI over the EQ-5D-5L if only one can be chosen to capture anxiety severity (this will be down to researchers, patients and public representatives to deliberate the extra cognitive burden of additional questions on the patient group of interest). For depression severity, the ReQoL-UI’s better construct validity, despite its poorer responsiveness, may be enough to rationalise its use over the EQ-5D-5L; noting depression severity will be notably better represented than anxiety severity by the ReQoL-UI and responsiveness is important particularly for economic evaluation. However, the ReQoL-10 offers both a clinical
and preference-based measure, which could capture additional information important to patients, clinicians, and decision-makers.
Additionally, the ReQoL measures are designed to depart from symptomology to broader recovery-focussed quality of life. Although PHQ-9 and GAD-7 measures are used to capture symptomology in IAPT service users with ‘[symptom] recovery’ representing a change from ‘caseness’ to ‘no caseness’, a shift in paradigm to these broader ‘personal recovery’ aspects could change the interpretation of our results if the symptoms and severity aspects captured by the GAD-7 and PHQ-9 were no longer the outcomes of interest for mental health services and users.
International generalisability
The generalisability of our England/UK-based results to other countries requires reflecting on existing between country considerations of value sets, cross-walk algorithms, and descriptive systems. For example, Gerlinger et al. [
79] compared EQ-5D-5L value sets across six different countries (Canada, England, Japan, Korea, Netherlands, and Uruguay) and 10 different cross-walk algorithms: “There were substantial differences in the [value set] utility index between countries in the values attributed to each health state”; a suggestion also made related to cross-walked scores. It is difficult to hypothesise exactly how psychometric performance might change between country-specific value sets and cross-walk algorithms, noting that part of the psychometric properties comes from the underlying descriptive system which will remain (hopefully) intact across countries. Although translation and subsequent interpretation of the descriptive system could impact on results, it seems reasonable to suggest the construct validity results which stem more from the measures’ descriptive system may be generalisable, but the responsiveness results are more country specific dependent on value set and cross-walk algorithm used while noting the limitation of our indirect comparison [
80].
Limitations
These trial eligible participants represent a specific mental health population referred to IAPT Step 2 care, England; therefore, they do not represent the full range of anxiety and/or depression severity (i.e. few had ‘minimal’ severity, restricting analysis at this level). The sample size is greater than the ‘rules of thumb’ for the analyses conducted [
81]; however, larger and more representative samples of patients with depression and anxiety, using alternative measures alongside the GAD-7 and PHQ-9 (e.g. HADS; BDI-II) for comparison and in diverse settings (e.g. secondary mental health care), should be sought. Without a gold standard, indirect methods are used to support the psychometric results. If there is a shift from ‘symptom recovery’ to broader ‘personal recovery’ within mental health services like IAPT, the ReQoL measures purport to measure this construct and therefore condition-specific, more symptom-based measures like the GAD-7 and PHQ-9 may not represent the construct of interest which form the basis on this psychometric analysis.
Conclusion
ReQoL-UI/-10 had better construct validity with depression severity (PHQ-9) than the EQ-5D-5L (VSE and cross-walk scores), which had relatively better construct validity with anxiety severity (GAD-7) than the ReQoL-UI/-10. EQ-5D-5L score responsiveness (VSE particularly) was better than ReQoL-UI, but worse than ReQoL-10. EQ-5D-5L anxiety/depression item had better construct validity with anxiety than depression severity. There is insufficient evidence to suggest using the ReQoL-UI over EQ-5D-5L for economic evaluations to capture anxiety severity. However, there may be rationale to use the ReQoL-UI to capture depression severity given its better construct validity, albeit poorer responsiveness, and if ‘personal recovery’ relative to change in symptomology is the construct/outcome of interest for mental health services and users.
Acknowledgements
We would like to thank the R&D and clinical team members at Berkshire NHS Foundation Trust service for assisting trial execution: Judith Chapman, Gabriella Clark, and Emma Cole. We thank our colleagues at SilverCloud for providing administrative support and assisting data collection and analysis. We thank Anju Keetharuth, Donna Rowen, and John Brazier at ScHARR, University of Sheffield, for answering our questions in regards to the ReQoL measures. We also thank many patients who volunteered their time and efforts to participate in the trial.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.