Introduction
Quality-adjusted life years (QALYs) are a popular metric to evaluate the cost-effectiveness of care interventions [
1‐
4]. However, a common evidence gap exists between available clinical measures of effect and the detailed preference-based information (e.g. utility scores) needed to estimate QALYs [
5]. Within mental health trials, patient-reported outcome measures (PROMs) like the Patient-Health Questionnaire-9 (PHQ-9) and Generalised Anxiety Disorder-7 (GAD-7) are commonly used (often together) to capture depression and anxiety severity, respectively [
6‐
8]. These measures are also routinely collected by mental health services such as Improving Access to Psychological Therapies (IAPT) services (now called NHS Talking Therapies) in England as part of their patient-based performance metrics [
6,
8‐
10]. However, such PROMs do not have preference-based value sets to enable cost-per-QALY estimates to be interpreted relative to thresholds to infer cost-effectiveness, e.g. in England and Wales, the National Institute for Health and Care Excellence’s (NICE’s) £20,000 to £30,000 per QALY threshold [
4,
11,
12].
Preference-based PROMs like the EQ-5D three-level (EQ-5D-3L) and five-level (EQ-5D-5L) versions have country-specific preference-based value sets for the estimation of QALYs and are favoured by health technology assessment organisations internationally, including NICE [
1‐
4]. However, existing empirical evidence indicates limitations of the EQ-5D measures in mental health populations, recommending a more mental health focussed preference-based measure for mental health service users [
13‐
20]. The Recovering Quality-of-Life 20-item (ReQoL-20) and 10-item (ReQoL-10) are two such PROMs capturing ‘recovery-focussed quality-of-life’ for mental health service users [
21]. A UK preference-based value set has been developed to calculate QALYs from seven ReQoL-10 items: the ReQoL Utility Index (ReQoL-UI) [
22]. Key differences in ReQoL-UI and EQ-5D-5L design, utility score distributions, psychometric properties, and subsequently estimated QALYs have been assessed and discussed [
23,
24].
Preference-based measures like the EQ-5D-5L or ReQoL-UI are frequently absent from clinical studies or routine service data collection, which prevents direct QALY calculation. The term ‘mapping’ is used to describe the process of estimating a statistical relationship between observed clinical outcome measures and preference-based measures using an estimation dataset containing both types of information. The estimated ‘mapping’ model can predict missing preference-based scores for clinical studies or care services based on observed clinical outcome measures. However, the distribution of preference-based scores tend to exhibit characteristics that make standard regression-based models such as linear and Tobit regressions inappropriate for mapping and their use should be discouraged, despite traditionally being common practice [
25‐
27]. Specifically for mapping, adjusted limited dependent variable mixture models (ALDVMMs) were first proposed by Hernández Alava et al. [
28] to deal with the distributional features presented by the EQ-5D-3L, with supportive evidence when modelling other preference-based scores such as EQ-5D-5L [
26,
29]. Alternative mixture models, such as mixture beta regression models (Betamix), might also have benefits relative to ALDVMMs dependent on the utility scores underlying distribution [
30‐
32].
Our overall aim is to map from the GAD-7 and PHQ-9 to the ReQoL-UI or EQ-5D-5L based on ‘best practice’ mapping methods using an estimation dataset obtained from an IAPT-based trial population [
24,
33,
34]. To accomplish this aim, we firstly use ALDVMMs to map from the GAD-7 and PHQ-9 to the ReQoL-UI to enable QALY estimation. Secondly, the availability of the EQ-5D-5L in the estimation dataset provides an opportunity to investigate previously raised issues around the appropriateness of mapping from PHQ-9 and GAD-7 to generic measures such as the EQ-5D-5L [
16]. This second objective is complicated by the fact EQ-5D-5L responses can be assigned utility scores using country-specific value sets, such as the current EQ-5D-5L value set for England (VSE) or United States value set (USVS), or predicted EQ-5D-3L utility scores using an existing mapping function [
35‐
37]. In England and Wales, NICE does not recommend the VSE, instead previously recommending the ‘cross-walk’ by van Hout et al. [
36]; however, since January 2022, NICE changed its recommendation from the cross-walk to the mapping function developed by the NICE Decision Support Unit (DSU) [
4,
38‐
40]. Work is ongoing to recommend the most appropriate way to map to the DSU mapping function, and is therefore not included in our analysis. Instead, mapping to three EQ-5D-5L utility scores (i.e. VSE, USVS, and cross-walked) provide additional insights into the suitability of mapping to generic preference-based measures given the marked differences across their distributions [
23,
41‐
43].
Methods
Pre-mapping considerations: conceptual overlap and existing mapping studies
An important pre-mapping consideration suggested by ISPOR guidance is the extent of overlap between the clinical outcomes measures and target preference-based measure/score; if there is little overlap, mapping success is unlikely [
34]. Measures’ conceptual and practical overlap can be examined using psychometric methods (for example assessing correlations and effects sizes) and additional learnings derived from previous mapping studies.
In terms of psychometrics, EQ-5D measures’ results offer better support in common mental health disorders such as anxiety and depression compared to severe disorders like schizophrenia and bipolar disorder [
16‐
19,
51]. Relatedly, the ReQoL-UI’s and EQ-5D-5L’s relative psychometric properties have been assessed in general and mental health populations [
24,
52]. Against the PHQ-9 and GAD-7 in IAPT patients, Franklin et al. [
24] concluded the ReQoL-UI has relatively better construct validity with the PHQ-9; however, the EQ-5D-5L had relatively better construct validity with the GAD-7.
The mapping literature is sparse in this area, limiting the insights that can be obtained. A 2019 systematic review of mapping studies by Mukuria et al. [
25] identified a single study focussed on mapping from mental health measures (e.g. PHQ-9 and GAD-7) to preference-based measures (EQ-5D-3L and SF-6D) [
16]: Brazier et al. [
16] questioned the appropriateness of mapping from mental health measures to generic preference-based measures based on their mapping performance statistics. However, Brazier et al. [
16] analyses did not include mixture models, rather they focussed on more traditional OLS, Tobit, and response-level mapping models. One other study ‘mapped’ from the PHQ-9 to the EQ-5D-3L using a non-regression-based approach (i.e. equipercentile linking), however, limited reported results restricted performance assessment of this approach [
53‐
55]. A non-peer-reviewed study mapped from the Health of Nation Outcomes Scale (HoNOS) to the ReQoL-UI, which is the only previous study we identified which mapped to the ReQoL-UI; however, this study only used an OLS model and the HoNOS is clinician not patient-reported, which may have contributed to the authors suggesting caution when using their mapping functions.
Estimation data source
The estimation dataset was obtained from a parallel-groups, randomised waitlist-controlled trial examining the effectiveness and cost-effectiveness of internet-delivered Cognitive Behavioural Therapy (iCBT) for patients presenting with depression and anxiety, conducted at an established IAPT service with eligibility criteria described in Appendix S1 [
33,
56]. The trial collected PROM data at baseline and 8-week across both trial-arms; additional data collection time-points for the intervention-arm only were at 3-, 6-, 9-, and 12-months. NHS England Research Ethics Committee provided trial ethics approval (REC Reference: 17/NW/0311). The trial was prospectively registered: Current Controlled Trials ISRCTN91967124. The trial is completed with the protocol and main results published [
23,
33,
56].
Mapping models
Our mapping of interest is fitting ALDVMMs to the ReQoL-UI and EQ-5D-5L (VSE, USVS, or cross-walk); all utility scores are UK/England specific, apart from the USVS. When the predictions from ALDVMMs were deemed to not sufficiently suit the observed data, Betamix models were used instead. We used the
aldvmm or
betamix command within the statistical software package Stata Version 17 [
57]. The
aldvmm command estimates the variant of the model presented in Hernández Alava et al. [
27,
58]. Full instructions on how to use the
aldvmm command are described by Hernández Alava and Wailoo [
29]. The
betamix command is described by Gray and Hernández Alava [
31].
ALDVMMs are flexible models that can approximate many distributional forms by combining (mixing) multiple component distributions; each component’s distribution is allowed to have different parameters for the same set of variables (i.e. xvars). Additional probability variables (i.e. pvars) predict the probability of each observation belonging to each component. Betamix models are similar to ALDVMMs in terms of being mixture models; although, key differences are that they are designed for dependent variables bounded in an interval (i.e. beta distributions are bounded between 0 and 1) and there are additional modelling options such as being able to specify a probability mass (i.e. pmass) at the lower and upper score, and some defined truncation point, of the dependent variable.
We estimated ALDVMMs (and Betamix when required) with 2–4 components; although it is possible to estimate 1-component models, fitting more than 1-component tends to improve model fit so we don’t present the 1-component model results. We describe how we moved from 2 to 4 component models in Appendix S1. For all ALDVMMs, we included PHQ-9 summary score (continuous variable), GAD-7 summary score (continuous variable), age (continuous variable), and sex (binary variable) to predict the utility scores within the components; however, we evaluate models with different variables and specifications. When a Betamix was chosen as preferable, only the PHQ-9 and GAD-7 summary scores were included as the core covariates of interest given the additional computational time and complications of trying to assess more modelling specifications using Betamix relative to ALDVMMs.
Model fit statistics and graphs
To compare results across models, we considered standard model fit measures/criteria such as absolute mean error (AE), mean absolute error (MAE), root mean square error (RMSE), log likelihood (LL), Akaike information criteria (AIC), Bayesian information criteria (BIC), and graphical methods for model selection in mapping [
59]. An AE closer to zero, higher LL, and lower MAE, RMSE, AIC, and BIC indicated a better fit. Graphical methods have been shown to be essential for mapping model selection as described in Appendix S1 [
59]; due to the number of models included in this mapping study which produced a large number of graphs, we only compare graphs between two models based on any given target utility score after assessing their model fit statistics. Specifically, we plotted the mean of the predicted utility scores with the mean observed values by PHQ-9 and GAD-7 scores. We also simulated data from the models and plotted the cumulative distribution functions (CDFs) comparing simulated with observed data across the severity range.
Throughout we followed ISPOR good practice mapping guidance [
34]. As ISPOR good practice mapping guidance does not wholly support the use of internal validation approaches (i.e. splitting the dataset into an estimation and validation dataset), in part because sample splitting means a reduced sample size for estimation and there is uncertainty around what extra value the information these validation analyses provide, we have opted to not split the dataset for such an internal validation approach [
34].
Discussion
Across all mapping models to UK/England utility scores, we selected 4-component models where utility within each component was a function of PHQ-9, GAD-7, age, and sex. For mapping to the ReQoL-UI we selected R6, where the probability of component membership was a function of PHQ-9, GAD-7, and sex. For mapping to the EQ-5D-5L VSE or cross-walk we selected V3 or C3, respectively, where the probability of component membership was a function of PHQ-9, GAD-7, sex, and age. Results pertaining to alternative model specifications are presented in Appendix S2.
For the USVS, the mapping process and results were more complicated. For the ALDVMMs, the models did not fit well for higher utility values, such that the proportion of perfect health values (1) implied by the estimated model was too high. Even though moving from 2- to 3-components reduced the proportion of ones, ALDVMMs were unable to match the observed proportion. The problem stemmed from the large probability mass present in the USVS sample distribution just below the gap (see Fig.
1) which would require a degenerate distribution. This is difficult to achieve with the ALDVMM, thus leading to the decision to use Betamix that is able to generate a separate probability mass at the truncation point.
Predictions from our recommended mapping functions are provided in an Excel-based lookup table, provided as part of the online Supplementary Materials.
Mapping to the USVS relative to the UK/England utility scores
The USVS in our estimation sample caused complications for our identified ALDVMMs that did not occur when mapping to the EQ-5D-5L VSE or cross-walk, nor ReQoL-UI. It should be noted that ALDVMMs are quicker and easier to fit than Betamix; however, Betamix has been developed to have more modelling options and therefore some additional flexibility for mapping than ALDVMMs when required. In this case, it was the ability of Betamix to specify probability mass at the upper (i.e. 1) and truncation (i.e. 0.943) values of the USVS which enabled us to overcome the problems when using ALDVMMs at the upper end of the utility scale, despite the additional computational time and considerations required to fit Betamix relative to ALDVMMs.
Comparisons with previous mapping studies
We identified three previous mapping studies relevant for comparison with our mapping study from the GAD-7 and/or PHQ-9 to the ReQoL-UI and/or EQ-5D (five or three-level versions) as part our pre-mapping considerations to inform our mapping plans.
Brazier et al. [
16] included the GAD-7 and PHQ-9 (among other mental health measures) with intentions to map to the EQ-5D-3L and SF-6D. This study used more traditional mapping models (OLS, Tobit, and response-level) rather than more modern and currently recommended mixture models; however, Brazier et al. [
16] was published in 2014 before mapping using mixture models gained widespread attention. It is important to note that Brazier et al. [
16] never mapped from the GAD-7 and PHQ-9 to the EQ-5D(-3L); rather, they mapped from the GAD-7 and PHQ-9 only to the SF-6D, with an alternative mental health measure (the Hospital Anxiety and Depression Scale, HADS) being used to map to the EQ-5D-3L. This was because the IAPT estimation dataset (one of four datasets) they had available with the PHQ-9 and GAD-7 only included the SF-6D, not the EQ-5D-3L. However, through inference from all the mapping they conducted, their overall conclusion was that “mapping from mental health condition-specific measures, such as the widely used PHQ-9, GAD and HADS, may not be an appropriate approach to generating EQ-5D and SF-6D scores as these measures focus on specific symptoms and not on the wider impact of mental health conditions”. Our current mapping study and associated previous psychometric analysis does not concur with Brazier et al. [
16] conclusion [
24], noting that our mapping studies are not completely alike (e.g. due to using a different target measure). However, reasons our conclusions do not concur could be associated with our use of more suitable mixture regression models for mapping compared to traditional mapping models (e.g. OLS) which have known limitations, that we are using the newer EQ-5D-5L rather than the previous EQ-5D-3L which has known shortcomings in mental health populations, and that we mapped from the PHQ-9 and GAD-7 to the EQ-5D-5L (and ReQoL-UI) which this previous study did not [
13‐
20,
25‐
27].
Furukawa et al. [
55] ‘mapped’ from the PHQ-9 to the EQ-5D-3L using a non-regression-based approach (i.e. equipercentile linking); however, Furukawa et al. [
55] does not describe itself as a mapping study and thus does not follow any current mapping guidance. The current first author published a correspondence about the study by Furukawa et al. [
55] which outlines concerns about the study and the ‘mapping function’ it produced, to which a response was also published [
53,
54]. Overall, the study by Furukawa et al. [
55] provides little to no model performance statistics, thus comparisons cannot be made with our current mapping study.
Keetharuth and Rowen [
60], a non-peer-reviewed article, mapped from the HoNOS to the ReQoL-UI. Although Keetharuth and Rowen [
60] follow mapping guidance and is appropriately reported, it has two key limitations: first, only OLS models are used; second, the HoNOS is clinician-reported thus the completer’s perspective is different to that of the ReQoL-UI (i.e. patient-reported) which limits the conceptual overlap between the two measures. Keetharuth and Rowen [
60] recognise these limitations, thus recommend caution when using their mapping functions.
Overall, previous mapping studies have not produced mapping functions between our source and target measures, with those mapping studies which are somewhat comparable to ours using more traditional regression-based (e.g. OLS) or non-regression-based (i.e. equipercentile linking) methods compared to the more modern and currently recommended mixture regression models we have used. Our study further emphasises the benefits of using mixture models, with ALDVMMs being a good starting point as they work well for mapping when used appropriately [
25‐
27]. Alternatively, Betamix can overcome the shortcomings of ALDVMMs (e.g. for the USVS in our study), noting Betamix is computationally more complicated and time consuming despite its relative benefits, thus ALDVMMs are the preferred starting model as was the case for this study. Overall, our mapping functions represent a needed tool for predicting utility values from the commonly used PHQ-9 and GAD-7 mental health measures.
Using the alternative predictions: aspects for consideration
Although all our predicted utility scores can be used to estimate QALYs, the source of these utility scores requires careful consideration. Firstly, each of our target utility scores have been shown to produce different QALYs [
23]; therefore, it is logical to assume these predictions will produce different QALYs. The EQ-5D-5L is the more commonly used and known preference-based measure, relative to the newer ReQoL-UI. The constructs of these measures are different; although both are suggested to be ‘generic health measures’, the descriptive system of EQ-5D-5L is more physical health focussed relative to the ReQoL-UI’s more mental health focus. The measures and associated utility scores have also been shown to have different relationships with anxiety and depression as measured by the GAD-7 and PHQ-9, respectively, which will have influenced the mapping models [
24]
Use of predicted utility scores: strengths and limitations
The mapping predictions have been estimated from a specific patient population involved in an IAPT-based trial: new IAPT Step 2 service referrals who met the trial eligibility criteria. IAPT Step 2 focusses on specific mental health populations and interventions; i.e. common mental health conditions that could benefit from low intensity therapies as brief psychological interventions (e.g. digital mental health interventions, Bibliotherapy) offered with support from clinicians [
61]. Additionally, our data collection time-period covers a 12-months care pathway when the patient is on a waiting-list or treatment, and a period during post-discharge. As such, we have less data that covers the ‘severe’ spectrum of anxiety and depression (mainly from baseline assessment) and this could explain our mapping models’ poorer performance at the severe end of the scale. Therefore, in mental health populations where ‘severe’ depression and anxiety is more prevalent (e.g. inpatient settings), our mapping functions are prone to higher predictive errors; alternative mapping predictions should be sought in such severe patient populations. For mental health trials wanting to use the predictions, consideration should be given to how an IAPT Step 2 population is representative of their trial population; for example, comparative assessment against our PROM score distributions in Fig.
1 with additional estimation sample descriptive statistics in Appendix S1.
Conclusion
Our mapping functions can be used to predict either the ReQoL-UI, EQ-5D-5L VSE, USVS or cross-walked utility scores from the PHQ-9 and GAD-7 summary scores. Our analyses found that including more than one component improved model fit, with the preferred ALDVMMs based on 4-component models, and that Betamix was preferred to ALDVMMs when mapping to the USVS only. Our mapping functions can be used in economic evaluations to predict utility as a function of the commonly collected PHQ-9 and/or GAD-7 summary scores.
Acknowledgements
We thank our colleagues at SilverCloud Health for permitting the anonymized trial data to be used for the purposes of analysis; in particular, we thank Derek Richards, Angel Enrique, and Jorge Palacios. We also thank the many patients who volunteered their time and efforts to participate in the trial.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.