Introduction
Evidence of the cost-effectiveness of new and emerging interventions is commonly used across health systems to assist policy makers in the allocation of scarce health care resources. In order to conduct cost–utility analysis, the most prevalent form of economic evaluation, a preference-based measure of health-related quality of life (HRQOL) is widely used to facilitate the calculation of quality-adjusted life years (QALYs), a generic measure of effectiveness [
1]. Preference-based HRQOL measures provide a single-summary score which makes it a useful outcome measure also across multiple settings.
During the last decade, there has been an increased focus on preference-based HRQOL instruments that aim to measure children’s utility. The guidelines for estimating QALYs in youth populations are, however, still unclear [
2]. A recent review, therefore, warrants further empirical evidence on the valuation of youth-specific preference-based measures [
3]. A review identifies nine preference-based HRQOL instruments that have been used in paediatric populations [
4]. Of these the Child Health Utility 9D (CHU9D) is the only one designed exclusively from its inception for application with young people [
5]. The remaining instruments were not originally developed for children and adolescents, and others represent different forms of adoptions from measures designed for adults.
CHU9D has been demonstrated in several studies to have good content, face, and construct validity for young people in the age-group 7-17 years [
6‐
13]. Furber & Segal [
14] examined face validity, practicality, internal consistency, and convergent validity of CHU9D in a population of clients in South Australian child and adolescent mental health services. The authors concluded that their initial validation of CHU9D showed promising results, but that there was a need for further validation including a general need for validation of responsiveness, which their cross-sectional design was unable to capture. Responsiveness is critical for a preference-based HRQOL instrument, as its suitability for application in economic evaluation depends on its capacity to reliably detect changes in HRQOL due to the introduction of new interventions. To the best of our knowledge, the responsiveness of CHU9D has not yet been examined.
There exist two main scoring algorithms for deriving utilities from CHU9D, one based on an adult population [
15] and one based on an adolescent population [
16]. The choice of whose values to use in an economic evaluation can have important policy implications due to the potential impact this has on the QALY estimates and thus on the final incremental cost-effectiveness ratio. The choice could especially be important in interventions aiming to improve mental health, as Ratcliffe and colleagues found that adolescents placed more weight upon impairments in CHU9D dimensions related to mental health (sad, worried, annoyed) than adults [
8].
The main objective of this study is to examine the construct validity and responsiveness of the proxy-reported (parent) CHU9D in a mental health setting. This will be the first study to examine the validity of CHU9D a longitudinal design, and the first study to examine responsiveness of CHU9D in a mental health context. Furthermore, the examination of construct validity will add to the evidence from Furber & Segal [
14], on the appropriateness of using CHU9D in a mental health setting by examining a larger population and having comparison with both a mental health-specific measure (the Strengths and Difficulties Questionnaire, SDQ) and a generic HRQOL measure (KIDSCREEN-27). A second objective is to examine whether the utility weights derived from the adult population or the utility weights derived from the adolescent population demonstrate differences in validity and responsiveness in this context.
Discussion
In examining the construct validity of CHU9D in a mental health setting, this study has demonstrated that CHU9D is capable of discriminating between groups with different severity of mental health problems. In all cases, the mean difference between the groups was higher than the MID of 0.03. The utilities derived using the adolescent scoring algorithm did, however, result in substantial larger mean differences between groups. The average mean difference between the low-medium–high groups was 0.115 across the three measures using adolescent weights, while it was 0.063 when using the utilities derived from the adult weights. There can be different explanations for the differences. There are substantial methodological differences between the two sets of preference weights including the country, the sample sizes and the elicitation techniques. The differences found is, however, likely to be reflective of the relatively stronger weight attached to mental health impairments in the adolescent scoring algorithm in comparison with the adult scoring algorithm [
10,
16]. A difference has also been found for CHU9D when comparing adolescent and adult preferences in the same country using the same methods for elicitation [
8]. When used in a cost–utility analysis the choice of preference weights difference is likely to have a substantial impact on the incremental cost-effectiveness ratio (ICER). Future cost–utility analysis involving interventions for children with mental health problems could examine the impact of the choice of preference weights on the results of a CUA by conducting their analyses using both value sets.
For convergent validity, similar results were evident regardless of the weights used to derive CHU9D utilities. CHU9D showed the hypothesized correlations with all measures except with the SDQ-TD. Here a correlation just below 0.3 was found and thereby categorized as weak. The weak correlation could be due to the differences in the scope of CHU9D and SDQ-TD. CHU9D aims to capture the impact of mental and other health-related problems of the child, and the SDQ-TD aims to measure the symptoms of the mental health problems. To further analyze the correlation we compared our item correlations with those found by Furber and Segal [
14]. In their study, they highlighted five correlations at the dimension/item-level which they argued have a clear conceptual overlap. Moderate correlations were found for three of them and weak correlations for two. In comparison, this study found moderate correlations for four of them and a weak correlation for one (correlations are marked in Appendix Table B). Furber and Segal [
14] furthermore found correlations between CHU9D utility (adult weights) and SDQ items above 0.2 for 11 of the 20 items, which makes the Total difficulties score, whereas this study found correlations of 0.1 or lower for the same correlations (correlation are marked in Appendix Table B). In examining these 11 items in our cohort, on average only 12% of respondents indicated that their child was in the worst category. For six of the 11 items,
well behaved, one good friend, often fights and bullies, often lies/cheats, picked on/bullied, and
steals we found that there were less than 10% that responded in the worst category. These findings suggest lower levels of social and behavioral problems in this specific population, and, therefore, less convergence on these domains with a generic measure of HRQOL Combined with the convergent validity of the other measures the above findings make the weak correlation between CHU9D and SDQ-TD less of a concern in relation to convergent validity.
There was neither floor nor ceiling effects of CHU9D. Although 16% of the children reported full health in the follow-up, it should reflect the fact that after the intervention, their HRQOL improved. The findings from SDQ-I showed that an even higher percentage reported no impact on the daily life from the mental health problems at follow-up.
The analyses of SRM showed acceptable responsiveness of CHU9D regardless of the weights used to derive utilities. In the analyses of change in mean utility, CHU9D was capable of distinguishing between the group of children whose mental health improved and those who did not. Using CHU9D adolescent scoring algorithm, we found that the magnitude of the mean differences was considerably larger compared to when using the adult weights. These results again point towards the choice of utility weights is likely to have a great impact on the ICER in a cost–utility analysis.
The difference in recall time adopted in different questionnaires may influence the validation analysis of CHU9D. However, all questionnaires were completed online at the same time, which could possibly minimize the impact of the time perspective differences.
It is beyond the scope of the present study to describe the group differences in change scores. A cost–utility analysis of the intervention using CHU9D will be conducted later and published in a separate article.
This study provides a broad validation for the use of CHU9D in mental health settings as the participants consist of children with a broad range of mental health problems, ranging from internalizing to externalizing problems and combinations. The results are, however, limited in their generalizability due to the lack of participants with severe mental disorders. E.g., commonly used preference-based HRQOL instruments in adult populations have been shown to be less appropriate in trials with schizophrenia patients [
28]. The cross-sectional findings by Furber and Segal [
14] in a population that include severe mental disorders does, however, indicate CHU9D is also appropriate for use in such populations. A previous study has examined the validity of other preference-based HRQOL in a youth population suffering from depressive conditions. A number of the instruments, including non-pediatric, showed good construct validity and responsiveness in the study [
34]. Future studies should examine if other non-pediatric preference-based HRQOL instruments show similar good construct validity and responsiveness also in other mental disorders and in younger populations.
In this study we examine the proxy-reported (parent) version of CHU9D, future studies should also examine the longitudinal validity of the self-reported version of CHU9D.
Conclusions
The findings from this study demonstrate that the proxy-reported (parent) CHU9D is an appropriate preference-based HRQOL measure for use in mental health trials. The inclusion of CHU9D will enable a cost–utility analysis of interventions aiming to improve child and adolescent mental health, and thereby provide valuable evidence for health care resource allocation and decision-making.
The results showed that the preference weights generated from an adolescent population resulted in the larger mean differences between groups with different severity of mental health problems, and between the children that measured with SDQ and KIDSCREEN improved their mental health and those who did not. This finding suggests that the choice of preference weights could have a substantial impact on the results when used in a cost–utility analysis in a mental health setting.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.