Introduction

Health-related quality of life (HRQoL) is an important patient-reported outcome after critical illness. In the intensive care context, several generic profile instruments, such as the SF-36 [1], the Sickness Impact Profile [2], and the Nottingham Health Profile [3], have been used. Also, the EQ-5D, a generic utility instrument enabling the calculation of quality-adjusted life years (QALYs) and allowing the comparison of the cost-utility of interventions across different medical specialities, has been used [48]. Although no HRQoL instrument can claim to be the gold standard, the 2002 Brussels Roundtable Consensus meeting recommended the SF-36 and EQ-5D as the preferred HRQoL instruments in the critical care setting [9], but at the same time encouraged further methodological research and instrument design.

There is still an evident lack of comparisons between utility instruments in the critical care setting. The RAND-36 (SF-36) and the EQ-5D have been compared, and the former turned out to have slightly more discriminatory power [10]. However, the EQ-5D is currently the most widely used utility instrument worldwide for the calculation of QALYs [11]. Thus, it seemed appropriate to choose it in this study as the comparator to the 15D, which is the most frequently used utility instrument in Finland, although a possibility to generate a utility score (SF-6D) also from the SF-36 data has been introduced [12]. The 15D and EQ-5D have previously been compared in a number of other populations and patient groups [1315], but not in critical care.

Materials and methods

Patients

The data were collected prospectively in the Helsinki University Hospital between 1 January 2003 and 31 December 2004. The study was approved by the local Ethics Committee. Informed consent was obtained from the patients. The study population consisted of 3,600 patients treated in two intensive care (ICU) and three high-dependency units (HDU).

HRQoL instruments

The EQ-5D and 15D were mailed to patients alive and with a known address 6 and 12 months after admission to the ICU or HDU together with an accompanying letter and an informed consent form. The patients were asked to return the questionnaires and the consent in a prepaid envelope. In case of non-response, one reminder was sent. Possible readmissions after the index admission did not start a new follow-up.

The EQ-5D consists of five dimensions: mobility, self-care, usual activities, pain or discomfort, and anxiety or depression. Each dimension is divided into three levels: no problems, some problems and severe problems. Instead of the Finnish VAS-based valuation algorithm used in some earlier Finnish critical care studies [5, 16], we used the UK time-trade-off (TTO) “tariff,” which is the most commonly used valuation system for the EQ-5D. According to it the utility scores range from −0.59 to 1, where 1 means full health and 0 stands for death. No health state can obtain a score between 0.88 and 0.99, and negative scores indicate health states worse than death [17]. The minimal clinically important difference (MID) for the EQ-5D TTO is about 0.08 [18].

The 15D consists of 15 dimensions: breathing, mental function, speech, vision, mobility, usual activity, vitality, hearing, eating, elimination, sleeping, distress, discomfort and symptoms, depression and sexual activity. Each dimension is divided into five levels from no problems to extreme problems. The utility scores of the 15D range from 0 to 1, with 1 being equivalent to full health and 0 to death [19]. The MID of the 15D has been estimated as 0.03 [20].

Statistical analysis

The mean and median scores of the two instruments at 6 months were calculated. The visual inspection of the distribution of the EQ-5D scores suggested that the use of classical parametric tests to compare the two sets of scores may not be appropriate. Therefore, the Wilcoxon signed-rank test was used to test the difference in medians and distributions. The Spearman rank correlation coefficient was used to measure the association and a Bland-Altman plot to describe the agreement between the sets at 6 months. The discriminatory power of the instruments was explored by comparing the proportion of patients obtaining the ceiling score of 1 (ceiling effect). To analyse the agreement in the direction of change of the HRQoL scores between 6 and 12 months, a 3 × 3 matrix was constructed. The changes were classified according to MID as negative, if the change was ≤−0.08 and ≤−0.03, and positive, if the change was ≥0.08 and ≥0.03 for the EQ-5D and 15D, respectively. Other values were classified as unchanged. The McNemar-Bowker test was used to test whether the instruments give a similar picture of the changes (the matrix is symmetric) and Cohen’s kappa to test the agreement between the instruments. The p values <0.05 were considered statistically significant.

Results

Both the 6- and 12-month questionnaires were returned by 998 patients (38% of the 2,600 patients alive). However, 69 of the EQ-5D questionnaires were not filled in completely, leaving 929 patients for final analysis. Of them, 31% had been treated in an ICU and 69% in a HDU. For characteristics of the study population, see electronic material.

The distributions of the EQ-5D and 15D scores at 6 months are presented in Fig. 1. The distribution of the EQ-5D scores was wide, three-peaked and discontinuous, whereas that of the 15D was one-peaked and continuous. The rank correlation between the two sets of scores was 0.811 (p < 0.001). The mean (median) utilities at 6 months were 0.832 (0.859) and 0.731 (0.760) for the 15D and EQ-5D, respectively. The medians and distributions were different (p < 0.001) and the agreement between the sets poor (Bland-Altman plot in electronic material). The EQ-5D detected fewer health states than the 15D (at 6 months 79 vs. 767, at 12 months 70 vs. 745). Using the EQ-5D, 26% of patients had a ceiling score of 1 compared to 6% for the 15D. Regarding the clinically important change in the HRQoL scores between 6 and 12 months, the instruments had the same direction of change in 53% of the patients. The EQ-5D showed no change in 61% of patients, the 15D in 46%. Overall, the instruments gave a different picture of the changes (Bowker 53.9, p < 0.001), and the agreement between them was only fair (kappa 0.24, 95% CI 0.19, 0.29; (Table 1).

Fig. 1
figure 1

The distributions of the EQ-5D and 15D scores at 6 months

Table 1 The direction of the minimal clinically important change (MID) in the HRQoL scores between 6 and 12 months

Discussion

Patient-reported outcomes such as HRQoL have gained increased importance as measures of effectiveness of health care. Of the HRQoL instruments used in the assessment of critical care, only a few produce a utility score necessary for the calculation of QALYs. These include the EQ-5D and 15D, which we compared in this large cohort study. Regardless of the statistical test used, the agreement between the instruments was only moderate.

The main reason for this may be the wide, three-peaked and discontinuous distribution of the TTO-based EQ-5D scores with a high ceiling effect. Because of these features of the EQ-5D scores, classical methodology used for comparisons and evaluation of agreement between scores may not be applicable. The existence of negative utilities implies that HRQoL could be improved by dying, which is not logically congruent with the objectives of care, although from an ethical and economic point of view, a permanent non-independent health state may be worse than death. The distribution of the 15D scores (all positive values) was one-peaked and continuous and, importantly, only 6% of the patients evaluated their health state as perfect at 6 months after critical illness, indicating better discriminatory power of the 15D in minor health problems. The EQ-5D detected fewer health states and clinically important changes in HRQoL than the 15D. In this light, the 15D appeared more sensitive than the EQ-5D in terms of discriminatory power and responsiveness to clinically important change, although in the absence of a gold standard, it is not possible to say which instrument is “right.” In these respects our results agree with those obtained previously in a number of populations and patient groups [1315].

Strengths and limitations

One strength is that HRQoL was measured simultaneously by both instruments. Our comparison revealed, therefore, differences and agreement regarding the instruments. In addition, the patient population was large (despite of the response rate of 38%). In general, the lack of baseline HRQoL scores may be seen as a limitation, but in the comparison of two instruments, as in our study, does not pose a problem. ICU and HDU patients were analysed together, but as only 34% of the sample belonged to the ICU subset, our results do not necessarily reveal how the instruments may work in comparison when applied exclusively to ICU patients.

Conclusion

In conclusion, the 15D may be more sensitive than the EQ-5D in terms of discriminatory power and responsiveness to clinically important change. The agreement between the EQ-5D and 15D utilities was only moderate. As utilities differ depending on the HRQoL instrument used, the results of the cost-utility studies in the critically ill are difficult to compare. In the future, large cohort studies are warranted to produce sufficient comparative evidence regarding utility measures, including the SF-6D and the 15D, before the gold standard for utility measurement may be announced.