Introduction
An important outcome in trauma care is health-related quality of life (HRQL) of patients. HRQL reflects a patient’s physical, psychological, and social well-being [
1]. This subjective measurement is increasingly used in estimating the impact of an injury, in evaluating the quality of care provided, and in providing patient information on particular injuries [
2,
3]. Measurement of HRQL changes over time may be additionally useful to understand patterns of recovery over time and the role of rehabilitative care [
4,
5].
It is, however, a challenge to establish reliable and valid outcomes for changes of HRQL over time. The best time frame to measure relevant changes over time may be difficult to define ex ante, data may be incomplete due to censoring (death, withdrawal) or random missings, and the event itself may be unpredictable, which makes prospectively collecting HRQL data difficult or impossible [
4]. Retrospective assessment can be used to reconstruct the HRQL at an earlier time point. Retrospective assessment is easier to implement and involves less patient burden, but may be confounded by recall bias [
6], and response shift may occur [
7‐
9]. Recall bias is defined as a systematic measurement error, due to memory decay, which is the fading of memory with time. As a result, patients may remember their HRQL as being better or worse than it actually was [
10]. Response shift on the other hand is the change in the meaning of a person’s evaluation of a specific construct. This can be caused by a change in internal standards, a change in values, and/or a redefinition of the construct [
11,
12]. Among trauma patients, response shift may occur between multiple post-injury HRQL measurements due to patients adapting to their ill health.
Conventionally measured change in HRQL (post-level minus pre-level) may not be identical to the change in HRQL as reported by the patient, looking back at the time point of interest (retrospective change). If we take post-level minus pre-level as gold standard, retrospective recall bias will depend on the time interval between the measurement and the recall moment, as bias likely increases with longer intervals between measurements [
6]. The presence of recall effects may also depend on the scale used, where a visual analogue scale (VAS) with a wide range of response options may be easier distorted than a classification-like scale with a limited number of response options, like the descriptive system of the EQ-5D-3L [
13]. Finally, adequate props and instructions may support retrospective measurement avoiding the tendency to create emotionally fitting stories (cognitive dissonance reduction) [
14].
Only few studies with varying results have evaluated the correspondence of patient recall of HRQL. Correspondence was poor [intraclass correlation coefficient (ICC) 0.34–0.40] among a sample of elderly hospitalized patients (3 day vs. 38 days assessment). A large proportion of this poor correspondence was attributed to recall bias; the correspondence after adjustment for recall bias was excellent (ICC 0.90–0.98) [
15]. Two other studies in patients with prostate cancer found moderate correspondence (ICC 0.39–0.57) between pre-surgery HRQL and recalled pre-surgery HRQL (pre-surgery and 6–37 months post-surgery assessment) [
16,
17], and a study in patients with hip arthroplasty found good correspondence of pre-surgery HRQL and retrospectively assessed HRQL at various time points (3 days, 6 weeks, and 3 months assessment) post-surgery (ICC 0.70–0.95) [
18].
This study is the first study ever to evaluate the correspondence of directly reported HRQL and recalled HQRL in a heterogeneous sample of trauma patients, with specific attention to predefined subgroups. It compares directly reported EQ-5D summary and EQ-VAS scores collected at 1 week and 3 months post-injury, and recalled scores of 1 week collected at 3 months and 12 months post-injury, and recalled scores of 3 months collected at 12 months.
Discussion
This study explored the recall effects of HRQL assessment in a large heterogeneous sample of trauma patients. The results showed that recalled HRQL measured by the EQ-5D-3L and EQ-VAS is systematically lower compared to the directly reported HRQL of trauma patients, with a general decrease over time. The relative size of measurement error and bias was larger in EQ-5D-3L summary scores than in EQ-VAS. Most distortion in recalled HRQL was present in the dimensions anxiety/depression and pain/discomfort. The correspondence between directly reported and recalled scores decreased with the time between measurements, and it was influenced by the post-injury phase being recalled: correspondence was better when T2 (3-months post-injury; recovery phase) was recalled compared to when T1 (1-week post-injury; acute phase) was recalled. Patients with a major injury and those with a middle level of education had most difficulties with recalling their prior HRQL, whereas patients with a high educational level were in general best in recalling their prior HRQL.
Our study showed in general fair correspondence between directly reported and recalled HRQL. This is in line with earlier studies on HRQL recall that showed that the association between recalled HRQL and prospective reports of HRQL was moderate [
13]. This was the case in patients with prostate cancer [
16,
17] as well as in older hospital patients [
15]. Two studies on recall of pre-surgery HRQL in prostate cancer found correlations between 0.39 and 0.57 for scores collected before and six to 37 months after surgery [
16,
17]. In the study of McPhail et al., elderly hospitalized patients reported their HRQL within 3 days of admission and immediately prior to discharge (median hospital stay of 38 days). This study found a poor recall correspondence (ICC of 0.34 for EQ-5D summary score and 0.40 for EQ-VAS) [
15]. However, as opposed to the results of these studies, a study in patients with hip arthroplasty found good to excellent correspondence of pre-surgery HRQL scores obtained before surgery and 3 days (ICC 0.8–0.9), 6 weeks (ICC 0.7–0.9), and 3 months (ICC 0.85–0.95) post-surgery [
18]. Results on recall correspondence are thus scarce and seem to depend on the condition that is being recalled as well as on the time frame between the assessments. Earlier studies investigated the test–retest reliability of the EQ-5D-3L. These studies showed that the accuracy of the EQ-5D-3L differed, depending on the timeframe, EQ-5D-3L utility or VAS used, and study population and ranged from 0.70 to 0.85 [
28‐
31]. The correspondence between directly reported and recalled HRQL based on the EQ-5D-3L found in our study is much lower, as we expected, since correspondence between directly reported and recalled HRQL cannot be more accurate than the reliability of the instrument. However, it should be noted that test–retest reliability of the EQ-5D-3L was not yet studied in trauma patients and therefore we were not able to compare the correspondence found against the accuracy of the instrument in trauma patients.
As opposed to our hypothesis that a scale with a wider range of response options like the EQ-VAS is easier distorted than a classification-like scale with a limited number of response options, like the EQ-5D-3L [
13], our findings showed lower ICC scores on the EQ-5D-3L compared to the EQ-VAS. This was also seen in the study of McPhail et al. where the ICC score of the EQ-VAS was higher than the score of the EQ-5D summary (0.40 vs. 0.34) [
15]. In view of these results, we reject our hypothesis as the EQ-VAS seems to be less distorted compared to the EQ-5D-3L.
Also, the time interval between the initial measurement and the recall moment was seen to influence the correspondence of recall; however, results were partly in contrast with our hypothesis. As expected, recalled scores of 1 week post-trauma differed more from the directly reported scores when recalled at 12 months post-injury compared to 3 months post-injury. This is in line with earlier studies that showed that the correspondence of recall decreases with the time between the initial measurement and the recalled moment [
10]. However, despite the longer time of 9 months between the initial assessment at 3 months and the recall assessment at 12 months, the correspondence between T2 and T3 was higher (highest ICC rates) compared to the T1 and T2. This seems to indicate that apart from the follow-up time, also the post-injury phase influences the correspondence between directly reported and recalled scores. In the acute phase (1 week post-injury), there are rapid changes in health, which may impede recall, whereas the health state in the recovery phase (3 to 12 months post-injury) may be more comparable to the current health state and therefore easier to remember. These findings are interesting to study further in future studies, for example, to see how a 2-year time period affects the recalled outcomes.
Different subgroups of patients had a different degree of correspondence between the directly measured and recalled HRQL. As hypothesized, patients with a major trauma (ISS ≥ 16) had lower correspondence. This may be due to the severity of the trauma and possibly also due to neurologic complications many of them suffered from. The type and severity of injury thus also seem to influence the correspondence of recall. Also, patients with a middle level of education were among the groups with the lowest correspondence between directly measured and recalled, whereas correspondence was high among patients with a high level of education. To the best of our knowledge, no other studies have investigated whether the correspondence between directly measured and recalled HRQL is different among subgroups based on level of education.
Our finding that recalled EQ-5D-3L and EQ-VAS is systematically lower compared to the directly reported HRQL of trauma patients may have implications for the application of recalled EQ-5D in cost-effectiveness studies. The EQ-5D-3L is a widely applied HRQL instrument for QALY estimations and in cost-effectiveness analyses; however, systematic bias in retrospective assessment, resulting in larger differences in EQ-5D summary scores between two assessments compared to directly reported EQ-5D, can influence cost-effectiveness analyses, and therefore, use of recalled HRQL assessment can potentially lead to inefficiencies in resource allocation.
Strengths and limitations
This study had several strengths and limitations. Strengths include the sample size of our study, which was large enough to test for differences between different subgroups of trauma patients, and the assessment of the directly reported and recalled HRQL on several time points and with different timeframes between assessments to evaluate both assessment points and follow-up times. Another strength is the inclusion of both the EQ-5D dimensions and the EQ-VAS, which allowed us to compare a classification-like scale with a more subjective scale. Limitations include the potential selection and participation bias and the use of the EQ-5D-3L instrument instead of the 5L version. A low proportion (< 10%) of all invited trauma patients participated in the study and filled in the various EQ-5D surveys at all assessment points. Therefore, our results may not fully reflect the Dutch trauma population. The EQ-5D-3L, the three answer option instrument, is less sensitive than the more comprehensive EQ-5D-5L version (five answer options). The recall correspondence is expected to be less accurate when more answer options are present. It might be valuable to test the recall correspondence of the EQ-5D-5L in future research.
Conclusion
Our study showed that recalled HRQL measured by the EQ-5D-3L and EQ-VAS is systematically lower compared to the directly reported HRQL of trauma patients, with a general decrease over time. This indicates that recalled HRQL cannot be used as a replacement for prospectively assessed HRQL. If it is difficult or impossible to collect HRQL data prospectively, retrospective assessment is an option; however, when applying retrospective assessment, researchers should be aware that systematic bias may occur. Our study showed better correspondence for the EQ-VAS compared to the EQ-5D summary score, indicating that the EQ-5D descriptive system is more prone to systematic bias than EQ-VAS. Besides, patient characteristics, injury severity, subjectivity of the dimension, and time interval also influence correspondence between directly reported and recalled HRQL.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.