FormalPara Key Points for Decision Makers

Including caregiving costs in economic evaluations is necessary in order to avoid the risk of labelling an intervention as cost effective when it actually does not maximize social welfare.

The iCARE tool (https://www.imta.nl/tools/) can be used to include informal care in economic evaluations based on patients’ EQ-5D data in situations where it is not feasible to collect caregiver data.

The iCARE tool enables the inclusion of informal care on the cost side of economic evaluations and should be used for samples that fall within the following age ranges: men (47–104.2 years), women (55–103 years).

1 Introduction

Economic evaluations in healthcare provide decision makers with insights about the relative efficiency of different interventions for producing health. These evaluations compare incremental costs and effects, often expressed in terms of quality-adjusted life-years, to some threshold representing either societal willingness to pay or opportunity costs within the healthcare system. An important consideration in these evaluations concerns the types of costs and effects that are to be included in the analysis [1, 2]. Advocates of a ‘societal perspective’ propose that all costs and effects occurring in society at large need to be considered [3, 4], including those associated with informal caregiving [5,6,7,8,9,10]. Moreover, excluding informal care in economic evaluations can result in non-optimal healthcare decision making [6, 11].

Family and friends often provide a substantial number of hours of care per week to address patients’ care needs [12, 13]. Therefore, healthcare interventions can affect not only patients but also the individuals in their social network [14,15,16,17,18]. A patient’s health can directly influence the health and well-being of the patient’s family or friends, regardless of whether they are involved in caregiving [19,20,21,22]. In addition to this family effect of ill health, family members and friends can experience positive and negative consequences from providing care, known as the caregiving effect [22]. For example, many caregivers derive fulfilment from caregiving, but caregiving is also associated with depressive feelings and physical health problems, and it can be difficult to combine caregiving with other activities, such as paid work or family life [23,24,25,26].

Outcomes for caregivers can be included on either the effect side or the cost side in economic evaluations. On the effect side, caregiver outcomes can be included as health-related quality of life next to patients’ utilities in cost-utility analysis or as a separate outcome in terms of care-related quality of life or carer experiences in multi-criteria analysis [27, 28]. Alternatively, informal care can be addressed as one of the cost components in economic evaluations in the form of the opportunity costs of the time spent on caregiving, typically by multiplying caregivers’ time investment by a value for time [6, 11]. However, due to practical considerations and methodological difficulties in identifying, measuring, and valuing informal care, these costs are usually not included in economic evaluations [6, 11, 29], mainly because most clinical and observational studies only collect data about patients and not about their caregivers [17]. Because effects of interventions on carers are not usually measured, they remain unknown [6], potentially resulting in suboptimal resource allocation recommendations.

Recent research has attempted to predict caregiver outcomes using patient health data—for example, researchers have used data collected for measuring the family impact of meningitis to estimate changes in carers’ health based on changes in patients’ health [30], and have also focused on predicting parents’ health-related quality of life based on health information about their ill children [31]. Although these studies provide compelling examples of how to include caregivers’ health-related quality of life data in economic evaluations, they are rather restrictive in the type of data they use, and they also report poor predictive performance for their statistical models. In addition to these findings, we also investigated the relationship between care-related quality-of-life data of caregivers using the CarerQol instrument (CarerQol-7D) [27] and patient-level data, but we found this relationship to be very weak. To overcome some of these challenges, studies have proposed accounting for informal care on the cost side. For example, some studies [3, 32] have developed statistical models for predicting the number of days of care using patients’ health data. However, using the number of days of care can only provide a general indication of the real informal care costs, since the actual time commitment on each day remains unknown.

This study aimed to develop a statistical model for predicting caregivers’ time investment using patients’ health data, to serve as an input for cost calculations in economic evaluations. The study builds on previous research in several ways. First, we combined several datasets that contain health data for a large variety of patient groups in the Netherlands. This allowed more generalizable results and made it possible to develop a platform that can be used for accounting for informal care in various economic evaluations. Second, we measured informal care costs based on the hours of care during a week rather than days per week, as in previous research, which provides a more accurate indication of informal care costs. Moreover, we used caregiver-reported time investment, while previous research used patients’ reports of their caregivers’ time investment [3, 32]. Hence, we aimed to develop a prediction model for the number of hours of care received by the patient based on information that is routinely available in clinical studies: the health status as measured by the EQ-5D and the gender of the patient. A unique feature of our study is that, based on our prediction model, we have constructed a Microsoft Excel tool, the informal CARE effect (iCARE) tool (https://www.imta.nl/tools/), which researchers can use to estimate hours of care in the absence of data on caregivers. The prediction model and the iCARE tool enable the inclusion of informal care on the cost side of economic evaluations that only have access to patient health data as measured by the EQ-5D.

2 Methods

2.1 Data

This study used 4 survey datasets with a total of 8012 observations. Here we briefly describe the datasets; the appendix provides a more thorough description (see electronic supplementary material [ESM]). Our dataset 1 (n = 6482) is the Older Persons and Informal Caregivers Survey Minimum Data Set (TOPICS-MDS, http://www.topics-mds.eu), a database of 26 research projects on care for older persons in the Netherlands that is part of the National Care for the Elderly Programme (NPO) [33, 34]. Dataset 2 (n = 1244) concerns a heterogeneous group of informal caregivers selected from a representative sample in terms of age and gender of the Netherlands’ general population aged 18 years and over [35]. Dataset 3 (n = 175) was collected with written questionnaires from a heterogeneous group of informal caregivers identified through respite centres in the Netherlands in 2003 [28]. Dataset 4 (n = 111) comes from the Brabant Injury Outcome Surveillance (BIOS) study, a cohort study of hip fracture patients in the Netherlands [36].

In all four datasets, health status was measured with the three-level version of the EuroQoL-5D measure (EQ-5D-3L) developed by the EuroQol Group (http://www.euroqol.org [37]), the most commonly used measure of patient health status in economic evaluations and the instrument of choice of the National Institute for Health Care and Excellence (NICE) for supporting reimbursement decisions [38]. Responses to the EQ-5D-3L questionnaire were provided by either the patients themselves (datasets 1 and 4) or their caregivers (datasets 2 and 3). Depending on the dataset, different questions were used to gather information on the number of informal care hours per week. For datasets 1, 2 and 3, information on the hours of informal caregiving was collected through questions on care activities from the iMTA Valuation of Informal Care Questionnaire (iVICQ [39]). For datasets 1 and 2, three questions were posed regarding the number of informal care hours during the previous week that were spent on household activities (HDL), personal care activities (ADL) and practical support (IADL). In dataset 3, these three different types of activities were split into 16 different care tasks (e.g. preparing food, personal hygiene). In dataset 4, information was collected on six care activities covering HDL, ADL and IADL activities. In all datasets, the total number of informal caregiving hours indicates the sum of the hours spent on HDL, ADL, and IADL activities during the preceding week. The maximum number of hours of care was set to 126 h/week [27] because, although some caregivers report needing to be on standby 24/7 for care needs, caregivers also need time for other activities, such as their own personal care or sleep.

Table 1 describes the four datasets and shows that they differ in terms of sample size, age of the care receivers, health of the care receivers as indicated by the EQ-5D-3L, and mean number of hours of care provided by the caregiver. Figure 1 presents histograms of the number of hours of informal care reported for the previous week in each dataset. We observe that the distribution is positively and right skewed for all datasets. All patients in the analysis reported that they received informal care during the previous week. Nonetheless, in some instances, their caregivers reported having provided zero hours of care during the preceding week.

Table 1 Summary statistics for the four datasets used in this study
Fig. 1
figure 1

Histograms of the four datasets

2.2 Analyses

We developed our model, which predicts the number of hours of informal care based on the EQ-5D-3L items, in four steps. In step 1, we selected the best regression method for our model. In step 2, we pooled the four datasets and selected the best model specification for the data (including testing for interaction terms). In step 3, we estimated the model developed in steps 1 and 2 using the Bayesian approach. Finally, in step 4, we used the results from the Bayesian estimation to develop the iCARE tool, which predicts hours of care (i.e., point estimates for the mean and the entire distribution of 10,000 samples obtained from the Markov chain Monte Carlo [MCMC] simulation). We used the library gamlss in R [40, 41] in steps 1 and 2, and we used Winbugs [42] in step 3. Here we briefly describe each step; details appear in the appendix (see ESM).

In step 1, we compared the linear and gamma models using training and validation sets obtained from the four datasets. At this stage, we modelled the number of hours of care as a function of gender and the EQ-5D-3L items. To do this, we randomly trained the model on a pooled dataset formed by collapsing three datasets and then validated this model with the remaining dataset (the four datasets provided four such training and validation sets). We also developed a training and validation set by splitting the pooled dataset obtained from merging the four datasets in half, through random sampling. To assess the quality of the prediction, we used the mean absolute error (MAE), the mean square error (MSE), the root mean square error (RMSE) [43] and the original Akaike information criterion (AIC) [44] for model comparison. We developed regression models both with and without intercepts. Although models without an intercept are generally not recommended, as they can introduce bias [45], they are appealing for our setting since it seems reasonable to assume that an individual in full or perfect health as indicated by the best EQ-5D-3L state may require 0 h of informal care. However, we found that the gamma model with an intercept produced the smallest prediction error (see Appendix Table 2 in the ESM).

In step 2, we pooled the four datasets and used the gamma method to select the best model specification by comparing models, including interaction terms for both the mean and the scale parameters of the gamma models. We minimized the generalized AIC (GAIC) for different penalties: \({\text{GAIC}} = - \,2*l(\theta ) + df*\#\), where \(- \,2*l(\theta )\) is the fitted deviance, \(df\)denotes the total degrees of freedom, and \(\#\) is the penalty for each degree of freedom used in the model [41, 46]. We selected the model specification based on the GAIC with penalties \(\# = 2,\,5,\,9\); \(\# = 2\) produces the original AIC [44], \(\# = 9\) gives the Schwarz Bayesian information criterion (SBC), and we also considered the GAIC with a penalty of 5, a value falling between the AIC and the SBC. We found that the best model specification as indicated by the GAIC with various penalties is one in which the shape parameter is modelled as a log-link function of the EQ-5D-3L dummy variables and the gender of the patient (see Appendix Table 3 in the ESM). In fact, because gender as a main effect as well as in interaction with other variables produced large benefits in terms of model fit, we opted to fit separate models for men and women. We found that no other interactions were significant and that adding model specifications for the scale parameter did not significantly improve the model fit.

To develop a model that provides predictions while offering a suitable measure of uncertainty associated with unknown future events, we used a Bayesian approach in step 3 to estimate probability distributions over future events. We used the MCMC algorithm, which yielded a Monte Carlo sample from the posterior predictive distribution over future quantities. We used the Bayesian estimates in step 4 to develop the iCARE tool, which predicts hours of informal care for a caregiver using the patient’s EQ-5D-3L data. The tool includes estimations of hours of informal care based on both EQ-5D-3L and EQ-5D-5L data, where the 5L data are transformed to 3L data using the probability matrix from the 5L to 3L cross-walk [47]. Note that the predictions of caregivers’ hours of informal care are based on a sample of individuals who all receive care. Since, in many populations, not all individuals receive care, the estimate produced by the toolkit must be corrected for the proportion of patients that receive care before using the estimate in cost-effectiveness analyses. To further exemplify its use, below we apply the toolkit to a previously published cost-effectiveness model for multiple sclerosis and demonstrate the impact of including caregiver burden on the incremental cost-effectiveness ratio.

3 Results

Figure 2 shows posterior distributions of the 10,000 MCMC simulation samples for the regression coefficients of the mean hours of care (µ), modelled separately for male and female patients. The regression coefficients consist of an intercept and EQ-5D-3L responses at levels 2 (some problems) and 3 (a lot of problems) relative to the baseline level 1 (no problems) for each EQ-5D-3L question. The distributions of the regression coefficients estimated for level 3 relative to level 1 were wider (illustrating more variation or higher uncertainty) and shifted to the right compared with those estimated for level 2 relative to level 1. For some EQ-5D-3L items, the two distributions largely overlapped. In the model for men, the EQ-5D-3L estimates for levels 2 and 3 overlapped for anxiety, mobility, and pain, while they overlapped for anxiety and self-care in the model for women. In general, the relationship between the hours of informal care and the EQ-5D-3L score was stronger for men than for women.

Fig. 2
figure 2

Posterior distributions (10,000 MCMC samples) of the regression coefficients of the mean µ for the models for male patients (top) and for female patients (bottom). Coefficients for the EQ-5D item levels 2 (grey lines) and 3 (black lines) are relative to the baseline category level 1

Table 2 illustrates the summary regression coefficient estimates (mean with 95% credible intervals, median, MC error) based on the 10,000 MCMC sample distributions. Table 2 shows that, for both men and women, the coefficients at levels 2 and 3 for most EQ-5D-3L questions had a positive sign relative to the item responses indicated at level 1, suggesting that the number of informal care hours increased as the health state indicated by the EQ-5D-3L score worsened. There are exceptions for a few coefficients at level 2, which had a negative sign relative to level 1 (usual activities for men and pain for women); however, these coefficients were not statistically significant and, as indicated by Fig. 2, their posterior distributions overlapped with the coefficient distributions at level 3. Moreover, in both models (men and women), the coefficient for mobility at level 2 was negative, which means that our models cannot distinguish between mobility levels 1 and 2. For these situations, we considered levels 1 and 2 as interchangeable in the iCARE tool and included only changes of level 1 or level 2 relative to level 3.

Table 2 Posterior summary estimates of the regression coefficients for the mean µ on log-link scale

Figure 3 compares posterior hours-of-care predictions (distribution and mean) for individuals in the best health state (11111) and in the worst health state (33333), as indicated by the EQ-5D-3L instrument. For male patients, the predictions of the mean hours of informal care per week varied between 16.0 (95% CI 14.7–17.4) and 52.8 (95% CI 39.4–66.3) for the best (11111) and the worst (33333) health state. For female patients, the variation was between 13.6 (95% CI 12.7–14.5) for the best (11111) and 32.0 (95% CI 26.2–37.8) for the worst (33333) health state. Hence, in line with the raw dataset, the numbers of hours of informal care predicted for men were considerably greater than for women.

Fig. 3
figure 3

Posterior predictions of mean hours of informal care for the best and the worst health states as indicated by the EQ-5D. The large dots represent the posterior mean distribution estimates

We imported the estimated probability distributions using the Bayesian approach from Winbugs to Microsoft Excel and developed the iCARE tool to provide users with entire estimated probability distribution predictions and corresponding point estimates of the mean hours of informal care based on the EQ-5D-3L items, with 95% credible intervals.

We used the iCARE tool in a case study to estimate caregiver burden for multiple sclerosis (MS) patients, using data from a previously published cost-effectiveness model [48]. The model had four health states based on the Expanded Disability Status Scale (EDSS): EDSS1 (EDSS 0–2.5), EDSS2 (EDSS 3–5.5), EDSS3 (EDSS 6–7.5), and EDSS4 (EDSS 8–9.5), and two relapse states. Observed EQ-5D-3L data for patients in each of the four EDSS-based health states (n = 1295) were available, and we used these to calculate caregiver hours per week for each health state. Since not all of the MS patients received informal care (especially those in the better EDSS states), and the iCARE algorithm is based on a sample of patients who all receive care, the estimated values had to be corrected for the proportion of patients who received care. For this, we used values for the proportion of patients in each EDSS class who receive care as provided in [49]. We translated the resulting estimate into costs per month using a 14 Euro (€) per hour cost price for caregiver time, as recommended in the Dutch costing manual [50]. The reference price of 1 h of informal care here is the market value of household activities as stated by the Central Administration Office (CAK) in the Netherlands. Costs and effects were discounted by 3% annually. Table 3 shows that including caregiver burden in this example reduces the lifetime incremental costs of glatiramer acetate versus symptom management from €22.369 to €20.774; consequently, the incremental cost-effective ratio (ICER) was reduced by approximately 10% (from €161.319 to €145.265).

Table 3 Example application incorporating estimated caregiver burden (monthly caregiver costs)

4 Discussion

This study shows that the relationship between the number of hours of care indicated by a caregiver and the EQ-5D-3L score of the care receiver can successfully be used to develop a prediction model that estimates informal care effects for economic evaluations in clinical studies when these data are unavailable. We developed the iCARE tool to derive predictions (mean, 95% credible intervals and entire distributions based on 10,000 MCMC samples) of hours of informal care using patient-level EQ-5D-3L data.

Our results show that informal care effects are higher for patients with lower health scores, in line with findings of previous studies [3, 32]. Moreover, our results indicate that men required more hours of informal care than women. The predicted mean hours of informal care per week for male patients varied between 16.0 (95% CI 14.7–17.4) for the best (11111) and 52.8 (95% CI 39.4–66.3) for the worst (33333) EQ-5D-3L health state. For female patients, the variation was between 13.6 (95% CI 12.7–14.5) for the best (11111) and 32.0 (95% CI 26.2–37.8) for the worst (33333) health state.

Our study has several noteworthy strengths. First, it contributes to the economic evaluation field by enabling researchers to estimate the effects of interventions on informal carers. Considering the increasing appeal to the public in many countries to contribute to the care of their loved ones, it is important to account for the effects thereof, irrespective of the perspective adopted in the evaluation of healthcare interventions. It is evident that these effects should be included in evaluations from the societal perspective. But considering the mounting evidence of the effects of providing care on the health and wellbeing of informal caregivers, evaluations from a healthcare perspective should also consider examining and presenting these effects alongside the health effects of interventions on patients. After obtaining information on the hours of informal care using the iCARE tool and correcting this estimate for the proportion of the sample that is receiving care, the next step for researchers is to multiply the hours of informal care by a value for time spent on informal care. This value can be derived from health economic manuals or the literature; for example, by using the value of a close market substitute who could be hired to perform the specific caregiving tasks [51] or values from stated preference methods, such as contingent valuation studies or discrete choice experiments [27, 51,52,53]. Second, from a statistical perspective, we have addressed some important challenges associated with the development of the prediction model. We used a variety of datasets in this study that include information on hours of care and the EQ-5D-3L items, allowing us to test the robustness of the prediction models for various populations. Furthermore, we adopted a Bayesian estimation procedure that enabled us to obtain suitable measures of uncertainty through use of the MCMC algorithms. This is important, especially for using the prediction model results in economic evaluations, as it will allow proper propagation of uncertainty in these models.

Our study also has several limitations. First, some of the datasets did not provide self-completed information on the care receiver’s EQ-5D-3L: for two of the four datasets, accounting for about 18% of the pooled dataset used to estimate the prediction model, the EQ-5D-3L questionnaire was completed by the caregiver instead of the care receiver, which may have introduced bias in our analyses. Second, we only had data on the hours provided by the primary caregiver, while patients could have had more than just one caregiver and hence may have received more hours of care in total. The model we developed is not conditional on the number of caregivers that a patient might have had but only on patient gender and EQ-5D-3L items. This means that the estimated number of hours of care from the model is independent of the number of caregivers. Third, the four datasets are rather unbalanced in terms of sample size, as they represented 81%, 15.5%, 2.1%, and 1.4% of the pooled dataset. Obviously, the pooled estimate was dominated by the larger dataset (TOPICS). This might have cancelled some of the diversity introduced by the different datasets, although the largest dataset comprised 26 different studies in a variety of patient populations. However, the pooled databases, while adding more heterogeneity, result in a model with similar errors and increased variability (or range of model predictions) as the one using TOPICS data only, and this is potentially beneficial for developing a toolkit that could be used in diverse health technology assessments. Fourth, the model is based on data from the Netherlands only. The validity of the model’s predictions for other countries with different healthcare systems and perhaps different informal caregiving needs and traditions is therefore unclear. Hence, we recommend investigating country-specific tools using local data. Fifth, we purposefully restricted the prediction model to information that is routinely available in clinical studies. Including only patient EQ-5D and gender was an a priori decision made considering that the main purpose of the tool was to be used in studies in which no information about the caregiving is available. More often than not, information about the caregiver is not available in clinical trials and observational studies; hence, we focused on developing a tool that would be useful in studies where only patient data is available. Besides, we expect that in studies where information about the caregiver is collected, this will likely include information about hours of care and there would therefore be no use for this tool. We also developed models including the relationship between the number of hours of care and the age of the patient; however, we found that this was difficult to explain and did not improve model fit. In addition, we developed models including other explanatory variables such as relationship of the care receiver with the caregiver (partner, son/daughter, other), age of the caregiver, if the care receiver is institutionalized or not, co-habitation of care receiver with caregiver. We only found a significant relationship for the relationship between the caregiver and the care receiver and the number of hours of care. However, as mentioned, we decided not to include more variables in the model as such specific information is unlikely to be generally available in the clinical trials and this would highly restrict the usability of the tool. Lastly, since the iCARE tool predicts only the mean hours of care through the health status and gender of the patient, the tool is not an appropriate instrument for interventions where other aspects beyond the scope of the health of patients can influence care needs; for example, interventions that target the location of services. For instance, an intervention that reduces the length of a hospital stay will have a major impact on the hours of informal care but may not directly affect the health of the patient, and hence the effect would not be captured by our model. Without disregarding these important considerations, the iCARE tool can be used to include informal care in economic evaluations based on patients’ EQ-5D data in research situations where it is not feasible to collect caregiver data. Furthermore, although the estimates are based on relatively weak associations, it is important to note that this may pose a problem for predictions at an individual level but not so much a problem for estimating the mean hours of care within a population of carers, as we do here. We recommend using the tool for samples that fall within the confidence intervals of the characteristics of our samples: men (age range: 47–104.2), women (age range: 55–103).

It is worth noting that there are equity implications from using different models for estimating caregiver burden for male and female patients. However, these are not a consequence of the model developed here, but a consequence of including caregiving costs in health economic models in general, as these will differ between men and women. For example, the Dutch costing manual for productivity losses specifies a different hourly wage for men and women, resulting in equity implications upon inclusion of productivity losses. However not all countries follow such recommendations. Contrary to the Dutch guidelines, the US guidelines do not recommend the use of gender-specific wage rates because of potential bias. As such, we acknowledge that the developed model here takes into consideration countries with similar recommendations to the Netherlands. We recommend taking this aspect into consideration when using the tool.

5 Conclusions

Informal caregivers make an important contribution to societal welfare. Including caregiving costs in economic evaluations is necessary in order to avoid the risk of labelling an intervention as cost effective when it actually does not maximize social welfare. This is supported by many national guidelines for cost-effectiveness studies advising researchers to adopt a societal perspective when identifying the relevant costs and effects of an intervention. This study is important because it facilitates the inclusion of informal caregiver effects in economic evaluations that lack this information. However, the algorithm we present is only a ‘second-best’ alternative, as actual data on caregivers will obviously be more accurate than predictions, and therefore preferable. Finally, although one may question the validity of the predictions of a model based on data from the Netherlands when used for other countries, the ranges of caregiver time seem reasonable, and we would argue that it is better to have a reasonable estimate than to completely neglect the effects of interventions on informal carers.