## Introduction

Economic evaluations that use Quality Adjusted Life Years (QALYs) as the measure of effectiveness, are often used to inform decisions regarding the efficient allocation of limited healthcare resources [1]. QALYs combine the years of life gained due to an intervention with the health-related quality of life (HRQoL) of those years gained, into a single index [2]. QALYs are calculated using the formula – years of life (survival) x utility value (quality of life) [3]. Preference-based instruments are used to calculate the utility component of QALY, and contain two main components: (a) a multi-attribute descriptive system that includes a set of ‘question and response’ categories that attempt to describe an individual’s health and (b) the utility formula or algorithm that converts the responses into an index of utility on a 0.00 (death) and 1.00 (perfect health state) scale [4].

Disease-specific measures of quality of life are more sensitive to the symptoms of a particular disease [5]. For example, a disease-specific questionnaire for heart disease will measure the severity of symptoms related to heart disease more accurately than a generic measure [6, 7]. However, most disease-specific quality-of-life instruments are non-preference-based and cannot directly generate utility estimates. Non-preference-based instruments are characterized by their ability to measure, but not value, health states, which limits their use in economic evaluations. Until recently, there were no heart disease-specific preference-based instruments; thus, generic instruments such as the EQ-5D were widely used [8]. To address the need for a heart disease-specific preference-based measure, we developed a new heart disease-specific classification system, MacNew-7D, in 2022 [9, 10].

Heart diseases are considered to be one of the leading causes of morbidity and mortality worldwide. Heart diseases are responsible for an estimated 17.9 million deaths each year, accounting for approximately 31% of all deaths globally, underscoring their status as a leading cause of mortality [11]. Advancements in science and technology have led to a surge in heart disease related interventions. Most of these interventions, however, come with a high cost, posing a significant challenge as resources are limited. Consequently, individual interventions need to be evaluated for their value for money in an effort to improve health system efficiency. CUA related to heart disease can be strengthened by using disease-specific preference-based instruments, such as the MacNew-7D. Utilising such instruments significantly increases the accuracy of the evaluations. Interventions that have used cardiac disease-specific, non-preference-based instruments need to convert non-preference-based scores to preference-based utility scores, which can be used in CUA. To achieve this, mapping algorithms can be deployed.

Mapping is a way of allocating utility values from preference-based instruments to disease-specific non-preference based instruments [12]. Many disease-specific non-preference-based quality of life instruments, including those related to heart disease, have been successfully mapped using mapping algorithms [13, 14]. The aim of this study was to create mapping algorithms that would allow scores of Kansas City Cardiomyopathy Questionnaire (KCCQ), Seattle Angina Questionnaire (SAQ), and Minnesota Living with Heart Failure Questionnaire (MLHF) to be translated into utility values that could be applied to cost utility studies of people with heart disease. The study will provide utility values for these quality-of-life instruments that can be used to calculate QALYs and facilitate cost utility analysis.

## Methods

### Study population

The analysis included data of 493 participants (180 with heart failure and 313 with angina) with documented evidence of heart failure and stable or unstable angina, recruited from cardiology out-patient clinics at Royal Brisbane and Women’s Hospital (RBWH) hospital between January 2018 and March 2018.

After obtaining written informed consent, study participants completed a three-sectioned questionnaire. The first section included the participant’s socio-demographic information such as age, sex, and diagnosis. The second section included their responses for MLHF, KCCQ, and SAQ. The third section included their response for MacNew Heart Disease health-related quality of life instrument (MacNew). Institutional ethics committee approval was obtained from the Griffith University Human Research Ethics Committee (reference no. 2117/069).

### Instruments

The source instrument for mapping KCCQ, MLHF, and SAQ and the target instrument was MacNew-7D.

#### Kansas City cardiomyopathy questionnaire

The KCCQ is a self-administered questionnaire developed to create a valid and disease-specific health status measure for patients suffering from heart failure [15]. The KCCQ consists of 7 domains with 12 items measured on Likert scales with 5–7 response options. The five domains quantified are physical limitation, symptom frequency, symptom severity, symptom burden, social limitations, self-efficacy, and quality of life. Scores are scaled from 0 to 100 with higher scores indicating better HRQoL [16].

#### Minnesota living with heart failure questionnaire

The MLHF Questionnaire is most commonly used for patients with heart failure, to quantify HRQoL [17] It includes two domains, physical and emotional, consisting of 21 items which are rated on a six point Likert scale [17]. The total score (or sum score) ranges from 0 to 105, with higher scores indicating worse HRQoL [17].

#### Seattle angina questionnaire

The SAQ is one of the most commonly used measures of disease-specific health status in patients with angina [18]. The questionnaire includes 5 domains with 19 items, quantifying physical limitation scale, angina stability scale, angina frequency scale, treatment satisfaction scale, and quality of life scale. Each domain can be scored between 0 and 100, with higher scores indicating better quality of life [18].

#### MacNew-7D questionnaire

Kularatna et al. used the original 27 itemed MacNew-7D questionnaire to develop the heart disease specific preference-based instrument, MacNew-7D [10]. The MacNew-7D classification system has seven dimensions with four levels in each: physical restriction; excluded from doing things with other people; worn out or low in energy; frustrated, impatient or angry; unsure and lacking in self-confidence; shortness of breath; and chest pain. These seven dimensions with four levels allow 16,284 (4 [7]) possible health states to be defined and the utility value set range from − 0.4456 (minus value indicating worse than death) to 1.000 (perfect health) for health states defined by the classification system [10].

### Statistical analysis

Socio-demographic characteristics were summarised using mean (standard deviation [SD]) and median (interquartile range [IQR]) for continuous variables and frequency (percentage) was used for categorical variables. Spearman’s correlation was used to describe the correlation between the KCCQ, MLHF, and SAQ scores and the MacNew-7D utility scores. Guildford’s criteria was used to interpret the magnitude of correlation coefficient [19]. As per the criteria, the correlation coefficient is divided into five categories on the bases of their strength of association. These categories are as follows; very low (r: 0.00–0.20), low (r: 0.21–0.40), moderate (r: 0.41–0.60), high (r: 0.61–0.80) and very high (r: 0.81–1.00).

Direct response mapping was used in this study. Currently, there is not one specific regression method that is considered to be the best predictive model that fits all data sets [19]. In order to compensate for this uncertainty, four regression techniques were used on the same data set during direct mapping and the best method was decided on the basis of validation parameters. The four methods used were Gamma GLM, Bayesian GLM, Linear regression with stepwise selection and Random Forest. Each of the four techniques has the capacity to cope with ceiling effect, heteroscedasticity, skewness and/or the potential presence of outlier [19].

Regression techniques were used to develop the direct mapping algorithm. Six sets of independent variables were considered to predict the MacNew-7D utility score: prediction using KCCQ, MLHFQ and SAQ domain scores (n = 3) and total scores (n = 3).

On the basis of previous literature, squared terms of item scores and domain scores were added as independent variable in order to account for the non-linear relationship between MacNew-7D utility scores and KCCQ, MLHFQ, and SAQ score [19]. Age was included in order to improve the predictive performance.

Model1

$$\eqalign{MacNew\,7D\,utility\,score &= {\beta}_{0}+ \sum _{j=1}^{m}{\beta }_{j}* \left(\begin{array}{c}KCCQ\\ MLHFQ\\ SAQ\end{array}\right)\text{i}\text{t}\text{e}\text{m} \text{s}\text{c}\text{o}\text{r}\text{e}\cr &\quad+\sum _{j=1}^{m}{\beta }_{j}* \left(\begin{array}{c}KCCQ\\ MLHFQ\\ SAQ\end{array}\right){\text{i}\text{t}\text{e}\text{m} \text{s}\text{c}\text{o}\text{r}\text{e}}^{2}\cr &\quad+ {\beta }3\text{*}\text{A}\text{g}\text{e}}$$

Model2

$$\eqalign{MacNew\,7D\,utility\,score &= {\beta}_{0}+ \sum _{j=1}^{m}{\beta }_{j}* \left(\begin{array}{c}KCCQ\\ MLHFQ\\ SAQ\end{array}\right)\text{d}\text{o}\text{m}\text{a}\text{i}\text{n} \text{s}\text{c}\text{o}\text{r}\text{e}\cr &\quad+\sum _{j=1}^{m}{\beta }_{j}* \left(\begin{array}{c}KCCQ\\ MLHFQ\\ SAQ\end{array}\right){\text{d}\text{o}\text{m}\text{a}\text{i}\text{n} \text{s}\text{c}\text{o}\text{r}\text{e}}^{2}\cr &\quad+ {\beta }3\text{*}\text{A}\text{g}\text{e} }$$

### Assessing regression model performance

Goodness of fit of the models was assessed using root mean square (RMSD), R-squared value, and mean absolute error (MAE). RMSD is calculated as the root square value of the mean squared differences between the actual and predicted MacNew-7D [19]. It is the sum of variance and squared bias, where bias is representative of the difference between the population’s true value and the predictive value [20]. The mean absolute error is calculated as the mean of the absolute differences between the actual and predicted MacNew-7D scores [19]. Greater preference was put on MAE performance as it is less sensitive to outliers and easy to interpret [19]. Furthermore, AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) were also used for model selection, where a lower value suggests a better-fitting model.

Due to the absence of an external validation dataset, we employed a three-fold cross-validation strategy to evaluate the performance of our regression models, using the ‘trainControl’ function from the ‘caret’ package in R. In three-fold cross-validation, the entire dataset is divided into three equal subsets or “folds”. The model is then trained on two of these folds (or two-thirds of the data) and validated, or tested, on the remaining third, and this process is repeated three times so that each fold serves as the validation set once. The performance indicators of the model (RMSD, R-squared value, and MAE) were then averaged over the three folds to obtain a more robust and generalized measure of the model’s predictive performance. The “Mapping onto Preference-based measures reporting Standards (MAPS) checklist was followed in this study.

To comprehensively evaluate the performance of our mapping algorithms across various health states, we employed a simulation technique [21]. This method is particularly advantageous in situations where analysts need to use the models to simulate individual level MacNew-7D data, rather than generate expected (e.g., mean cohort) MacNew-7D values. To assess the model’s ability to accurately capture uncertainty, we compared the simulated MacNew-7D scores from our best-performing models with the original observed data. Such validation is essential in cost-effectiveness analysis, which often relies on long-term projections and simulations involving numerous hypothetical patients. To conduct the simulations, we incorporated both the patient-specific explanatory variables and the random error terms inherent to the statistical model. This approach ensures that our model’s predictions encompass both the systematic and random variations observed in real-world data. In order to demonstrate the model’s predictive accuracy and address uncertainty, we generated 1000 simulated data points for each observation in the three datasets (KCCQ, MLHFQ, and SAQ). Subsequently, we depicted the results by plotting the cumulative distribution functions (CDFs) for each dataset.

All statistical analyses were performed using R Software version 4.1.0.

## Results

### Sample characteristics

A total of 493 participants took part in the study and were divided into two samples. Sample one included patients diagnosed with heart failure (n = 180) and sample two included patients diagnosed with angina (n = 313) (Table 1). Sample one completed the KCCQ & MLHFQ and sample two completed the SAQ. The mean age of the study participants for sample one was 60.8 (SD 14.2) and more than half (68%) were males. The mean MacNew-7D utility was 0.726 (SD 0.196) and the median was 0.720 (IQR 0.622–0.899). For sample two, the mean age of the study participants was 64.5 (SD 10.5) and more than half (74.7%) were males. The mean MacNew-7D utility was 0.735 (SD 0.202) and median was 0.769 (IQR 0.594–0.921).

Table 1

Patient characteristics

Characteristic | Sample 1 (n = 180) (KCCQ & MLHFQ) | Sample 2 (n = 313) (SAQ) |
---|---|---|

Age (years) | ||

Mean (SD) | 60.8 (14.2) | 64.5 (10.5) |

Median (IQR) | 62.0 (52.0–70.2) | 64.8 (58.6–73.0) |

Sex | ||

Male (%) | 123 (68.3) | 234 (74.7) |

Female (%) | 57 (31.7) | 79 (25.3) |

Mac New 7D utility | ||

Mean (SD) | 0.726 (0.196) | 0.735 (0.202) |

Median (IQR) | 0.720 (0.622–0.899) | 0.769 (0.594– 0.921) |

Kansas City Cardiomyopathy Questionnaire (Max score 100) | ||

Physical Limitation - Mean (SD) | 67.7 (27.2) | |

Symptom Stability - Mean (SD) | 53.4 (17.8) | |

Symptom Frequency - Mean (SD) | 49.6 (26.3) | |

Symptom Burden - Mean (SD) | 78.7 (22.4) | |

Self-Efficacy - Mean (SD) | 80.6 (22.3) | |

Quality of Life - Mean (SD) | 72.1 (25.7) | |

Social Limitation - Mean (SD) | 67.3 (29.2) | |

Minnesota Living with Heart Failure | ||

Physical dimension score (max 40) - Mean (SD) | 27.4 (10.2) | |

Emotional dimension score (max 25) - Mean (SD) | 19.0 (6.7) | |

Seattle angina questionnaire | ||

Physical Limitation scale | 70.2 (25.5) | |

Angina stability scale | 52.3 (19.7) | |

Angina frequency scale | 86.7 (20.4) | |

Treatment satisfaction scale | 86.8 (17.6) | |

Quality of life scale | 70.2 (25.5) |

The study showed a strong correlation (p < 0.001) between the MacNew-7D and several domains across the KCCQ, MLHFQ, and SAQ. For KCCQ, this included physical limitation (r = 0.62), symptom frequency (r = 0.58), symptom burden (r = 0.67), quality of life (r = 0.68), and social limitation (r = 0.65). A moderate correlation was found with self-efficacy (r = 0.34), while a weak correlation was found with symptom stability (r = 0.09). A highly strong correlation was observed between MacNew-7D and the two domain scores of MLHFQ (r = 0.82 and r = 0.78). A strong correlation (p < 0.001) was observed between MacNew-7D scores and the physical limitation scale (r = 0.66), angina frequency scale (r = 0.54), and quality of life scale (r = 0.59) domains of the SAQ. Treatment satisfaction scale (r = 0.39) showed moderate correlation (Table 2).

Table 2

Correlation coefficients of MacNew-7D and the domains of the three heart disease-specific quality of life instruments

Instrument and domains | Spearman correlation | P-value |
---|---|---|

Kansas City Cardiomyopathy Questionnaire | ||

Physical Limitation | 0.62 | < 0.001 |

Symptom Stability | 0.09 | 0.2122 |

Symptom Frequency | 0.58 | < 0.001 |

Symptom Burden | 0.67 | < 0.001 |

Self-Efficacy | 0.34 | < 0.001 |

Quality of Life | 0.68 | < 0.001 |

Social Limitation | 0.65 | < 0.001 |

Minnesota Living with Heart Failure | ||

Physical dimension score | 0.82 | < 0.001 |

Emotional dimension score | 0.78 | < 0.001 |

Seattle angina questionnaire | ||

Physical Limitation scale | 0.66 | < 0.001 |

Angina stability scale | 0.06 | 0.2364 |

Angina frequency scale | 0.54 | < 0.001 |

Treatment satisfaction scale | 0.39 | < 0.001 |

Quality of life scale | 0.59 | < 0.001 |

### Validation

In the absence of an external validation dataset, predictive performance of the models was assessed using a three-fold cross validation method. All models were assessed for goodness of fit using the RMSD, R-squared and MAE. The best models to predict the MacNew-7D utility scores using the KCCQ, MLFHQ, and SAQ item scores and domain scores were selected based on their performance in the cross-validation step, with more weight put on the MAE following evidence in the literature.

Adding squared terms of both the items and the domain scores to the GLM and Bayesian GLM, led to no improvement in the MAE. Therefore, these squared terms were excluded from the final model of these two regression models. However, the squared terms were retained in the other two models (Linear Regression and Random Forest) as the inclusion led to an improvement in the MAE. In the case of the Gamma GLM, Bayesian GLM, and Linear Regression, only those variables that were statistically significant were included in the final regression model. This strategy applied to both item and domain level predictions. In the Random Forest models, adding squared terms of the items and domains did result in an improvement in the MAE across all models, with the exception of the SAQ item level prediction model, where squared terms were excluded.

### Best performing models

For KCCQ prediction using item scores, the Random Forest model was observed to be the best method with the lowest MAE (0.0929), highest R-squared (0.5961) and the lowest RMSD (0.1272). Gamma GLM most accurately predicted minimum and maximum utility values (difference of 0.0020 & 0.0491 to observed values). In MLHFQ, Random Forest had the lowest MAE (0.0818), while Bayesian GLM had the highest R-squared (0.7001) and lowest RMSD (0.1090). Gamma GLM predicted most accurately minimum and maximum utility values (difference of 0.0025 & 0.0657 to observed values). In the SAQ, Random Forest had the lowest MAE (0.0993), highest R-squared (0.5617), and lowest RMSD (0.1346). Gamma GLM predicted most accurately the minimum utility value (difference of 0.0496 to observed value) and Linear regression with stepwise selection predicted most accurately the maximum utility value (difference of 0.0657 to observed value). Based on MAE performance, the Random Forest method is observed to be the best model for predicting utility value using item scores for KCCQ, MLHF, & SAQ. However, it tended to over-predict severe health states. For example, the predicted minimum utility value for KCCQ, MLHF, & SAQ had a difference of 0.17979, 0.2772, & 0.1655 respectively to the observed values.

For KCCQ prediction using domain scores, Random Forest was observed to be the best method with the lowest MAE (0.0939). Gamma GLM predicted minimum & maximum utility value (difference of 0.1736 and 0.0131 to observed value) most accurately. In MLHF, Bayesian GLM had the lowest MAE (0.0828), highest R-squared (0.6928), and lowest RMSD (0.1097). Random Forest predicted most accurately minimum & maximum utility value (difference of 0.2331 & 0.056 to observed value). In SAQ, Linear regression with stepwise selection had the lowest MAE (0.0995), highest R-squared (0.5396), and lowest RMSD (0.1377) and also predicted most accurately the maximum utility value (difference of 0.323 to observed value). Bayesian GLM most accurately predicted the minimum utility value (difference of 0.1926 to observed value). Based on MAE performance, the random forest method was found to be the best model for predicting utility values using domain scores for KCCQ. Similar to item score, it tended to over-predict severe health states (difference of 0.2364 to observed value). For MLHF and SAQ, the Bayesian GLM was the best model, however, it under predicted severe health states (difference of 0.2543 and 0.1926 respectively).

AIC and BIC values are outlined in supplementary Table 1. These criteria were not applicable to the Random Forest models due to the nature of this non-parametric approach. The analysis revealed that for domain level predictions, the MLHFQ Bayesian GLM and the SAQ Linear Regression with stepwise selection demonstrated the lowest AIC and BIC values, indicating their superior fit among the evaluated models, consistent with other performance indicators.

Figure 1 illustrates the scatter plots of actual vs. predicted MacNew-7D using the selected best performing models. A strong linear positive correlation is observed between the actual and predicted MacNew-7D scores when using item scores for KCC, SAQ, & MLHF as well as for KCC & MLHF when using domain scores, with majority data points lying close to the line of best fit. However, when predicting MacNew-7D values using domain scores for SAQ, a moderately strong positive correlation (r = 0.766) with data points scattered away from the line of best fit are observed. The Bland-Altman plot in Fig. 2 shows a vast majority of data points lying close to the mean difference and within the limits of agreement, indicating good agreement between the actual and the predicted MacNew-7D values.

The cumulative distribution function plots (Fig. 3) exhibit a strong level of agreement between the observed and simulated MacNew 7D utility scores across the three datasets, KCC, MLHFQ, and SAQ. Both the item and domain scores demonstrate that the simulation model accurately captures the distribution of the actual data, indicating a precise fit. This is particularly evident as the cumulative distribution function curves for the simulated data closely mimic those of the observed data across the entire range of utility scores. It is worth noting that in the MLHFQ and SAQ domain scores, there is a slight deviation near the upper limit, which suggests potential differences at extreme values. Overall, the consistency of these plots across different instruments emphasises the reliability of our simulation model in reflecting real-world patient-reported outcomes.

Online resource files contain R scripts and detailed descriptions which can be used to predict the MacNew-7D utility scores from the best prediction models (Table 3).

Table 3

Goodness of fit results from three-fold cross-validation

Instrument | Method | Mean utility | Min utility | Max utility | RMSD | R-squared | MAE |
---|---|---|---|---|---|---|---|

Prediction using item scores | |||||||

KCCQ (n = 180) | Observed | 0.7264 | 0.0159 | 1.0000 | |||

Gamma GLM (link = identity) | 0.7408 | 0.0179 | 1.0491 | 0.1498 | 0.5237 | 0.1148 | |

Bayesian GLM | 0.7264 | 0.2720 | 0.9509 | 0.1323 | 0.5573 | 0.0991 | |

Linear regression with stepwise selection | 0.7264 | 0.2625 | 0.9569 | 0.1459 | 0.4798 | 0.1091 | |

Random forest | 0.7274 | 0.1956 | 0.9800 | 0.1272 | 0.5961 | 0.0929 | |

MLHFQ (n = 180) | Observed | 0.7264 | 0.0159 | 1.0000 | |||

Gamma GLM (link = identity) | 0.7378 | 0.0184 | 1.0657 | 0.1299 | 0.6082 | 0.0990 | |

Bayesian GLM | 0.7264 | 0.2741 | 0.9256 | 0.1090 | 0.7001 | 0.0861 | |

Linear regression with stepwise selection | 0.7264 | 0.2512 | 0.9513 | 0.1414 | 0.5480 | 0.1058 | |

Random forest | 0.7265 | 0.2931 | 0.9334 | 0.1144 | 0.6686 | 0.0818 | |

SAQ (n = 317) | Observed | 0.7351 | 0.1373 | 1.0000 | |||

Gamma GLM (link = identity) | 0.7353 | 0.1869 | 0.9681 | 0.1455 | 0.4924 | 0.1108 | |

Bayesian GLM | 0.7351 | 0.2599 | 0.9710 | 0.1435 | 0.4993 | 0.1080 | |

Linear regression with stepwise selection | 0.7351 | 0.3425 | 1.0131 | 0.1610 | 0.4130 | 0.1163 | |

Random forest | 0.7357 | 0.3524 | 0.9431 | 0.1346 | 0.5617 | 0.0993 | |

Prediction using domain scores | |||||||

KCCQ (n = 174) | Observed | 0.7314 | 0.0159 | 1.0000 | |||

Gamma GLM (link = identity) | 0.7364 | 0.1895 | 1.0567 | 0.1538 | 0.5192 | 0.1168 | |

Bayesian GLM | 0.7314 | 0.2468 | 0.9522 | 0.1245 | 0.6003 | 0.0955 | |

Linear regression with stepwise selection | 0.7314 | 0.2757 | 0.9673 | 0.1373 | 0.5298 | 0.1024 | |

Random forest | 0.7317 | 0.2523 | 0.9498 | 0.1275 | 0.5799 | 0.0939 | |

MLHFQ (n = 180) | Observed | 0.7264 | 0.0159 | 1.0000 | |||

Gamma GLM (link = identity) | NA | NA | NA | NA | NA | NA | |

Bayesian GLM | 0.7264 | 0.2702 | 0.9393 | 0.1097 | 0.6928 | 0.0828 | |

Linear regression with stepwise selection | 0.7264 | 0.2702 | 0.9393 | 0.1137 | 0.6675 | 0.0855 | |

Random forest | 0.7269 | 0.2490 | 0.9440 | 0.1149 | 0.6691 | 0.0834 | |

SAQ (n = 313) | Observed | 0.7355 | 0.1373 | 1.0000 | |||

Gamma GLM (link = identity) | 0.7351 | 0.3467 | 0.9380 | 0.1379 | 0.5391 | 0.1012 | |

Bayesian GLM | 0.7355 | 0.3299 | 0.9431 | 0.1403 | 0.5199 | 0.1024 | |

Linear regression with stepwise selection | 0.7355 | 0.3970 | 0.9677 | 0.1377 | 0.5396 | 0.0995 | |

Random forest | 0.7354 | 0.3598 | 0.9450 | 0.1397 | 0.5278 | 0.1027 |

×

×

×

## Discussion

This study aimed to map three commonly used cardiac disease health-related quality of life questionnaires to the MacNew-7D utility values in patients with heart disease. The results provide insights into the relationships between these instruments and the utility values derived from the MacNew-7D. Due to the lack of comparable mapping studies between KCCQ, MLHF, & SAQ to MacNew-7D it is difficult to perform a direct comparison of validity parameters of the study to current literature. This mapping facilitates estimation of utility values from any of the heart disease specific quality of life instruments used in this analysis.

The correlation analysis demonstrated significant associations between MacNew-7D and the various domains of the questionnaires. Moderate correlations were observed between MacNew-7D and all domains of KCCQ, MLHF, and SAQ except the symptom stability domain of KCCQ. A moderate correlation indicates that the MacNew-7D is valid and is capturing aspects of health-related quality of life that are relevant to the specific domains of these questionnaires. This may be important for clinicians and researchers who use these instruments to assess the impact of interventions or disease progression on patients’ health-related quality of life [22‐24]. Moreover, it may indicate valuable information about the effectiveness of treatments or interventions targeted at improving the specific domains of interest. Previously, policy makers and researchers have utilised mapping results to estimate utility scores, QALYs, and consequently incremental cost effectiveness ratios (ICERs), and incorporated them into decision-making processes in different research fields, for example, dialysis treatment [25], epilepsy management [26], and joint replacement surgeries [27]. Our study may facilitate economic evaluations, for example QALYs and cost-utility analyses in heart disease. This may be used for decision-making related to resource allocation, reimbursement decisions, health policy formulation, and treatment guidelines in the field of cardiovascular diseases. The MacNew-7D may instigate a more precise and specific assessment of HRQoL in individuals with heart disease, that may further play a key role in delivering informed and targeted healthcare decisions.

Due to the absence of an external validation dataset, a three-fold cross-validation method was employed to validate the mapping models. Cross-validation is a prevalent method used in machine learning when there is limited dataset or when an external validation dataset is not available [28]. It involves splitting the dataset into multiple subsets [28]. By using cross-validation, an analyst can estimate how well the mapping models generalize to unfamiliar data within the same dataset [29]. In this case, it assessed the model’s ability to obtain the relationships between the predictors (e.g., KCCQ, MLHFQ, SAQ) and the target variable (MacNew-7D utility scores) and indicated the model’s performance on new data from the similar populations. Cross-validation is a broadly used method in various fields including cancer studies, dermatology, and orthopaedics [30‐32]. In these studies, performance of the models was assessed using RMSD, R-squared, and MAE. Therefore, cross-validation provides a rigorous approach to assess the performance and generalisability of mapping models in the absence of an external validation dataset. In our analysis, we observed that the R-squared, RMSE, and MAE values were higher compared to some mapping studies, including the study by Klapproth et al. [33]. This study, which developed optimal models for mapping the EQ-5D-5 L crosswalk from the PROMIS-29 in the UK, France, and Germany, reported lower values of these metrics. Specifically, their nRMSE values were 0.076, 0.075, and 0.079 for the UK, France, and Germany, respectively, which are lower than those observed in our study. The differences in these metrics between our study and studies like Klapproth et al. [33] could be attributed to various factors, including the nature of the health conditions assessed, the specific patient populations involved, and the methodological approaches employed in model development. These factors underscore the complexity and variability inherent in mapping studies and highlight the need for context-specific considerations in the interpretation and application of such models.

For the KCCQ and SAQ, the Random Forest model exhibited the best performance. In general, the Random Forest model has been found to outperform other models in several studies in different healthcare contexts [34‐36]. The Random Forest algorithm’s ability to handle complex interactions, high-dimensional data, and reduce overfitting makes it a popular choice in machine learning-based healthcare research [37, 38]. However, it is important to note that the choice of the best-performing model may vary depending on the specific dataset and research question at hand [37]. The gamma GLM model accurately predicted the minimum and maximum utility values for KCCQ and SAQ. The gamma GLM model is particularly suitable for handling skewed and non-normally distributed data, which makes it a popular choice in mapping studies [39]. However, its performance in accurately estimating extreme values depends on the underlying distribution of the data and the assumptions of the model [39]. Other mapping studies also found that the gamma GLM model best predicted minimum and maximum utility value [40‐42].

Similarly, for the MLHFQ, the Random Forest model achieved the lowest MAE, while the Bayesian GLM model had the highest R-squared and lowest RMSD. The gamma GLM model accurately predicted extreme utility values. The Bayesian GLM model has performed well in comparatively few studies. One such study aimed to map the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire (EORTC QLQ-C30) onto the EuroQol Five-Dimensional Five-Level (EQ-5D-5 L) questionnaire in lung cancer patients. This study had highest R-squared and lowest RMSD compared to other mapping models tested [31]. The Bayesian approach in modelling allows for the incorporation of prior knowledge or beliefs about the parameters and uncertainties into the model [43]. Moreover, it provides a framework to estimate posterior distributions of the model parameters and to make probabilistic inferences about the relationships between variables [43]. It should be noted, however, that the random forest models tended to over-predict severe health states across all three questionnaires. This may imply that the predicted utility values for individuals with poorer health or more severe symptoms were higher than the observed values. In other words, the Random Forest models underestimated the impact of the disease on the quality of life for individuals with more severe cardiovascular conditions. This observation has practical implications in healthcare decision-making and resource allocation. Over-predicting severe health states could potentially lead to an underestimation of the burden of the disease and may affect the allocation of healthcare resources. However, over and under-predicting are common issues in mapping [22, 33, 44‐46]. These may occur due to several reasons including when the model extrapolates beyond the observed data, when there is significant disparity in the distribution of different classes or categories, when the model fits the training data too closely, or if the training data has a limited representation of extreme cases [47, 48].

In comparing our study’s findings to existing mapping studies to EQ-5D, the most commonly used utility measure, we note several interesting parallels and distinctions. For instance, a study mapping the SAQ to EQ-5D in coronary health disease patients in China reported correlations (ranging from 0.62 to 0.71) similar to our findings. However, the smaller sample size in that study might limit comparability [49]. Another research mapping SAQ to EQ-5D observed weaker correlations (ranging from 0.18 to 0.59) compared to ours [50]. Regarding studies mapping the KCCQ to EQ-5D, they generally reported lower R-squared statistics (0.48 to 0.50) than what we found in our study [51]. Similarly, Hunger et al., in developing a mapping algorithm for Japanese and UK value sets, also reported lower R-squared statistics (0.45 to 0.52) [52]. This contrast may highlight the high sensitivity of MacNew-7D in assessing heart disease impacts. MacNew-7D, being specifically tailored to heart disease, including key dimensions like shortness of breath and chest pain [7], might capture nuances of heart disease effects more effectively than generic measures.

This suggests that for heart disease patients, a disease-specific instrument like MacNew-7D can provide a more accurate evaluation of HRQoL compared to generic utility measures. The initial concept of QALY was to enable comparison of cost-effectiveness across different conditions, leading to the prevalent use of generic preference-based measures. However, the recent trend towards condition-specific preference-based measures, such as MacNew-7D, has sparked a debate about the most effective approach for capturing patient experiences. Our MacNew-7D health state valuation study specifically highlights the substantial disutility linked to dimensions like shortness of breath and chest pain, aspects not as comprehensively captured by generic measures like EQ-5D. This suggests that while generic measures offer broad comparability, condition-specific measures can provide more detailed insights into certain patient experiences, especially when symptoms have significant clinical implications.

### Strengths and limitations

Including relatively large sample size and thorough validation process using cross-validation are few of the strengths of this study. The mapping models predicted MacNew-7D utility scores well, particularly item scores. Although MAE, RMSE, and R-squared are reliable parameters, these measures have limitations. It is recommended to consider multiple evaluation metrics and examine other aspects including the context and research question of the study, the interpretability of the model, and requirements of the analysis.

There are some limitations to consider. Firstly, the study sample only consisted of patients with heart failure and angina, which may limit the generalizability of the findings to other cardiovascular conditions. Additionally, the absence of an external validation dataset may introduce some uncertainty in the generalizability of the mapping models. Furthermore, the under-prediction of severe health states by the Random Forest models warrants further investigation and potential refinement of the mapping algorithms. Our study did not explore the use of mixture models due to the modest sample size and the complexity it would introduce. Recognizing this as a limitation, future studies with larger datasets may benefit from considering mixture models to potentially uncover latent sub-populations within heart disease patients, enhancing the precision of mapping algorithms.

## Conclusion

In conclusion, the mapping of the KCCQ, MLHFQ, and SAQ to the MacNew-7D in patients with heart disease showed promising results. The correlations between the questionnaires and the utility values derived from the MacNew-7D suggest that the MacNew-7D captures important aspects of health-related quality of life specific to cardiovascular disease. The mapping models demonstrated reliable predictive performance, although some limitations should be considered. Further research is needed to validate and refine the mapping algorithms and to explore their applicability to a wider range of cardiovascular conditions.

## Acknowledgements

Not applicable.

## Declarations

### Competing interests

Not applicable.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.