INTRODUCTION

An inclusive in-hospital patient management with appropriate resource allocation is the current challenge of healthcare systems.1, 2 Improved diagnostic and therapeutic measures have increased life expectancy, but often do not achieve long-lasting cure.3 These patients are at high risk for an increased length of stay (LOS) and planning of post-acute care is neglected during hospitalization, increasing risk for readmission.4 Thus, early and efficient post-acute care planning—integrated into an accurate risk stratification5—has the potential to optimize acute care resources by reducing LOS while preventing functional disability by preventing frequent patient transfers.6,7,8

Several algorithms for prediction of 30-day readmission risk have been proposed.9 Their purpose is to identify early patients at risk for unplanned hospital readmission to potentially prevent these events. Yet, the majority of these scores have not been externally validated.10 Before the publication of the “Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis” (TRIPOD) statement, there was no consensus how to report prediction modeling studies.10 Although risk scores perform best in the environment where they were developed, researchers and clinicians might be interested in the merits of a score in a different setting. This can guide adaptation or recalibration of an existing score to one’s own requisites or highlight the need for developing a new score.

In this study, we validated six risk scores previously published, and compared their prognostic performance head-to-head in a large, contemporary, and independent cohort of medical inpatients in a Swiss tertiary care center. This report adheres to the TRIPOD statement for reporting of prediction models.

METHODS

Information on the Studies Validated

Tsui et al. reported a risk score based on insurance data from Hong Kong on 2.3 million admissions.11

Gildersleeve et al. used their own rural-based hospital database (16,889 admissions).12 Zapatero et al. generated their SEMI Index based on a nationwide Spanish insurance registry encompassing almost 1 million internal medicine admissions.13 The PARA was established out of a Swiss tertiary care hospital cohort with 11,074 stays.14 The LACE score was derived on data derived from 11 hospitals in Ontario (4812 admissions) and externally validated in an insurance cohort with 1 million admissions.15 Donzé et al. based their HOSPITAL score on 10,731 admissions stemming from insurance data of 3 hospitals located in the Boston (USA) area.16 For further information, please consult Table 2 and the corresponding section in the supplementary material.

Source of Data and Participants

Based on a systematic review on readmission risk scores from 2016,17 we identified studies on medical inpatients with a predefined overall area under the curve (AUC) of at least 0.7. We picked this cutoff as models with this discriminative ability are deemed useful.9 Additionally, we performed a literature search on PubMed for studies published after 2015 including the terms “readmission” and “risk factors,” while excluding studies with the terms “surgery” and “operation.”

Our data is based on a retrospective analysis of our prospective cohort studies on medical inpatients (i.e., TRIAGE and InHospiTOOL) of which the protocols have been published.18, 19 For this analysis, we used data from our suburban Swiss center only. From January 1, 2016, through February 28, 2019, we collected data on every patient aged 16 years or older being admitted to our medical wards. Exclusion criteria are depicted in Figure 1 and were adopted from the original studies. As part of our quality control, patients were contacted by telephone 30 days after admission to ask for quality measures (focused on their functional performance) and to identify readmissions. If patients or relatives could not be reached, we reached out to the primary care physician, post-acute care institutions, or the registry office at the place of residence.

Figure 1
figure 1

Flowchart with exclusion of patients.

As a retrospective, quality control study without any effects on individual patient outcomes, the ethical review board (EKNZ) waived the need for individual patient informed consent. The study has been conducted according to the principles of the Declaration of Helsinki.

Outcomes

The primary outcome was the first unplanned readmission within 30 days after the index admission. Unplanned readmission was defined as every readmission occurring within 30 days.

To provide a better overview of the predictive abilities on distinct patient populations, we performed subgroup analyses based on our main admission specialties.

Predictors

Our database compromises characteristics such as health insurance, demographics, and clinical features (e.g., diagnoses, medications, and laboratory). Comorbidities were evaluated using the Charlson Comorbidity Index20 based on information at hospital discharge. Hospitalization characteristics include LOS, number of admissions during the past year before index hospitalization, and visits to our emergency department (ED). Swiss procedure codes (CHOP codes) were used as indicators in terms of interventions during a stay. This system was originally based on the 2008 ICD-9-CM codes, but was later adapted to the Swiss healthcare system and is updated annually.21

This study is based on our TRIAGE/InHospiTOOL (not to be confused with the HOSPITAL score) study cohort.18, 19 Therein, we decided to explicitly store laboratory results from admission instead of discharge. Hence, we had to access laboratory results (i.e., sodium and hemoglobin) at discharge through our electronic health record system (EHR). We were not able to get discharge values before 2015 because the corresponding interface has not been instituted. As a proxy, we used admission values by carrying them forward.22

In case of repeated hospitalizations more than 30 days apart, each admission was eligible to become a new index case. Data from the EHR are anonymized and entered into our data warehouse. In contrast to the SEMI index13 and the HOSPITAL score,16 we used ICD-10 codes instead of ICD-9 codes. Further, it was unclear how the diagnoses were determined by Tsui et al.11; thus, we used ICD-10 diagnoses as appropriate.

Statistics, Sample Size, and Missing Data

We collected all information on variables out of the derivation studies. If information on β-coefficients and intercept were provided,16 we used them for replication of the score (probability \( P=\frac{1}{1-{e}^{\mathrm{Intercept}-b\times \mathrm{predictors}}}\ \Big) \). If coefficients or intercept were not reported, we emailed the corresponding authors.12, 14, 15 Otherwise, simplified scores were applied, and points were summarized as suggested by the authors.11, 13 We did not perform any model updating.

We assessed calibration graphically by plotting the observed risks on the y-axis against the predicted risks on the x-axis augmented by a lowess line. For comparison of discriminatory power, we calculated the AUC/C-statistic by a non-parametric model. All available admissions from our cohort were used without a priori sample size calculation.

As some of the data were gathered in clinical routine outside our TRIAGE study, we had missing data on the following predictors: number of medications, hemoglobin and sodium at discharge. As our database was still quite comprehensive, we primarily performed a complete case analysis (see Fig. 1). To gauge impact of missing data, we also performed an exploratory analysis based on multiple imputations (see supplementary material). Data are presented as medians and IQR [interquartile range], or as counts and frequencies as appropriate.

All significance tests were based on a two-sided α-error of 0.05. Statistical analysis was conducted using Stata software version 15.1 (Stata Corp., College Station, TX, USA).

RESULTS

Out of 11,575 patients with 15,639 admissions, we recorded 1149 readmissions (7.3%) within 30 days of admission (see Fig. 1). The baseline characteristics among the patients included in the analysis compared with the overall cohort before application of exclusion criteria are presented in Table 1. Before application of exclusion criteria, the 30-day death rate was higher. On the other hand, LOS was longer.

Table 1 Baseline Characteristics of Patients Before and After Application of Exclusion Criteria

Data on individuals with missing values per predictor and outcome, and comparison of characteristics of individuals with any missing value and those with completely observed data are provided in supplementary table 1.

Out of the systematic review by Zhou et al.,9 we retrieved 13 studies reporting risk scores with an AUC > 0.7. Out of these, our data sufficed to replicate the following six risk scores: LACE Index, HOSPITAL Score, SEMI Index, RRS Score, PARA, and a score from Hong Kong by Tsui et al.11,12,13,14,15,16 The findings and cohort characteristics of these studies are summarized in Table 2. Scores for which we did not have sufficient data were mainly data-driven machine learning approaches based on North American hospital data,23,24,25,26,27,28,29 except one study from the UK.29

Table 2 Characteristics and Predictors of the Validated Scores (in Cases of Derivation and Internal Validations, Only Data for Validation Shown)

Differences Between Validation and Development Cohorts

Readmission rate in our cohort was 7.3%, whereas it ranged from 7.0 to 14.8% in the derivation cohorts. Characteristics of the cohorts and readmission rates are summarized in Table 2. The cohorts of the LACE, HOSPITAL, and SEMI scores are based on insurance data. Our cohort, PARA, and RRS scores are based on hospital data, whereas the score from Hong Kong is based on a mixture of the two.

Table 3 Discriminative Ability of Each Score in the Original Publication and in Our Cohort

Calibration

Calibration was assessed graphically by tenths of the predicted risk (in the case of validation based on coefficients) or in the case of reported scores by the intervals of the score (see supplementary figure 1). Additionally, we provide a tabular overview (see supplementary table 2). Overall, calibration was rather poor, except for the HOSPITAL score and SEMI Index. Most models tended to overpredict the risk of readmission.

Discrimination

AUCs for the risk scores were as follows: LACE Index AUC 0.53 (95% CI 0.50–0.56), HOSPITAL Score AUC 0.73 (95% CI 0.72–0.74), SEMI Index AUC 0.47 (95% CI 0.46–0.49), RRS Score AUC 0.64 (95% CI 0.62–0.66), PARA AUC 0.73 (95% CI 0.72–0.74), and a score from Hong Kong by Tsui et al. AUC 0.73 (95% CI 0.72–0.75) (see Table 3). Subgroup analysis is presented in a forest plot (see Fig. 2). The plot is weighted on the number of patients for each subgroup. Performance in subgroups did not differ from the overall performance, except for oncology patients in the PARA score and nephrology patients in the SEMI index, respectively.

Figure 2
figure 2

Forest plot depicting overall discriminative performance of each score and for distinct subgroups of patients.

DISCUSSION

To our knowledge, this study is the first comprehensive comparison of risk scores head-to-head in one patient population. As EHR systems have become ubiquitous, it is tempting to use risk scores as clinical decision support tools. By the reason that such tools can have a strong influence on clinicians’ decisions, they have been thoroughly assessed for their performance. Authors of all risk scores performed validation by splitting their original cohort and reported an AUC > 0.7 in their validation cohort. We found that the performance of the scores applied to our Swiss, tertiary care, single-center, non-surgical dataset varied greatly with AUCs ranging from 0.47 to 0.73 and with rather poor calibration. Subgroup analysis showed that scores are consistent across various medical subspecialties. The two outliers (i.e., oncology in the PARA score, and nephrology in the SEMI index) might be explained by the low number of cases in these subgroups.

Any prediction score is prone to overestimate results in its original dataset by non-parsimonious addition of variables (i.e., overfitting).10, 30 Thus, it is of utmost importance to test its predictive abilities in an independent cohort. The SEMI, PARA, RRS, and the score from Tsui et al. have been neither retrospectively validated nor prospectively assessed so far.

The HOSPITAL score has been prospectively validated by Donzé himself in Switzerland.31 We abstained from using the SQLape® algorithm32 to separate unavoidable and avoidable readmissions. Currently, there is no consensus how to classify readmissions, which carries the risk of introducing bias.19

His research group prospectively included 346 patients older than 50 years admitted to the general internal medicine wards of the cantonal hospital of Fribourg. A hospital that has roughly two-thirds the size of our own. They found an AUC of 0.70 (95% CI 0.62–0.79) with an 11.6% 30-day composite outcome (unplanned readmission and death rate). This AUC seems comparable with our result of 0.79 (95% CI 0.78–0.80), albeit we did not include death rate into our outcome which might explain the difference. Furthermore, we used CHOP codes instead of ICD-9-CM codes as predictor. Nevertheless, we do not think this has relevantly influenced our results. The same group published a simplification of the original HOSPITAL score which does not contain procedures as predictor,33 and its AUC differs only by 2% from the original score. Additionally, this kind of adaptions would occur in every real-world application of a score, and as such rather strengthens our findings.

The LACE score has been analyzed in various retrospective and prospective studies. Many prospective studies used the LACE score merely for risk stratification. Consecutively, specific interventions were then taken to reduce the risk of readmission.34,35,36,37 Only one study specifically looked at the predictive ability of the LACE tool.38 In 378 patients, Yazdan et al. dichotomized LACE points at different cutoffs and found AUCs ranging from 0.52 to 0.58. Another recent study retrospectively compared the LACE score against a machine learning approach.39 Authors analyzed a cohort of 10,732 patients and reported AUCs of 0.66 for the LACE score, 0.63 for the HOSPITAL score, 0.64 for the Maxim/RightCare tool, and 0.81 for their own B score.

Besides risk scores, many different strategies (discharge checklists, follow-up phone calls, home visits by nurses, etc.) have been assessed and all have the potential to reduce readmissions.19

Strengths and Limitations

Compared with the derivation cohorts, our general readmission rate was at the lower end. This could be due to the fact that we assessed readmission 30 days after index admission. Taking our median LOS of 5 days into account, this shortens our observed time period by that amount. However, we do not think this has biased our results. First, our cohort is large, which should level out possible fluctuations over time. Second, our readmission rate is the same as the 7.0% observed in the other Swiss study by Uhlmann and colleagues,14 and comparable with the readmission rate of 5.45% reported by the five Swiss university hospitals in 2014.40 The setting was also similar, as both hospitals provide tertiary care for a greater area, while providing basic care for patients living nearby.

Our 1149 readmissions occurred in 290 patients only, compared with 11,285 patients without a readmission. Because the other authors did not report these figures, we cannot directly compare our studies in this regard. But we do not have reason to believe that the other cohorts were dissimilar, and most readmissions are generated by high utilizers. An extensive Australian study in over 20,000 patients found that 80% of patients were being admitted only once.41 Hence, application of a risk score makes even so much more sense, as it allows hospital staff to concentrate their efforts on patients being at high risk for readmission.

Only certain parts of our current work were based on our prospective TRIAGE/InHospiTOOL study series and provide Swiss data only. As we performed complete case analysis only, there is the potential bias of data not missing at random. Thus, we have up to 26% missing information on certain predictors. In comparison (see supplementary table 1A), the amount of missing information on relevant predictors did not exceed 7% between patients with readmission and those without. Additionally, complete cases had slightly less medications at discharge and had lower hemoglobin values than incomplete cases (see supplementary table 1B). Otherwise, there were minor differences in the baseline characteristics before and after application of the exclusion criteria. Especially, median LOS rose from 2 to 5 days, death rate decreased from 4.9 to 1.6%, and the number of hospital admissions and ED visits changed. Although this could have introduced selection bias, it should not decrease comparability of our results as we applied the same inclusion and exclusion criteria as the original studies.

The strengths of our work are for one the large population in which we externally validated six scores. This allowed us to calculate precise predictions of the AUC and enough power to provide sensible subgroup analyses. Second, the utility of readmission risk scores lies in their daily use. As such, clinical data is needed to calculate a score. On the other hand, LACE, Tsui et al., and SEMI scores were mostly based on insurance data. Data from insurances can easily contain information that has not been known to a clinician when deciding the discharge. By using information available to physicians at discharge, we strengthen the validity of the models assessed. Furthermore, we compared scores developed in four different healthcare systems (i.e., Spain, North America, Hong Kong, Switzerland). Through such comparisons, confidence on the reliability of a score is strengthened, especially if performance across different healthcare settings remains similar. Also, a substantial part of our data was gathered through our prospective TRIAGE cohort, which strengthens confidence and integrity of our exposures and outcomes.

There is an abundance of retrospective trials on readmission risk prediction,9 but there is scant literature on prospective, multi-centric trials. A systematic review and meta-analysis looked on various forms of readmission prevention interventions.42 Authors identified 47 trials of which 22 were conducted in less than 200 patients and only 15 trials were multi-centric. Also, there was no trial that assessed a readmission risk prediction tool.

CONCLUSION

The comparison of six readmission risk scores revealed that there exists a substantial difference in performance in an external validation. HOSPITAL, PARA, and the score from Tsui et al. showed the best predictive abilities and have high potential to improve patient care. Our work highlights the importance of rigorous assessment of risk scores before they are being implemented into daily routine to improve patient care. Interventional research is now needed to understand the effects of these scores when used in clinical routine.