Introduction

Cardiovascular disease (CVD) is a major cause of death in patients with type 2 diabetes, with risk for developing CHD increased two- to fourfold [1]. Practice guidelines recommend calculating CVD risk for treatment of cardiovascular complications. Several risk equations are available to estimate CVD risk, such as the Systematic Coronary Risk Evaluation (SCORE) [2] and Framingham Risk Score (FRS) [3] for the general (non-diabetic) population. The ability of SCORE and FRS to distinguish between those at low and high risk (discrimination) is only moderate and the ability to correctly quantify the observed absolute risks (calibration) is poor in patients with type 2 diabetes [4, 5].

A few prediction scores have been developed specifically for diabetes patients, either newly detected patients or those already receiving treatment [610], of which the UK Prospective Diabetes Study (UKPDS) risk engine is most widely known and used. It estimates absolute CHD risk using traditional risk factors such as BMI, age, sex, smoking, systolic blood pressure and ratio of total cholesterol to HDL-cholesterol, plus the diabetes-specific factors duration of diabetes and HbA1c (see Electronic supplementary material [ESM] text and ESM Table 1) [8]. Several studies have validated the UKPDS risk engine with inconsistent results, as shown in a recent systematic review [47, 11, 12]. In general, the discrimination of the model was moderate and calibration poor. The methods used in the validation studies were diverse, with different study populations and different endpoints (e.g. CHD vs CVD), and the calculated time period varied. Furthermore, the methods of these validation studies could be improved. For example, none of the studies calculating a 10 year risk accounted for loss to follow-up or endpoints registered for a shorter duration than 10 years, which probably biased results.

Nonetheless, guidelines promoted and funded by the National Institute for Health and Clinical Excellence, British Canadian guidelines and Dutch general practitioner guidelines advocate using the UKPDS risk engine among other risk models for risk quantification in clinical practice [1315].

Therefore, the aim of our study was to quantify the discrimination and calibration of the UKPDS risk engine in a large cohort of patients with type 2 diabetes. We studied the risk engine’s performance in prediction of cardiovascular risk over various time intervals, including 4, 5, 6 and 8 years. Furthermore, we investigated whether the duration of diabetes or the choice of disease endpoint (CVD or CHD) has any impact on the performance of the UKPDS risk engine.

Methods

Study population and design

The study population consisted of patients with type 2 diabetes from the Dutch and Potsdam (Germany) contributions to the European Prospective Investigation into Cancer and Nutrition (EPIC-NL and EPIC-Potsdam, respectively). Both cohorts have been described in more detail by Beulens et al. and Boeing et al. [16, 17]. In brief, EPIC-NL consists of the Prospect cohort and the Monitoring Project on Risk Factors for Chronic Diseases (MORGEN) cohort. Prospect is a prospective population-based cohort of 17,357 women, aged 49–70 years, who participated in breast cancer screening between 1993 and 1997. The MORGEN cohort consists of 22,654 men and women, aged 20–59 years, recruited from three Dutch towns (Amsterdam, Maastricht and Doetinchem). From 1993 to 1997 each year a new random sample of about 5,000 participants was examined. The EPIC-Potsdam cohort recruited 27,548 participants between 1994 and 1998 and was based on general population registries.

In total, 1,861 individuals with type 2 diabetes were identified at baseline; 239 patients had a history of CVD or missing endpoint measurements and were excluded. This resulted in 1,622 patients with type 2 diabetes for inclusion in the current analyses. Participants from EPIC-NL were all patients with a confirmed diagnosis of type 2 diabetes. For EPIC-Potsdam 322 patients were confirmed as having type 2 diabetes; for 845 patients the diabetes type was unspecified. Diabetes cases in the EPIC-NL cohort were verified through medical records of the general practitioner or pharmacist, while diabetes cases in EPIC-Potsdam were verified through repeated self-report in follow-up questionnaires. All participants gave written informed consent prior to study inclusion. Both cohorts were approved by the local ethics committee. All information from EPIC-NL and EPIC-Potsdam was compared, recoded and merged into one uniform database.

Predictors and measurements

At baseline, a general questionnaire containing questions on demographic characteristics, smoking, presence of chronic diseases and other potential risk factors was filled out by all participants. Body weight and height were measured. Smoking was recoded into current smokers and non-smokers (former or non-smokers). Blood pressure was measured twice for the participants in EPIC-NL and three times for EPIC-Potsdam participants. The measurement was performed on the left arm while the participant was in a supine position. The mean of these measurements was used in the analyses. In the EPIC-Potsdam and Prospect cohorts systolic and diastolic blood pressure was measured using a Boso oscillomat (Bosch and Sohn, Jungingen, Germany). In the MORGEN cohort a random zero sphygmomanometer (Hawksley and Sons, Lancing, UK) was used, which slightly underestimated the blood pressure compared with the Boso oscillomat.

Blood, 30 ml, was collected from all participants to obtain plasma, serum and erythrocytes. Total cholesterol, HDL- and LDL-cholesterol, and triacylglycerol levels were measured in frozen serum samples and HbA1c was measured in frozen erythrocytes.

Endpoints

Participants were followed for two primary outcomes: coronary events, defined as myocardial infarction and ischaemic heart disease (International Classification of Diseases [ICD]-9 codes 410–414 [www.icd9data.com/2007/Volume1/]; ICD-10 codes I20–I25 [www.who.int/classifications/icd/en/]) and cardiovascular events, defined as myocardial infarction, ischaemic heart disease or stroke (ICD-9 codes 430–438; ICD-10 codes I60–I67, I69). In EPIC-NL, incident morbidity cases were obtained through linkage with the Dutch National Medical Registry, which holds a standardised computerised database of all hospital discharge diagnoses throughout the country. In the Netherlands it is mandatory to fill out a hospital discharge diagnosis whenever a patient leaves the hospital. The vital status of EPIC-NL participants was obtained through linkage with the municipal population registries. The records of this database were linked to the EPIC-NL cohort with a validated probabilistic method [18].

In EPIC-Potsdam the major source of data on incident cases was questionnaires that were mailed to all participants every 2 years. Of these questionnaires, 95% were returned. Mortality data for EPIC-Potsdam participants were collected through cooperation with the local health offices of Potsdam and the state office of statistics of Brandenburg [19]. All diagnoses were coded according to the ICD.

Data analysis

Years at risk for developing the endpoints were calculated as the time between enrolment in the study and the diagnosis of one of the two endpoints (CHD or CVD), the date of death or the end of follow-up.

Missing values occurred on various predictor variables ranging from 4.2% (for systolic blood pressure) to 19.0% (for HDL-cholesterol). As missing values seldom occur completely at random, simply leaving those patients out of the analysis yields biased results. Accordingly, it is widely recommended to impute missing values rather than performing a complete subject analysis [20]. We used multiple imputation (MI) for our missing data on predictor variables using the MI by chained equations procedure, assuming that after correction for measured variables the patterns of ‘missingness’ can be considered to be at random.

The 4, 5, 6 and 8 year predicted risks for CHD and CVD were calculated using the UKPDS risk engine (see ESM). The measurements of HbA1c, blood pressure and lipid ratio were less precise than the estimates of the UKPDS. Therefore, in order to prevent overestimation of the risk, we used the beta values from the appendix of the paper as suggested by Stevens et al. [8].

Not all participants had a follow-up of 8 years; therefore, two types of analysis were conducted. The first analysis included only patients with type 2 diabetes who were followed up for at least the corresponding time; participants with follow-up shorter than the calculated risk period were excluded. The second analysis included all patients, with patients with a follow-up shorter than the calculated risk period included as non-cases.

Model performance was assessed by measuring discrimination (the ability to discriminate between participants with or without an event) and calibration (the ability to quantify the observed absolute risk). The discriminative ability of the model was examined by calculating the c-statistic with 95% CI for each time period (4, 5, 6 and 8 year risk). The calibration of the model was assessed through visually inspecting the calibration plots and by calculating the Hosmer–Lemeshow χ2 statistic (HLχ2). Estimates for the c-statistic and HLχ2 were pooled using Rubin’s rule, in order to correct for the MI.

To take into account the time-to-event structure of our data, we analysed the data and assessed the discrimination of the UKPDS risk engine using Harrell’s c-statistic for censored data [21]. The calibration was examined by plotting predicted survival probabilities against right-censored failure times (using the R program, val.surv function, developed by F. E. Harrell Jr).

As a longer duration of diagnosed diabetes is modelled in the survival part of the risk equation rather than in the linear predictor, each year longer duration results in a much higher predicted CVD risk. We examined whether performance of the UKPDS risk engine was affected by duration of diagnosed diabetes. Separate analyses were performed for patients who had had diabetes for over 10 years and patients who had had diabetes for up to and including 10 years. All statistical analyses were performed using R-2.10.1 for Windows (http://cran.r-project.org/).

Results

During a mean exposure time of 8 years, 146 CVD cases were identified, of which 99 were cases of CHD. Baseline characteristics according to cohort are shown in Table 1, together with the characteristics of the UKPDS cohort. In EPIC-NL there were more smokers and participants had a higher total cholesterol/HDL-cholesterol ratio, but a shorter duration of diabetes compared with EPIC-Potsdam participants. Compared with the participants from the UKPDS population in which the risk equation was developed, EPIC-NL and EPIC-Potsdam included fewer smokers and participants were older and had higher levels of HbA1c.

Table 1 Baseline characteristics by country of 1,622 participants, with baseline characteristics of the UKPDS cohort [8]

Using the UKPDS risk engine the mean predicted 8 year risk was 15.9% while the observed 8 year CHD risk was 4.9%, resulting in an overestimation of 224%. For 8 year CVD risk, the UKPDS risk engine overestimated the CVD risk by 112%, as the observed 8 year CVD risk was 7.5%.

Performance of the model for CHD outcome

In the first analysis (excluding participants with a shorter follow-up than the predicted duration) the c-statistic was 0.66 (95% CI 0.51–0.81) for 5 year risk (Table 2). The calibration was poor (HLχ2 = 61.9, p < 0.001 for 5 year risk) with a severe overestimation of the risk (Fig. 1a). Discrimination and calibration were similar for other calculated risk periods. In the second analysis (including patients with a shorter follow-up than predicted duration), the discrimination was the same as for the first analysis (c-statistic 0.65 [95% CI 0.50–0.80]), and the calibration was similarly poor (HLχ2 = 77.4, p < 0.001).

Table 2 Discrimination and calibration of the UKPDS risk engine for calculated risk periods of 4, 5, 6 and 8 years, with CHD as outcome
Fig. 1
figure 1

Calibration plots for 5 year calculated risk for (a) CHD and (b) CVD. Values depict observed and predicted values with 95% CI. The dotted 45° line denotes ideal agreement between predicted and observed risk. Results based on first analysis (censored cases excluded)

Taking into account the time-to-event structure of the data, the overall discrimination was similar; Harrell’s c-statistic was 0.65 (95% CI 0.53–0.76). The overall calibration was similarly poor, showing a severe overestimation, comparable with the plots of the first and second analyses.

The discrimination of the model for patients with a duration of diabetes >10 years was similar to the discrimination of the model for patients with a duration of ≤10 years (ESM Tables 2 and 3). However, the calibration was better for 4 and 5 year risk prediction for patients with diabetes for >10 years, as the HLχ2 statistics were 15.2 (p = 0.347) for 4 year risk and 20.8 (p = 0.098) for 5 year risk, which indicates that the observed and predicted risks did not differ significantly. For other calculated risk periods the calibration was similarly poor for both patient groups with up to and over 10 years of diagnosed diabetes.

Performance of the model for CVD outcome

In the first analysis, discrimination for 5 year risk prediction was moderate to poor, with a c-statistic of 0.66 (95% CI 0.53–0.79) (Table 3). The calibration was poor (HLχ2 = 35.2, p = 0.002) with a severe overestimation (Fig. 1b). Again, similar results were obtained for the other calculated risk periods. For the second analysis the discrimination for 5 year risk prediction was similar to the first analysis (c-statistic 0.65 [95% CI 0.53–0.79]). The calibration was also poor (HLχ2 = 48.1, p < 0.001), but the overestimation was greater than for the first analysis. Overall discrimination for CVD, taking into account the time-to-event structure of the data, yielded similar results; Harrell’s c-statistic was 0.65 (95% CI 0.55–0.74). The calibration plot for right censored data was similarly poor, showing a severe overestimation comparable with the plots of the first and second analysis.

Table 3 Discrimination and calibration of the UKPDS risk engine for calculated risk periods of 4, 5, 6 and 8 years, with CVD as outcome

Overall, the performance of the model for predicting CVD was about the same as for predicting CHD. The discriminative ability of the model was similar for patients with up to and over 10 years of diagnosed diabetes (ESM Tables 4 and 5).

Consistent with the results for CHD risk, the calibration was better for patients who had diabetes for >10 years compared with patients with diabetes ≤10 years for 4 and 5 year calculated risk periods. The HLχ2 test was not significant, meaning there was no significant difference between observed and predicted risks.

Discussion

This study shows that the discriminative ability of the UKPDS risk engine is moderate and the calibration poor, with a severely overestimated CHD risk prediction. The performance was similar for prediction of CVD. The calibration of the UKPDS risk engine for 4 and 5 year prediction was better for patients who had been diagnosed with diabetes for >10 years compared with patients who had been diagnosed with type 2 diabetes ≤10 years ago. But this difference must be interpreted with caution because the number of patients in these subgroup analyses was much smaller, making it difficult to detect differences between observed and predicted risks.

The strengths of this study are its large sample size of patients with type 2 diabetes, the verification of diabetes cases and the variety of patients (from Germany and the Netherlands), which enhances the generalisability of the results. However, some limitations need to be addressed. First, the mean follow-up time was 8 years; therefore, we could not validate 10 year CVD and CHD risks. However, the UKPDS risk engine is, in principle, designed for all risk periods, including periods shorter than 10 years [8]. Second, our population consisted of all diabetes cases, not just individuals newly diagnosed with diabetes. Therefore, we could only validate the use of the UKPDS risk engine for patients who have been diagnosed with diabetes for some time. Finally, we had some missing values in the baseline factors, but we addressed this limitation using MI.

There are several explanations for the poor to moderate performance of the UKPDS risk engine to predict CHD and CVD risk in this population. First, the UKPDS risk engine was developed from a cohort that started including patients in 1977 [8]. Treatment of type 2 diabetes and prevention of CVD has improved since 1977 and the risk of developing CVD has declined with better treatment of type 2 diabetes [22]. Also, as diabetes is now detected at an earlier stage, therapeutic intervention can be initiated earlier, reducing CVD risk even further. Altogether, this is likely to explain the large differences in predicted and observed absolute risks that have led to poor calibration.

To further investigate the difference between the study populations, the model was fitted on our data and the regression coefficients were compared with the original values of the UKPDS risk engine. The greatest difference was observed for sex, with women having a slightly greater risk compared with men in our population, which is opposite to the original UKPDS model. This difference might be explained by different lifestyles between men and women in 1977 and nowadays. More men tended to smoke in 1977 which, combined with starting smoking at an earlier age, may have increased the CVD risk in men at this time [23].

The current cohort was established between 1993 and 1997 and differences in lifestyle between men and women may have changed over time, probably resulting in the observed change in the regression coefficient for sex. Furthermore, to some extent it can be explained by the high number of women in our population. When only the German cohort, which has equal numbers of men and women, was analysed, this difference in the CVD risk for women between our cohort and the UKPDS was eliminated.

The results of this study are comparable with those of other validation studies. Two studies observed a modest discrimination (c-statistics 0.74, 0.67) and an underestimation [5, 6] of CHD risk instead of an overestimation as we and others observed [4, 7, 11]. Yet, the number of diabetes patients in these studies was small (n = 125, n = 428). Furthermore, one of these studies estimated 10 year CVD and CHD risk, while the follow-up of their cohort was only 4 years [5]. A validation study in the EPIC-Norfolk population observed a good discrimination and an overestimation of the 10 year CVD risk [4]. However, a slightly different version of the UKPDS risk engine was validated, which was designed to calculate CVD risk instead of CHD risk [4]. Therefore, their results might not be directly comparable with the results of our study. Combining our results with the previous studies mentioned, we can conclude that the performance of the UKPDS risk engine for predicting CVD and CHD risks is only moderate to poor. Nonetheless, Dutch, Canadian and UK guidelines recommend using the UKPDS risk engine to calculate CHD risk in general practice [1315]. The UKPDS risk engine is also used in large trials to calculate initial CHD risk [24]. It may not be advisable to use the UKPDS risk engine to calculate absolute risk as a basis to initiate treatment or to use risk ranking based on the outcome of the UKPDS risk engine, as performance of the UKPDS is moderate to poor.

Constructing a new diabetes-specific CVD risk model might enhance accurate risk prediction, particularly if using a more contemporary population. As treatment of diabetes has improved over time, CVD risk has been lowered. Furthermore, modifiable risk factors have changed over time: smoking is less common and there are better treatments for hypertension and to lower HbA1c concentrations. These developments have impacted on the associated risk for a cardiovascular event.

In summary, we observed that the UKPDS risk engine severely overestimated CHD and CVD risks in patients with type 2 diabetes. The discriminative ability was only moderate to poor. The results from various subgroup analyses were not substantially different. To enhance prediction of CVD and CHD in patients with type 2 diabetes, there is a need to update or construct a new and improved diabetes-specific model with better performance and, more importantly, better external validity [25, 26].