Introduction

Although screening for prostate cancer (PC) can reduce disease-specific mortality [13], key problems are over diagnosis and unnecessary testing (i.e., prostate biopsies). Applying a purely PSA-based biopsy indication (PSA ≥ 3.0 ng/ml as indication for prostate biopsy) resulted in 76% of biopsies being performed in men who were found not to have cancer on that biopsy [2]. Increasing the PSA threshold above which men are recommended to undergo a biopsy causes a considerable number of cancers to be missed [4]. A more promising approach is to refine the biopsy indication based on the multivariable combination of PSA with other patient and disease characteristics [5].

We previously developed a multistep PC risk calculator based on data from the European Randomized Study of Screening for Prostate Cancer (ERSPC) Rotterdam (http://www.prostatecancer-riskcalculator.com). The calculator is meant as a decision aid for lay people (RC#1), general practitioners (RC#2), and urologists (RC#3 for men previously unscreened, RC#4 for men with a previous PSA test, RC#5 for men with a previous negative biopsy, and RC#6 to predict indolent PC) by providing individualized estimates of the risk of detecting PC on biopsy [6, 7].

The calculators have some limitations (i.e., based on sextant biopsy information), but have been validated in several cohorts with satisfactory results [810]. Clinical implementation may however be hampered by the need to perform a TRUS before biopsy. We therefore developed variants of the original calculators without the need for TRUS measurements. Next to PSA and DRE outcome (abnormal vs. normal), the DRE can also estimate prostate volume and may hence avoid the need for TRUS before biopsy [11].

We aimed to test the validity of the DRE-based volume ERSPC risk calculator and to compare its performance to a biopsy strategy based on the PSA value and the outcome of the DRE (i.e., abnormal vs. normal with no information on prostate volume included in the risk prediction).

Patients and methods

Patients

Data were from the Prostate Biopsy Collaborative Group, which includes ten European and US cohorts as previously described [12]. We excluded 4 cohorts where prostate volume was not available, leaving 6 cohorts with a total of 15,300 men for analysis. These included four screening cohorts from the ERSPC: Göteborg, Sweden Round 1 screening (GOTEBORG-R1), Göteborg Rounds 2–6 (GOTEBORG-R2–6), Rotterdam, the Netherlands, Round 1 (PSA ≥3.0 ng/ml) (ROTTERDAM-R1), Rotterdam Rounds 2–3 (ROTTERDAM-R2–3); one other screening cohort (Tyrol, Austria) (TYROL); and one US clinical cohort [Cleveland Clinic, Ohio (CCF)]. Biopsy decisions for all ERSPC cohorts were indicated by PSA ≥3 ng/ml (4 ng/ml for Rotterdam during some years) and for the other cohorts by clinical indication typically an elevated PSA without strict protocol. Biopsy schemes for all ERSPC cohorts were 6-core, for CCF, greater than 8-core, Tyrol 6, 10, or 12 depending on year. All cohorts except for GOTEBORG-R1 and ROTTERDAM-R1 included some men who had been previously screened. Biopsies after a positive biopsy for PC were excluded from the analysis.

Methods

The ERSPC DRE-based volume calculators were developed on a cohort of 3,624 men who had never before had a biopsy (DREvol-RC3) and a cohort of men (N = 2,896) previously screened/biopsied (DREvol-RC4/5) [11]. The model based on PSA and DRE outcome only was developed on similar cohorts resulting in a model suitable for men previously unscreened (PSADRE-model) and men previously screened and/or biopsied (PSADRE-model, see “Appendix” for formulas). Characteristics of each cohort were summarized by descriptive statistics. For each biopsy in the dataset, risks were computed using the DREvol-RC3 (men not previously biopsied) or DREvolRC4/5 (men previously biopsied) [11] as well as the model based on PSA and DRE alone (PSADRE-model) and compared to the outcome of presence of cancer at biopsy. Missing values were imputed (filled in) based on correlations between any of the predictor variables and correlations with the end point. A single imputation was performed, using the first set of imputed values from a multiple imputation procedure (aregImpute, as implemented in R v 2.10.1 software) [13].

Since no data on prostate volume as assessed with DRE were available in the validation cohorts, TRUS-assessed prostate volume was recoded as in [11] into three volume classes as can be estimated by DRE. TRUS-assessed volumes <30 cc were recoded as 25 cc, volumes between 30 and 50 cc as 40 cc, and volumes ≥50 cc as 60 cc.

The predictive performance was evaluated by validation plots, comparing the frequency of observed outcome to predicted risks [14]. Perfect calibration results in a 45° line of a loess smoother in this plot. The calibration slope should be 1, with values below 1 indicating too extreme predictions, as might result from overfitting [13].

The discriminative ability of predictions is reflected in the spread of the predictions, stratified by the presence of cancer at biopsy. The area under the ROC curve (AUC, equivalent to c statistic) summarized discriminative ability, with pairwise testing by the DeLong test.

The clinical usefulness of the calculators was evaluated by decision curves [15].

We focused on the range from 10 to 40% for the decision threshold. This is a reasonable range of thresholds since some patients might opt for biopsy with a chance cancer of 10% while few patients would refuse biopsy if their risk was greater than 1 in 2–3.

Results

Study population

Median age ranged from 61 to 67 years and median PSA levels from 3.5 to 5.8 ng/ml. The percentage of men with an abnormal DRE result ranged from 10 to 26% in the screening cohorts and was remarkably lower in the clinical cohort where only 6% of the men had an abnormal DRE. The missing data on prostate volume ranged from 0.4 to 30%. Despite the fact that two of the ERSPC cohorts and the Tyrol-cohort included men who were previously screened and/or biopsied, the cancer detection rates in the screening cohorts were similar (26–28%, Table 1).

Table 1 Descriptives of the 6 validation cohorts

Discrimination

In all validation cohorts, the AUC of the DREvol-RC was larger than that of the PSADRE-model (each pair wise comparison: p < 0.001). In the strictly protocol-based screening cohorts of the ERSPC centers, the AUCs for the DREvol-RC and PSADRE-model were 0.776 (0.735–0.816) and 0.719 (0.673–0.764) for GOTEBORG-R1; 0.609 (0.573–0.646) and 0.563 (0.525–0.601) for GOTEBORG-R2–6; 0.746 (0.724–0.767) and 0.692 (0.669–0.715) for ROTTERDAM-R1; and 0.659 (0.628–0.690) and 0.606 (0.571–0.638) for ROTTERDAM-R2–3, respectively. Hence, the improvement in AUC by using the DREvol-RC versus the PSADRE-model was approximately 0.05, although we note that Rotterdam R1 is part of the development cohort and hence a likely optimistic validation. In the two large, not ERSPC validation cohorts, the differences in AUCs of the DREvol-RC and the PSADRE-model were similar: for the CCF cohort, the AUCs were 0.665 (0.646–0.684) versus 0.620 (0.600–0.640) and for the Tyrol-cohort 0.720 (0.705–0.735) versus 0.675 (0.659–0.690).

Calibration

As expected, the calibration of the DREvol-RC was very good for the ROTTERDAM-R1 cohort (Fig. 1). The validation cohort ROTTERDAM-R2–3 differs from the development cohort in that men with previous negative biopsy were included in the development but not in the validation cohort, and results from round 3 were included only in the validation but not the development cohort. Yet, the validation plot showed good calibration with a calibration slope close to 1 (0.96). Calibration in the large (intercept of 0.20) indicates minor underprediction of cancer at biopsy. Calibration was also satisfactory for the GOTEBORG-R1 and GOTEBORG-R2–6 cohorts, but with some over prediction for probabilities above 40%. In the large, screening-based Tyrol-cohort, minor underprediction was noted. In the clinically derived CCF cohort, a considerable underprediction was noted, especially in the risk range below 50% (Fig. 1).

Fig. 1
figure 1

Calibration plots for the DRE-based ERSPC RC

Net benefit

The DREvol-RC provided a net benefit over performing a prostate biopsy in every men and performing a prostate biopsy on the basis of PSA and DRE for threshold probabilities of approximately 10% or higher in the cohorts representing men screened for the first time (GOTEBORG and ROTTERDAM-R1) and the Tyrol-cohort (Fig. 2). Net benefit over performing a prostate biopsy in all men in GOTEBORG-R2–6 and ROTTERDAM-R2–3) was less but nevertheless present for risk thresholds between 25 and 40%. Ignoring information on prostate volume decreased net benefit slightly. The net benefit curve for CCF suffers from the observed underprediction, with a net benefit only for higher thresholds (>35%), while the net benefit curves on the Tyrol-cohort clearly show the additional value of (a) a multivariable approach (PSA + DRE) and (b) the inclusion of information on prostate volume.

Fig. 2
figure 2

Net benefit curves for the DRE-based ERSPC RC (large dashed black) versus the screening strategies of referring no men to biopsy (horizontal line at 0), referring all men to biopsy (solid black) and referring men on the basis of PSA + DRE (small dashed black)

Discussion

A key measure for a prediction model is its ability to distinguish those who will have a biopsy detectable PC from those who will not. In the ideal situation, this will lead to biopsy only in those men that indeed have PC and hence making the prostate biopsy a procedure to assess disease aggressiveness.

The recently developed DREvol-RC was previously validated in men screened at repeat screening rounds of ERSPC Rotterdam (4th and 5th screening round) [11], showing that volume estimation by DRE underestimated the TRUS-assessed prostate volume. However, the median values of the TRUS-assessed prostate volume (26.5, 45.6, and 69.3 cc) were close to the three predefined volume classes of 25, 40, and 60 cc, respectively. Since the sample size of the validation cohort in [11] was relatively small, additional studies are needed. This study performed a validation of this DREvol-RC in six different cohorts, both of screening and clinical origin. The AUCs covered a wide range (from 0.61 to 0.78), which reflects both the fit of the prediction model to the cohort and the varying case-mix. Higher AUC values will be found in more heterogeneous cohorts [16]. In all cohorts, AUC values were substantially larger than found for the model based on PSA and DRE alone (PSADRE-model). This was also true for the two largest validation cohorts, i.e., the Tyrol screening cohort and the clinical CCF cohort.

Better discrimination does not automatically imply better clinical usefulness. A better discriminatory model can suffer from major under or over prediction making it in fact a dangerous tool to use in clinical decision making. In the current study, the DREvol-RC was well calibrated with the exception of the CCF cohort where a systematic underprediction was observed. Baseline data of the CCF cohort did not indicate it as a high risk cohort. The cancer detection rate was however considerably higher (39%) as compared to the other cohorts. This most likely is the result of the biopsy procedure comprising of at least 10 cores or more while the ERSPC RCs were based on sextant biopsies. Moreover, men who were actually biopsied in the CCF cohort were preselected based on clinical judgment. Biopsy may have been avoided in men with elevated PSA levels due to the presence of BPH. This idea is supported by the larger net benefit when using the DREvol-RC in cohorts where men were biopsied purely based on an elevated PSA level. Application of the DREvol-RC in clinical cohorts hence needs further study, and a correction for the miscalibration might need to be made.

Similar analysis for the Prostate Cancer Prevention Trial (PCPT) RC [17, 18] (including PSA, DRE outcome, age, and family history) shows limited to no clinical benefit as compared to the policy of biopsying all men. These differences in predictive performance have been observed in previous studies [8, 9, 19]. Prostate volume was highly predictive in a screening-based validation cohort of 525 men [9]. While prostate volume is lacking in the PCPT RC, it is crucial within the ERSPC RCs. The predictors PSA and prostate volume play an opposing role [6, 7, 20]: high PSA level is positively related to PC at biopsy, but a large prostate volume has a negative relation. Indeed, prostate volume is related to serum PSA in men with Benign prostatic hyperplasia and no evidence of PC, and the relationship depends on age [21, 22]. In middle-aged men, BPH may hence explain a high PSA. Histologically, distinguishable BPH is present in about 8% of men aged 31–40 years, and this prevalence increases markedly with age to about 70% by the seventh decade of life [23].

In conclusion, identifying men at increased risk for having a biopsy detectable PCa should consider multiple factors. Prostate volume is a key element in such risk prediction, and approximate estimation through DRE might enable a more widespread use of this calculator.