Introduction

Coronary artery disease (CAD) is one of the major causes of mortality and morbidity throughout the world [1]. The initial assessment of a patient with chest pain usually consists of a stress ECG (electrocardiogram). However, its diagnostic accuracy is low [2] compared to conventional coronary angiography (CCA), which is the reference standard for diagnosing CAD. On the other hand, CCA is an invasive technique and carries a small risk of complications [3, 4]. Myocardial perfusion imaging (MPI) is a non-invasive technique that is used clinically as a gatekeeper test before CCA.

MPI can be conducted using stress magnetic resonance imaging (MRI), contrast-enhanced echocardiography (ECHO), single-photon emission computed tomography (SPECT), positron emission tomography (PET) and, under development, computed tomography (CT). The only available extensive study directly comparing two techniques is the MR-IMPACT study [5], a multicentre randomised trial which found that MRI is superior to SPECT. Systematic reviews and meta-analyses have been published for most of the techniques but none of these reviews compare MPI techniques [610]. The comparability between these different meta-analyses is questionable mainly because of differences in publication period, searching the literature, selection of the evidence, and analysis of the data. Furthermore, studies with verification bias are often included in these reports which may have overestimated the sensitivity and underestimated the specificity of the tests considered. To overcome these problems a systematic review of different MPI techniques is required using the same selection criteria and methods of analysis for all techniques and excluding studies with (potential) verification bias, to make a fair comparison between these imaging tests.

The aim of this study was to determine and compare the diagnostic performance of stress MPI tests for the diagnosis of obstructive CAD, with conventional CCA as the reference standard. We performed the review according to the PRISMA statement for such reviews [11, 12].

Materials and methods

Search strategy

We searched Medline and Embase for English-language literature published between January 2000 and May 2011 evaluating the presence of obstructive CAD by stress perfusion imaging tests, namely MRI, contrast-enhanced ECHO, SPECT and PET. In this meta-analysis we focus on functional imaging tests evaluating perfusion as a measure of haemodynamically significant myocardial ischaemia as opposed to anatomical imaging tests, such as coronary CT angiography, which evaluates structural abnormalities of the coronary arteries. We limited the search to publications from 2000 onwards to include only studies that evaluated state-of-the-art MPI techniques. This may have introduced a selection bias with respect to SPECT, because many SPECT studies were published before 2000. To deal with this problem we compare our results with a review of meta-analyses of SPECT studies by Heijenbrok-Kal et al. [13]. CT was excluded because it is still being developed technically. Review articles were checked for potential additional studies. The search included keywords corresponding to the four index tests (MRI, ECHO, SPECT and PET), the reference test (CCA), the target condition (CAD) and diagnostic performance. We used numerous synonyms including both ‘text words’ and MeSH (Medical Subject Headings) terms to maximise the sensitivity of our search. See Appendix A in the Electronic Supplementary Material for a detailed description of the search strategy.

Study selection

Two authors reviewed article titles and abstracts for eligibility. Discrepancies were resolved by consensus.

We included studies if they met all of the following criteria: (1) the study assessed diagnostic performance of stress perfusion MRI, stress perfusion contrast-enhanced ECHO, stress perfusion SPECT, or stress perfusion PET as a diagnostic test for CAD, (2) a prospective study design was used, (3) the study population consisted of known (previously diagnosed) or suspected adult CAD patients, (4) CCA was used as the reference standard test in all patients irrespective of the non-invasive test result, i.e. selective verification was not present, (5) obstructive CAD was defined as at least 1 vessel with at least 50 %, at least 70 % or at least 75 % lumen diameter reduction and, (6) absolute numbers of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) were available at the patient level or could be derived adequately.

Studies were excluded if they met one of the following criteria: (1) the article was a review or meta-analysis, (2) patients had (suspected) acute coronary syndrome (ACS), (3) normal healthy volunteers or asymptomatic patients were included, (4) less than 30 patients were included (criterion to avoid TPs, FPs, TNs or FNs of zero), (5) (potentially) overlapping study populations were reported, (6) a very specific patient population (e.g. only patients with a heart transplant, left bundle branch block or aortic stenosis) was studied, (7) the study focused on in-stent or graft stenosis after percutaneous coronary intervention (PCI) or coronary artery bypass grafting (CABG).

Data extraction

Two authors independently extracted data on author, journal, year of publication, technique used, country, hospital type, number of patients, mean age, percentage male, patient selection, brand of imaging device, magnetic field strength, radiotracer, contrast agent used, type of assessment (qualitative or quantitative), stressor used, CAD definition and the numbers of TP, FP, TN and FN. Discrepancies were resolved by consensus.

If a study reported pairs of sensitivities and specificities at different cut-off points, we extracted the pair with the highest sensitivity. When studies reported data for multiple CAD definitions (e.g. at least 50 % and at least 70 % stenosis), the highest sensitivity was used to calculate the overall estimates. This also applied when studies reported sensitivities and specificities for different observers.

Quality assessment

We used a modified QUADAS checklist (quality assessment of studies of diagnostic performance included in systematic reviews) [14] to assess the quality of included studies. Two authors independently assessed the study quality of the included articles. Discrepancies were resolved by consensus.

Statistical analysis and data synthesis

We analysed the data at the patient level using a bivariate random effects regression model [15]. The model assumes a binomial distribution of the within-study variability (variability between sensitivity and specificity within a study). The model furthermore assumes correlated normally distributed random effects between studies. The degree of correlation between the logit sensitivity and logit specificity corresponds to the inverse relation between sensitivity and specificity when the positivity criterion is varied. Additionally, meta-regression was performed to explore the effect of differences in patient selection and CAD disease definition, taking into account the possible interaction between differences in CAD disease definition and the techniques considered.

The data of each study were summarised in forest plots and summary estimates with a 95 % confidence interval of sensitivity and specificity for each imaging technique. Additionally, we summarised these numbers in receiver operator characteristic (ROC) spaces showing the summary estimates with a 95 % confidence region and a summary curve. To distinguish SPECT studies that used different protocols, we highlighted the studies that combined gated-SPECT with the use of 99mtechnetium as a radiotracer (Fig. 4). Similarly, MRI studies that included the assessment of delayed contrast enhancement were highlighted. Figures were created using Cochrane’s Review Manager (version 5, Copenhagen, Denmark).

To estimate the clinical utility of each technique we calculated the positive and negative likelihood ratios (LR + and LR−). The likelihood ratio is equivalent to the ratio of the likelihood of a certain test result in patients with the disease and the likelihood of the same test result in those without the disease. LR+ [= sensitivity/(1 − specificity)] describes the likelihood when the test is positive and LR− [= (1  − sensitivity)/specificity] describes the likelihood when the test is negative. To illustrate the clinical utility, we used the LRs to calculate post-test probabilities across the range of possible pre-test probabilities (Fig. 5).

Finally, we calculated the natural logarithm of the diagnostic odds ratio (lnDOR). The lnDOR represents an overall summary estimate of diagnostic performance. The diagnostic odds ratio (DOR) is the odds of positive test results in patients with disease compared to the odds of positive test results in those without disease which equals the ratio of the positive and negative likelihood ratios.

We also created funnel plots to assess the presence of publication bias. The funnel plot shows the DOR horizontally and the standard error of the log transformed DOR vertically. Publication bias usually occurs when negative publications (in our case studies with a low DOR) with a small sample size are not published. An asymmetric funnel plot, for example one with fewer studies in the lower left part of the graph, suggests the presence of publication bias.

The statistical software package SAS (Proc NLMIXED, SAS v9.2, Raleigh, NC, USA) was used for the analyses.

Results

Medline (PubMed) and Embase searches yielded 1,649 unique studies (Fig. 1). On the basis of title and abstract we excluded 1,405 articles. On the basis of the full text, we excluded 202 for various reasons detailed in Fig. 1. Review of the study characteristics shows considerable differences between the included studies (Table 1).

Fig. 1
figure 1

Flow chart of systematic literature search

Table 1 Study characteristics

Forty-four studies met the inclusion criteria. Articles on MRI yielded a total of 2,970 patients from 28 studies, articles on ECHO yielded a sample size of 795 from 10 studies, articles on SPECT yielded 1,323 from 13 studies. We could not include any PET studies, which is why PET was excluded from the analysis. The overview of the QUADAS checklist for all studies demonstrates some differences in terms of study quality (see Appendix B in the Electronic Supplementary Material). The funnel plots of MRI and SPECT suggest evidence for publication bias, whereas the funnel plot of ECHO shows no obvious evidence for publication bias (Fig. 2). The sensitivities and specificities of each study vary across studies with sample sizes ranging from 30 to 823 (Tables 2 and 3). The forest plots show the sensitivities and specificities of each study with their 95 % confidence intervals depicted as horizontal lines (Fig. 3), grouped by CAD definition and study population and then sorted by sensitivity.

Fig. 2
figure 2

Funnel plots. The diagnostic odds ratio (DOR) on the x-axis is plotted against the standard error (SE) of the log(DOR) on the y-axis. A symmetrical distribution of studies indicates the absence of publication bias. An asymmetrical distribution with, for example, relatively more smaller studies with a positive result (in the lower right part of the plot) would suggest the presence of publication bias. In the ECHO funnel plot Peltier et al. [56], in the SPECT funnel plot Astarita et al. [58] and in the MRI funnel plot Donati et al. [25] are not included, because their respective DORs could not be calculated (0 false negatives or false positives). a MRI, b ECHO, c SPECT

Table 2 Source data for MRI, ECHO and SPECT
Table 3 Measures of diagnostic performance for MRI, ECHO and SPECT, estimated using the bivariate random effects model
Fig. 3
figure 3

Forest plots. The data are sorted by suspected and known CAD versus suspected CAD and CAD definition of ≥50 % versus ≥70 % stenosis from lowest to highest sensitivity and data are reported at the patient level. a MRI, b ECHO, c SPECT. *When data were available for both CAD definitions (≥50 % and ≥70 %) the summary estimates only include data from CAD ≥70 % stenosis

Compared with coronary angiography the meta-analysis of the sensitivities and specificities of the different techniques (Table 3; Fig. 3) resulted for MRI in a sensitivity of 0.91 (95 % CI 0.88–0.93) and a specificity of 0.80 (95 % CI 0.76–0.83). Perfusion ECHO showed a sensitivity of 0.87 (95 % CI 0.81–0.91) and a specificity of 0.72 (95 % CI 0.56–0.83). SPECT demonstrated a sensitivity of 0.83 (95 % CI 0.73–0.89) and a specificity of 0.77 (95 % CI 0.64–0.86). The ROC spaces show the summary estimates for sensitivity and specificity of each technique two-dimensionally surrounded by its 95 % confidence area (Fig. 4). The sensitivity of MRI and SPECT differed significantly (P = 0.03). In terms of specificity, no significant differences were found.

Fig. 4
figure 4

ROC space with summary estimates for each technique with 95 % confidence areas. This figure shows the diagnostic performance of studies relative to each other with specificity (plotted in reverse) on the x-axis and sensitivity on the y-axis. Perfect diagnostic accuracy is in the upper left corner, where sensitivity and specificity are both 1. a MRI, b ECHO, c SPECT, d All three techniques. The grey rectangles in a refer to the studies using delayed contrast enhancement and in c they refer to the studies using gated SPECT with radiotracer 99mTc. The size of the rectangles corresponds with the inverse standard error of sensitivity and specificity, which correlates with the size of the study

We found no effect of CAD definition on the sensitivities (Table 3; P = 0.55). The disease definition greater than/at least 70 % stenosis compared to greater than/at least 50 % stenosis resulted in significantly lower specificities for SPECT (Table 3; P = 0.045), but no significant differences for ECHO (P = 0.39) and MRI (P = 0.51). Furthermore, we found no effect of CAD definition on the lnDORs of MRI (P = 0.24), ECHO (P = 0.96) and SPECT (P = 0.34) (Table 3).

Furthermore, MRI, ECHO and SPECT showed no significant differences in terms of sensitivity, specificity and lnDOR when comparing patients with suspected CAD without a prior history of CAD to patients with known or suspected CAD (all P values >0.05; Table 3).

We did not observe an association between the use of gated-SPECT in combination with 99mtechnetium as radiotracer and the diagnostic performance of SPECT (Fig. 4). MRI studies that assessed delayed contrast enhancement were associated with high sensitivities albeit with a wide range of specificities (Fig. 4).

The positive likelihood ratios (LR+) of MRI, ECHO and SPECT were 4.43 (95 % CI 3.64–5.23), 3.08 (95 % CI 1.65–4.50) and 3.56 (95 % CI 2.07–5.04) respectively (Table 3). The negative likelihood ratios (LR-) for MRI, ECHO, and SPECT were 0.12 (95 % CI 0.08–0.15), 0.18 (95 % CI 0.13–0.24) and 0.22 (95 % CI 0.14–0.31), respectively. Figure 5 illustrates the revised probability of CAD after a positive and negative test. The lnDORs of MRI, ECHO and SPECT were 3.63 (95 % CI 3.26–4.00), 2.83 (95 % CI 2.29–3.37) and 2.76 (95 % CI 2.28–3.25), respectively (Table 3). We found significantly higher lnDORs for MRI in comparison with SPECT (P = 0.006) and ECHO (P = 0.02). There was no significant difference between the lnDOR of SPECT and ECHO (P = 0.52).

Fig. 5
figure 5

Revised probability of CAD. This figure shows the revised (post-test) probability of CAD (y-axis) as a function of prior (pre-test) probability (x-axis) of CAD for positive and negative MPI results, based on the likelihood ratios presented in Table 3 (overall analysis). MRI+, ECHO+ and SPECT+ represent the lines for a positive test result and MRI−, ECHO− and SPECT− represent the lines for a negative test result

Discussion

In this systematic review and meta-analysis we compared the diagnostic performance of different stress MPI techniques. MRI showed the best diagnostic performance with the narrowest confidence intervals; the latter is explained by the large number of patients studied with MRI. We found a significantly higher sensitivity for MRI compared to SPECT and a significantly higher lnDOR for MRI compared to both ECHO and SPECT. In contrast to previous meta-analyses [9], we compared the different imaging techniques using the same search strategy and methods of analysing the data. Furthermore, we only included studies without verification bias.

In our review we paid special attention to the issue of verification bias. Sensitivity may be overestimated and specificity underestimated if patients with a positive test result are more likely to be verified with the reference standard test. Diagnostic odds ratios are generally not, or only minimally, affected by verification bias [16]. Underwood et al. [9] reviewed the diagnostic performance of SPECT and explained the overall low specificity (0.70–0.75 for high quality studies) of SPECT studies by verification bias. In their review of SPECT studies, Heijenbrok-Kal et al. [13] did not exclude studies with verification bias and demonstrated a sensitivity of 0.88 (95 % CI 0.87–0.90) and a specificity of 0.73 (95 % CI 0.69–0.74). By excluding studies with verification bias, we found a lower sensitivity of 0.83 (95 % CI 0.73–0.89), but a higher specificity of 0.77 (95 % CI 0.64–0.86). As pointed out above, the diagnostic odds ratios are less affected by verification bias and were the same for the previous and current review.

Nandalur et al. [7] and Hamon et al. [10] previously studied the diagnostic performance of myocardial perfusion MRI and found sensitivities of 91 % and 89 % respectively and specificities of 81 % and 80 % respectively, which is very similar to what we found. Unfortunately we could not include PET in the analysis, because no PET studies met our inclusion and exclusion criteria. Nandalur et al. [6] performed a meta-analysis of PET perfusion studies and they found a sensitivity of 0.92 and a specificity of 0.85. However, their analysis included studies with potential verification bias. Stress perfusion CT is an upcoming MPI technique, but we did not include this technique because of the low number of available studies and because perfusion CT is still in the technical development phase.

Other promising alternatives to CCA are non-invasive CT and MR coronary angiography. Schuetz et al. [17] compared CT and MR coronary angiography to CCA in a meta-analysis resulting in a sensitivity and specificity of respectively 0.97 and 0.87 for CT, and 0.87 and 0.70 for MR, suggesting that CT angiography has a better diagnostic performance compared to the MPI techniques analysed in this article. However, drawbacks of CT angiography are the use of iodinated contrast material which poses a small risk of idiosyncratic reactions and nephrotoxicity and the lack of functional information [18].

Limitations

We focused on the diagnostic performance of MPI. However, an MPI examination can yield functional information as well (e.g. left ventricular function, presence of wall motion abnormalities, presence of scar tissue), rather than perfusion images alone. Our analysis does not take into account the possible impact of these parameters on the interpretation of the MPI test and the results of MRI are therefore likely to be even better than we estimated.

Also, it is important to note that in clinical practice a small proportion of patients will be unsuitable for MRI, either due to contraindications or claustrophobia. Likewise, an echocardiography procedure relies on an adequate acoustic window. Often, unsuitable patients were excluded from the original studies, which in turn could have resulted in an overestimation of the diagnostic performance in our analysis. Unfortunately, the included studies did not report sufficient information to explore these issues.

In the current review we included only studies that used the most advanced technology by searching for studies published from 2000 until 2011, which implies that some large landmark SPECT studies performed in the 1980s and 1990s were excluded from our analysis. A previously published comprehensive systematic review sheds light on the effect of this exclusion criterion [13]. In the previous review 103 SPECT studies with a total of 11,977 patients published between 1984 and 2002 were analysed. There is no overlap with the SPECT studies that we included. The diagnostic odds ratios for SPECT found in the previous review and in the current review are the same: they found an lnDOR of 2.8 (95 % CI 2.6–3.0) compared to our lnDOR of 2.8 (95 % CI 2.3–3.3).

The funnel plot for MRI and SPECT suggests that there is evidence of publication bias, which implies that our summary measures may be overestimated. Nevertheless, the overestimation applies to both MRI and SPECT. The funnel plot for ECHO does not suggest evidence of publication bias.

Heterogeneity across studies is a limitation of meta-analyses of diagnostic performance. Across studies differences exist with respect to imaging techniques, assessment methods, stressors, radiotracers, contrast media, CAD definition (lumen diameter reduction of at least 50 %, at least 70 % or at least 75 %), CAD prevalence, percentage male patients, patient inclusion criteria, setting and country. Although we were able to analyse the effect of using different CAD definitions and patient inclusion criteria, sample size limitations did not allow us to do subset analyses for the other cross-study variations. Due to chance there will always be variability between studies, but there may also be different types of biases influencing the results. We used a random effects model which adjusts the estimates and confidence intervals to account for between-study variations. Nevertheless, heterogeneity across studies remains an important limitation.

For calculation and precision purposes, we excluded studies with less than 30 patients. In this way, we minimised the number of studies with for example zero FPs or FNs. This exclusion criterion may have introduced a selection bias.

Another limitation of meta-analyses is the dependence on the level of detail reported in the original papers. For example, data on the individual territories were generally not available. Furthermore, most studies included a mix of known and suspected CAD patients or did not report the test characteristics for the subgroup of patients with suspected CAD without a prior history of MI, PCI or CABG. Therefore, our subgroup analysis of suspected CAD was limited due to a small sample size. Nevertheless, our analysis did suggest that the diagnostic performance of MPI tests is not substantially affected by including patients with known CAD.

Although our results show that all tests are reasonably accurate, the likelihood ratios suggest that neither one of them is suitable to rule out or rule in the presence of disease [19]. This can also be seen in Fig. 5, where the post-test probability after a positive test rarely exceeds 90 %, and the post-test probability of disease after a negative test may still be substantial. Since MPI is intended as a gatekeeper test, ruling out disease is more important than ruling in disease. MRI performs quite well in this respect with an LR− of 0.12 (0.08–0.15). SPECT and ECHO demonstrate less favourable LRs (Table 3).

The reference standard test for diagnosing CAD is CCA. Innovative technological developments in diagnosing CAD are most often compared with CCA. The limitation of CCA is that it evaluates the lumen diameter reduction of the coronary arteries, but for instance a 50 % vessel diameter reduction does not always result in the same reduction in blood flow and does not necessarily lead to myocardial ischaemia. There are alternative techniques such as fractional flow reserve (FFR) that measure the pressure difference across a coronary stenosis. It is even possible that the imaging techniques we evaluated are better diagnostic tools than CCA to begin with, since they measure myocardial perfusion which is the physiological basis of myocardial function. Thus, the less than perfect sensitivity and specificity could in part be attributed to imperfections of CCA instead of the limitations of perfusion imaging.

Clinical implications

The results of our systematic review and meta-analysis suggest that MRI is superior to ECHO and SPECT in diagnosing CAD. This statement is strengthened firstly by the findings of the MR-IMPACT study [5]—a multicentre randomised trial—which suggested that MRI is superior to SPECT and secondly by the findings of the EuroCMR registry [20], which demonstrated that in patients who underwent stress MRI for the diagnostic workup of suspected CAD, invasive angiography could be avoided in nearly one-half of the patients. All in all, the results suggest that stress perfusion MRI is potentially useful as a gatekeeper test before CCA in patients with low to intermediate prior probability of CAD but this needs to be confirmed with a comparative cost-effectiveness analysis. Furthermore, more research of the diagnostic performance of stress perfusion ECHO, PET and CT is required to evaluate their clinical usefulness.

In conclusion, our results suggest that stress perfusion MRI is superior for the diagnosis of obstructive CAD compared to stress perfusion contrast-enhanced echocardiography and SPECT, and that echocardiography and SPECT are similar in terms of diagnostic performance.