Introduction

Low back pain (LBP) is usually defined as pain, muscle tension, or stiffness localised below the costal margin and above the inferior gluteal folds, with or without sciatica [1]. With a lifetime prevalence from 11 to 84% LBP is a major health problem worldwide, and causes a substantial economic burden in Western societies [27].

LBP is occasionally the presenting symptom of an underlying pathology such as radiculopathy or spinal stenosis or another specific spinal condition [8]. The diagnostic process is to distinguish ‘simple’ back pain from back pain due to serious underlying diseases or neurologic impairments [9]. Guidelines recommend starting the diagnostic triage with history taking and physical examination, in order to identify ‘red-flags’ and classify patients into one of three categories: serious spinal pathology, nerve root pain/radicular pain, and nonspecific LBP [8, 10].

Routine lumbar-spine imaging is not recommended in patients with LBP without symptoms suggesting serious underlying conditions [8, 10, 11]. However, if serious spinal pathology is suspected based on red-flags, diagnostic imaging could be performed, since delayed diagnosis and treatment are associated with poorer outcomes [8]. One of the diagnostic imaging techniques available for this purpose is computed tomography (CT). CT nowadays plays a vital role in spinal imaging and has largely replaced invasive imaging techniques, such as myelography, epidural venography and epidurography, particularly because CT is associated with less morbidity than invasive techniques [9, 12]. Caution is, however, necessary in the choice of CT as an imaging modality particularly in younger patients due to gonadal radiation dose particularly with repeated examinations. For this reason, in many clinical guidelines magnetic resonance imaging (MRI) is the imaging modality of choice. CT is suggested to be the primary imaging technique to depict disorders of bone structures [15]. CT is also used for detection of chronic morphologic changes and has a well-recognized role in the diagnosis of spinal stenosis, herniated nucleus pulposus and facet joint abnormalities [13, 14]. Additionally, compared to MRI, CT is cheaper, the total testing time is shorter, and the availability of CT scanners is larger in hospital settings. The disadvantages of CT, however, is the radiation dose particularly cumulative dose with repeat examinations in younger patients. Moreover, even when MRI is readily available, the need for a CT of the lumbar spine in the presence of a cardiac pacemaker seems to be increasing with an aging population.

Estimates of the diagnostic accuracy of CT scans vary considerably across primary diagnostic studies. Potential sources of heterogeneity include: difference in considered pathology, variation in CT protocols, differences in study design, included study populations, or the methodological quality of the studies. Therefore, our aim is to provide evidence on the diagnostic accuracy of CT in patients with LBP or sciatica with symptoms suspected to be caused by specific underlying pathology. Sciatica is here defined as nerve root pain or radiating leg pain. We also aim to assess the potential influence of various sources of heterogeneity on the outcomes.

Methods

Design

Systematic review of diagnostic accuracy studies.

Search strategy

We systematically searched Medline, Embase and CINAHL databases (until December 2009). The search strategy we used was developed to identify publications for four separated systematic reviews. These systematic reviews all concern the diagnostic test accuracy of imaging techniques (MRI, CT, X-ray, or myelography) for identifying or excluding lumbar spinal pathology.

Study selection

Two review authors (AV, MW) independently selected the articles, based on title and abstract (Fig. 1). For final inclusion the studies had to fulfill the following criteria: (1) the diagnostic accuracy of CT was assessed in adult patients with LBP suspected to be caused by specific pathology (i.e. radicular syndrome, spinal stenosis, spinal tumors, spinal fractures, spinal infection/inflammation, spondylolisthesis, spondylolysis, ankylosing spondylitis, disc displacement, osteoporotic fractures, and other degenerative disc diseases), (2) the results were compared with those of a reference test (i.e. findings at surgery, expert panel opinion, diagnostic work-up, or MRI), (3) the design was a case–control or cohort study; either prospective or retrospective, and (4) the results were published as full reports with sufficient data to construct diagnostic two-by-two tables. Disagreements were resolved by consensus; a third review author (MvT) was consulted in case of persisting disagreement.

Fig. 1
figure 1

Flow chart of selected articles

Data extraction and risk of bias assessment

Data extraction was performed by two review authors (RvR, MW) independently using a standardised form. Data were extracted on: (1) study design; prospective or retrospective observational study, (2) characteristics of study population; setting, age, gender, pathology considered, duration and history of LBP, inclusion and exclusion criteria, enrollment, number of subjects (enrolled, eligible), level of measurement (3) test characteristics; type of index test, type of reference test, year and methods of execution, outcome scales, and (4) diagnostic parameters; two-by-two table, or, if not available, relevant parameters to reconstruct this table.

Two independent review authors (MW, RvR) assessed the risk of bias of each included study using the QUality Assessment of Diagnostic Accuracy Studies (QUADAS) tool [16, 17]. The QUADAS tool consisted of 11 items that referred to internal validity. In addition, we identified nine additional items described in the Cochrane Handbook for Diagnostic Test Accuracy Reviews [17]. These additional items were of relevance to this review and were also scored. The 20 items were scored as “yes”, “no”, or “unclear” according to the classification definition described in Appendix 1. A radiologist (AG) was consulted for the assessment of the used technology (item 13). Disagreements were resolved by consensus. In case of persisting disagreement a third review author (AV) was consulted. We did not apply weights to the different items and did not use a summary score since the interpretation of summary scores was problematic and potentially misleading [18, 19].

Data synthesis and analysis

From each included study we used the two-by-two table to calculate sensitivity and specificity with the corresponding 95% confidence intervals (95% CI). For a descriptive analysis, sensitivity and specificity were presented in forest plots. Besides, we plotted the results on a receiver operating curve (ROC) plot of sensitivity against 1-specificity.

For meta-analysis of pairs of sensitivity and specificity we used a bivariate random effects method [20]. This method provides summary estimates of sensitivity and specificity with corresponding 95% CI while dealing with sources of variation within and between studies and any correlation that might exist between sensitivity and specificity. We calculated a 95% confidence ellipse around the summary estimate of sensitivity and specificity, and plotted the results in ROC space. We only conducted a meta-analysis if studies show sufficient homogeneity (i.e. same pathology, same reference standard, comparable population, same study design). Analyses were carried out using STATA 10 software. All findings were presented in a summary of results Table 2, which included summary estimates of sensitivity and specificity, prior probabilities, diagnostic odds ratio, and likelihood ratios for the diagnostic accuracy of CT.

Several factors may contribute to heterogeneity in diagnostic performance across studies. We investigated the potential influence of differences in pathology, and reference standards used in the primary studies on sensitivity and specificity, both by comparing subgroups. If sufficient data were available, we assessed the possible bias introduced by negative scores on several important items on the risk of bias assessment. These items were independently added as a covariate to the bivariate model. The results were presented graphically and in a summary of results table.

Results

Literature search

Our search resulted in 9023 potentially relevant articles of which 447 were retrieved in full text. The additional search and reference check resulted in 85 possible relevant articles, of which 38 were retrieved in full text. Finally, 19 articles met our inclusion criteria and were eligible for at least one of the four separate reviews conducted on the diagnostic accuracy of imaging in adult LBP patients to identify or exclude specific pathology (Fig. 1). Of these, seven articles focused on CT and were included in this review [2127]. All studies described the diagnostic accuracy of CT in identifying lumbar disc herniation (Table 1).

Table 1 Characteristics of included studies

Risk of bias assessment

Figure 2 presents the results of the individual studies. The initial agreement of the reviewers was 78% (109 of 140 items). The disagreements were resolved by consensus. All studies used an acceptable reference standard, avoided differential verification, and pre-specified their objectives (items 2, 5 and 19). None of the studies reported enough information to assess the items on the delay between index test and reference test, observer variation, instrument variation, appropriate patient subgroups, appropriate sample size, and whether treatment or intervention was initiated between index test and reference test (items 3, 15, 16, 17, 18, and 20). The majority of studies poorly described the selection of patients, blinding of reference test results, and whether cut-off values were pre-specified (item 1, 7 and 12), resulting in a high risk of selection and reviewer bias. In two studies [22, 26] not all patients received confirmation of their diagnosis by a reference test (item 4), and in four studies [21, 23, 25, 26] the technology of CT used was changed since the study was carried out (item 13). Since these two items were thought to influence the reported sensitivity and specificity we added them individually as covariates to the bivariate analysis.

Fig. 2
figure 2

Risk of bias scores for each included study

Findings

For each study the extracted data (2 × 2 table) and sensitivity and specificity are presented in a forest plot (Fig. 3).

Fig. 3
figure 3

Forest plot of seven comparisons of the seven included studies describing lumbar disc herniation as specific pathology with the estimated sensitivity and specificity with accompanying 95% confidence intervals. TP true-positive, FP false-positive, FN false-negative, TN true-negative

All studies described the accuracy of CT in identifying lumbar disc herniation, containing a total of 498 discs explorations and 296 measurements on patient level. The prior probability of lumbar disc herniation, varied from 49.2 [24] to 90.5% [21]. In these studies, lumbar disc herniation was defined as herniated nucleus pulposus, including protruded, extruded or sequestrated disc, or causing nerve root compression. One study used expert panel consensus, a four stage process, as the reference standard resulting in a sensitivity of 94% (95% CI 73–100%) and a specificity of 64% (95% CI 35–87%) [27]. Six studies used surgical findings as the reference standard [2126]. We considered these studies sufficiently homogenous for a meta-analysis. The sensitivity and specificity of CT in identifying lumbar disc herniation in these studies ranged from 59 to 92% and from 45 to 87%, respectively. The results of the bivariate analysis are presented in the Table 2 and plotted in a ROC space (Fig. 4). The pooled summary estimate of sensitivity was 77.4% (95% CI 66.2–85.7%) and the pooled summary estimate of specificity was 73.7% (95% CI 61.8–82.9%), resulting in a positive likelihood ratio of 2.94, a negative likelihood ratio of 0.31, and a diagnostic odds ratio of 9.61.

Table 2 Results of the bivariate analysis with summary estimates of sensitivity, specificity, positive likelihood ratio (LR+), and negative likelihood ratio (LR−) and the accompanying diagnostic odds ratio (DOR) and the prior probability of lumbar disc herniation
Fig. 4
figure 4

Summary ROC plots of sensitivity and specificity of six studies describing the diagnostic accuracy of computed tomography with surgical findings as the reference standard and lumbar disc herniation as specific pathology. The width of the rectangles is proportional to the number of patients with possible or without lumbar disc herniation; the height of the blocks is proportional to the number of patients with lumbar disc herniation (proven or probable). The solid line is the summary ROC curve; the black spot is the mean value for sensitivity and specificity; the ellipse around the black spot represent the 95% confidence interval around this summary estimate

The influence of pre-defined potential sources of heterogeneity was determined by adding each individual QUADAS item as covariate to the bivariate model (Table 2). We assessed the influence of partial verification bias and used CT technology (items 4 and 13). Adding the item on partial verification bias to the model resulted in a pooled summary estimate of sensitivity and specificity of 76.7% (95% CI 64.7–85.6%) and 73.4% (95% CI 61.2–82.8%), respectively. Summary estimates of sensitivity and specificity changed to 79.1% (95% CI 65.0–88.5%) and 76.0% (95% CI 60.1–87.0%), respectively after adding the item on use of an appropriate CT technique as covariate to the model. The item for selection bias (item 1) was poorly described and could, therefore, not be added as a covariate to the model.

We were unable to evaluate the influence of differences in pathology and different reference standards on sensitivity and specificity, since six out of seven studies investigated the accuracy of CT in identifying lumbar disc herniation with surgical findings as the reference standard. Exploratory analysis on the influence of the use of a prospective versus a retrospective design and measurements at disc level versus patient level did not resulted in a different accuracy of CT.

Discussion

This review included seven studies on lumbar disc herniation, and found a pooled summary estimate of sensitivity of 77.4% and specificity of 73.7% for CT compared to surgical findings. This means that, a substantial part of the patients is still classified as false-negative and false-positive. The use of newer CT technique resulted in a slightly better accuracy compared to the use of old CT technology.

The results of this review should be interpreted with caution. First, prior probabilities of the underlying pathologies of LBP showed a large variation. The diagnostic value of CT also depends both on the prior probability of the underlying pathology in the investigated population. In general, a high prior probability results in a high positive diagnostic value and a low negative diagnostic value, and vice versa [28]. The large variation of prior probabilities might be due to the selection procedure of the patients, as in five out of seven studies the selection procedure was unclear or inadequate and therefore selection bias might have occurred. Besides, all included studies were performed in a secondary care setting, where patients often will have a higher prior probability due to referral.

Secondly, because of the absence of a clear gold reference standard studies were included if findings at surgery, expert panel opinion, diagnostic work-up, or MRI was used as reference standard. Finally, one study used expert panel consensus and six studies surgical findings as the reference standard. The problem with surgical populations is that only patients with a strong suspicion on a specific underlying pathology are subjected to surgery. Therefore, the results of these studies can easily be biased, leading to an overestimation of the diagnostic accuracy of the index test.

Thirdly, the accuracy of an index test also depends on the reliability of the test, definition of a positive result and used technology. As CT requires some degree of expertise it is not surprisingly that the reliability of CT varies considerably. None of the studies reported data on the observer variation. Therefore, the extent of the effect on the results cannot be estimated. Also, the CT technology used can be of influence on the diagnostic accuracy. Assessing the effect of the use of CT technology resulted in an increase in the sensitivity and specificity when using a newer CT technology. Most CT technology used in the studies found are rather outdated as the most recent study is published in 1993, probably modern technology, not yet evaluated in the studies available will show better results.

Finally, the diagnostic imaging studies reported their results on patient level as well as on disc level. Presenting the results on disc level will lead to multiple inclusions of the same patients. Besides, patients with signs of lumbar disc herniation are more likely to be subjected to multiple level testing than patients without these signs. This might lead to an overestimation of the diagnostic performance of CT. Here, four studies presented their results on disc level only, but an explorative subgroup analysis did not result in different pooled summary estimates.

Strengths and weakness of the review

This is the first systematic review that provides evidence on the diagnostic accuracy of CT in LBP patients. One of the limitations of this study was the use of a filter to limit the primary literature search. The filter was targeted on study design to overcome indexing problems related to terms like sensitivity, specificity, accuracy or predictive value. After a random check we assumed using this filter would not lead to missing relevant studies. Second, the generalisability of the results is limited mainly by poor reporting in the original studies, which lead to many unclear or inadequate scores on several QUADAS items. This means that the potential influence of bias is difficult to assess [29].

Implication to clinical practice

The summary estimates for sensitivity and specificity for CT in identifying lumbar disc herniation may be acceptable, but also demonstrates that a substantial part of the patients will be wrongly diagnosed. However, the accuracy of CT might differ between pathology, but no studies were found evaluating the accuracy of CT for pathologies such as vertebral cancer, infection and fractures and this remains unclear. The applicability to clinical practice also depends on the role to which the diagnostic test is allocated [30]. Here most studies present the separate diagnostic value of CT, although in clinical practice CT is part of the diagnostic process which might lead to a better diagnostic performance as a whole. Therefore, more research is needed before our results can be translated to clinical practice and policy.

Implication for research

Given the possible advantages of CT over MRI future research should focus on the diagnostic performance of up to date CT technology assessed in high quality prospective cohort studies with an unselected population of patients with LBP. In order to provide clear evidence when to use CT or not, analyses should be done on patient level and in combination with other diagnostic tools. Furthermore, in order to improve accuracy and completeness of reporting of accuracy studies, future studies should comply with the STARD initiative [31].