Evaluating Bias and Variability in Diagnostic Test Reports,☆☆,

https://doi.org/10.1016/S0196-0644(99)70422-1Get rights and content

Abstract

Diagnostic testing is an important component of modern medical care. Unfortunately, many diagnostic tests are not rigorously evaluated before general application. Studies examining test characteristics often have methodologic flaws that impair their ability to provide reliable information on test performance. These flaws can introduce systematic nonrandom errors (biases) that distort measures of test accuracy. Other design errors can make it difficult to generalize the results of individual studies. These problems may enhance the apparent performance of poor tests while obscuring the performance of good tests, and they may result in the widespread use of tests with uncertain or limited efficacy. This article explores the ways in which studies of diagnostic test efficacy can be affected by bias and variability.

[Mower WR: Evaluating bias and variability in diagnostic test reports. Ann Emerg Med January 1999;33:85-91.]

Section snippets

INTRODUCTION

Diagnostic tests are an important component of modern medical care. They account for almost 25% of all US ambulatory care expenditures1 and have transformed clinical care over the past 2 decades.2 Physicians order diagnostic tests with the expectation that test results will clarify diagnostic thinking, influence therapeutic choices, and ultimately improve patient outcomes. This practice implicitly assumes that test results provide reliable information about disease status in individual patients.

SOURCES OF BIAS AND VARIABILITY

True biases arise from methodologic flaws that distort measures of test efficacy such as sensitivity and specificity. To accurately assess test efficacy, an investigator must reliably determine both disease status and test results. These 2 determinations must be made independently and accurately, or the resulting statistical associations may yield erroneously high or low measures of test efficacy.5 Any factor that influences the assessment of disease status or test results can produce bias.

Workup bias (verification bias)

Workup or verification bias occurs when a study is restricted to patients who have definitive verification of disease. Bias is introduced when patients with a positive (or negative) diagnostic test are preferentially selected to receive verification by the “gold standard” examination. In the case of positive test results, the patients selected for additional workup are more likely to have disease than those excluded and therefore are more likely to have a true-positive result. Alternatively,

Spectrum and subgroup biases (case-mix bias)

Indices of test efficacy, such as sensitivity, specificity, and likelihood ratios, are often considered to be fixed properties of a test that do not vary as disease prevalence changes among a uniform population. However, they can vary substantially when measured in different patient populations.2, 5, 6, 11 Test indices are particularly vulnerable to variation when they are measured in populations defined by characteristics such as demographic features (age, sex, race), clinical presentation

ASSESSING THE LITERATURE

Judgments on diagnostic tests should be based on systematic review of the relevant literature using sound scientific principles. Not all articles are equally important. Diagnostic test reports should be examined carefully to determine whether crucial information is present, accounted for, or absent. Reports with significant biases and poor documentation should be discarded. Articles with sound methodology and careful documentation should be given detailed consideration.11 The Figure summarizes

References (40)

  • CB Begg et al.

    Assessment of radiologic tests: Control of bias and other design considerations

    Radiology

    (1988)
  • The PIOPED Investigators

    Value of the ventilation-perfusion scan in acute pulmonary embolism: Results of the Prospective Investigation of Pulmonary Embolism Diagnosis (PIOPED)

    JAMA

    (1990)
  • RJ Panzer et al.

    Workup bias in prediction research

    Med Decis Making

    (1987)
  • J Klein et al.

    Is “silent” myocardial ischemia really as severe as symptomatic ischemia? The analytical effect of patient selection bias

    Circulation

    (1994)
  • RH Fletcher et al.
  • RA Greenes et al.

    Assessment of diagnostic technologies: Methodology for unbiased estimation from samples of selectively verified patients

    Invest Radiol

    (1985)
  • P Doubilet et al.

    Interpretation of radiographs: Effect of clinical history

    AJR Am J Roentgenol

    (1981)
  • KS Berbaum et al.

    Tentative diagnoses facilitate the detection of diverse lesions in chest radiographs

    Invest Radiol

    (1986)
  • BJ McNeil et al.

    Paired receiver operating characteristic curves and the effect of history on radiographic interpretation: CT of the head as a case study

    Radiology

    (1983)
  • JT Ennis et al.

    Value of infarct-specific isotope (99m Tc-labeled stannous pyrophosphate) in myocardial scanning

    BMJ

    (1975)
  • Cited by (0)

    Address for reprints: William R Mower, MD, PhD, UCLA Emergency Medicine Center, 924 Westwood Boulevard, Suite 300, Los Angeles, CA 90024;E-mail [email protected].

    ☆☆

    0196-0644/99/$8.00 +0

    47/1/94611

    View full text