Skip to main content

Missing Data

  • Chapter
  • First Online:
Regression Methods in Biostatistics

Abstract

Missing data are a fact of life in medical research. Subjects refuse to answer sensitive questions (e.g., questions about income or drug use), are unable to complete an MRI exam because of metallic implants, or drop out of studies and do not contribute further data. In each of these cases, data are “missing” or not complete. How should this be accommodated in a data analysis? Statistical computing packages will typically drop from the analysis all observations that are missing any of the variables (outcomes or predictors). So, for example, a linear regression predicting a patient’s number of emergency room visits from their age, gender, race, income, and current drug use will drop any observation missing even one of those variables. Analysis of data using this strategy is called complete case analysis because it requires that the data be complete for all variables before that observation can be used in the analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Recall that the likelihood is the probability of observing the data. The probability of a specific count for a Poisson model, conditional on being 1 or greater is given by \(P(x) = \frac{{\lambda }^{x}{e}^{-\lambda }} {x!(1-{e}^{-\lambda })}\). The product over the entire sample is given by \(L = \frac{{\lambda }^{\Sigma {x}_{i}}{e}^{-n\lambda }} {\Pi {x}_{i}!{(1-{e}^{-\lambda })}^{n}}\), where n is the sample size, and x i is the count for individual i. It is equivalent and easier to maximize the logarithm of L. We can also ignore the factorial term which does not depend on λ, giving logL = Σx i log \(\lambda - n\lambda - n\log (1 - {e}^{\lambda })\).

References

  • Diggle, P. and Kenward, M. (1994). Informative drop-out in longitudinal data analysis (Disc: p73-93). Applied Statistics, 43, 49–73.

    Article  MATH  Google Scholar 

  • Hogan, J., Roy, J. and Korkontzelou, C. (2004). Handling drop-out in longitudinal studies. Statistics in Medicine, 23(9), 1455–1497.

    Article  Google Scholar 

  • Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523–539.

    Article  MathSciNet  Google Scholar 

  • Little, R. J. A. (1992). Regression with missing x’s: A review. Journal of the American Statistical Association, 83, 1227–1237.

    Article  MathSciNet  Google Scholar 

  • Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125–134.

    Article  MATH  Google Scholar 

  • Scheuren, F. (2005). Multiple imputation: How it began and continues. The American Statistician, 59, 315–319.

    Article  MathSciNet  Google Scholar 

  • Splieth, C., Steffen, H., Welk, A. and Schwahn, C. (2005). Responder and nonresponder analysis for a caries prevention program. Caries Research, 39, 269–272.

    Article  Google Scholar 

  • Subak, L., Wing, R., West, D., Franklin, F., Vittinghoff, E., Creasman, J., Richter, H., Myers, D., Burgio, K., Gorin, A., Macer, J., Kusek, J. and Investigators., D. G. P. (2009). Weight loss to treat urinary incon- tinence in overweight and obese women. New England Journal of Medicine, 360(5), 481–490.

    Google Scholar 

  • Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer, New York.

    MATH  Google Scholar 

  • White, I. R. and Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine, 28, 1982–1998.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Vittinghoff, E., Glidden, D.V., Shiboski, S.C., McCulloch, C.E. (2012). Missing Data. In: Regression Methods in Biostatistics. Statistics for Biology and Health. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1353-0_11

Download citation

Publish with us

Policies and ethics