Missing Data

Vittinghoff, Eric; Glidden, David V.; Shiboski, Stephen C.; McCulloch, Charles E.

doi:10.1007/978-1-4614-1353-0_11

Eric Vittinghoff⁵,
David V. Glidden⁵,
Stephen C. Shiboski⁵ &
…
Charles E. McCulloch⁶

Part of the book series: Statistics for Biology and Health ((SBH))

30k Accesses
1 Citations

Abstract

Missing data are a fact of life in medical research. Subjects refuse to answer sensitive questions (e.g., questions about income or drug use), are unable to complete an MRI exam because of metallic implants, or drop out of studies and do not contribute further data. In each of these cases, data are “missing” or not complete. How should this be accommodated in a data analysis? Statistical computing packages will typically drop from the analysis all observations that are missing any of the variables (outcomes or predictors). So, for example, a linear regression predicting a patient’s number of emergency room visits from their age, gender, race, income, and current drug use will drop any observation missing even one of those variables. Analysis of data using this strategy is called complete case analysis because it requires that the data be complete for all variables before that observation can be used in the analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Recall that the likelihood is the probability of observing the data. The probability of a specific count for a Poisson model, conditional on being 1 or greater is given by \(P(x) = \frac{{\lambda }^{x}{e}^{-\lambda }} {x!(1-{e}^{-\lambda })}\). The product over the entire sample is given by \(L = \frac{{\lambda }^{\Sigma {x}_{i}}{e}^{-n\lambda }} {\Pi {x}_{i}!{(1-{e}^{-\lambda })}^{n}}\), where n is the sample size, and x _i is the count for individual i. It is equivalent and easier to maximize the logarithm of L. We can also ignore the factorial term which does not depend on λ, giving logL = Σx _ilog \(\lambda - n\lambda - n\log (1 - {e}^{\lambda })\).

References

Diggle, P. and Kenward, M. (1994). Informative drop-out in longitudinal data analysis (Disc: p73-93). Applied Statistics, 43, 49–73.
Article MATH Google Scholar
Hogan, J., Roy, J. and Korkontzelou, C. (2004). Handling drop-out in longitudinal studies. Statistics in Medicine, 23(9), 1455–1497.
Article Google Scholar
Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523–539.
Article MathSciNet Google Scholar
Little, R. J. A. (1992). Regression with missing x’s: A review. Journal of the American Statistical Association, 83, 1227–1237.
Article MathSciNet Google Scholar
Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125–134.
Article MATH Google Scholar
Scheuren, F. (2005). Multiple imputation: How it began and continues. The American Statistician, 59, 315–319.
Article MathSciNet Google Scholar
Splieth, C., Steffen, H., Welk, A. and Schwahn, C. (2005). Responder and nonresponder analysis for a caries prevention program. Caries Research, 39, 269–272.
Article Google Scholar
Subak, L., Wing, R., West, D., Franklin, F., Vittinghoff, E., Creasman, J., Richter, H., Myers, D., Burgio, K., Gorin, A., Macer, J., Kusek, J. and Investigators., D. G. P. (2009). Weight loss to treat urinary incon- tinence in overweight and obese women. New England Journal of Medicine, 360(5), 481–490.
Google Scholar
Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer, New York.
MATH Google Scholar
White, I. R. and Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine, 28, 1982–1998.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Epidemiology and Biostatistics, University of California, San Francisco, Parnassas Ave. 500 MU-420 West, 94143, San Francisco, California, USA
Eric Vittinghoff, David V. Glidden & Stephen C. Shiboski
Department of Epidemiology and Biostatistics, University of California, San Francisco, Berry 185 Suite 5700, 94107, San Francisco, California, USA
Prof. Charles E. McCulloch

Authors

Eric Vittinghoff
View author publications
You can also search for this author in PubMed Google Scholar
David V. Glidden
View author publications
You can also search for this author in PubMed Google Scholar
Stephen C. Shiboski
View author publications
You can also search for this author in PubMed Google Scholar
Prof. Charles E. McCulloch
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vittinghoff, E., Glidden, D.V., Shiboski, S.C., McCulloch, C.E. (2012). Missing Data. In: Regression Methods in Biostatistics. Statistics for Biology and Health. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1353-0_11

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1353-0_11
Published: 17 January 2012
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-1352-3
Online ISBN: 978-1-4614-1353-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics