Abstract
Missing data are a fact of life in medical research. Subjects refuse to answer sensitive questions (e.g., questions about income or drug use), are unable to complete an MRI exam because of metallic implants, or drop out of studies and do not contribute further data. In each of these cases, data are “missing” or not complete. How should this be accommodated in a data analysis? Statistical computing packages will typically drop from the analysis all observations that are missing any of the variables (outcomes or predictors). So, for example, a linear regression predicting a patient’s number of emergency room visits from their age, gender, race, income, and current drug use will drop any observation missing even one of those variables. Analysis of data using this strategy is called complete case analysis because it requires that the data be complete for all variables before that observation can be used in the analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Recall that the likelihood is the probability of observing the data. The probability of a specific count for a Poisson model, conditional on being 1 or greater is given by \(P(x) = \frac{{\lambda }^{x}{e}^{-\lambda }} {x!(1-{e}^{-\lambda })}\). The product over the entire sample is given by \(L = \frac{{\lambda }^{\Sigma {x}_{i}}{e}^{-n\lambda }} {\Pi {x}_{i}!{(1-{e}^{-\lambda })}^{n}}\), where n is the sample size, and x i is the count for individual i. It is equivalent and easier to maximize the logarithm of L. We can also ignore the factorial term which does not depend on λ, giving logL = Σx i log \(\lambda - n\lambda - n\log (1 - {e}^{\lambda })\).
References
Diggle, P. and Kenward, M. (1994). Informative drop-out in longitudinal data analysis (Disc: p73-93). Applied Statistics, 43, 49–73.
Hogan, J., Roy, J. and Korkontzelou, C. (2004). Handling drop-out in longitudinal studies. Statistics in Medicine, 23(9), 1455–1497.
Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523–539.
Little, R. J. A. (1992). Regression with missing x’s: A review. Journal of the American Statistical Association, 83, 1227–1237.
Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125–134.
Scheuren, F. (2005). Multiple imputation: How it began and continues. The American Statistician, 59, 315–319.
Splieth, C., Steffen, H., Welk, A. and Schwahn, C. (2005). Responder and nonresponder analysis for a caries prevention program. Caries Research, 39, 269–272.
Subak, L., Wing, R., West, D., Franklin, F., Vittinghoff, E., Creasman, J., Richter, H., Myers, D., Burgio, K., Gorin, A., Macer, J., Kusek, J. and Investigators., D. G. P. (2009). Weight loss to treat urinary incon- tinence in overweight and obese women. New England Journal of Medicine, 360(5), 481–490.
Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer, New York.
White, I. R. and Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine, 28, 1982–1998.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Vittinghoff, E., Glidden, D.V., Shiboski, S.C., McCulloch, C.E. (2012). Missing Data. In: Regression Methods in Biostatistics. Statistics for Biology and Health. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1353-0_11
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1353-0_11
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-1352-3
Online ISBN: 978-1-4614-1353-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)