Skip to main content
Log in

Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random

  • Published:
Quality and Quantity Aims and scope Submit manuscript

Abstract

Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arbuckle, J. L. (1997). Amos 3.6 for Windows. [computer program]. Chicago: SmallWaters Corporation.

    Google Scholar 

  • Baker, S. G. & Laird, N. M. (1988). Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. Journal of the American Statistical Association 83: 62–70.

    Google Scholar 

  • Basilevsky, A., Sabourin, D., Hum D. & Anderson, A. (1985). Missing data estimators in the general linear model: an evaluation of simulated data as an experimental design. Communications in Statistics – Simulation and Computation 14: 371–394.

    Google Scholar 

  • Bernaards, C. A. & Sijtsma, K. (2000). Influence of imputation and EM methods on factor analysis when item nonresponse in questionnaire data is nonignorable. Multivariate Behavioral Research 35: 321–364.

    Google Scholar 

  • Delucchi, K. L. (1994). Methods for the analysis of binary outcome results in the presence of missing data. Journal of Consulting and Clinical Psychology 62: 569–575.

    Google Scholar 

  • Ezpeleta, L., de la Osa, N., Doménech, J. M., Navarro, J. B., Losilla, J. M. & Judez, J. (1997a). Diagnostic agreement between clinician and the structured Diagnostic Interview for Children and Adolescents--DICA-R – in an Outpatient Sample. Journal of Child Psychology and Psychiatry 38: 431–440.

    Google Scholar 

  • Ezpeleta, L., de la Osa, N., Doménech, J. M., Navarro, J. B. & Losilla, J. M. (1997b). Test-retest reliability of the Spanish adaptation of the Diagnostic Interview of Children and Adolescents. Psicothema 9: 529–539.

    Google Scholar 

  • Gold, M. S. & Bentler, P. M. (2000). Treatment of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation-maximization. Structural Equation Modelling 7, 319–355.

    Google Scholar 

  • Graham, J. W. & Donaldson, S. I. (1993). Evaluating interventions with differential attrition: The importance of nonresponse mechanism and use of follow-up data. Journal of Applied Psychology 78: 119–128.

    Google Scholar 

  • Graham, J. W., Hofer, S. M., Donaldson, S. I., MacKinnon, D. P. & Schafer, J. L. (1997), Analysis with missing data in prevention research. In K. Bryant, M. Windle & S. West (eds). The Science of Prevention: Methodological Advances from Alcohol and Substance Abuse Research.Washington, DC: American Psychological Association.

    Google Scholar 

  • Graham, J. W., Hofer, S. M. & Piccinin, A. M. (1994). Analysis with missing data in drug prevention research. In L. M. Collins & L. A. Seitz (eds). Advances in Data Analysis for Prevention Intervention Research. NIDA Research Monograph. Series (#142), Washington, DC: National Institute on Drug Abuse.

    Google Scholar 

  • Graham, J. W., Hofer, S. M. & Mackinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: an application of maximum likelihood procedures. Multivariate Behavioral Research 31: 197–218.

    Google Scholar 

  • Greenlees, J. S., Reece, W. S. & Zieschang, K. D. (1982). Imputation of missing values when the probability of response depends on the variable being imputed. Journal of the American Statistical Association 77: 251–261.

    Google Scholar 

  • Huisman, M., Krol, B. & Van Sonderen, E. (1998). Handling missing data by re-approaching nonrespondents. Quality & Quantity 32: 77–91.

    Google Scholar 

  • Huisman, M. (2000). Imputation of missing item responses: some simple techniques. Quality & Quantity 34: 331–351.

    Google Scholar 

  • King, G., Honaker, J., Joseph, A. & Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95: 49–69.

    Google Scholar 

  • Kromrey, J. D. & Hines, C. V. (1994). Nonrandomly missing data in multiple regression: an empirical comparison of common missing-data treatments. Educational and Psychological Measurement 54: 573–593.

    Google Scholar 

  • Landerman, L. R., Land, K. C. & Pieper, C. F. (1997). An empirical evaluation of the predictive mean matching method for imputing missing values. Sociological Methods and Research 26: 3–33.

    Google Scholar 

  • Little, R. J. A. & Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York: Wiley.

    Google Scholar 

  • Little, R. J. A. & Rubin, D. B. (1989). The analysis of social science data with missing values. Sociological Methods and Research 18: 292–326.

    Google Scholar 

  • Little, R. J. A. (1992). Regression with missing X's: A review. Journal of the American Statistical Association 87: 1227–1237.

    Google Scholar 

  • Moinpour, C.M., Triplett, J. S., McKnight, B., Lovato, L.C., Upchurch, C., Leichman, C. G., Muggia, F. M., Tanaka, L., James, W. A., Lennard, M. & Meyskens, F. L. (2000). Challenges posed by non-random missing quality of life data in an advanced-stage colorectal cancer clinical trial. Psycho-Oncology 9: 340–354.

    Google Scholar 

  • Navarro, J. B. & Losilla, J. M. (2000). Analysis of incomplete data with artificial neural networks: a simulation study. Psicothema 12: 503–510.

    Google Scholar 

  • Norris, C. M., Ghali, W. A., Knudtson, M. L., Naylor, C. D. & Saunders, L. D. (2000). Dealing with missing data in observational health care outcome analyses. Journal of Clinical Epidemiology 53: 377–383.

    Google Scholar 

  • de la Osa, N., Ezpeleta, L., Doménech, J. M., Navarro, J. B. & Losilla, J. M. (1997). Convergent and discriminant validity of the Structured Diagnostic Interview for Children and Adolescents (DICA-R). Psychology in Spain 1: 37–44.

    Google Scholar 

  • Othuon, L. O. (1999). The accuracy of parameter estimates and coverage probability of population values in regression models upon different treatments of systematically missing data. Dissertation Abstracts International Section A 59: 4359.

    Google Scholar 

  • Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: Wiley.

    Google Scholar 

  • Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. London: Chapman and Hall.

    Google Scholar 

  • Schafer, J. L. & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: a data analyst's perspective. Multivariate Behavioral Research 33, 545–571.

    Google Scholar 

  • Simonoff, J. S. (1988). Regression diagnostics to detect non-random missingness in linear regression. Technometrics 30: 205–214.

    Google Scholar 

  • SPSS Inc. (1999). Statistical Package for Social Sciences 10.0. [computer program]. Chicago: SPSS Inc.

    Google Scholar 

  • Statistical Solutions Ltd. (1999). Solas 2.1 for Windows [computer program]. Cork, Ireland: Statistical Solutions Lts.

    Google Scholar 

  • Taylor, M. A. & Amir, N. (1994). The problem of missing clinical data for research in psychopathology. Some solution guidelines. The Journal of Nervous and Mental Disease 182: 222–229.

    Google Scholar 

  • The MathWorks Inc. (1998). Matlab 5.2 [computer program]. Natick, MA: The MathWorks Inc.

    Google Scholar 

  • Von Eye, A. (1990). Statistical Methods in Longitudinal Research. San Diego: Academic Press.

    Google Scholar 

  • Wilson WindowWare Inc. (1996). WinBatch 2.0 [computer program]. Seattle, WA: Wilson WindowWare Inc.

    Google Scholar 

  • Wothke, W. (1998). Longitudinal and multi-group modelling with missing data. In T. D. Little, K. U. Schnabel & J. Baumert (eds).Modeling Longitudinal and Multiple Group Data: Practical Issues, Applied Approaches and Specific Examples. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Navarro Pastor, J.B. Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random. Quality & Quantity 37, 363–376 (2003). https://doi.org/10.1023/A:1027323122628

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1027323122628

Navigation