Abstract
Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation.
Similar content being viewed by others
References
Arbuckle, J. L. (1997). Amos 3.6 for Windows. [computer program]. Chicago: SmallWaters Corporation.
Baker, S. G. & Laird, N. M. (1988). Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. Journal of the American Statistical Association 83: 62–70.
Basilevsky, A., Sabourin, D., Hum D. & Anderson, A. (1985). Missing data estimators in the general linear model: an evaluation of simulated data as an experimental design. Communications in Statistics – Simulation and Computation 14: 371–394.
Bernaards, C. A. & Sijtsma, K. (2000). Influence of imputation and EM methods on factor analysis when item nonresponse in questionnaire data is nonignorable. Multivariate Behavioral Research 35: 321–364.
Delucchi, K. L. (1994). Methods for the analysis of binary outcome results in the presence of missing data. Journal of Consulting and Clinical Psychology 62: 569–575.
Ezpeleta, L., de la Osa, N., Doménech, J. M., Navarro, J. B., Losilla, J. M. & Judez, J. (1997a). Diagnostic agreement between clinician and the structured Diagnostic Interview for Children and Adolescents--DICA-R – in an Outpatient Sample. Journal of Child Psychology and Psychiatry 38: 431–440.
Ezpeleta, L., de la Osa, N., Doménech, J. M., Navarro, J. B. & Losilla, J. M. (1997b). Test-retest reliability of the Spanish adaptation of the Diagnostic Interview of Children and Adolescents. Psicothema 9: 529–539.
Gold, M. S. & Bentler, P. M. (2000). Treatment of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation-maximization. Structural Equation Modelling 7, 319–355.
Graham, J. W. & Donaldson, S. I. (1993). Evaluating interventions with differential attrition: The importance of nonresponse mechanism and use of follow-up data. Journal of Applied Psychology 78: 119–128.
Graham, J. W., Hofer, S. M., Donaldson, S. I., MacKinnon, D. P. & Schafer, J. L. (1997), Analysis with missing data in prevention research. In K. Bryant, M. Windle & S. West (eds). The Science of Prevention: Methodological Advances from Alcohol and Substance Abuse Research.Washington, DC: American Psychological Association.
Graham, J. W., Hofer, S. M. & Piccinin, A. M. (1994). Analysis with missing data in drug prevention research. In L. M. Collins & L. A. Seitz (eds). Advances in Data Analysis for Prevention Intervention Research. NIDA Research Monograph. Series (#142), Washington, DC: National Institute on Drug Abuse.
Graham, J. W., Hofer, S. M. & Mackinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: an application of maximum likelihood procedures. Multivariate Behavioral Research 31: 197–218.
Greenlees, J. S., Reece, W. S. & Zieschang, K. D. (1982). Imputation of missing values when the probability of response depends on the variable being imputed. Journal of the American Statistical Association 77: 251–261.
Huisman, M., Krol, B. & Van Sonderen, E. (1998). Handling missing data by re-approaching nonrespondents. Quality & Quantity 32: 77–91.
Huisman, M. (2000). Imputation of missing item responses: some simple techniques. Quality & Quantity 34: 331–351.
King, G., Honaker, J., Joseph, A. & Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95: 49–69.
Kromrey, J. D. & Hines, C. V. (1994). Nonrandomly missing data in multiple regression: an empirical comparison of common missing-data treatments. Educational and Psychological Measurement 54: 573–593.
Landerman, L. R., Land, K. C. & Pieper, C. F. (1997). An empirical evaluation of the predictive mean matching method for imputing missing values. Sociological Methods and Research 26: 3–33.
Little, R. J. A. & Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York: Wiley.
Little, R. J. A. & Rubin, D. B. (1989). The analysis of social science data with missing values. Sociological Methods and Research 18: 292–326.
Little, R. J. A. (1992). Regression with missing X's: A review. Journal of the American Statistical Association 87: 1227–1237.
Moinpour, C.M., Triplett, J. S., McKnight, B., Lovato, L.C., Upchurch, C., Leichman, C. G., Muggia, F. M., Tanaka, L., James, W. A., Lennard, M. & Meyskens, F. L. (2000). Challenges posed by non-random missing quality of life data in an advanced-stage colorectal cancer clinical trial. Psycho-Oncology 9: 340–354.
Navarro, J. B. & Losilla, J. M. (2000). Analysis of incomplete data with artificial neural networks: a simulation study. Psicothema 12: 503–510.
Norris, C. M., Ghali, W. A., Knudtson, M. L., Naylor, C. D. & Saunders, L. D. (2000). Dealing with missing data in observational health care outcome analyses. Journal of Clinical Epidemiology 53: 377–383.
de la Osa, N., Ezpeleta, L., Doménech, J. M., Navarro, J. B. & Losilla, J. M. (1997). Convergent and discriminant validity of the Structured Diagnostic Interview for Children and Adolescents (DICA-R). Psychology in Spain 1: 37–44.
Othuon, L. O. (1999). The accuracy of parameter estimates and coverage probability of population values in regression models upon different treatments of systematically missing data. Dissertation Abstracts International Section A 59: 4359.
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: Wiley.
Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. London: Chapman and Hall.
Schafer, J. L. & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: a data analyst's perspective. Multivariate Behavioral Research 33, 545–571.
Simonoff, J. S. (1988). Regression diagnostics to detect non-random missingness in linear regression. Technometrics 30: 205–214.
SPSS Inc. (1999). Statistical Package for Social Sciences 10.0. [computer program]. Chicago: SPSS Inc.
Statistical Solutions Ltd. (1999). Solas 2.1 for Windows [computer program]. Cork, Ireland: Statistical Solutions Lts.
Taylor, M. A. & Amir, N. (1994). The problem of missing clinical data for research in psychopathology. Some solution guidelines. The Journal of Nervous and Mental Disease 182: 222–229.
The MathWorks Inc. (1998). Matlab 5.2 [computer program]. Natick, MA: The MathWorks Inc.
Von Eye, A. (1990). Statistical Methods in Longitudinal Research. San Diego: Academic Press.
Wilson WindowWare Inc. (1996). WinBatch 2.0 [computer program]. Seattle, WA: Wilson WindowWare Inc.
Wothke, W. (1998). Longitudinal and multi-group modelling with missing data. In T. D. Little, K. U. Schnabel & J. Baumert (eds).Modeling Longitudinal and Multiple Group Data: Practical Issues, Applied Approaches and Specific Examples. Mahwah, NJ: Lawrence Erlbaum Associates.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Navarro Pastor, J.B. Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random. Quality & Quantity 37, 363–376 (2003). https://doi.org/10.1023/A:1027323122628
Issue Date:
DOI: https://doi.org/10.1023/A:1027323122628