Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random

Navarro Pastor, José Blas

doi:10.1023/A:1027323122628

Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random

Published: November 2003

Volume 37, pages 363–376, (2003)
Cite this article

Quality and Quantity Aims and scope Submit manuscript

José Blas Navarro Pastor¹

226 Accesses
11 Citations
Explore all metrics

Abstract

Since the work of Little and Rubin (1987) not substantial advances in the analysisof explanatory regression models for incomplete data with missing not at randomhave been achieved, mainly due to the difficulty of verifying the randomness ofthe unknown data. In practice, the analysis of nonrandom missing data is donewith techniques designed for datasets with random or completely random missingdata, as complete case analysis, mean imputation, regression imputation, maximumlikelihood or multiple imputation. However, the data conditions required to minimizethe bias derived from an incorrect analysis have not been fully determined. In thepresent work, several Monte Carlo simulations have been carried out to establishthe best strategy of analysis for random missing data applicable in datasets withnonrandom missing data. The factors involved in simulations are sample size,percentage of missing data, predictive power of the imputation model and existenceof interaction between predictors. The results show that the smallest bias is obtainedwith maximum likelihood and multiple imputation techniques, although with lowpercentages of missing data, absence of interaction and high predictive power ofthe imputation model (frequent data structures in research on child and adolescentpsychopathology) acceptable results are obtained with the simplest regression imputation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

References

Arbuckle, J. L. (1997). Amos 3.6 for Windows. [computer program]. Chicago: SmallWaters Corporation.
Google Scholar
Baker, S. G. & Laird, N. M. (1988). Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. Journal of the American Statistical Association 83: 62–70.
Google Scholar
Basilevsky, A., Sabourin, D., Hum D. & Anderson, A. (1985). Missing data estimators in the general linear model: an evaluation of simulated data as an experimental design. Communications in Statistics – Simulation and Computation 14: 371–394.
Google Scholar
Bernaards, C. A. & Sijtsma, K. (2000). Influence of imputation and EM methods on factor analysis when item nonresponse in questionnaire data is nonignorable. Multivariate Behavioral Research 35: 321–364.
Google Scholar
Delucchi, K. L. (1994). Methods for the analysis of binary outcome results in the presence of missing data. Journal of Consulting and Clinical Psychology 62: 569–575.
Google Scholar
Ezpeleta, L., de la Osa, N., Doménech, J. M., Navarro, J. B., Losilla, J. M. & Judez, J. (1997a). Diagnostic agreement between clinician and the structured Diagnostic Interview for Children and Adolescents--DICA-R – in an Outpatient Sample. Journal of Child Psychology and Psychiatry 38: 431–440.
Google Scholar
Ezpeleta, L., de la Osa, N., Doménech, J. M., Navarro, J. B. & Losilla, J. M. (1997b). Test-retest reliability of the Spanish adaptation of the Diagnostic Interview of Children and Adolescents. Psicothema 9: 529–539.
Google Scholar
Gold, M. S. & Bentler, P. M. (2000). Treatment of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation-maximization. Structural Equation Modelling 7, 319–355.
Google Scholar
Graham, J. W. & Donaldson, S. I. (1993). Evaluating interventions with differential attrition: The importance of nonresponse mechanism and use of follow-up data. Journal of Applied Psychology 78: 119–128.
Google Scholar
Graham, J. W., Hofer, S. M., Donaldson, S. I., MacKinnon, D. P. & Schafer, J. L. (1997), Analysis with missing data in prevention research. In K. Bryant, M. Windle & S. West (eds). The Science of Prevention: Methodological Advances from Alcohol and Substance Abuse Research.Washington, DC: American Psychological Association.
Google Scholar
Graham, J. W., Hofer, S. M. & Piccinin, A. M. (1994). Analysis with missing data in drug prevention research. In L. M. Collins & L. A. Seitz (eds). Advances in Data Analysis for Prevention Intervention Research. NIDA Research Monograph. Series (#142), Washington, DC: National Institute on Drug Abuse.
Google Scholar
Graham, J. W., Hofer, S. M. & Mackinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: an application of maximum likelihood procedures. Multivariate Behavioral Research 31: 197–218.
Google Scholar
Greenlees, J. S., Reece, W. S. & Zieschang, K. D. (1982). Imputation of missing values when the probability of response depends on the variable being imputed. Journal of the American Statistical Association 77: 251–261.
Google Scholar
Huisman, M., Krol, B. & Van Sonderen, E. (1998). Handling missing data by re-approaching nonrespondents. Quality & Quantity 32: 77–91.
Google Scholar
Huisman, M. (2000). Imputation of missing item responses: some simple techniques. Quality & Quantity 34: 331–351.
Google Scholar
King, G., Honaker, J., Joseph, A. & Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95: 49–69.
Google Scholar
Kromrey, J. D. & Hines, C. V. (1994). Nonrandomly missing data in multiple regression: an empirical comparison of common missing-data treatments. Educational and Psychological Measurement 54: 573–593.
Google Scholar
Landerman, L. R., Land, K. C. & Pieper, C. F. (1997). An empirical evaluation of the predictive mean matching method for imputing missing values. Sociological Methods and Research 26: 3–33.
Google Scholar
Little, R. J. A. & Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York: Wiley.
Google Scholar
Little, R. J. A. & Rubin, D. B. (1989). The analysis of social science data with missing values. Sociological Methods and Research 18: 292–326.
Google Scholar
Little, R. J. A. (1992). Regression with missing X's: A review. Journal of the American Statistical Association 87: 1227–1237.
Google Scholar
Moinpour, C.M., Triplett, J. S., McKnight, B., Lovato, L.C., Upchurch, C., Leichman, C. G., Muggia, F. M., Tanaka, L., James, W. A., Lennard, M. & Meyskens, F. L. (2000). Challenges posed by non-random missing quality of life data in an advanced-stage colorectal cancer clinical trial. Psycho-Oncology 9: 340–354.
Google Scholar
Navarro, J. B. & Losilla, J. M. (2000). Analysis of incomplete data with artificial neural networks: a simulation study. Psicothema 12: 503–510.
Google Scholar
Norris, C. M., Ghali, W. A., Knudtson, M. L., Naylor, C. D. & Saunders, L. D. (2000). Dealing with missing data in observational health care outcome analyses. Journal of Clinical Epidemiology 53: 377–383.
Google Scholar
de la Osa, N., Ezpeleta, L., Doménech, J. M., Navarro, J. B. & Losilla, J. M. (1997). Convergent and discriminant validity of the Structured Diagnostic Interview for Children and Adolescents (DICA-R). Psychology in Spain 1: 37–44.
Google Scholar
Othuon, L. O. (1999). The accuracy of parameter estimates and coverage probability of population values in regression models upon different treatments of systematically missing data. Dissertation Abstracts International Section A 59: 4359.
Google Scholar
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: Wiley.
Google Scholar
Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. London: Chapman and Hall.
Google Scholar
Schafer, J. L. & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: a data analyst's perspective. Multivariate Behavioral Research 33, 545–571.
Google Scholar
Simonoff, J. S. (1988). Regression diagnostics to detect non-random missingness in linear regression. Technometrics 30: 205–214.
Google Scholar
SPSS Inc. (1999). Statistical Package for Social Sciences 10.0. [computer program]. Chicago: SPSS Inc.
Google Scholar
Statistical Solutions Ltd. (1999). Solas 2.1 for Windows [computer program]. Cork, Ireland: Statistical Solutions Lts.
Google Scholar
Taylor, M. A. & Amir, N. (1994). The problem of missing clinical data for research in psychopathology. Some solution guidelines. The Journal of Nervous and Mental Disease 182: 222–229.
Google Scholar
The MathWorks Inc. (1998). Matlab 5.2 [computer program]. Natick, MA: The MathWorks Inc.
Google Scholar
Von Eye, A. (1990). Statistical Methods in Longitudinal Research. San Diego: Academic Press.
Google Scholar
Wilson WindowWare Inc. (1996). WinBatch 2.0 [computer program]. Seattle, WA: Wilson WindowWare Inc.
Google Scholar
Wothke, W. (1998). Longitudinal and multi-group modelling with missing data. In T. D. Little, K. U. Schnabel & J. Baumert (eds).Modeling Longitudinal and Multiple Group Data: Practical Issues, Applied Approaches and Specific Examples. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar

Download references

Author information

Authors and Affiliations

Departament de Psicobiologia i Metodologia, Universitat Autónoma de Barcelona, Edifici B, 08193, Bellaterra, Spain
José Blas Navarro Pastor

Authors

José Blas Navarro Pastor
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Navarro Pastor, J.B. Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random. Quality & Quantity 37, 363–376 (2003). https://doi.org/10.1023/A:1027323122628

Download citation

Issue Date: November 2003
DOI: https://doi.org/10.1023/A:1027323122628

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Methods for the Analysis of Explanatory Linear Regression Models with Missing Data Not at Random

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation