Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

Manly, Catherine A.; Wells, Ryan S.

doi:10.1007/s11162-014-9344-9

Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

Short Note
Published: 20 July 2014

Volume 56, pages 397–409, (2015)
Cite this article

Research in Higher Education Aims and scope Submit manuscript

Catherine A. Manly¹ &
Ryan S. Wells¹

6464 Accesses
118 Citations
2 Altmetric
Explore all metrics

Abstract

Higher education researchers using survey data often face decisions about handling missing data. Multiple imputation (MI) is considered by many statisticians to be the most appropriate technique for addressing missing data in many circumstances. In particular, it has been shown to be preferable to listwise deletion, which has historically been a commonly employed method for quantitative research. However, our analysis of a decade of higher education research literature reveals that the field has yet to make substantial use of this technique despite common employment of quantitative analysis, and that in research where MI is used, many recommended MI reporting practices are not being followed. We conclude that additional information about the technique and recommended reporting practices may help improve the quality of the research involving missing data. In an attempt to address this issue, we develop a set of reporting recommendations based on a synthesis of the MI methodological literature and offer a discussion of these recommendations oriented toward applied researchers. The recommended MI reporting practices involve describing the nature and structure of any missing data, describing the imputation model and procedures, and describing any notable imputation results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Maximum likelihood (ML) is another modern technique that is theoretically an excellent choice for handling missing data since it is always fully efficient (although MI can come close to full efficiency in practice with a high number of imputations, a bar no longer requiring unusual computing capacity), involves fewer implementation decisions than MI, and eliminates the possibility of conflicting imputation and analysis models (Allison 2012). However, ML also has some practical limitations (e.g. it is more frequently implemented in statistical packages for structural equation modeling (SEM) than for other forms of analysis, and even in SAS, which implements ML more than other software, it is not yet possible to estimate a logistic regression model). A full discussion of ML (Allison 2002; Cox et al. 2014; Enders 2010;) is beyond the scope of this research note.
Annotated Stata code that uses MI with publicly available data from the National Center for Education Statistics and that illustrates the recommended practices we discuss in this paper is available from the authors upon request or from the UMass Amherst ScholarWorks website at http://works.bepress.com/ryan_wells/21/.
To check the impact of uncertainty from missing data, check the “missing information,” a concept clearly explained by McKnight et al. (2007). Missing information (γ) gives a measure of the influence of missing data on the results of a statistical analysis for a certain number of imputations given the known correlations between variables. It is good to investigate convergence for variables with a high fraction of missing information.
Evaluating the sensitivity of analysis results to the presence of nonrandom missing data is recommended, although it is not currently facilitated by standard statistical software and is beyond the scope of this paper (Allison 2002; Kenward and Carpenter 2007).
Burn-in iterations are the number of times the imputation process is repeated prior to saving the first complete dataset to memory (e.g. saving a dataset as m = 1). For example, the default burn-in iteration number for Stata’s mi impute chained command is 10, and is 100 for mi impute mvn. For MVN, a different number of between-imputation iterations may also be selected (Stata’s default is 100), which refers to the number of times the imputation process is iterated between saving one complete dataset to memory and the next (e.g. between m = 1 and m = 2), and this convergence aspect should also be investigated (Enders 2010). The researcher should know and evaluate the adequacy of the default number in the software used.
Another rule of thumb for reproducibility is to have m ≥ largest FMI (fraction of missing information) (StataCorp 2011). Since the FMI includes more information than just the missing data rate, it is an even better guide. However, since it is more complex to calculate and is not known prior to analysis, m ≥ percent of missing data is a good starting place.

References

Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage.
Google Scholar
Allison, P. D. (2012). Handling missing data by maximum likelihood. In Paper presented at the SAS Global Forum, Orlando, FL. http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf. Accessed 18 April 2013.
Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40–49. doi:10.1002/mpr.329.
Article Google Scholar
Buhi, E. R., Goodson, P., & Neilands, T. B. (2008). Out of sight, not out of mind: Strategies for handling missing data. American Journal of Health Behavior, 32(1), 83–92.
Article Google Scholar
Burton, A., & Altman, D. G. (2004). Missing covariate data within cancer prognostic studies: A review of current reporting and proposed guidelines. British Journal of Cancer, 91, 4–8. doi:10.1038/sj.bjc.6601907.
Article Google Scholar
Collins, L. M., Schafer, J. L., & Kam, C.-M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4), 330–351. doi:10.1037/1082-989X.6.4.330.
Article Google Scholar
Cox, B. E., McIntosh, K., Reason, R. D., & Terenzini, P. T. (2014). Working with missing data in higher education research: A primer and real-world example. The Review of Higher Education, 37(3), 377–402.
Article Google Scholar
Craig, L. E., Wu, O., Gilmour, H., Barber, M., & Langhorne, P. (2011). Developing and validating a predictive model for stroke progression. Cerebrovascular Diseases Extra, 1(1), 105–114. doi:10.1159/000334473.
Article Google Scholar
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.
Google Scholar
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576. doi:10.1146/annurev.psych.58.110405.085530.
Article Google Scholar
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213.
Article Google Scholar
Heeringa, S., West, B. T., & Berglund, P. A. (2010). Applied survey data analysis. Boca Raton, FL: Chapman & Hall.
Book Google Scholar
Hutchinson, S. R., & Lovell, C. D. (2004). A review of methodological characteristics of research published in key journals in higher education: Implications for graduate research training. Research in Higher Education, 45(4), 383–403. doi:10.1023/B:RIHE.0000027392.94172.d2.
Article Google Scholar
Jelicic, H., Phelps, E., & Lerner, R. A. (2009). Use of missing data methods in longitudinal studies: The persistence of bad practices in developmental psychology. Developmental Psychology, 45(4), 1195–1199. doi:10.1037/A0015665.
Article Google Scholar
Kenward, M. G., & Carpenter, J. R. (2007). Multiple imputation: Current perspectives. Statistical Methods in Medical Research, 16(3), 199–218. doi:10.1177/0962280206075304.
Article Google Scholar
Klebanoff, M. A., & Cole, S. R. (2008). Use of multiple imputation in the epidemiologic literature. American Journal of Epidemiology, 168(4), 355–357. doi:10.1093/Aje/Kwn071.
Article Google Scholar
Lee, K. J., & Carlin, J. B. (2010). Multiple imputation for missing data: Fully conditional specification versus multivariate normal imputation. American Journal of Epidemiology, 171(5), 624–632. doi:10.1093/Aje/Kwp425.
Article Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken, NJ: Wiley.
Book Google Scholar
McKnight, P. E., McKnight, K. M., Sidani, S., & Figueredo, A. J. (2007). Missing data: A gentle introduction. New York: Guilford Press.
Google Scholar
Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74(4), 525–556. doi:10.3102/00346543074004525.
Article Google Scholar
Royston, P. (2004). Multiple imputation of missing values. Stata Journal, 4(3), 227–241.
Google Scholar
Royston, P., & White, I. R. (2011). Multiple imputation by chained equations (MICE): Implementation in Stata. Journal of Statistical Software, 45(4), 1–20.
Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.
Article Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Book Google Scholar
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3–15.
Article Google Scholar
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. doi:10.1037/1082-989X.7.2.147.
Article Google Scholar
Social Science Computing Cooperative (2012). Multiple imputation in Stata: Introduction. University of Wisconsin, Madison. http://www.ssc.wisc.edu/sscc/pubs/stata_mi_intro.htm. Accessed 27 September 2012.
StataCorp, L. P. (2011). Stata multiple-imputation reference manual: Release 12. College Station, TX: Stata Press.
Google Scholar
Sterne, J. A. C., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., et al. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. British Medical Journal. doi:10.1136/bmj.b2393.
Treiman, D. J. (2009). Quantitative data analysis: Doing social research to test ideas. San Francisco: Jossey-Bass.
Google Scholar
van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16(3), 219–242. doi:10.1177/0962280206074463.
Article Google Scholar
van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.
Book Google Scholar
van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064. doi:10.1080/10629360600810434.
Article Google Scholar
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67.
Google Scholar
White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine, 30, 377–399. doi:10.1002/sim.4067.
Article Google Scholar
Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594–604. doi:10.1037/0003-066X.54.8.594.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Massachusetts, Amherst, 256 Hills House South, Amherst, MA, 01003, USA
Catherine A. Manly & Ryan S. Wells

Authors

Catherine A. Manly
View author publications
You can also search for this author in PubMed Google Scholar
Ryan S. Wells
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Catherine A. Manly.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Manly, C.A., Wells, R.S. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research. Res High Educ 56, 397–409 (2015). https://doi.org/10.1007/s11162-014-9344-9

Download citation

Received: 18 September 2013
Published: 20 July 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s11162-014-9344-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

Abstract

Access this article

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation