Skip to main content
Log in

Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

  • Short Note
  • Published:
Research in Higher Education Aims and scope Submit manuscript

Abstract

Higher education researchers using survey data often face decisions about handling missing data. Multiple imputation (MI) is considered by many statisticians to be the most appropriate technique for addressing missing data in many circumstances. In particular, it has been shown to be preferable to listwise deletion, which has historically been a commonly employed method for quantitative research. However, our analysis of a decade of higher education research literature reveals that the field has yet to make substantial use of this technique despite common employment of quantitative analysis, and that in research where MI is used, many recommended MI reporting practices are not being followed. We conclude that additional information about the technique and recommended reporting practices may help improve the quality of the research involving missing data. In an attempt to address this issue, we develop a set of reporting recommendations based on a synthesis of the MI methodological literature and offer a discussion of these recommendations oriented toward applied researchers. The recommended MI reporting practices involve describing the nature and structure of any missing data, describing the imputation model and procedures, and describing any notable imputation results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

  1. Maximum likelihood (ML) is another modern technique that is theoretically an excellent choice for handling missing data since it is always fully efficient (although MI can come close to full efficiency in practice with a high number of imputations, a bar no longer requiring unusual computing capacity), involves fewer implementation decisions than MI, and eliminates the possibility of conflicting imputation and analysis models (Allison 2012). However, ML also has some practical limitations (e.g. it is more frequently implemented in statistical packages for structural equation modeling (SEM) than for other forms of analysis, and even in SAS, which implements ML more than other software, it is not yet possible to estimate a logistic regression model). A full discussion of ML (Allison 2002; Cox et al. 2014; Enders 2010;) is beyond the scope of this research note.

  2. Annotated Stata code that uses MI with publicly available data from the National Center for Education Statistics and that illustrates the recommended practices we discuss in this paper is available from the authors upon request or from the UMass Amherst ScholarWorks website at http://works.bepress.com/ryan_wells/21/.

  3. To check the impact of uncertainty from missing data, check the “missing information,” a concept clearly explained by McKnight et al. (2007). Missing information (γ) gives a measure of the influence of missing data on the results of a statistical analysis for a certain number of imputations given the known correlations between variables. It is good to investigate convergence for variables with a high fraction of missing information.

  4. Evaluating the sensitivity of analysis results to the presence of nonrandom missing data is recommended, although it is not currently facilitated by standard statistical software and is beyond the scope of this paper (Allison 2002; Kenward and Carpenter 2007).

  5. Burn-in iterations are the number of times the imputation process is repeated prior to saving the first complete dataset to memory (e.g. saving a dataset as m = 1). For example, the default burn-in iteration number for Stata’s mi impute chained command is 10, and is 100 for mi impute mvn. For MVN, a different number of between-imputation iterations may also be selected (Stata’s default is 100), which refers to the number of times the imputation process is iterated between saving one complete dataset to memory and the next (e.g. between m = 1 and m = 2), and this convergence aspect should also be investigated (Enders 2010). The researcher should know and evaluate the adequacy of the default number in the software used.

  6. Another rule of thumb for reproducibility is to have m ≥ largest FMI (fraction of missing information) (StataCorp 2011). Since the FMI includes more information than just the missing data rate, it is an even better guide. However, since it is more complex to calculate and is not known prior to analysis, m ≥ percent of missing data is a good starting place.

References

  • Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Allison, P. D. (2012). Handling missing data by maximum likelihood. In Paper presented at the SAS Global Forum, Orlando, FL. http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf. Accessed 18 April 2013.

  • Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40–49. doi:10.1002/mpr.329.

    Article  Google Scholar 

  • Buhi, E. R., Goodson, P., & Neilands, T. B. (2008). Out of sight, not out of mind: Strategies for handling missing data. American Journal of Health Behavior, 32(1), 83–92.

    Article  Google Scholar 

  • Burton, A., & Altman, D. G. (2004). Missing covariate data within cancer prognostic studies: A review of current reporting and proposed guidelines. British Journal of Cancer, 91, 4–8. doi:10.1038/sj.bjc.6601907.

    Article  Google Scholar 

  • Collins, L. M., Schafer, J. L., & Kam, C.-M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4), 330–351. doi:10.1037/1082-989X.6.4.330.

    Article  Google Scholar 

  • Cox, B. E., McIntosh, K., Reason, R. D., & Terenzini, P. T. (2014). Working with missing data in higher education research: A primer and real-world example. The Review of Higher Education, 37(3), 377–402.

    Article  Google Scholar 

  • Craig, L. E., Wu, O., Gilmour, H., Barber, M., & Langhorne, P. (2011). Developing and validating a predictive model for stroke progression. Cerebrovascular Diseases Extra, 1(1), 105–114. doi:10.1159/000334473.

    Article  Google Scholar 

  • Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.

    Google Scholar 

  • Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576. doi:10.1146/annurev.psych.58.110405.085530.

    Article  Google Scholar 

  • Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213.

    Article  Google Scholar 

  • Heeringa, S., West, B. T., & Berglund, P. A. (2010). Applied survey data analysis. Boca Raton, FL: Chapman & Hall.

    Book  Google Scholar 

  • Hutchinson, S. R., & Lovell, C. D. (2004). A review of methodological characteristics of research published in key journals in higher education: Implications for graduate research training. Research in Higher Education, 45(4), 383–403. doi:10.1023/B:RIHE.0000027392.94172.d2.

    Article  Google Scholar 

  • Jelicic, H., Phelps, E., & Lerner, R. A. (2009). Use of missing data methods in longitudinal studies: The persistence of bad practices in developmental psychology. Developmental Psychology, 45(4), 1195–1199. doi:10.1037/A0015665.

    Article  Google Scholar 

  • Kenward, M. G., & Carpenter, J. R. (2007). Multiple imputation: Current perspectives. Statistical Methods in Medical Research, 16(3), 199–218. doi:10.1177/0962280206075304.

    Article  Google Scholar 

  • Klebanoff, M. A., & Cole, S. R. (2008). Use of multiple imputation in the epidemiologic literature. American Journal of Epidemiology, 168(4), 355–357. doi:10.1093/Aje/Kwn071.

    Article  Google Scholar 

  • Lee, K. J., & Carlin, J. B. (2010). Multiple imputation for missing data: Fully conditional specification versus multivariate normal imputation. American Journal of Epidemiology, 171(5), 624–632. doi:10.1093/Aje/Kwp425.

    Article  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • McKnight, P. E., McKnight, K. M., Sidani, S., & Figueredo, A. J. (2007). Missing data: A gentle introduction. New York: Guilford Press.

    Google Scholar 

  • Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74(4), 525–556. doi:10.3102/00346543074004525.

    Article  Google Scholar 

  • Royston, P. (2004). Multiple imputation of missing values. Stata Journal, 4(3), 227–241.

    Google Scholar 

  • Royston, P., & White, I. R. (2011). Multiple imputation by chained equations (MICE): Implementation in Stata. Journal of Statistical Software, 45(4), 1–20.

    Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.

    Article  Google Scholar 

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

    Book  Google Scholar 

  • Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3–15.

    Article  Google Scholar 

  • Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. doi:10.1037/1082-989X.7.2.147.

    Article  Google Scholar 

  • Social Science Computing Cooperative (2012). Multiple imputation in Stata: Introduction. University of Wisconsin, Madison. http://www.ssc.wisc.edu/sscc/pubs/stata_mi_intro.htm. Accessed 27 September 2012.

  • StataCorp, L. P. (2011). Stata multiple-imputation reference manual: Release 12. College Station, TX: Stata Press.

    Google Scholar 

  • Sterne, J. A. C., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., et al. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. British Medical Journal. doi:10.1136/bmj.b2393.

  • Treiman, D. J. (2009). Quantitative data analysis: Doing social research to test ideas. San Francisco: Jossey-Bass.

    Google Scholar 

  • van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16(3), 219–242. doi:10.1177/0962280206074463.

    Article  Google Scholar 

  • van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.

    Book  Google Scholar 

  • van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064. doi:10.1080/10629360600810434.

    Article  Google Scholar 

  • van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67.

    Google Scholar 

  • White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine, 30, 377–399. doi:10.1002/sim.4067.

    Article  Google Scholar 

  • Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594–604. doi:10.1037/0003-066X.54.8.594.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catherine A. Manly.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Manly, C.A., Wells, R.S. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research. Res High Educ 56, 397–409 (2015). https://doi.org/10.1007/s11162-014-9344-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11162-014-9344-9

Keywords

Navigation