Skip to main content
Log in

The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration

  • Review Article
  • Published:
Educational Psychology Review Aims and scope Submit manuscript

Abstract

Multilevel models are an increasingly popular method to analyze data that originate from a clustered or hierarchical structure. To effectively utilize multilevel models, one must have an adequately large number of clusters; otherwise, some model parameters will be estimated with bias. The goals for this paper are to (1) raise awareness of the problems associated with a small number of clusters, (2) review previous studies on multilevel models with a small number of clusters, (3) to provide an illustrative simulation to demonstrate how a simple model becomes adversely affected by small numbers of clusters, (4) to provide researchers with remedies if they encounter clustered data with a small number of clusters, and (5) to outline methodological topics that have yet to be addressed in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. On a more technical note, population values for standard errors cannot be directly set within a simulation design, so other values must be used to assess the variability of estimates in the population (of which the standard error is an estimate). Although there are different values that can be used for such a purpose, the prevailing technique in the reviewed studies was to use the variability of the parameter estimates across replications. Because the same technique was implemented across studies, it is reasonable to compare these values across studies.

  2. It is important to note again that the level-2 variance component will be included in the calculation of ICC values. The level-1 variance component is calculated differently than with continuous outcomes and Goldstein, Browne, and Rasbash (2002) discuss four methods for its calculation. Most commonly \( \frac{\pi^2}{3} \) or about 3.29 is substituted for the level-1 variance, since this is the variance of the logistic distribution when the scale is set to 1 with a location of 0. Other methods include simulation and Taylor series expansions. The associated problems with misestimated ICC values are the same as presented with continuous outcomes.

  3. A diffuse inverse gamma prior is common used when a researcher wants to utilize Bayesian methods but wants to limit the impact of the prior distribution on the posterior distribution. This is the default prior distribution for variances for more user-friendly Bayesian software programs such as Mplus.

  4. If model comparison is undertaken and models differ with respect to fixed effects, then FML must be used. Otherwise, the deviance will not be calculated appropriately with REML.

References

References marked by an (*) indicate they were included in the review

  • * Austin, P.C. (2010). Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures. The International Journal of Biostatistics, 6, Article 16.

  • *Baldwin, S.A., & Fellingham, G.W. (2013). Bayesian methods for the analysis of small sample multilevel data with a complex variance structure. Psychological Methods, 18, 151–164.

  • Bell, B., Ene, M., Smiley, W., & Schoeneberger, J. (2013). A multilevel primer using SAS Proc Mixed, SAS Global Forum.

  • Bell, Schoeneberger, Smiley, Ene, and Leighton (2013). Doubly diminishing returns: an empirical investigation on the impact of sample size and predictor prevalence on point and interval estimates in two-level linear models. Paper presented at the Modern Modeling Methods Conference (M3). Storrs.

  • *Bell, B.A., Morgan, G.B., Schoeneberger, J.A., Kromrey, J.D., & Ferron, J.M. (2014). How low can you go? An investigation of the influence of sample size and model complexity on point and interval estimates in two-level linear models. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 10, 1–11.

  • Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152.

    Article  Google Scholar 

  • * Browne, W.J., & Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 1, 473–514.

  • Butar, F. B., & Lahiri, P. (2003). On measures of uncertainty of empirical Bayes small-area estimators. Journal of Statistical Planning and Inference, 112, 63–76.

    Article  Google Scholar 

  • *Clarke, P. (2008). When can group level clustering be ignored? Multilevel models versus single level models with sparse data. Journal of Epidemiology and Community Health, 62, 752–758.

  • *Cohen, J. (1998). Determining sample sizes for surveys with data analyzed by hierarchical linear models. Journal of Official Statistics, 14, 267–275.

  • Dedrick, R. F., Ferron, J. M., Hess, M. R., Hogarty, K. Y., Kromrey, J. D., & Lee, R. (2009). Multilevel modeling: a review of methodological issues and applications. Review of Educational Research, 79, 69–102.

    Article  Google Scholar 

  • *Ferron, J.M., Bell, B.A., Hess, M.R., Rendina-Gobioff, G., & Hibbard, S.T. (2009). Making treatment effect inferences from multiple-baseline data: the utility of multilevel modeling approaches. Behavior Research Methods, 41, 372–384.

  • Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2012). Applied longitudinal analysis. Hoboken: Wiley.

    Google Scholar 

  • Gardiner, J. C., Luo, Z., & Roman, L. A. (2009). Fixed effects, random effects and GEE: what are the differences? Statistics in Medicine, 28, 221–239.

    Article  Google Scholar 

  • Gelman, A. (2002). Prior distribution. Encyclopedia of Environmetrics.

  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis, 1, 515–534.

    Article  Google Scholar 

  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Boca Raton: CRC press.

    Google Scholar 

  • Goldstein, H., Browne, W., & Rasbash, J. (2002). Partitioning variation in multilevel models. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 1, 223–231.

    Article  Google Scholar 

  • González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D., & Santamaría, L. (2007). Estimation of the mean squared error of predictors of small area linear parameters under a logistic mixed model. Computational Statistics and Data Analysis, 51, 2720–2733.

    Article  Google Scholar 

  • Halekoh, U., & Højsgaard, S. (2012). pbkrtest: parametric bootstrap and Kenward Roger based methods for mixed model comparison. URL http://cran.r-project.org/web/packages/pbkrtest/pbkrtest.pdf [accessed on 14 March 2014].

  • Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning group randomized trials in education. Educational Evaluation and Policy Analysis, 29, 60–87.

    Article  Google Scholar 

  • Heo, M., & Leon, A. C. (2008). Statistical power and sample size requirements for three level hierarchical cluster randomized trials. Biometrics, 64, 1256–1262.

    Article  Google Scholar 

  • Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling. An overview and a meta-analysis. Sociological Methods & Research, 26, 329–367.

    Article  Google Scholar 

  • Hox, J. J. (1998). Multilevel modeling: when and why. In I. Balderjahn, R. Mathar, & M. Schader (Eds.), Classification, data analysis, and data highways (pp. 147–154). Berlin: Springer.

    Chapter  Google Scholar 

  • Hox, J. (2010). Multilevel analyses: techniques and applications (2nd ed.). Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Hox, J., van de Schoot, R., & Matthijsse, S. (2012). How few countries will do? Comparative survey analysis from a Bayesian perspective. Survey Research Methods, 6, 87–93.

    Google Scholar 

  • Kenward, M. G., & Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983–997.

    Article  Google Scholar 

  • Kenward, M. G., & Roger, J. H. (2009). An improved approximation to the precision of fixed effects from restricted maximum likelihood. Computational Statistics and Data Analysis, 53, 2583–2595.

    Article  Google Scholar 

  • Kim, Y., Choi, Y. K., & Emery, S. (2013). Logistic regression with multiple random effects: a simulation study of estimation methods and statistical packages. The American Statistician, 67, 171–182.

    Article  Google Scholar 

  • *Konstantopoulos, S. (2010). Power analysis in two-level unbalanced designs. The Journal of Experimental Education, 78, 291–317.

  • Kowalchuk, R. K., Keselman, H. J., Algina, J., & Wolfinger, R. D. (2004). The analysis of repeated measurements with mixed-model adjusted F tests. Educational and Psychological Measurement, 64, 224–242.

    Article  Google Scholar 

  • *Kreft, I. G. G. (1996). Are multilevel techniques necessary? An overview, including simulation studies. Unpublished manuscript, California State University, Los Angeles.

  • *Maas, C., & Hox, J. (2004). Robustness issues in multilevel regression analysis. Statistica Neerlandica.,58,127-137.

  • *Maas, C.J., & Hox, J.J. (2005). Sufficient sample sizes for multilevel modeling. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 1, 86–92.

  • *McNeish, D.M. (2014). Modeling sparsely clustered data: design-based, model based, and single-level methods. Psychological Methods. DOI: 10.1037/met0000024.

  • *Meuleman, B., & Billiet, J. (2009). A Monte Carlo sample size study: how many countries are needed for accurate multilevel SEM? Survey Research Methods, 3, 45–58.

  • *Moineddin, R., Matheson, F.I., & Glazier, R.H. (2007). A simulation study of sample size for multilevel logistic regression models. BMC Medical Research Methodology, 7, 34.

  • *Mok, M. (1995). Sample size requirements for 2-level designs in educational research. Multilevel Modelling Newsletter, 7, 11–15.

  • Molenberghs, G., & Verbeke, G. (2004). Meaningful statistical model formulations for repeated measures. Statistica Sinica, 14, 989–1020.

    Google Scholar 

  • *Paccagnella, O. (2011). Sample size and accuracy of estimates in multilevel models. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 7, 111–120.

  • Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods (2nd ed.). Thousand Oaks: Sage.

    Google Scholar 

  • Satterthwaite, F. E. (1946). An approximate distribution of the estimates of variance components. Biometrics, 2, 110–114.

    Article  Google Scholar 

  • Savalei, V., & Kolenikov, S. (2008). Constrained versus unconstrained estimation in structural equation modeling. Psychological Methods, 13, 150–170.

    Article  Google Scholar 

  • *Scherbaum, C. A., & Ferreter, J. M. (2009). Estimating statistical power and required sample size for organizational research using multilevel modeling. Organizational Research Methods, 12, 347–367.

  • Searle, S. R., Casella, G., & McCulloch, C. E. (2006). Variance components. Hoboken: Wiley.

    Google Scholar 

  • *Snijders, T., & Bosker, R. (1993). Standard errors and sample sizes for two-level research. Journal of Educational Statistics, 18, 237–259.

  • Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: an introduction to basic and advanced multilevel modeling (2nd ed.). London: Sage.

    Google Scholar 

  • Spilke, J., Piepho, H. P., & Hu, X. (2005). A simulation study on tests of hypotheses and confidence intervals for fixed effects in mixed models for blocked experiments with missing data. Journal of Agricultural, Biological, and Environmental Statistics, 10, 374–389.

    Article  Google Scholar 

  • *Stegmueller, D. (2013). How many countries for multilevel modeling? A comparison of frequentist and Bayesian approaches. American Journal of Political Science, 57, 748–761.

  • Stram, D. O., & Lee, J. W. (1994). Variance components testing in the longitudinal mixed effects model. Biometrics, 50, 1171–1177.

    Article  Google Scholar 

  • Van der Leeden, R., Busing, F., & Meijer, E. (1997, April). Applications of bootstrap methods for two-level models. Paper presented at the Multilevel Conference. Amsterdam.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel M. McNeish.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McNeish, D.M., Stapleton, L.M. The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration. Educ Psychol Rev 28, 295–314 (2016). https://doi.org/10.1007/s10648-014-9287-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10648-014-9287-x

Keywords

Navigation