Skip to main content
Log in

The Impact of APA and AERA Guidelines on Effect Size Reporting

  • Research into Practice
  • Published:
Educational Psychology Review Aims and scope Submit manuscript

Abstract

Given the long history of effect size (ES) indices (Olejnik and Algina Contemporary Educational Psychology, 25, 241–286 2000) and various attempts by APA and AERA to encourage the reporting and interpretation of ES to supplement findings from inferential statistical analyses, it is essential to document the impact of APA and AERA standards on ES reporting practices. In this paper, we investigated the impact by examining findings from 31 published reviews and our own review of 451 articles published in 2009 and 2010. The 32 reviews were divided into two periods: before and after 1999. A total of 116 journals were reviewed. Findings from these 32 reviews revealed that since 1999, the ES reporting has improved in terms of its rate, variety, interpretation, confidence intervals, and fullness. Yet several inadequate practices still persisted: (1) the dominance of Cohen’s d, and the unadjusted R 22, (2) the mere labeling of ES, (3) the under-reporting of confidence intervals, and (4) a lack of integration between ES and statistical tests. The paper concludes with resources on Internet and recommendations for improving ES reporting practices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Algina, J., & Keselman, H. J. (2003). Approximate confidence intervals for effect sizes. Educational and Psychological Measurement, 68, 233–244. doi:10.1177/0013164403256358.

    Article  Google Scholar 

  • Algina, J., Keselman, H. J., & Penfield, R. D. (2005). An alternative to Cohen's standardized mean difference effect size: a robust parameter and confidence interval in the two independent groups case. Psychological Methods, 10, 17–328. doi:10.1037/1082-989X.10.3.317.

    Article  Google Scholar 

  • Algina, J., Keselman, H. J., & Penfield, R. D. (2006). Confidence intervals for an effect size when variances are not equal. Journal of Modern Applied Statistical Methods, 5, 2–13. Retrieved from http://www.jmasm.com.

  • Alhija, F. N.-A., & Levy, A. (2009). Effect size reporting practices in published articles. Educational and Psychological Measurement, 69, 245–265. doi:10.1177/0013164408315266.

    Article  Google Scholar 

  • American Educational Research Association. (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35(6), 33–40. doi:10.3102/0013189X035006033.

    Article  Google Scholar 

  • American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: American Psychological Association.

  • American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association.

  • Andersen, M. B., McCullagh, P., & Wilson, G. J. (2007). But what do the numbers really tell us?: arbitrary metrics and effect size reporting in sport psychology research. Journal of Sport & Exercise Psychology, 29, 664–672. Retrieved from http://journals.humankinetics.com/jsep.

    Google Scholar 

  • APA Publications and Communications Board Working Group on Journal Article Reporting Standards. (2008). Reporting standards for research in psychology: why do we need them? What might they be? American Psychologist, 63, 839–851. doi:10.1037/0003-066X.63.9.839.

    Google Scholar 

  • Armstrong, S. A., & Henson, R. K. (2004). Statistical and practical significance in the IJPT: a research review from 1993–2003. International Journal of Play Therapy, 13(2), 9–30. doi:10.1037/h0088888.

    Article  Google Scholar 

  • Bonett, D. G. (2008). Confidence intervals for standardized linear contrasts of means. Psychological Methods, 13, 99–109. doi:10.1037/1082-989X.13.2.99.

    Article  Google Scholar 

  • Byrd, J. K. (2007). A call for statistical reform in EAQ. Educational Administration Quarterly, 43, 381–391. doi:10.1177/0013161X06297137.

    Article  Google Scholar 

  • Camp, C. J., & Maxwell, S. E. (1983). A comparison of various strength of association measures commonly used in gerontological research. Journal of Gerontology, 38, 3–7.

    Article  Google Scholar 

  • Carroll, R. M., & Nordholm, L. A. (1975). Sampling characteristics of Kelly's ε2 and Hay's w 2. Educational and Psychological Measurement, 35, 541–554. doi:10.1177/001316447503500304.

    Article  Google Scholar 

  • Cliff, N. (1993). Dominance statistics: ordinal analyses to answer ordinal questions. Psychological Bulletin, 114, 494–509. doi:10.1037/0033-2909.114.3.494.

    Article  Google Scholar 

  • Cliff, N. (1996). Answering ordinal questions with ordinal data using ordinal statistics. Multivariate Behavioral Research, 31, 331–350. doi:10.1207/s15327906mbr3103_4.

    Article  Google Scholar 

  • Cochran-Smith, M., & Zeichner, K. M. (Eds.). (2005). Studying teacher education: the report of the AERA Panel on Research and Teacher Education. Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Cohen, J. (1965). Some statistical issues in psychological research. In B. B. Wolman (Ed.), Handbook of clinical psychology (pp. 95–121). New York: McGraw-Hill.

  • Cohen, J. (1969). Statistical power analysis in the behavioral sciences. New York: Academic Press.

  • Cohen, J. (1988). Statistical power analysis in the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Crosnoe, R., & Cooper, C. E. (2010). Economically disadvantaged children's transitions into elementary school: linking family processes, school contexts, and educational policy. American Educational Research Journal, 47(2), 258–291. doi:10.3102/0002831209351564.

    Article  Google Scholar 

  • Delaney, H. D., & Vargha, A. (2002). Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples. Psychological Methods, 7(4), 485–503. doi:10.1037///1082-989X.7.4.485.

    Article  Google Scholar 

  • Dunlap, W. P. (1999). A program to compute McGraw and Wong's common language effect size indicator. Behavior Research Methods, Instruments, & Computers, 31, 706–709. doi:10.3758/BF03200750.

    Article  Google Scholar 

  • Dunleavy, E. M., Barr, C. D., Glenn, D. M., & Miller, K. R. (2006). Effect size reporting in applied psychology: how are we doing? The Industrial-Organizational Psychologist, 43(4), 29–37. Retrieved from http://www.openj-gate.com/browse/Archive.aspx?year=2009&Journal_id=102632.

    Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.

    Google Scholar 

  • Fidler, F., Cumming, G., Thomason, N., Pannuzzo, D., Smith, J., Fyffe, P., Schmitt, R. (2005). Evaluating the effectiveness of editorial policy to improve statistical practice: the case of the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 73, 136–143. doi:10.1037/0022-006X.73.1.136.

    Google Scholar 

  • Fox, C. L., & Boulton, M. J. (2003). Evaluating the effectiveness of a social skills training (SST) program for victims of bullying. Educational Research, 64, 231–247. doi:10.1080/0013188032000137238.

    Article  Google Scholar 

  • Friedman, H. (1968). Magnitude of experimental effect and a table for its rapid estimation. Psychological Bulletin, 70, 245-251. doi:10.1037/h0026258.

  • Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141, 2–18. doi:10.1037/a0024338.

  • Garrison, A. M., & Kahn, J. H. (2010). Intraindividual relations between the intensity and disclosure of daily emotional events: the moderating role of depressive symptoms. Journal of Counseling Psychology, 57(2), 187–197. doi:10.1037/a0018386.

    Article  Google Scholar 

  • Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. doi:10.3102/0013189X005010003.

    Article  Google Scholar 

  • Grissom, R. J., & Kim, J. J. (2001). Review of assumptions and problems in the appropriate conceptualization of effect size. Psychological Methods, 6, 135–146. doi:10.1037/1082-989x.6.2.135.

    Article  Google Scholar 

  • Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: univariate and multivariate applications (2nd ed.). New York: Routledge.

    Google Scholar 

  • Harrison, J., Thompson, B., & Vannest, K. J. (2009). Interpreting the evidence for effective interventions to increase the academic performance of students with ADHD: relevance of the statistical significance controversy. Review of Educational Research, 79, 740–775. doi:10.3102/0034654309331516.

    Article  Google Scholar 

  • Hays, W. L. (1963). Statistics for psychologists. New York: Holt, Rinehart & Winston.

    Google Scholar 

  • Hedges, L. V. (1981). Distributional theory for Glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107–128. doi:10.2307/1164588.

    Article  Google Scholar 

  • Hedges, L. V. (1982). Estimation of effect size from a series of independent experiments. Psychological Bulletin, 92, 490–499. doi:10.1037/0033-2909.92.2.490.

    Article  Google Scholar 

  • Hedges, L. V., & Olkin, I. (1984). Nonparametric estimators of effect size in meta-analysis. Psychological Bulletin, 96, 573–580. doi:10.1037/0033-2909.96.3.573.

    Article  Google Scholar 

  • Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.

  • Hess, M. R., & Kromrey, J. D. (2004). Robust confidence intervals for effect sizes: A comparative study of Cohen's d and Cliff's delta under non-normality and heterogeneous variances. Paper presented at the American Educational Research Association, San Diego.

  • Hogarty, K. Y., & Kromrey, J. D. (April, 2001). We've been reporting some effect sizes: Can you guess what they mean? Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA.

  • Hsieh, P., Acee, T., Chung, W.-H., Hsieh, Y.-P., Kim, H., Thomas, G. D., Robinson, D. H. (2005). Is educational intervention research on the decline? Journal of Educational Psychology, 97, 523–529. doi:10.1037/0022-0663.97.4.523.

    Google Scholar 

  • Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: correcting error and bias in research findings. Thousand Oaks, CA: SAGE Publications.

    Google Scholar 

  • Jitendra, A. K., Griffin, C. C., Haria, P., Leh, J., Adams, A., & Kaduvettoor, A. (2007). A comparison of single and multiple strategy instruction on third-grade students’ mathematical problem solving. Journal of Educational Psychology, 99, 115–127. doi:10.1037/0022-0663.99.1.115.

    Article  Google Scholar 

  • Kelly, K. (2005). The effects of non nomral distributions on confidence intervals around the standardized mean difference: bootstrap and parametric confidence intervals. Educational and Psychological Measurement, 51–69. doi:10.1177/0013164404264850.

  • Keppel, G. (1973). Design and analysis: a researcher's handbook. Englewood Cliffs, NJ: Prentice-Hall.

  • Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. N. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13, 110–129. doi:10.1037/1082-989x.13.2.110.

    Article  Google Scholar 

  • Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., Levin, J. R. (1998). Statistical practices of educational researchers: an analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350–386. doi:10.3102/00346543068003350.

  • Kieffer, K. M., Reese, R. J., & Thompson, B. (2001). Statistical techniques employed in AERJ and JCP articles from 1998 to 1997: a methodological review. The Journal of Experimental Education, 69, 280–309. doi:10.1080/00220970109599489.

    Article  Google Scholar 

  • Kirk, R. E. (1996). Practical significance: a concept whose time has come. Educational and Psychological Measurement, 56, 746–759. doi:10.1177/0013164496056005002.

  • Kraemer, H. C., & Andrews, G. (1982). A nonparametric technique for meta-analysis effect size calculation. Psychological Bulletin, 91, 404–412.

    Article  Google Scholar 

  • Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. Biological Psychiatry, 59(11), 990–996. doi:10.1016/j.biopsych.2005.09.014.

    Article  Google Scholar 

  • Kromrey, J. D., & Coughlin, K. B. (2007, November). ROBUST_ES: a SAS macro for computing robust estimates of effect size. Paper presented at the annual meeting of the SouthEast SAS Users Group, Hilton Head, SC. Retrieved from http://analytics.ncsu.edu/sesug/2007/PO19.pdf.

  • Lipsey, M. W., & Wilson, D. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., Roberts, M., Anthony, K. S., & Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. (NCSER 2013–3000). Washington, DC: National Center for Special Education Research, Institute of Education Sciences, US Department of Education.

  • MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130–149. doi:10.1037/1082-989X.1.2.130.

    Article  Google Scholar 

  • Matthews, M. S., Gentry, M., McCoach, D. B., Worrell, F. C., Matthews, D., & Dixon, F. (2008). Evaluating the state of a field: effect size reporting in gifted education. The Journal of Experimental Education, 77(1), 55–65. doi:10.3200/JEXE.77.1.55-68.

    Google Scholar 

  • Maxwell, S. E., Camp, C. J., & Arvey, R. D. (1981). Measures of strength of association: a comparative examination. Journal of Applied Psychology, 66, 525–534. doi:10.1037/0021-9010.66.5.525.

    Article  Google Scholar 

  • McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: the case of r and d. Psychological Methods, 11, 386–401. doi:10.1037/1082-989X.11.4.386.

    Article  Google Scholar 

  • McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111, 361-365. doi:10.1037/0033-2909.111.2.361.

  • Meline, T., & Schmitt, J. F. (1997). Case studies for evaluating significance in group designs. American Journal of Speech-Language Pathology, 6(1), 33–41. Retrieved from http://ajslp.asha.org/.

    Google Scholar 

  • Meline, T., & Wang, B. (2004). Effect reporting practices in AJSLP and other ASHA journals, 1999–2003. American Journal of Speech-Language Pathology, 13, 202–207. Retrieved from http://ajslp.asha.org/.

  • Mohr, J. J., Weiner, J. L., Chopp, R. M., & Wong, S. J. (2009). Effects of clients bisexuality on clinical judgment: when is bias most likely to occur? Journal of Counseling Psychology, 56, 164–175. doi:10.1037/a0012816.

    Article  Google Scholar 

  • Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society of London. Series A, 236, 333–380. Retrieved from http://rstl.royalsocietypublishing.org/.

  • Odgaard, E. C., & Fowler, R. L. (2010). Confidence intervals for effect sizes: compliance and clinical significance in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 78, 287–297. doi:10.1037/a0019294.

    Article  Google Scholar 

  • Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: applications, interpretations, and limitations. Contemporary Educational Psychology, 25, 241–286. doi:10.1006/ceps.2000.1040.

    Article  Google Scholar 

  • Osborne, J. W. (2008). Sweating the small stuff in educational psychology: how effect size and power reporting failed to change from 1969 to 1999, and what that means for the future of changing practices. Educational Psychology, 28, 151–160. doi:10.1080/01443410701491718.

    Article  Google Scholar 

  • Paul, K. M., & Plucker, J. A. (2004). Two steps forward, one step back: effect size reporting in gifted education research from 1995–2000. Roeper Review, 26(2), 68–72.

    Article  Google Scholar 

  • Pearson, K. (1905). Mathematical contributions to the theory of evolution: XIV. On the general theory of skew correlations and nonlinear regression (Draper’s Company Research Memoirs, Biometric Series II). London: Dulau

  • Peng, C.-Y. J., & Chen, L.-T. (2013). Beyond Cohen's d: alternative effect size measures for between subject designs. The Journal of Experimental Education (in press).

  • Peng, C.-Y., Chen, L.-T., Chiang, H.-M., & Chiang, Y.-C. (2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review. doi:10.1007/s10648-013-9218-2.

  • Plucker, J. A. (1997). Debunking the myth of the "highly significant" result: effect sizes in gifted education research. Roeper Review, 20, 122–126. doi:10.1080/02783199709553873.

    Article  Google Scholar 

  • Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis. New York: Russell Sage Foundation.

  • Ruscio, J. (2008). A probability-based measure of effect size: robustness to base rates and other factors. Psychological Methods, 13, 19–30. doi:10.1037/1082-989X.13.1.19.

    Article  Google Scholar 

  • Schatz, P., Jay, K. A., McComb, J., & McLaughlin, J. R. (2005). Misuse of statistical tests in archives of clinical neuropsychology publications. Archives of Clinical Neuropsychology, 20, 1053–1059. doi:10.1016/j.acn.2005.06.006.

    Article  Google Scholar 

  • Snyder, P., Thompson, B., McLean, M. E., & Smith, B. J. (2002). Examination of quantitative methods used in early intervention research: linkages with recommended practices. Journal of Early Intervention, 25, 137–150. doi:10.1177/105381510202500211.

    Article  Google Scholar 

  • Smith, M. L., & Honoré, H. H. (2008). Effect size reporting in current health education literature. American Journal of Health Studies, 23, 130–135. http://www.va-ajhs.com/.

    Google Scholar 

  • Snyder, P. A., & Thompson, B. (1998). Use of tests of statistical significance and other analytic choices in a school psychology journal: review of practices and suggested alternatives. School Psychology Quarterly, 13, 335–348. doi:10.1037/h0088990.

    Article  Google Scholar 

  • Staudte, R. G., & Sheather, S. J. (1990). Robust estimation and testing. New York: Wiley.

  • Steiger, J. H. (2004). Beyond the F test: effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164–182. doi:10.1037/1082-989X.9.2.164.

    Article  Google Scholar 

  • Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L. Harlow, S. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 221–257). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Sun, S. Y., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102, 989–1004. doi:10.1037/a0019507.

    Article  Google Scholar 

  • Thompson, B. (1999). Improving research clarity and usefulness with effect size indices as supplements to statistical significance tests. Exceptional Children, 65, 329–337. http://journals.cec.sped.org/ec/.

  • Thompson, B. (2002). What future quantitative social science research could look like: confidence intervals for effect sizes. Educational Researcher, 31(3), 25–32. doi:10.3102/0013189X031003025.

    Article  Google Scholar 

  • Thompson, B. (2006). Foundations of behavioral statistics: an insight-based approach. New York: Guilford.

  • Thompson, B., & Snyder, P. A. (1997). Statistical significance testing practices. The Journal of Experimental Education, 66, 75–83. doi:10.1080/00220979709601396.

    Article  Google Scholar 

  • Thompson, B., & Snyder, P. A. (1998). Statistical significance and reliability analyses in recent Journal of Counseling & Development research articles. Journal of Counseling and Development, 76, 436–441.

    Article  Google Scholar 

  • Trusty, J., Thompson, B., & Petrocelli, J. V. (2004). Practical guide for reporting effect size in quantitative research in the Journal of Counseling & Development. Journal of Counseling and Development, 82, 107–110.

    Article  Google Scholar 

  • Vacha-Haase, T., & Ness, C. (1999). Statistical significance testing as it relates to practice: use within Professional Psychology. Professional Psychology: Research and Practice, 30, 104–105.

    Article  Google Scholar 

  • Vacha-Haase, T., & Nilsson, J. E. (1998). Statistical significance reporting: current trends and usages in MECD. Measurement and Evaluation in Counseling and Development, 31, 46–57. Retrieved from http://mec.sagepub.com.

  • Vacha-Haase, T., Nilsson, J. E., Reetz, D. R., Lance, T. S., & Thompson, B. (2000). Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory and Psychology, 10, 413–425. doi:10.1177/0959354300103006.

    Article  Google Scholar 

  • Vansteenkiste, M., Sierens, E., Soenens, B., Luyckx, K., & Lens, W. (2009). Motivational profiles from a self-determination perspective: the quality of motivation matters. Journal of Educational Psychology, 101, 671–688. doi:10.1037/a0015083.

    Article  Google Scholar 

  • Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25, 101–132. doi:10.2307/1165329.

    Google Scholar 

  • Wilcox, R. R. (2005). Introduction to robust estimation and hypothesis testing (2nd ed.). San Diego, CA: Elsevier Academic Press.

    Google Scholar 

  • Wilkinson, L., & The Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604. doi:10.1037/0003-066X.54.8.594.

    Google Scholar 

  • Yin, P., & Fan, X. (2001). Estimating R 2 shrinkage in multiple regression: a comparison of different analytical methods. The Journal of Experimental Education, 69, 203–224. doi:10.1080/00220970109600656.

  • Zientek, L. R., Capraro, M. M., & Capraro, R. M. (2008). Reporting practices in quantitative teacher education research: one look at the evidence cited in the AERA Panel Report. Educational Researcher, 37, 208–216. doi:10.3102/0013189x08319762.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao-Ying Joanne Peng.

Additional information

Author Note

This research was supported in part by two Maris M. Proffitt and Mary Higgins Proffitt Endowment Grants of Indiana University, awarded to H.-M. Chiang and C.-Y. J. Peng, and C.-Y. J. Peng, respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, CY.J., Chen, LT., Chiang, HM. et al. The Impact of APA and AERA Guidelines on Effect Size Reporting. Educ Psychol Rev 25, 157–209 (2013). https://doi.org/10.1007/s10648-013-9218-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10648-013-9218-2

Keywords

Navigation