Swipe om te navigeren naar een ander artikel
Editor’s Note: Commentary by S. Wright, DOI 10.1007/s40037-016-0323-z
The work reported in this article should be attributed to: Centre for Medical Education, McGill University and Chaire de recherche en pégagogie médicale Paul Grand’Maison de la Société des médecins de l’Université de Sherbrooke.
Multiple-choice questions (MCQs) are a cornerstone of assessment in medical education. Monitoring item properties (difficulty and discrimination) are important means of investigating examination quality. However, most item property guidelines were developed for use on large cohorts of examinees; little empirical work has investigated the suitability of applying guidelines to item difficulty and discrimination coefficients estimated for small cohorts, such as those in medical education. We investigated the extent to which item properties vary across multiple clerkship cohorts to better understand the appropriateness of using such guidelines with small cohorts.
Exam results for 32 items from an MCQ exam were used. Item discrimination and difficulty coefficients were calculated for 22 cohorts (n = 10–15 students). Discrimination coefficients were categorized according to Ebel and Frisbie (1991). Difficulty coefficients were categorized according to three guidelines by Laveault and Grégoire (2014). Descriptive analyses examined variance in item properties across cohorts.
A large amount of variance in item properties was found across cohorts. Discrimination coefficients for items varied greatly across cohorts, with 29/32 (91%) of items occurring in both Ebel and Frisbie’s ‘poor’ and ‘excellent’ categories and 19/32 (59%) of items occurring in all five categories. For item difficulty coefficients, the application of different guidelines resulted in large variations in examination length (number of items removed ranged from 0 to 22).
While the psychometric properties of items can provide information on item and exam quality, they vary greatly in small cohorts. The application of guidelines with small exam cohorts should be approached with caution.
Roediger HL, Karpicke JD. Test-enhanced learning: taking memory tests improves long-term retention. Psychol Sci. 2006;17:249–55. CrossRef
Roediger HL, Karpicke JD. The power of testing memory: basic research and implications for educational practice. Perspect Psychol Sci. 2006;1:181–210. CrossRef
Larsen DP, Butler AC, Roediger HL III. Test-enhanced learning in medical education. Med Educ. 2008;42:959–66. CrossRef
Larsen DP, Butler AC, Roediger HL III. Repeated testing improves long-term retention relative to repeated study: a randomised controlled trial. Med Educ. 2009;43:1174–81. CrossRef
Tamblyn R, Abrahamowicz M, Brailovsky C, Grand’Maison P, Lescop J, Norcini J, et al. Association between licensing examination scores and resource use and quality of care in primary care practice. J Am Med Assoc. 1998;280:989–96. CrossRef
Tamblyn R, Abrahamowicz M, Dauphinee WD, et al. Association between lincensure examination scores and practice in primary care. J Am Med Assoc. 2002;288:3019–26. CrossRef
Tamblyn R, Abrahamowicz M, Dauphinee D, et al. Physician scores on a national clinical skills examination as predictors of complaints to medical regulatory authorities. JAMA. 2007;298:993–1001. CrossRef
Wallach PM, Crespo LM, Holtzman KZ, Galbraith RM, Swanson DB. Use of a committee review process to improve the quality of course examinations. Adv Health Sci Educ Theory Pract. 2006;11(1):61–8. CrossRef
Haladyna TM, Downing SM, Rodriguez MC. A review of multiple-choice item-writing guidelines for classroom assessment. Appl Meas Educ. 2002;15:309–34. CrossRef
Epstein RM. Assessment in medical education. N Engl J Med. 2007;356:387–96. CrossRef
Wass V, Jones R, Van der Vleuten C. Standardized or real patients to test clinical competence? The long case revisited. Med Educ. 2001;35:321–5. CrossRef
Crocker L, Algina J. Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston; 1986.
Ebel RL, Frisbie DA. Essentials of educational measurement. Englewood Cliffs: Prentice-Hall; 1991.
Nunnally J, Bernstein I. Psychometric theory, 3rd ed. New York: McGraw-Hill; 1994.
Laveault D, Grégoire J. Introduction aux théories des tests en psychologie et en sciences de l’éducation. Bruxelles: De Boeck; 2014.
Hogan TP, Stephenson R, Parent N. Introduction à la psychométrie. Montréal: Chenelière-Éducation; 2012.
Schmeiser CB, Welch CJ. Test development. Educ Meas. 2006;4:307–53.
Nevo B. Item analysis with small samples. Appl Psychol Meas. 1980;4:323–9. CrossRef
Kromrey JD, Bacon TP. Item analysis of achievement tests based on small numbers of examinees. Paper presented at the Annual Meeting of the American Education Research Association, San Francisco. 1992.
Millman J, Green J. The specification and development of tests of achievement and ability. In: Linn RL, editor. Educational measurement, 3rd edn. New York: ACE/MacMillan; 1989. pp. 335–66.
Nunnally JC, Bernstein IH, Berge J. Psychometric theory. vol 226. New York: McGraw-Hill; 1967.
Health Professional Assessment Consultancy. Foundations of assessment – Programme 2016. http://facourse.webs.com/programme. Accessed 15 Jan 2016.
Barbara B, Davis G. Quizzes, tests, and exams 1993. https://www.elon.edu/docs/e-web/academics/teaching/Tools%20For%20Teaching.pdf. Accessed 19 Dec 2016.
Jones P, Smith RW, Talley D. Developing test forms for small-scale achievement testing systems. In: Downing SM, Haladyna TM, editors. Handbook of test development. New York: Routledge; 2006. pp. 487–525.
Laveault D, Grégoire J. Introduction aux théories des tests en sciences humaines. Bruxelles: De Boeck Université; 1997.
- Ensuring the quality of multiple-choice exams administered to small cohorts: A cautionary tale
- Bohn Stafleu van Loghum