Skip to main content


Swipe om te navigeren naar een ander artikel

01-02-2017 | Original Article | Uitgave 1/2017 Open Access

Perspectives on Medical Education 1/2017

Ensuring the quality of multiple-choice exams administered to small cohorts: A cautionary tale

Perspectives on Medical Education > Uitgave 1/2017
Meredith Young, Beth-Ann Cummings, Christina St-Onge
Belangrijke opmerkingen
Editor’s Note: Commentary by S. Wright, DOI 10.​1007/​s40037-016-0323-z
The work reported in this article should be attributed to: Centre for Medical Education, McGill University and Chaire de recherche en pégagogie médicale Paul Grand’Maison de la Société des médecins de l’Université de Sherbrooke.



Multiple-choice questions (MCQs) are a cornerstone of assessment in medical education. Monitoring item properties (difficulty and discrimination) are important means of investigating examination quality. However, most item property guidelines were developed for use on large cohorts of examinees; little empirical work has investigated the suitability of applying guidelines to item difficulty and discrimination coefficients estimated for small cohorts, such as those in medical education. We investigated the extent to which item properties vary across multiple clerkship cohorts to better understand the appropriateness of using such guidelines with small cohorts.


Exam results for 32 items from an MCQ exam were used. Item discrimination and difficulty coefficients were calculated for 22 cohorts (n = 10–15 students). Discrimination coefficients were categorized according to Ebel and Frisbie (1991). Difficulty coefficients were categorized according to three guidelines by Laveault and Grégoire (2014). Descriptive analyses examined variance in item properties across cohorts.


A large amount of variance in item properties was found across cohorts. Discrimination coefficients for items varied greatly across cohorts, with 29/32 (91%) of items occurring in both Ebel and Frisbie’s ‘poor’ and ‘excellent’ categories and 19/32 (59%) of items occurring in all five categories. For item difficulty coefficients, the application of different guidelines resulted in large variations in examination length (number of items removed ranged from 0 to 22).


While the psychometric properties of items can provide information on item and exam quality, they vary greatly in small cohorts. The application of guidelines with small exam cohorts should be approached with caution.
Over dit artikel

Andere artikelen Uitgave 1/2017

Perspectives on Medical Education 1/2017 Naar de uitgave