Original reportsNonspecialist Raters Can Provide Reliable Assessments of Procedural Skills
Introduction
Medical education is changing rapidly, and the way doctors train procedural skills is shifting from traditional apprenticeship and time-based learning to competency-based attainments of skills.1, 2 Competency-based learning is being favored as it implies that trainees will pass when they are competent and not after a certain prescribed time or when a certain number of procedures have been performed, which does not necessarily reflect competence.3, 4, 5, 6 Competency-based learning require specialist assessment of procedural skills. Despite the advantages of competency-based learning, there are still challenges that need to be addressed, e.g., the limited availability of faculty time.7, 8, 9, 10 Some studies also suggest that knowing the identity of the trainee can influence assessment.11, 12 Technology holds some promise, as video recordings of performances create more flexibility and reduce the risk of bias.13, 14, 15
Studies show that rater training is beneficial and even suggest that 1-hour frame-of-reference training sessions are able to sufficiently train raters to use a simple evaluation instrument for the assessment of procedural skills.16, 17 Studies have shown that medical students can be used in teaching settings instead of professors,18, 19, 20 and this could be translated to competency-based assessment where the use of nonspecialist raters could be implemented to further reduce the time spent by specialists on assessment. The common perception is currently that specialists need to assess procedural skills, but previous studies have shown that even nonmedically trained individuals can be used to assess surgical skills.21, 22
Using nonspecialist raters would not only decrease the workload of specialists and minimize interpersonal bias but also provide a more beneficial economical solution in areas where competency-based assessment is needed. The use of nonspecialist raters needs to be proven reliable and valid before it can be implemented as part of competency-based learning assessments.
The aim of this study was to explore the validity of assessments of video recorded procedures performed by nonspecialist raters.
Section snippets
Design
This study was a blinded observational trial. Novices (senior medical students) and experienced doctors were video recorded while performing 2 flexible cystoscopies each. The recordings were anonymized and placed in random order and then rated by 2 experienced cystoscopists (specialist raters) and 2 medical students (nonspecialist raters). Flexible cystoscopy was chosen as it is a simple procedural skill that is crucial to master in a resident urology program.23
Participants
The novices participating in this
Results
Twenty-three novices and 9 experienced doctors participated in this study giving a total of 64 video recordings all rated by a pair of specialist raters and a pair of nonspecialist raters (256 ratings in total). The internal consistency of assessments was high, Cronbach’s α = 0.93 and 0.95 for nonspecialist and specialist raters, respectively (p < 0.001 for both correlations). The interrater reliability was significant (p < 0.001) with a Pearson’s correlation of 0.77 for the nonspecialist and
Discussion
Our study showed that nonspecialist raters can provide reliable and valid assessments of video recorded cystoscopies when comparing the level of internal consistency, interrater reliability, and test-retest reliability to the level of specialist raters. The strength of the reliability coefficient always depends on the purpose of the assessment, and the consequences of the assessment.25 Overall, the interrater reliability was reasonable approaching the 0.80 level needed for high-stakes
Conclusion
Our study suggests that nonspecialist raters can provide reliable assessments of video recorded cystoscopies.
References (46)
- et al.
Is a resident′s score on a videotaped objective structured assessment of technical skills affected by revealing the resident′s identity?
Am J Obstet Gynecol
(2003) - et al.
Computer-assisted video evaluation of surgical skills
Obstet Gynecol
(1995) - et al.
Duration of faculty training needed to ensure reliable or performance ratings
J Surg Educ
(2013) - et al.
Crowd-sourced assessment of technical skills: a novel method to evaluate surgical performance
J Surg Res
(2014) - et al.
The simulation centre at Rigshospitalet, Copenhagen, Denmark
J Surg Educ
(2015) - et al.
Measuring to improve: peer and crowd-sourced assessments of technical skill with robot-assisted radical prostatectomy
Eur Urol
(2016) - et al.
Teaching surgical skills—changes in the wind
N Engl J Med
(2006) - et al.
Shifting paradigms: from flexner to competencies
Acad Med
(2002) - et al.
Three-year experience with an innovative, modular competency-based curriculum for orthopaedic training
J Bone Joint Surg Am
(2013) - et al.
Time- versus competency-based residency training
Plast Reconstr Surg
(2016)
Building and assessing competence: the potential for evidence-based graduate medical education
Qual Manag Health Care
The assessment of clinical skills is imperative in postgraduate specialty training
Ugeskr Laeger
Implementation of competency-based medical education: are we addressing the concerns and challenges?
Med Educ
Reviewing residents′ competence: a qualitative study of the role of clinical competency committees in performance assessment
Acad Med
Reconceptualizing variable rater assessments as both an educational and clinical care problem
Acad Med
The role of assessment in competency-based medical education
Med Teach
Reliable and valid assessment of competence in endoscopic ultrasonography and fine-needle aspiration for mediastinal staging of non-small cell lung cancer
Endoscopy
An integrable, web-based solution for easy assessment of video-recorded performances
Adv Med Educ Pract
Toward reliable operative assessment: the reliability and feasibility of videotaped assessment of laparoscopic technical skills
Surg Endosc
How faculty members experience workplace-based assessment rater training: a qualitative study
Med Educ
Student teachers can be as good as associate professors in teaching clinical skills
Med Teach
A new first-year course designed and taught by a senior medical student
Acad Med
Student-led tutorials in problem-based learning: educational outcomes and students′ perceptions
Med Teach
Cited by (14)
ChatGPT: The brightest student in the class
2023, Thinking Skills and CreativityCrowdsourced assessment of surgical skills: A systematic review
2022, American Journal of SurgeryCitation Excerpt :All but four studies used the crowdsourcing platforms C-SATS and/or AMT to obtain video reviews from CW. The remaining studies used Vimeo (New York, NY),42 Facebook users,40 local university websites,29 and recruitment at work.23 Most studies used both experts and CW to assess the videos, but a few studies did not.
Neurosurgical Operative Videos: An Analysis of an Increasingly Popular Educational Resource
2020, World NeurosurgeryA video anchored rating scale leads to high inter-rater reliability of inexperienced and expert raters in the absence of rater training
2020, American Journal of SurgeryCitation Excerpt :The literature supports our approach of using non-medically trained individuals to assess surgical skills.18,27 Mahmood et al. showed that trained non-specialist raters can provide reliable and valid assessments of video recorded cystoscopies compared to specialist raters as evidenced by similar internal consistency, inter-rater reliability, and test-retest reliability.30 Yeung et al. showcased that raters with varying levels of expertise can reliably grade performance of an intra-corporeal suturing task.
Assessment of competence in video-assisted thoracoscopic surgery lobectomy: A Danish nationwide study
2018, Journal of Thoracic and Cardiovascular SurgeryCitation Excerpt :Using nonexperts or novice raters may be considered, because the availability is easier and the costs are less. This approach should be used with some caution, but recent work has shown good inter-rater reliability between expert and nonexpert raters.27,28 The logarithmic relation between the experience level of the thoracic surgeons and the mean VATS score shows good consistency.
Direct Observation vs. Video-Based Assessment in Flexible Cystoscopy
2018, Journal of Surgical EducationCitation Excerpt :With such reductions in time consumption, it would be feasible to let 2 different raters assess the same performance, as they would be able to see twice as many performances compared to direct observations and schedule their time for rating and feedback in more calm settings. It has earlier been described how residents or even medical students and chief physicians generate similar scores in video-rating.34,35 It could therefore be a possibility to let one of the raters be a resident and the other a chief physician.