Short communication
Examiner characteristics and interrater reliability in a communication OSCE

https://doi.org/10.1016/j.pec.2017.01.013Get rights and content

Highlights

  • Inter-individual examiner factors might influence interrater reliability.

  • Interrater-reliabilty is higher in dyads of highly experienced examiners.

  • Participation in recent examiner training had no influence on interrater reliability.

  • High interrater reliability in dyads of same sex refer to gender-specific concepts.

  • High accordance of clinically active examiners hints at context specificity.

Abstract

Objective

To identify inter-individual examiner factors associated with interrater reliability in a summative communication OSCE in the 4th study year.

Methods

The OSCE consists of 4 stations assessed with a 4-item 5-point global rating instrument. A bivariate secondary analysis of interrater reliability in relation to 4 examiner factors (gender, profession, OSCE experience, examiner training) was conducted. Intraclass correlation coefficients (ICC) were calculated and compared between examiner dyads of different similarity.

Results

169 pairwise ratings from 19 different examiners in 16 dyads were analysed. Interrater reliability is significantly higher in examiner dyads of same vs. different gender (ICC = 0.76 (95%CI = 0.65-0.83) vs. ICC = 0.41 (95%CI = 0.21-0.57)), in dyads of two clinicians vs. non-clinical/mixed professions (ICC = 0.72 (95%CI = 0.56-0.83) vs. ICC = 0.57 (95%CI = 0.41-0.69)), and in dyads with high vs. low/mixed OSCE experience (ICC = 0.73 (95%CI 0.50-0.87) vs. ICC = 0.56 (95%CI = 0.41-0.69)). Participation in recent examiner training had no influence on ICCs.

Conclusion

Better concordance of ratings between clinically active examiners might be a hint for context specificity of good communication. Higher interrater reliability between examiners with same gender may indicate gender-specific communication concepts.

Practice implications

Medical faculties introducing summative assessment of communication competence should focus the influence of examiner characteristics on interrater reliability.

Introduction

The “objective structured clinical examination” (OSCE) is a well-established method of assessing a student's clinical skills, including communicative competence [1], [2], [3], [4]. The reliability of an OSCE is influenced by various factors [5], [6], [7]. However, some authors suggest that even in a well-designed and valid OSCE, examiner factors remain the most important contributors to overall examination error [8], [9].

The specific examiner factors that may affect the reliability of an OSCE have not been well studied [9], [10], except for a few studies on the variability of individual examiner function over time influenced by fatigue [11], [12] or leniency at the start of the OSCE [13].

Wilkinson [8] found that the involvement of examiners in station construction made a positive contribution to interrater reliability (IRR). Many authors have addressed the issue of examiner training, which has been indicated as essential, especially for the use of global ratings [14], [15], [16]. However, differences between examiners cannot often be sufficiently eliminated by training programs; therefore, the selection of appropriate examiners should be emphasized [17], [18].

The analysis of our CoMeD–OSCE, which is an assessment of communication competence in challenging doctor-patient encounters [19], showed relatively low IRR according to other studies [20], [21]. The aim of this exploratory secondary analysis is to identify interindividual examiner factors that may influence IRR in communication skill assessments.

Section snippets

Methods

Bivariate secondary analyses of IRR in relation to examiner factors in a communication OSCE were performed.

The Düsseldorf CoMeD undergraduate communication skills training program [22] is followed by a 4-station OSCE in the fourth year when students encounter professional actors trained as typical standardized patients (SP) for the examination of the following types of communication: breaking bad news, sensitive issues (guilt and shame), handling emotions (aggression), and sharing

Results

A sample of 169 pairwise ratings (=338 OSCE scores) from 19 examiners in 16 dyads were analysed (Table 1). Within the OSCE sessions rated by 2 dissimilar examiners, those with greater OSCE experience generally gave more lenient scores. Other examiner characteristics had no effect on global rating scores.

IRR was significantly higher in examiner dyads of the same gender, same professional background, and greater OSCE experience (Table 2). Participation in a recent training session had no

Discussion

Higher IRR is associated with current clinical practice, OSCE experience, and concordant gender of examiners, but is not associated with participation in recent examiner training. Other studies also found that examiner training often yielded no or marginal improvement in reliability of an OSCE [17], [18]. Several approaches for examiner training have been reported, but little is known about their effect on examiner performance [28]. However, Wilkinson [8] reported that examiner experience is

Conclusion

Our finding of higher rating concordance between examiners of the same gender suggests the hypothesis that unrevealed gender-specific concepts are important in assessing communicative competence. Better concordance of ratings in clinically active examiners hints at context specificity, which is pre-existent even in medical encounters with a focus on communication aspects.

Practice implications

Medical faculties introducing summative assessment of communication competence should focus the influence of examiner characteristics on IRR.

Acknowledgements

We are grateful to the medical students and the examiners for facilitating and supporting our project. Our special thanks go to our CoMeD team and the staff of the Düsseldorf University Hospital Training Centre (TräF) for project organisation.

References (37)

  • P.R. Jeffries

    A framework for designing, implementing, and evaluating simulations used as teaching strategies in nursing

    Nurs. Educ. Perspect.

    (2005)
  • M.K. Burns

    How to establish interrater reliability

    Nursing

    (2014)
  • G.M. Humphris et al.

    Examiner fatigue in communication skills objective structured clinical examinations

    Med. Educ.

    (2001)
  • K. McLaughlin et al.

    The effect of differential rater function over time (DRIFT) on objective structured clinical examination ratings

    Med. Educ.

    (2009)
  • D. Hope et al.

    Examiners are most lenient at the start of a two-day OSCE

    Med. Teach.

    (2015)
  • J. Crossley et al.

    Assessing health professionals

    Med. Educ.

    (2002)
  • J. Van Dalen et al.

    Evaluating communication skills

    Adv. Health Sci. Educ. Theory Pract.

    (1998)
  • J.A. Spencer et al.

    Communication education and assessment: taking account of diversity

    Med. Educ.

    (2004)
  • Cited by (17)

    • Impact of Structured Feedback on Examiner Judgements in Objective Structured Clinical Examinations (OSCEs) Using Generalisability Theory

      2020, Health Professions Education
      Citation Excerpt :

      Although recent literature suggested that examiner judgements are inherently subjective and could be based on idiosyncratic reasons,15,16,17 it is important to provide a fair assessment of student clinical competence taking into account the interactions between students and the specific context including the examiners and the circumstances.17 Previous empirical studies have attempted to evaluate the impact of examiner training to reduce the unwanted sources of variance in examiner judgements.18–23 However, results have been inconclusive and difficult to compare as researchers applied different methodologies.24

    • The development of a six-station OSCE for evaluating the clinical competency of the student nurses before graduation: A validity and reliability analysis

      2020, Nurse Education Today
      Citation Excerpt :

      An inter-correlation coefficient (Cronbach's α value) between 0.7 and 0.8 reflects acceptable reliability for high-stakes examinations (Khan et al., 2013). High inter-rater reliability is associated with examiners in current clinical practice and with OSCE experience (Mortsiefer et al., 2017). The number of stations and length of the examination also influence OSCE reliability (Newble, 2004).

    • Local tangent space alignment and relevance vector machine as nonlinear methods for estimating sensory quality of tea using NIR spectroscopy

      2019, Vibrational Spectroscopy
      Citation Excerpt :

      Intraclass correlation coefficient (ICC) was applied to assess the consistency and reliability between the 4 assessors for the sensory quality. ICC range from 0.0 to 1.0 (≥0.70, substantial; 0.50-0.69, moderate;0.30-0.49, fair; <0.3, poor [45]). The results of ICC are shown in Table 1.

    • Effects and Consequences of Being an OSCE Examiner in Surgery—A Qualitative Study

      2019, Journal of Surgical Education
      Citation Excerpt :

      Many studies have analyzed the OSCE, mainly regarding its psychometric properties and potential influences on ratings. These investigations have already described the influence of different examiners.15-20 On the contrary, the view of the examiners has only been analyzed to a limited degree.

    • An Analysis of the Extent of Intra and Inter-rater Variability in OSCE

      2023, Indian Journal of Pharmaceutical Education and Research
    View all citing articles on Scopus
    View full text