Introduction
Competency-based medical education requires assessment of performance in authentic clinical learning environments via workplace-based assessments (WPBAs) [
1‐
3]. Many WPBAs use a combination of quantitative performance ratings and narrative comments [
4]. Historically, educators have viewed quantitative assessment data as more reliable and useful for summative decision-making, and narrative data as subjective and useful for formative feedback [
5]. However, quantitative WPBA practices can result in psychometrically poor results,[
6] leading some to advocate for a “post-psychometric” era of assessment in which the subjective and contextual natures of WPBAs are embraced [
7‐
9]. This has sparked interest in narrative data as a potentially more useful measure of performance than quantitative assessment data [
5,
10‐
13].
Estimates suggest anywhere from 5 to 10% of residents “struggle” during their training, with many problems identified too late to maximize help to the learner [
14‐
17]. Multiple studies have shown the value of narrative data in making summative decisions, while others have attempted to predict ongoing struggles with learner development [
12,
18‐
20]. However, challenges exist when using narrative data in these ways. Assessors frequently “code” their comments, or use nonspecific and idiosyncratic language, requiring readers to interpret hidden meaning [
21‐
24]. Patterns in narrative data often take time to develop, with themes only becoming clear after months or years of training. Comments can also be discordant from one another or from accompanying quantitative data, making interpretation and subsequent decision-making challenging. Narrative data are also cumbersome and time-consuming to analyze, particularly in large training programs where many comments are present [
25]. Because of these challenges, training programs may struggle to use their narrative data to their full potential, particularly with residents early in training.
Carefully collected and analyzed narrative assessment data can provide value, particularly early in training given their potential to capture rich details about a learner’s performance. Different approaches to finding value in narrative data have included the use of “keyword algorithms” or counting the number of words and percentage of assessments containing negative or ambiguous comments, which were associated with the need for remediation [
15,
19]. However, using keyword algorithms was much better at ruling out, rather than predicting who would struggle as evidenced by a positive predictive value of 23% [
19]. Using negative or ambiguous comments was helpful when reading an entire residency file, but 12 of 17 residents in good standing still had negative comments [
15]. These approaches highlight opportunities to better understand how narrative feedback analyzed early in training may differ between residents who struggle from those who do not struggle, while also describing the actual differences in the content of narrative data. In this study, we aimed to explore the first 6 months of narrative comments from WPBAs in one internal medicine residency program, with a robust program of assessment,[
26‐
29] to determine identifiable patterns that subsequently might predict who will receive lower quantitative entrustment ratings over the course of training. Recognizing the early signal that portends the need for additional support and intervention at the beginning of residency can provide a practical approach for clinical competency committees (CCCs) to surface what matters from the plethora of positive, nonspecific, and or idiosyncratic narrative feedback.
Results
We organized the narrative comments that differed between typical and lower entrusted residents into two types of themes: three explicit/manifest and three implicit/latent themes (Tab.
1). The three explicit/manifest themes focused on specific aspects of resident performance and are: 1) Gaps in attention to detail, 2) Communication deficits with patients, and 3) Difficulty recognizing the “big picture” in patient care for lower entrusted residents. The three implicit/latent themes focused on how narrative data were written and are: 1) Assessors describe feedback as a deficiency rather than an opportunity to improve for lower entrusted residents, 2) Assessors make normative comparisons that identified a resident as being behind their peers for lower entrusted residents, and 3) Assessors warn of possible risk to patient care for lower entrusted residents. Direct quotes are included and labelled with resident number, month, and assessor role.
Table 1
Themes present in the first six months of narrative data associated with lower overall entrustment at the end of an internal medicine residency
Explicit/manifest: Resident performance |
1. Gaps in attention to detail | “There have been a few overlooks in regards to medications and orders that I have had to correct. I encouraged them to look at order and medication list on a daily basis as part of rounds to make sure that there is nothing important that is missing or needs to be removed” |
2. Communication deficits with patients | “Bedside presentations include words that are not understandable to the patient” |
3. Difficulty recognizing the “big picture” in patient care | “They could get a better handle of the overall picture of a patient instead of focusing only on the individual problems” |
Implicit/latent: Assessor description |
4. Describing feedback as a deficiency rather than an opportunity to improve | “Their knowledge base overall is not good enough to answer simple questions such as how different insulins work etc.” |
5. Normative comparisons that identified a resident behind their peers | “Knowledge base is below what would be expected for an early intern” |
6. Warning of possible risk to patient care | “Supervising resident and attending need to keep close eye on them, look at everything” |
Before describing differences in narrative comments between groups, it is important to note that many similarities existed. We did not elaborate on these for the final analysis but note them here briefly for context. The most common similarity between groups focused on the need to further medical knowledge or knowledge acquisition, including generic advice to “read more” or specific areas for knowledge expansion. Other similarities included the need to broaden differential diagnoses, increase confidence in clinical practice, improve efficiency in documentation and workflow, and gain more clinical experience. We determined that analyzing comments that were similar between the groups would not contribute to our study aim, so we did not explore these themes further.
Explicit/manifest theme #1: gaps in attention to detail
Comments describing a lack of attention to detail were common in the lower entrusted residents. These comments described a need for improved thoroughness and accuracy of completed tasks, such as knowing all the details of a patient’s current presentation and reviewing past medical history and previous admissions. Examples of faculty comments in this area included: “needs to work on knowing the patient condition and collect[ing] and analyz[ing] the data more thoroughly” (R7, Oct, Faculty) as well as the need to pay “further attention to …chart review for new admissions [that allows a] better understanding of chronology of events in the recent past that inform the current admission” (R18, Aug, Faculty).
Lower entrusted residents had comments about “pay[ing] attention to detail when writing orders” (R8, Dec, Faculty). They also had gaps in performing medication reconciliation on admission and discharge from the hospital, with one assessor noting “more attention and analysis are needed on medication reconciliation” (R15, Aug, Faculty).
Assessors frequently commented on the data acquisition skills of lower entrusted residents with comments such as “be more thorough when obtaining a history from the patient” (R32, Aug, Faculty). Lack of organization was sometimes noted in narrative comments as a possible explanation for lacking attention to details. This was sometimes accompanied by advice, such as the assessor who noted a resident should “try making a check list of everything that needs done and cross off as you go” (R32, Nov, AHP).
Comments about documentation tasks were also common, such as forgetting to update notes and sign-outs, and the need to remove inconsistent, repetitive, or inaccurate information. Illustrative faculty comments in this area include: “notes suffer[ed] from copy/paste and are not thoroughly reviewed …every day and edit[ed] as appropriate.” (R15, Nov, Faculty).
Explicit/manifest theme #2: communication deficits with patients
Assessors identified communication deficits in lower entrusted residents. Often this took the form of suggestions to improve patient communication with three specific examples rarely found in typically entrusted residents. First, assessors suggested improving engagement with the patient through examples such as, listening to patients, building rapport, and bedside manner. Second, assessors suggested using less medical terminology that may be unclear to patients, such as “bedside presentations include words [that are] not understandable to the patient” (R29, Aug, Faculty). Finally, assessors suggested more clearly articulating a plan to the patient, avoiding “a tendency to tell the patient too much” (R28, Aug, Faculty) and tending to an inability to “recognize when patients are not understanding what is being said.” (R39, Nov, Faculty).
Explicit/manifest theme #3: Difficulty recognizing the “big picture” in patient care
Many comments described an inability of lower entrusted residents to synthesize information and recognize the bigger picture in patient care. Assessors described residents getting “bogged down with every detail” and suggested “keep[ing] an eye on the bigger picture” (R6, Aug, Faculty). They described this being illustrated when a resident struggled to sort primary from secondary problems, resulting in “difficulty prioritizing and then dealing efficiently with the most serious problems” (R28, Sept, Faculty). Finally, assessors encouraged “instead of focusing only on the individual problems …get a better handle of the overall picture [of patient care]” (R12, Nov, Faculty).
Implicit/latent theme #1: Assessors describe feedback as a deficiency rather than an opportunity to improve for lower entrusted residents
Assessors frequently used negative descriptors with lower entrusted residents compared with those with typical entrustment ratings, for whom constructive feedback was often framed as an opportunity to improve. Examples of negative descriptors from the former group included: “disorganized” and “… poor self-confidence, which limits their capacity to propose a plan of care for the patients” (R32, Dec, Faculty). In other cases, assessors explicitly used words such as “deficiency,” “problem,” “concern,” “weakness,” “difficulty” and “struggle” when narrative comments included constructive feedback. Illustrative narrative comments employing these terms include: “struggled developing a system of organization” (R31, Sept, Faculty), “… concern about the level of detail for their progress notes” (R2, Nov, Faculty), and “difficulty synthesizing information [for basic tasks]” (R13, Aug, Faculty).
Another way that assessors expressed negative narrative comments with lower entrusted residents was to call direct attention to something the assessor felt should have been done but was not by using the phrase “did not.” Examples include: “they did not present or possibly even find history of mitral valve repair” (R2, Dec, Faculty) or “did not come up with a differential diagnosis” (R37, Sept, AHP) or “they did not report her ‘white out chest x‑ray’” (R2, Dec, Faculty).
Finally, demonstrating the most extreme limit of this theme, assessors sometimes clearly conveyed a value-laden negative tone, such as describing a resident as “oblivious to what was going on” (R7, July, AHP) or “this intern glosses over things they do not understand” (R26, Dec, Faculty). This was also reflected in describing opportunities for improvement as concern for the resident’s potential ability to perform better. This is illustrated well by a faculty assessor who noted a resident’s “knowledge base overall is not good enough to answer simple questions such as how different insulins work.” (R26, Sept, Faculty).
Implicit/latent theme #2: Assessors make normative comparisons that identified lower entrusted residents as behind their peers
When documenting narrative comments for lower entrusted residents, assessors sometimes used normative language. For example, faculty noted that a resident is “not at the same level as co-interns” (R37, Sept, AHP), “below what would be expected for an early intern” (R35 Sept, Faculty), and “lack[ing] more self-confidence than others …” (R26, Dec, Faculty). In other examples, assessors used less obvious examples while invoking that a resident needed more help than their peers with comments such as, “relied heavily on senior to incorporate [information] independently” (R25, Oct, AHP) or a comment that a resident was “in the early stages” (R18 Dec, Faculty) for many basic tasks.
Implicit/latent theme #3: Assessors warn of possible risk to patient care for lower entrusted residents
Assessors used more language signaling risk to patients in two ways. First, concern over risk to patients was documented through using terms that describe potentially unsafe care, such as “mistake,” “inaccurate,” or “errors.” Examples include: “sometimes errors were caught in discharge med recs” (R21, Nov, AHP) and “even with direct supervision by a senior resident [presentations and examination skills] were often inaccurate.” (R25, Oct, AHP) Second, risk concerns were described by conveying feelings about the potential for errors, including calls for closer supervision, using words such as “worry” and “concern” or even directly stating “supervising resident and attending need to keep close eye on them, look at everything” (R2, Dec, Faculty).
Discussion
We identified themes in narrative comments during the first 6 months of training that were present in residents who subsequently had lower entrustment ratings during residency, dividing themes into explicit/manifest and implicit/latent to explore differences in both residents’ performance and how assessors describe that performance. Narrative data can differ between higher and lower performing residents and thus can be used to discriminate between learners [
18,
24]. Many faculty members describe their reading of narrative data as scanning for red flags, usually in the form of words or phrases [
23]. We advanced this understanding of red flags by exploring themes in the narrative comments that were unique to learners who subsequently had lower entrustment ratings. While data suggesting extreme outlier performance can usually help identify residents with performance concerns, CCCs often rely on accumulated data and trends, both of which take time and potentially delay early identification [
34]. Our findings could aid CCCs in their incorporation of narrative comments to support specific residents that may benefit most from earlier intervention. We hope these findings continue building emphasis on the implications of how narrative data can be used to guide decision-making (including predictive analytics and machine learning algorithms) in a program of assessment [
20,
35].
In programs of assessment, numerical and narrative data are often obtained for formative purposes but used by CCCs to make summative decisions about a learner’s trajectory [
36,
37]. Recognizing the right time to intervene on perceived concerns can be complicated. CCCs face two challenging scenarios:
overreacting to specific comments and implementing remediation when it may not be necessary or
underreacting and not intervening while waiting for more data despite valuable time passing to help a struggling resident. Differentiating signal from noise is a challenge in all early assessment efforts, although evidence suggests minimal narrative data is needed to discriminate between learners [
18]. Our findings can be helpful in determining when to intervene and when to simply continue monitoring, allocating resources (faculty time, extra CCC discussion, remediation plans, etc.) where there is a higher likelihood of learners having continued struggles. Specifically, comments invoking the need to increase knowledge, build confidence, gain experience, and improve efficiency in workflow or documentation were present in both typical and lower entrusted residents. Therefore, these types of comments are less likely to help identify residents early in training who need additional intervention. However, comments describing a lack of attention to detail, difficulty communicating clearly with patients, or synthesizing details to see a bigger picture are potentially more likely to portend ongoing struggles, prompting a swifter reaction to consider whether intervention is warranted.
We found that beyond the specific details of performance, sometimes specific descriptors can also be a signal in lower entrusted residents. The adage, “it’s not what you say, but how you say it” applies to both verbal and written narrative feedback. Recent studies on narrative data support that faculty have consistent writing styles and uncovering meaning often requires reading beyond the literal words [
21‐
24,
38]. We found certain implicit/latent themes in written narrative feedback were disproportionately present in lower entrusted residents. These themes represent another layer in CCCs decision-making as they encounter narrative data that describe a resident as behind their peers, warning of risk to patient safety, and framing their feedback as a deficiency. When narrative feedback explores common themes for all early residents (i.e. knowledge, efficiency, confidence) the implicit/latent characteristics represent an opportunity to still uncover signal in the noise.
Finally, in addition to harnessing and building upon our findings to identify signal, the process of using an iterative qualitative lens to analyze narrative data in a large program of assessment is transferable to other programs. Narrative data are often difficult to interpret for individuals, but more easily understood when viewed in aggregate [
22]. Viewing narrative data in cohorts with larger aggregates, as seen in this study, can yield additional insights. Since meaning is contextual and dependent on cultural norms, programs analyzing their own narrative data for keywords and patterns may provide deeper understanding of comments that require more immediate and definitive interventions [
21]. This can aid CCCs in better recognition of patterns or inform text-based applications of machine learning algorithms, to help predict those residents that might benefit most from limited resources and earlier intervention to improve their developmental trajectory [
35].
Limitations
First, we analyzed data from one internal medicine residency program, which may limit transferability of the findings to other programs. Specifically, these findings may be more specific to medical-based training programs and less applicable to procedure-based specialties. Second, we used learning analytics to define typical and lower performance using quantitative ratings. However, those ratings as well as our modeling may not accurately categorize trainees by their performance. This possibility noted, we believe our program of assessment as well as performance analytic modeling are robust. Third, a sample of typically performing residents was analyzed and it is possible that if larger samples had been used the contrast between themes in lower performing residents could have changed. Fourth, given our methodology we cannot assert that the themes in the narrative data will predict residents who will struggle. Finally, we did not compare or contrast comments from different assessors and therefore we do not know if assessor-specific characteristics might impact the type or description of narrative data provided. Future study should explore this.