Introduction
Handovers, or the transfer of clinical information and responsibility from one clinician or team to another, occur frequently in health care. These transitions in care are vulnerable to communication failures that often lead to medical errors and harm to patients [
1]. In response to this hazard, considerable attention has focused on interventions to improve patient safety during handovers [
2], many of which were adapted from industries such as nuclear power and space aviation in which transition errors have high consequences [
3]. These best practices aim to ensure that the necessary information is transmitted via communication protocols that include structured face-to-face and written sign-out, interactive questioning, and distraction-free settings [
4].
Interventions that deploy these practices simultaneously (often referred to as a bundle) have yielded significant improvements in educational and clinical outcomes [
5]. Medical schools and residency programmes are rapidly implementing handoff curricula that teach these best practices [
2]. However, even with these gains, errors continue to occur during patient handovers, often in the form of information loss (e.g., drug allergy, critical comorbidity, relevant history or current treatments) or distortion (e.g., wrong medication dose, wrong surgical site, or incorrect diagnosis). Information loss and distortion increases when the cognitive load of the handover exceeds the working memory capacity of the clinician sender and/or receiver. To further improve patient safety will require a deeper understanding of human cognition in order to identify the challenges trainees face when learning how to give and receive sign-outs and to use this understanding to design an assessment that can help identify novel intervention targets and measure their efficacy.
Human memory consists of three main subsystems: sensory memory, working memory, and long-term memory [
6]. Sensory memory perceives and briefly retains visual and auditory information [
7]. Sensory information raised to conscious awareness enters the domain of working memory. Working memory retrieves relevant knowledge possessed by the learner and stored in long-term memory as schemata. Working memory then organizes and integrates the new with the already existing information to facilitate efficient storage in the form of new (or modified) schemata [
8].
Originally developed by John Sweller in the context of studying how students problem solve [
9], cognitive load theory (CLT) focuses on the implications of limited working memory for learning [
10]. Unlike sensory and long-term memory, working memory is not infinite—it can only hold a limited number of independent information units at a time (4–7 ± 2) [
11] and can actively process (i.e. organize, compare and contrast) no more than two to four elements at any given moment [
8]. CLT researchers have distinguished between different types of cognitive load. In 1998, John Sweller argued for three types [
12]:
1.
Intrinsic—load associated with the task itself (i.e., working memory resources required to process the information essential to the task). Intrinsic load depends on the number of information elements, the interactivity of those elements, and the knowledge of the learner.
2.
Extraneous—load not essential to the task but induced by the design of the task (e.g., how the information is presented) or the environment (e.g., background noise).
3.
Germane—load imposed by the learner’s deliberate use of cognitive strategies to refine existing schemata and enhance storage in long-term memory.
Recent work by Sweller and others has suggested that germane load may best be understood as a component of intrinsic load rather than a separate type of load [
13,
14]. In this view, a two-factor model (intrinsic and extraneous load) is preferred on theoretical grounds and best explains empirical results.
Given working memory limitations and the still developing schemata of trainees, the additive effects of these different types of load can easily exceed the working memory capacity of the trainee, resulting in impaired learning and performance. Regardless of how germane load is conceptualized, CLT uses three strategies to enhance learning: reduce extraneous load, titrate intrinsic load to the developmental stage of the learner, and increase germane load.
Researchers have developed a number of techniques to estimate cognitive load [
15,
16], including learner self-rating of effort [
16‐
24], response time to a secondary task (e.g., visual monitoring task) presented during the primary task [
16,
18], and psychophysiological measures (e.g. heart rate variability, pupillary response, and electrical skin conductance) [
20]. Secondary task performance and physiological measures only capture overall cognitive load, but are not dependent on learner perception and can capture in real time how load may dynamically change over the course of the task. Learner self-rating has been the most commonly used strategy because it is inexpensive and has evidence of validity [
25]. Paas developed a single item designed to measure overall cognitive load [
22]. This measure has been used extensively, including in a recent study on cognitive load and surgical knot tying [
26], but may actually measure intrinsic load rather than overall load [
27,
28]. The NASA-TLX measures mental workload with a multi-item scale [
21,
29]. It is unclear to what extent mental workload corresponds to cognitive load [
13,
27].
The use of instruments that measure only overall load has presented challenges. For example, integrating visual and written information has been shown to reduce overall load and improve learning [
30]. Some have assumed that this occurs due to decreased extraneous load [
31] while others have argued that the benefits of data integration are also mediated by increased germane load [
32]. The absence of measures of specific load types permits competing and sometimes contradictory explanations to exist in parallel. To address this challenge and further develop CLT, researchers have tested instruments that attempt to differentiate cognitive load types [
13,
16‐
20,
24,
28]. To date, these studies are of variable methodological quality, focus mostly on classroom-based learning settings and have shown better results for items intended to capture intrinsic and extraneous load and only mixed results for germane load items [
13].
The most promising efforts to collect validity evidence for a measure of load types have focused on content-specific learning (e.g., college statistics) in the classroom setting [
13,
28]. This measure has recently been adapted for use in two medical education studies, though neither reports validity evidence for use of the measure in this context [
33,
34]. In addition, Naismith et al. discuss how their own measure of load types compares with the Paas overall measure and the NASA-TLX [
27]. The authors identified the need for the development of validity evidence of measures appropriate for workplace-based clinical procedures, in general, and handovers, in particular. Such measures are necessary to identify the cognitive mechanisms of current handover interventions and to develop new handover strategies that modulate intrinsic, extraneous, and germane loads in the desired directions. The authors developed a novel measure, the Cognitive Load Inventory for Handoffs (CLI4H). This measure was then tested in the context of a handover simulation that medical students completed during a multi-station objective structured clinical skills examination (OSCE). In order to provide evidence in support of the validity of the scores from this measure of cognitive load, the study addressed the following questions:
1.
To what extent does the CLI4H yield factors consistent with intrinsic, extraneous and/or germane load?
2.
How does the performance of the CLI4H compare with the Paas Cognitive Load scale—a single-item measure of cognitive load with evidence to support validity? Positive correlations would support construct alignment between the two measures.
3.
Do the CLI4H scores vary, as predicted by CLT, with measures of amount of training and performance? According to CLT, students with greater prior training should experience lower intrinsic and germane load while students with higher performance should experience lower intrinsic load and higher germane load.
Discussion
This study represents the first published attempt to measure cognitive load types during a handover. The newly developed instrument, the CLI4H, generated mixed results. While the findings from the exploratory factor analysis are encouraging with respect to intrinsic and germane load, the items for extraneous load performed poorly. The extraneous load items themselves may not be adequate, even though they were tailored to handovers and consistent with the structure of extraneous load items that have performed reasonably well in other settings [
13,
18,
24,
28]. This seems to have been the case with respect to the question about how well the student understood the handover protocol. Written comments from the students indicated confusion about this item. Shifting the focus of this item from understanding to ‘clarity about what protocol to use’ may help. In hindsight, ‘clarity’ better captures extraneous load than understanding which relates better to intrinsic load. The item on accessibility of the information used a scale with two concepts—fragmentation and difficulty of organization. This may have led to respondents focusing on different concepts. And the terminology item asks about ‘mental effort to understand’ which may have caused the item to split across extraneous and intrinsic load domains.
In addition to the construction of the extraneous items, the context may have been a primary contributor to the poor performance of these items. The handover occurred in a highly controlled environment in which there were no interruptions or background noise and no fragmentation of information. Consequently, the items focused on distractions and information fragmentation were not tested by the setting. Similarly, the standardized receivers were trained actors who likely did not simulate the ‘give and take’ of an actual clinician-receiver. As a result, we suspect communication was mostly unidirectional, making the item on the clarity of the terminology of questionable applicability. Taken as a whole, these limitations provide guidance for future efforts to measure extraneous load. Response process should be assessed more systematically in the development of new extraneous load items. Items should be tested in environments that better simulate sources of distraction in clinical handovers. Moreover, measurement of certain sources of extraneous load (e.g., clarity of terminology) will require the bi-directional communication of sender and receiver.
The germane load results are promising. However, a single item is not sufficient for confirmatory factor analysis which will be necessary for further validation studies. More items need to be developed and tested. Moreover, germane load may be inadequately specified by our current models. Future items should include metacognition concepts given the similarities between the concept of germane load and metacognition (anticipatory planning, monitoring and adapting action in real time, and reflection and evaluation afterward).
The findings from the correlational analyses provide some additional evidence of validity. Intrinsic load factor showed a positive association with Paas’ measure of cognitive load. While small, the magnitude (0.310) is in a similar range to the correlation found between intrinsic load and Paas’ overall measure (0.347,
p < 0.01) in a recent study on cognitive load and the use of hypermedia [
24]. Still, we expected the correlation to be higher. In addition, the intrinsic load factor was higher for students with less handover experience which is consistent with CLT’s notion that a given task will present less intrinsic load as a learner’s skill increases. Although CLT predicts a negative correlation between intrinsic load and performance, our measure of intrinsic load did not correlate with either of our measures of performance (i.e., self-assessment of success and rating by the standardized resident). This is surprising and inconsistent with other studies [
13,
17,
24]. However, the students may not have had sufficient external information and reflection skills to self-assess accurately [
42]. In addition, there was very little spread in the performance ratings from the standardized residents (e.g., more than 40 % of the students had the same score of 8). Therefore, the absence of a correlation between intrinsic load and performance likely reflects an inadequate measure of performance—due to the rating tool and/or the raters. The rating tool focused on whether the sender performed each step of the protocol. But variation in performance may arise less from compliance with each step than from the content quality within each step. One group has reported results on the initial testing of a handoff evaluation tool, the Handoff Mini-CEX, which includes a focus on the content quality [
43]. Also, the standardized residents who did the performance ratings were actors who typically function as standardized patients and may not have sufficient clinical knowledge to rate the handover. It is less likely but also possible that the learners did not differ enough in their skill or that the intrinsic load of the handover itself was not sufficiently high to generate meaningful differences in performance between different levels of experience.
The study found a negative correlation between the germane load factor and experience. In other words, the less experienced students dedicated more effort to understanding how to perform the handover. Theoretically, performance and learning should improve as germane load increases, again with the proviso that total load does not exceed the learner’s working memory capacity. Some studies have reported a positive correlation [
18,
24] while others have not [
13,
28]. Our results were similarly mixed—germane load correlated with the subjective measure of success, but not the performance rating by the standardized resident. Given the limitations of self-assessment as a performance measure, the more important point may be the inadequacy of our performance measure (e.g., rating by the standardized residents).
We found only a small association between the intrinsic load factor and the germane load factor, which supports the relative independence of these two constructs—an issue of some controversy in the CLT literature. The triarchic formulation posits that the three load types are separate and thus should not correlate. This perspective places the activities related to schema construction and automation (i.e., learning) in the domain of germane load [
12]. Others have argued that intrinsic load encompasses schema acquisition and learning and that germane load represents additional activities that enhance learning such as the conscious application of learning strategies [
44]. This perspective defines germane load differently but still maintains germane load as an independent type of load. Still others argue that germane and intrinsic load overlap so significantly that the two categories are redundant and best understood as a single type of load. This latter perspective has gained increasing support from CLT researchers [
14,
45]. The results of this study suggest that intrinsic and the single germane load are mostly independent. Yet, other recent studies that have found a third factor have wondered whether the factor may relate to a construct other than germane load [
13]. That is a possibility with our results.
Limitations of this study, as addressed above, included an inadequate measure of performance due to non-clinician actors serving as raters and a performance measure that only focused on adherence to a format rather than the quality or accuracy of the information communicated. The simulation also failed to introduce common sources of extraneous load, making it difficult to assess this part of the instrument. These limitations serve as important lessons for subsequent research in this area, especially when the study occurs in a simulated environment such as an OSCE, in which non-clinical actors often rate trainees and occupy important roles, and sources of extraneous load are by design minimized. Future studies should use a meaningful performance measure (such as accuracy or quality of information conveyed). And testing should occur in authentic clinical workplaces or use simulation scenarios that better capture the sources of extraneous load such as interruptions, fragmented information, terminology differences between sender and receiver, and perhaps hierarchies. While reasonable for this initial stage of instrument development to focus on the sender only and the handover of a single patient amongst medical students with experience in handovers, future studies should examine cognitive load in the sender and receiver, sign-out of patient panels, and include trainees with a broader range of experience (e.g., students, residents, and fellows).
Conclusion
These are the first published results of an instrument designed to measure the cognitive load types associated with a handover. The study employed learners with different levels of experience which allowed the collection of validity evidence beyond factor structure. While preliminary, the results offer some support for the items measuring the intrinsic and germane load constructs. These can be refined and further tested, especially with more germane load items, a better measure of performance, senders and receivers, a broader spectrum of learner levels, and variation in patient complexity. Items for extraneous load require re-building and then testing in an environment that better simulates factors that induce extraneous load. The study’s limitations serve as important insights for future research efforts and represent a set of initial findings upon which future endeavors can build. The ability to measure cognitive load types is critical to our efforts to understand the cognitive load mechanisms of handover procedures. Such a measure will help the field better leverage CLT in order to identify handover procedures that manage intrinsic, extraneous, and germane load in the desired direction, and, thereby, enhance learning, reduce errors and avoid harm to patients.