A physician’s expertise in processing visual data is fundamental to accurately interpreting medical images, such as those found in cardiology, pathology, radiology, and dermatology practice [
1]. Emerging research at the intersection of medicine, education, and cognitive science has recently revealed reliable differences in visual behaviour during expertise development. For instance, experts tend to move their eyes differently compared with novices, are faster and more accurate in identifying suspicious regions in a visual image, and are less vulnerable to the distracting effects of diagnostically irrelevant visual patterns [
2‐
4]. This research informs curricula and assessment methods for medical education and training, suggesting novel techniques for accelerating novice learning [
5] and objectively assessing competency development [
6]. However, to date this existing research is restricted to reviewing static images, which is particularly unfortunate given that several medical specialties increasingly involve the review of dynamic visual imagery, such as when reviewing coronary angiograms or volumetric CT scans, or performing diagnostic fluoroscopy or laparoscopy. In these domains, educating and training the visual interpretive process is the linchpin to accurate diagnostic decision making.
Over the course of gaining medical experience, learners develop highly specialized expertise in processing visual data that guides and focuses attention and affords accurate mappings from perceived stimuli to candidate diagnoses. One prominent theory proposes that developing global perceptual strategies allows experts to make fine-grained distinctions between visual stimuli [
7‐
9], with experts encoding broader visual information than novices and quickly developing a relatively holistic representation of an overall configuration [
10‐
12]. Interestingly, much of this holistic processing can be done very quickly and without requiring the expert to fixate his or her eyes on the more global structure [
13]. Experts develop specialized skills in visual search, recognizing objects, and making decisions, resulting in higher efficiency in knowing where to look, what to look for, and what it means [
14‐
18]. Eye tracking provides an innovative tool for quantifying and possibly accelerating this expertise development and providing a basis for objective, formative feedback.
Eye tracking
Many experience-based differences in the visual interpretive process have been revealed through eye tracking [
2,
4,
11‐
13]. Monitoring eye movements is valuable for objectively characterizing the visual search process and in some cases predicting diagnostic outcomes. For the present study, eye movements can be parsed into two meaningful units: fixations and saccades. Fixations describe the momentary pauses of the eyes to foveate a restricted region of space, and saccades describe the rapid, ballistic movement of the eyes between successive fixations [
19]. Fixations are characterized by their location in the visual world, and their duration; in general, the more fixations and longer their duration, the more visual attention and interest the viewer has in a region [
20]. Saccades are thought to be preceded by an attentional shift to a different location, which results in a saccade to afford foveation of a new goal region [
21,
22]. Saccades can reveal how dramatic (usually measured in degrees, or amplitude) the shifts of attention are across a scene. The peak velocity of saccades can help researchers gain insight into varied states of workload and arousal. Specifically, higher peak saccade velocity correlates with greater sympathetic nervous system activation, for instance during states of arousal or uncertainty [
23], possibly driven by excitatory inputs to oculomotor neurons from the reticular formation [
24,
25].
Tracking the allocation of visual attention over a medical image allows us to dissociate between a failure to view critical scene regions, versus a failure to accurately interpret those regions [
26]. Specifically, if a physician fails to fixate on critical regions of a medical image, any diagnostic decision errors that accrue can be sourced to a failure to find critical diagnostic regions. In contrast, if a physician fixates on critical regions but does not arrive at an accurate diagnostic decision, the error can be attributed to the interpretive process. This distinction is critically important for identifying possible sources of error and how they change as a function of expertise development, and informing the development of tailored student assessments and training curricula. For example, eye tracking can exemplify expert eye movements to medical students, allowing them to learn viewing strategies for reviewing complex slides [
5]. Eye tracking also holds potential for evaluating student competency progression, objectively assessing skill development and providing formative feedback [
6,
27].
Earlier research using eye tracking to understand expertise-related differences in visual search among medical professions is restricted to examining static images due to the complexity of tracking and interpreting eye movements over moving (dynamic) scenes [
28,
29]. This is because not only are the head and eyes constantly moving relative to the computer monitor, but also relative to a moving scene with constantly changing stimulus locations. For instance, for a cardiologist reviewing an angiogram, a region of diagnostic importance may move across a scene as tool angulation or position changes, necessitating a tedious frame-by-frame coupling of region location and eye location over time. For this reason, most studies examining eye movements with medical images artificially restrict zooming and panning behaviour, simplifying analysis but also possibly reducing relevance to behaviour elicited during routine clinical practice [
26,
30,
31].
To increase the efficiency of interpreting eye movements over dynamic scenes, researchers can temporally couple eye tracking with logged interface behaviour to relate eye position to specific video frames [
32‐
34]. To prioritize video frames, a behaviour analysis can assist in inferring a viewer’s interest in certain video frames: using the
ShotRank technique, the frequency and duration of viewing video frames indicates interest in the information available on those frames [
35‐
37]. Specifically, repeated interest in certain video frames (e.g., pausing, reviewing) can provide important information regarding the spatiotemporal distribution of critical diagnostic regions throughout a video. The
ShotRank technique was applied herein to prioritize analysis of video time frames and associated eye movements. Some automated image processing and dynamic region of interest tracking techniques are also becoming available in eye tracking software packages [
38‐
40], but remain heavily reliant on researchers manually defining and correcting regions over the course of the video.
Over three million angiograms are performed each year in the United States and European Union [
41,
42], with diagnosis and treatment decisions relying on a cardiologist’s interpretation of dynamic angiogram images, yet the underlying interpretive process has not been adequately studied. Thus, this pilot study provided a first examination of the behaviours and eye movements characterizing cardiologists’ interpretation of coronary angiograms, with implications for the development of next-generation educational and training curricula and assessment methods.