In the complex visual environment of everyday life, we need to be able to select relevant over irrelevant information to achieve our goals. For example, if your goal is to read this article, it is important to attend to the paper in front of you and to ignore other distracting visual stimuli on your desk, such as a coffee cup or a pencil. From an evolutionary viewpoint, the ultimate goal in life is to survive (Öhman, Flykt, & Esteves, 2001). Accordingly, our visual system will likely have evolved in such a way that we do not ignore but rather prioritize distractors that may threaten us and impose danger on our survival, such as a big spider crawling across your desk. Moreover, visual prioritization of these so-called emotional stimuli should occur irrespective of your attempt to remain focused on your goal. Only by virtue of automatic visual prioritization are we able to respond immediately and adaptively to the situation and thereby enhance our chance for survival. However, if, how, and when emotional stimuli modulate visual and attentional selection are questions that are not yet fully understood (de Gelder, van Honk, & Tamietto, 2011; Pessoa & Adolphs, 2010; Pourtois, Schettino, & Vuilleumier, 2013; Vuilleumier, 2005, 2015). For example, are emotional stimuli detected faster than other stimuli? Do emotional stimuli capture attention in a bottom-up manner, irrespective of ongoing top-down goal settings? Do emotional stimuli hold attention longer than other neutral stimuli? Does emotion potentiate attention processes, or rather does emotion potentiate perception in a similar manner as attention potentiates perception? And what are the neural mechanisms involved in emotional attention?

In striving to answer these questions, eye-movement research with emotional stimuli can provide valuable insight into some of these processes. However, studies investigating eye-movement behavior in relation to emotional stimuli are limited. Moreover, paradigms and stimuli that are used in emotional eye-movement research are diverse, ranging from free viewing pictures from the International Affective Picture System (IAPS) without a task (e.g., Alpers, 2008) to instructed saccades with controlled stimuli (e.g., Mulckhuyse, Crombez, & Van der Stigchel, 2013). Furthermore, much research on emotional attention is concerned with individual differences or psychopathology (see for reviews on eye movements and affective disorders; Armstrong & Olatunji, 2012; Richards, Benson, Donnelly, & Hadwin, 2014). Although these studies provide valuable insights into processes related to attentional selection of emotional stimuli, attentional biases in psychopathology are often specifically related to a particular disorder and cannot be generalized across a healthy population (Bar-Haim, Lamy, Pergamin, Bakermans-Kranenburg, & van IJzendoorn, 2007). In the current review, I will focus only on studies with emotional visual stimuli in healthy populations in which saccades are instructed and in which eye-movement behavior is the primary measured outcome of attentional selection. A search for relevant studies was performed using a systematic search of Google Scholar and Web of Science until September 28, 2017, using the following search terms: (1) “eye movement” AND “emotion”; (2) “eye movement” AND “threat”; (3) “oculomotor” AND “emotion”; (4) “oculomotor” AND “threat”; (5) “saccade” AND “emotion”; (6) “saccade” AND “threat.” In addition, references of the studies included in this review were screened for relevance.

First, I will discuss why and how eye-movement behavior can provide insight into the temporal and spatial dynamics of attentional selection of emotional stimuli. Second, I will discuss the variety of stimuli used in emotion research and their possible different impact on visual selection. Subsequently, I will describe studies that investigated the influence of emotional stimuli on attentional and oculomotor capture in relation to different measurements, such as saccade latency, saccade trajectory, and saccade endpoint. Finally, I will discuss underlying neural mechanisms of oculomotor programming and present a neurobiological model that integrates emotion as a strong modulator of oculomotor control.

Covert attention and eye movements

Spatial attention can be allocated to a stimulus with or without eye movements. When attention is shifted to a location without eye movements, we speak of covert attention. When attention is shifted to a location with eye movements, we speak of overt attention (Posner, 1980). Behavioral studies on covert attention with manual responses have shown inconsistent results regarding the ability of emotional stimuli to capture attention in healthy populations (see for reviews Bar-Haim et al., 2007; Yiend, 2010). Specifically, early attention effects such as enhanced attentional capture are not always found in covert attention studies with manual responses (e.g., Koster, Crombez, Van Damme, Verschuere & De Houwer, 2004; Notebaert, Crombez, De Houwer, & Theeuwes, 2011; Mulckhuyse & Crombez, 2014). One of the reasons why some of these studies fail to find an effect of emotion might be due to the response mode. Manual responses require a separate process of response selection (Hommel & Schneider, 2002), whereas eye movements are a direct manifestation of attentional selection. Moreover, eye movements are typically elicited faster than manual responses (Bannerman, Milders, de Gelder, & Sahraie, 2009; Bannerman, Milders, & Sahraie, 2009; Hunt, von Muhlenen, & Kingstone, 2007b). It is possible that emotional modulation of early attentional processing such as attentional capture, is not always traceable in relatively slower manual responses. Therefore, saccadic eye movements, which are supposed to be executed fast and vigorous, may be more susceptible for emotional modulation of attentional selection.

Due to the strong coupling between covert and overt attention (Corbetta et al., 1998; Hoffman & Subramaniam, 1995; Rizzolatti, Riggio, Dascola, & Umilta, 1987), covertly attending to a location has a strong impact on oculomotor behavior (Godijn & Theeuwes, 2002). This has been shown by modulations of temporal and spatial characteristics of saccades when executed in the presence of a visually salient distractor that captures covert attention. For example, saccade latency increases when a visually salient distractor is presented at a different location than the target location (e.g., Mulckhuyse, van Zoest, & Theeuwes, 2008; Walker, Deubel, Schneider, & Findlay, 1997), fast saccades often land at an intermediate position between target and distractor (the so-called global effect; e.g., Findlay, 1982) and saccade trajectory deviates toward and/or away from a salient distractor presented along the saccadic path (e.g., McSorley, Haggard, & Walker, 2006; Mulckhuyse, Van der Stigchel, & Theeuwes, 2009; see for review Van der Stigchel, Meeter, & Theeuwes, 2006). Therefore, measurements such as saccade latency, saccade trajectory, and saccade endpoint can give us more insight into the spatial and temporal processes related to emotional modulation of early attentional selection.

Endogenous, exogenous, and emotional attention

Similar to covert attention studies, which differentiate between endogenous and exogenous attention (Posner, 1980), a distinction is made between endogenous and exogenous attention in eye-movement research (Berger, Henik, & Rafal, 2005). Endogenously driven saccades are voluntarily driven saccades that are induced by ongoing goals or task demands, such as looking to the left and right before crossing a street. Exogenously driven saccades are involuntarily driven saccades induced by visually salient events in the environment, such as instantly looking at a flashing light on a police car while you planned to look the other way. In this case, the eyes are captured automatically by a visual event due to its distinctiveness from the surroundings (Theeuwes, Kramer, Hahn, & Irwin, 1998). Besides these functional distinctions between endogenously and exogenously driven saccades, there is a clear distinction in the temporal characteristics between the two types of saccades. Specifically, endogenous saccades are initiated more slowly than are exogenous saccades (Godijn & Theeuwes, 2002; Henik, Rafal, & Rhodes, 1984). Because of this, it has been suggested that slower saccades (>200 ms) are under top-down (goal directed) control, whereas very fast saccades (<200 ms) are under bottom-up (stimulus driven) control.

In the lab, endogenously driven saccades are induced by either (1) task instructions, for example, with a predictive symbolic cue at fixation in a cueing paradigm (Ro, Henik, Machado, & Rafal, 1997); (2) with an instruction prior to the trial, for example, the instruction to make a saccade to the unique shape in a visual search paradigm (Godijn & Theeuwes, 2002); or (3) to make a saccade to the opposite location of a suddenly presented stimulus in an antisaccade task (Walker, Walker, Husain, & Kennard, 2000; see for review Munoz & Everling, 2004). In all of these paradigms, the initiated saccades are supposed to be under top-down control, meaning that the saccades are initiated voluntarily based on task demands. In contrast, exogenously driven saccades are supposed to be under bottom-up control, meaning that the saccades are initiated involuntarily based on the visual saliency of a distractor. Therefore, to investigate exogenously driven saccades in the lab, one has to present task-irrelevant distractors that induce an error saccade irrespective of task demands. In other words, the distractor needs to interfere with ongoing goals or tasks and capture attention or the eyes based solely on its visual saliency. For example, exogenous saccades can be investigated with a visual search paradigm in which one has to saccade to a less salient target stimulus while a more salient distractor stimulus is simultaneously presented. Due to the visually saliency, the distractor captures the eyes on a subset of the trials and thus induces an exogenously driven saccade (Godijn & Theeuwes, 2002).

It has been suggested that top-down goal or behavioral relevant information and bottom-up visual saliency information are integrated in a so-called priority map to direct attentional selection (Fecteau & Munoz, 2006). The priority map has a topographic representation and is supposed to develop spatial representations of objects and locations in the environment that are required for saccade programming in the superior colliculus (SC; Chelazzi & Corbetta, 2000). The SC has a retinotopic representation of the environment in order to determine where and when to make a saccade and is sometimes referred to as the common saccade map (Godijn & Theeuwes 2002; Meeter, Van der Stigchel, & Theeuwes, 2010; Munoz, Dorris, Pare, & Everling, 2000; Trappenberg, Dorris, Munoz, & Klein, 2001). According to the competitive integration model of saccade programming (Godijn & Theeuwes, 2002), a saccade is executed to a location as soon as activation of that location in the saccade map reaches a certain threshold. This activation can be induced by top-down relevant information, bottom-up saliency-driven information, or both (Fecteau & Munoz, 2006).

Recently, a new framework of attentional control has been suggested. In this framework, stimuli are either selected based on ongoing goals and task demands, based on visually saliency, or based on selection history (Awh, Belopolsky, & Theeuwes, 2012). Selection history refers to stimuli that have gained saliency due to past experience, and capture attention irrespective of goals and visual saliency. For example, if a specific stimulus has been selected on a previous trial, it can capture attention on the current trial, as in priming (Maljkovic & Nakayama, 1994). Likewise, if a specific stimulus has previously been associated with a reward, it can capture attention on the current trial irrespective of ongoing task demands or visual saliency (Della Libera & Chelazzi, 2006). Importantly, these stimuli are supposed to be under bottom-up control. It is entirely possible that the way emotional stimuli affect attentional selection also eludes the dichotomy between exogenous and endogenous attention processes in a similar manner as in selection history.

In the current review, I will refer to endogenous attention when a saccade is directed to a stimulus based on task demands and to exogenous attention when a saccade is directed to a stimulus based on visual saliency. Similar to the integration of task relevant (top-down) and visually salient information (bottom-up), emotional information may integrate with top-down and/or bottom-up information in the priority map (Belopolsky, 2015). Moreover, when an emotional stimulus is neither task relevant nor visually salient but does affect attentional selection, I will refer to emotional attention.

Emotional stimuli and neural processing

A stimulus is considered an emotional stimulus when it induces an emotional response, such as action tendencies, bodily responses, behavioral responses, and a change in subjective feeling (Brosch, Pourtois, & Sander, 2010). As previously mentioned, in emotion attention research, a variety of emotional stimuli is used, which ranges from complex pictures of social scenes to color stimuli that are associated with reward or punishment. Consequently, stimuli differ in the intensity of the emotion they elicit (Moors, 2009), and emotional stimuli can trigger a positive or negative response. Possibly, a different level of arousal and a different valence will influence attention mechanisms in their own specific way (Mather & Sutherland, 2011). For example, in modulation of attention by reward, the basal ganglia and dopaminergic projections from midbrain areas play a key role, whereas in modulation of attention by threat, the amygdala plays a key role (Vuilleumier, 2005, 2015), although the amygdala is also known to be involved in prioritization of positive emotional stimuli (Peck, Lau, & Salzman, 2013, see for review Sander, Grafman, & Zalla, 2003). In addition, reward learning may be more related to motivational processes than to emotional processes (see for review Bourgeois, Chelazzi, & Vuilleumier, 2016). Because of this, I will mainly focus on studies that have used negative emotional stimuli with different levels of arousal, ranging from the presentation of a schematic angry face to a colored circle that signals the possibility of receiving an electric shock.

With regard to the amygdala, it has been suggested that a subcortical phylogenetically old pathway, which projects from the retina to the superior colliculus and via the pulvinar to the amygdala, is specifically involved in visual processing of threatening stimuli (Liddell et al., 2005; Lipp & Waters, 2007; Morris, DeGelder, Weiskrantz, & Dolan, 2001; Morris, Öhman, & Dolan, 1999; Öhman & Mineka, 2001). This retinotectal pathway is mainly magnocellular, meaning that it is involved in processing coarse visual features, low spatial frequencies, motion, and luminance transients (Schiller, Malpeli & Schein, 1979). For example, it has been demonstrated that the amygdala responds more strongly to emotional expressions presented in low spatial frequency than in high spatial frequency (Vuilleumier, Armony, Driver, & Dolan, 2003). Amygdala activation even differs between subliminally presented wide eyes (a feature of a fearful expression) and narrow eyes (a feature of a happy expression), possibly due to the visual characteristics of these low and high spatial frequency stimuli (Whalen et al., 2004).

Therefore, it is important to take into account that processing of faces with an emotional expression, which may be biologically prepared, and processing of complex pictures of social scenes may not be mediated in a similar fashion and thus affects attentional selection differently, even though both stimuli are viewed as emotional stimuli (e.g., Ekman & Friesen, 1975; Lundqvist, Esteves, & Öhman, 1999; Vuilleumier, 2002). For example, in an fMRI study by Hariri, Tessitore, Mattay, Fera, and Weinberger (2002), it was shown that the amygdala responded more strongly to fearful and threatening emotional expressions than to negative IAPS pictures. The authors suggested that this might be due to the difference in biological value of the different stimulus sets. They reasoned that faces have an intrinsic biological relevance for all individuals, whereas the IAPS pictures may have different biological values for each individual. They also found a differential laterality effect of the amygdala. That is, the left amygdala responded more strongly to IAPS pictures, whereas the right amygdala responded more strongly to faces. The authors suggested that this laterality effect may stem from enhanced cognitive processing when viewing complex IAPS pictures, possibly requiring language processing. Nevertheless, although there may be differences in neural processing of facial expressions and other emotional pictures, at the behavioral level the differences are possibly less evident. For example, in a meta-analysis of exogenous covert attention to emotional expressions of faces and to emotional scenes, no behavioral differences were found (Carretié, 2014).

Besides the specific role of the amygdala in processing emotional pictures, the amygdala is also known to be essential in processing fear-conditioned stimuli (Davis & Whalen, 2001; LeDoux, 2000; Sehlmeyer et al., 2009). In fear conditioning, a stimulus, either a biologically relevant stimulus, such as a face (e.g., Armony & Dolan, 2002), or a biologically neutral stimulus, such as a specific color (e.g., Mulckhuyse & Dalmaijer, 2016), is associated with an aversive event, such as a loud noise or an electric shock to the finger (see Fig. 1a). Subsequently, these stimuli are presented in an experiment in which the conditioned stimulus (CS+) is threatening because it signals a possible upcoming aversive event, whereas the other stimulus signals safety (CS−; see Fig. 1b). With respect to using fear-conditioned, biologically neutral stimuli, these stimuli have the benefit of having no prepared associations, and they are fully controlled for any differences in visual features, which is more difficult with, for example, IAPS pictures or emotional expressions of faces (Calvo & Nummenmaa, 2011).

Fig. 1
figure 1

a Example of a fear conditioning procedure in which the red circle is associated with an electric shock, and the green circle is not. Subsequently, these stimuli are presented in an experiment. b Example of an emotional variant of the oculomotor capture paradigms with fear conditioned stimuli (Mulckhuyse & Dalmaijer, 2016). Participants make a speeded saccade to the target (cross) while at the same time an additional singleton distractor is presented. The distractor can either be absent (left panel), a CS+ distractor (middle panel), or a CS− distractor (right panel). Colors are counterbalanced between subjects. (Color figure online)

Attentional capture and oculomotor control

In the next section, I will discuss studies that investigated the influence of covert attentional capture by emotional stimuli on oculomotor control. With paradigms, such as forced choice, spatial cueing, visual search, and pro saccade tasks, it has been shown that attentional capture by emotional stimuli affect saccade latency and saccade trajectory.

Saccade latency in forced choice

In a forced-choice task by Bannerman et al. (Bannerman, Milders, de Gelder, & Sahraie, 2009; Bannerman, Milders, & Sahraie, 2009), emotional and neutral stimuli were presented left and right of fixation. Participants were instructed to make a speeded saccade to either the emotional or the neutral stimulus, while saccade latency was measured. The latency of the saccade is defined as the time it takes to execute a saccade after target onset. The emotional stimuli in these studies consisted of pictures or schematic faces with a fearful, angry, or happy expression (Bannerman, Milders, & Sahraie, 2009) or pictures of faces with a fearful expression and fearful body postures (Bannerman, Milders, de Gelder, et al., 2009) presented in a pair with their neutral counterpart. Note that in this set-up, the saccades are driven by task demands and are therefore endogenously driven saccades. The pairs were either presented very briefly (20 ms) or for a relatively long time (500 ms; Bannerman, Milders, de Gelder, et al., 2009; Bannerman, Milders, & Sahraie, 2009, Experiment 4). Findings from both studies demonstrated that when the pairs were presented very briefly, saccade latencies directed to the emotional stimulus were shorter than to the neutral stimulus. When the pairs were presented for a longer period of time (500 ms), they found inconsistent results (see also Nummenmaa, Hyona, & Calvo, 2006, Experiment 2). These results indicate that endogenously driven saccades are facilitated, that is, initiated faster to an emotional stimulus than to a neutral stimulus. Note, that the inconsistent results for a longer stimulus presentation might suggest that the prioritization of emotional stimuli only occurs when participants are provided a short period of time to process the stimuli, when provided more time, the effects of emotion on endogenously driven saccades is less pronounced.

Saccade latency in spatial cueing

In the exogenous emotional spatial cueing task, either an emotional cue or a neutral task-irrelevant cue is briefly presented as a sudden onset to the left or right of fixation. Subsequently, a neutral target to which participants have to make a saccade is presented at either the cued location or at the opposite location. Bannerman, Milders, and Sahraie (2010a, 2010b) used this set-up to investigate the influence of fearful expressions (2010a) and fearful body postures (2010b) on saccadic behavior with different cue–target stimulus onset asynchronies (SOAs; 20 ms to 100 ms). That is, the cue duration varied while the SOA between cue offset and target onset was always zero. Only in trials with a very briefly presented cue (≤40 ms), they found decreased saccade latency when a saccade was directed toward the valid location previously cued with the emotional stimulus relative to the neutral stimulus. Moreover, in invalidly cued trials, latency of saccades directed toward the opposite location of the briefly presented cue was longer when the cue was emotional than when it was neutral. Bannerman et al. (2010a, 2010b) suggested that these findings reflect stronger automatic covert attentional capture by the emotional stimulus, which subsequently either facilitated the saccade when directed toward the same location or interfered with the saccade when directed toward the opposite location. However, to what extent attention was captured only by the emotional content of the stimuli in these designs, is difficult to determine. It is known that sudden onsets, or briefly flashed stimuli are extremely visually salient and are known to capture attention in a bottom-up manner (Theeuwes, 1994; Yantis & Jonides, 1984). Therefore, any effect of emotion on attentional capture in the spatial cueing task of Bannerman et al. (2010a, 2010b) must have interacted with exogenous attention. It is possible that the sudden onset of the cue captured exogenous attention and the emotional content of the cue facilitated or enhanced subsequent visual processing of the target stimulus (Phelps, Ling, & Carrasco, 2006), thereby decreasing saccade latency when the target was presented at the similar location as the cue.

In a study with a different set-up, Nummenmaa, Hyona, and Calvo (2009) precluded that the emotional stimulus was attended endogenously due to task demands or exogenously due to its visually saliency. In this emotional spatial cueing paradigm, participants were endogenously (arrow at fixation) or exogenously (flash in the periphery) cued to saccade to the left or to the right, while at different SOAs (−150 ms, 0 ms, 150 ms) task irrelevant emotional and neutral IAPS pictures were bilaterally presented. Note that the emotional stimuli were task irrelevant, and, in addition, the pictures were presented on both sides of fixation, which prevented attention from being captured by one side of the display due to the visual onset, as was the case in the study by Bannerman et al. (2010a, 2010b). The cues, whether presented as an arrow at fixation or as a flash in the periphery, indicated the location toward which a saccade had to be made. Irrespective of SOA and type of cueing, saccade latency decreased when a saccade was directed toward the location of the emotional stimulus. This means that the emotional information at the cued location, even though task irrelevant and not visually salient, facilitated a saccade to its location. This is similar to the findings with emotional facilitation of endogenously driven saccades (Bannerman, Milders, de Gelder, et al., 2009; Bannerman, Milders, & Sahraie, 2009) and exogenously driven saccades (Bannerman et al., 2010a, 2010b).

Akin to Nummenmaa et al. (2009), Schmidt, Belopolsky, and Theeuwes (2015, 2017) used pairs of cue stimuli (left and right of fixation) to avoid exogenous attentional capture by the emotional stimulus based on visual saliency. Moreover, in this study, the task irrelevant cue stimuli were briefly presented and no longer displayed when a saccade was initiated. The cues consisted either of a pair of a fear-conditioned colored diamond (CS+) and a different colored diamond (CS−), or of a pair of two differently colored diamonds (neutral condition). Subsequently, after 50 ms (Schmidt et al., 2015), or after longer SOAs of 600 ms and 1,000 ms (Schmidt et al., 2017), an endogenous cue, an arrow at fixation, indicated the target location. Results from these studies showed that with a short SOA, saccades directed to the location previously occupied by a threatening cue were faster than were saccades directed to a neutrally cued location (Schmidt et al., 2015, 2017, Experiment 1). Moreover, with a longer SOA of 600 ms and 1,000 ms, the facilitation effect was still found. The authors suggested that attention remained allocated at the location previously cued with the threatening stimulus, thereby decreasing saccade latency to that location. In addition, saccades directed toward the opposite location of the threatening cue were slower than to a neutrally cued location when SOA was short (Schmidt et al., 2015, 2017). These findings, facilitation at longer SOAs and interference at the short SOA were interpreted as consistent with the delayed disengagement theory, which states that emotional stimuli hold attention longer (Fox, Russo, Bowles, & Dutton, 2001; Fox, Russo, & Dutton, 2002).

Delayed disengagement from emotional stimuli was also found in an endogenous cueing study in which the cue at fixation was a schematic face that could either be emotional or not (Belopolsky, DeVue, & Theeuwes, 2011). The expression of the face was task irrelevant, but the tilt of the faces indicated the location toward which a saccade had to be made. Despite the irrelevance of the emotional expression, saccade latency was increased when the emotional expression was angry, suggesting that it held attention longer. Likewise, in the study by Schmidt et al. (2017), attention lingered longer on the location previously cued with a threatening rather than a neutral stimulus, as indicated by a facilitation of saccades even at longer SOAs.

In line with the premotor theory of attention (Rizzolatti et al., 1987) and similar to Bannerman et al. (Bannerman, Milders, de Gelder, et al., 2009; Bannerman, Milders, & Sahraie, 2009, 2010a, 2010b), Schmidt et al. (2015, 2017) suggested that the threatening cue captured attention automatically and thereby facilitated the execution of a saccade toward its location and impaired the execution of a saccade toward the opposite location. Nevertheless, although the stimuli in this spatial cueing set-up did not capture attention exogenously because they were presented at both sides of fixation, the conditioned cues (CS+ and CS−) were also not entirely task irrelevant. Namely, the CS+ not only signaled threat but it also meant that, in that particular trial, one could avoid a shock by reacting quickly. Therefore, these findings do not demonstrate bottom-up-driven attentional capture by threat but rather demonstrate the facilitating effect of a threatening task-relevant stimulus on voluntarily oculomotor programming.

Saccade latency in visual search

In contrast to these spatial cueing studies in which facilitation or interference by emotional stimuli on saccade latency was found, visual-search tasks with emotional expressions found no evidence for emotional modulation on saccade latency. For example, in a study by DeVue and Grimshaw (2017), participants were asked to make a speeded saccade to a uniquely colored target stimulus among distractors in a circular array. In a concentric circular array inside the stimulus array, pictures of irrelevant objects were presented. One of the pictures consisted of a face with a neutral or an angry expression or a butterfly, whereas the other pictures were photographs of inanimate objects. The location of the critical picture could either match the target location or not. Note that in this set-up the emotional and nonemotional distractor stimuli were neither visually salient nor task relevant. Although the results showed decreased latency for saccades directed to targets at the location of the faces, suggesting facilitation due to attentional capture, there was no additional effect of the emotional expression of the face. DeVue and Grimshaw (2017) concluded that emotional stimuli do not capture attention in a bottom-up manner and therefore do not affect early visual selection processes.

Likewise, in a study by Hunt, Cooper, Hungr, and Kingstone (2007a), in which the emotional expression was task relevant, no facilitation of saccades to emotional expressions was found. In this study (Hunt, Cooper, et al., 2007a), participants had to saccade to an upright happy or angry schematic face or to an inverted happy or angry schematic face among neutral distractor faces in a circular array. In half of the trials, one of the distractor faces had the opposite emotional expression (angry or happy) and was presented upright or inverted. Note that in the inverted condition, the target was defined by the configuration of the schematic face, whereas in the upright condition, the target was defined by the emotional expression. But even though participants searched for a particular emotional expression, it did not decrease saccade latency relative to search for the inverted target. However, Hunt et al. did find evidence for attentional holding by the emotional expressions when they were the upright distractor. Saccade latency of correct saccades increased in the presence of an upright emotional distractor relative to an inverted emotional distractor. Nevertheless, when task instructions changed and the emotion was no longer task relevant, this effect disappeared. The authors suggested that their findings indicate that a top-down search strategy for emotion was employed in the first instance and, consequently, the findings argue against the idea of automatic attentional grabbing and holding of emotional stimuli.

In contrast to these visual search studies with emotional expressions, visual search studies with fear-conditioned distractor stimuli have found evidence for attentional capture and holding. For example, several studies have used a modification of the oculomotor capture paradigm (Theeuwes et al., 1998). In this paradigm, participants are instructed to make a speeded saccade to a target presented among distractors in a circular array. Simultaneously with the appearance of the target, a sudden onset distractor is presented. Typically, results show that saccade latency is increased in the presence of the distractor, suggesting that attention is first covertly shifted to the salient distractor before it moves on to the less salient target and a subsequent saccade can be initiated. In the emotional variant of the paradigm, the sudden onset distractor can either be a threatening distractor (CS+) or a nonthreatening distractor (CS−; see Fig. 1b). With this paradigm, it was found that endogenously driven saccades to the target are slowed more by a threatening distractor than by a nonthreatening distractor (Hopkins, Helmstetter, & Hannula, 2016; Mulckhuyse & Dalmaijer, 2016). These results indicate that the threatening distractor captured covert attention automatically and thereby delayed the programming of a saccade to a different location. Nevertheless, note that the distractors in the studies described above, although completely task irrelevant, were presented as sudden onsets distractors. As discussed before, sudden onsets are extremely visually salient (Yantis & Jonides, 1984) and are known to capture attention in a bottom-up manner. The effect of emotion on attentional capture with this paradigm is again due to the integration of visually salient and emotional information. The question of whether these fear-conditioned stimuli would capture attention beyond their physical saliency is not answered in these studies.

It is possible that a nonvisually salient threatening distractor may not capture and hold attention. For example, in a study with a slightly different modification of the additional singleton paradigm (Nissens, Failing, & Theeuwes, 2017), it was shown that a nonvisually salient fear-conditioned distractor did not influence saccade latency of correctly executed saccades to the target. In this study, stimuli of different colors were presented in a circular array, and the target was defined by its shape. Participants were informed that the presence of one particular color of the distractors signaled whether or not they could receive a shock, whereas the presence of another particular color signaled they were safe. Although the nonvisually salient distractor did not increase saccade latency to the target relative to the safe distractor, it is worth noting that participants were informed that the shock would be administered if they were too slow in fixating the target. Most likely, in trials in which saccades were performed correctly and successfully (that is, fast enough to avoid a shock), attentional capture by the threatening distractor could be inhibited.

Saccade trajectory

Studies that investigated the trajectory of a saccade have shown clear evidence for emotional modulation of oculomotor control. In general, in these studies, participants are asked to saccade to a target stimulus while a distractor stimulus is presented alongside the saccadic path. Typically, short latency saccades (<200 ms) curve toward a distractor, whereas long latency saccades (>200 ms) curve away from a distractor (McSorley et al., 2006). The former process is explained by averaged activity of the distractor and target location in the saccade map resulting in an eye movement deviating toward the distractor. The latter process is the result of strong inhibition of the distractor location in the saccade map, resulting in an eye movement deviating away from the distractor (Tipper, Howard, & Jackson, 1997; Van der Stigchel et al., 2006). With fear-conditioned distractors, it was shown that the early activation process as well as the later inhibitory process were modulated by threat (Mulckhuyse et al., 2013). Short saccade latencies curved more toward the threatening distractor than toward the nonthreatening distractor, and long latency saccades showed a greater curve away from the threatening distractor than from the nonthreatening distractor. Saccade deviation to a negative emotional stimulus has also been shown with pairs of IAPS pictures presented to the left and right of the target stimulus (McSorley & van Reekum, 2013). Similar to Mulckhuyse et al. (2013), saccades deviated more toward the negative emotional stimulus when saccade latency was short. Short saccade latencies in these studies were elicited by a gap paradigm (McSorley et al., 2006, 2009), in which the fixation cross was removed before target onset. In addition, note that the target stimuli in these studies were also presented as sudden onsets. When no gap manipulation is implemented and the target location is indicated by an arrow at fixation, it typically takes longer to initiate a saccade. With these longer latency saccades, deviation away from an emotional stimulus was found in a study with pairs of face and house stimuli presented left and right of the saccade path (Schmidt, Belopolsky, & Theeuwes, 2012). Schmidt et al. (2012) found that saccades deviated more strongly away from an angry face picture than from a neutral face picture. Likewise, saccades deviated more strongly away from negative IAPS pictures than from neutral pictures that were presented next to the start point of the saccade in a study by Nummenmaa et al. (2009). However, in the latter study, the trajectory of the saccades was only modulated if the pictures were presented 150 ms prior to the target stimulus. When the emotional picture was presented together with the target stimulus, there was no effect. This is possibly due to the spatial configuration of the display. For example, in a slightly different set-up, West, Al-Aidroos, Susskind, and Pratt (2011) investigated saccade deviation induced by a neutral distractor along the saccade path due to an emotional stimulus presented at fixation. They were specifically interested in spatial or temporal inhibitory effects due to the emotional stimulus at fixation, reasoning that an emotional stimulus would activate the subcortical pathway and therefore influence oculomotor behavior. They found no effect on saccade trajectory, but saccade latency was decreased when the emotional stimulus was presented 200 ms before target onset. They suggested that an emotional stimulus at fixation may only influence temporal dynamics of saccade programming but not spatial dynamics. They argued that temporal modulations are modulated by reciprocal subcortical connections between areas involved in emotion processing and saccade execution (SC). And because there are no reciprocal cortical connections between areas involved in emotion processing and saccade inhibition (FEF), responsible for spatial modulation, they found no effect of emotion on spatial dynamics.

However, the lack of a finding on spatial effects in the study by West et al. (2011) may probably be due to the configuration of the display. When an emotional stimulus is presented in the periphery along the saccade path, it does influence spatial dynamics (McSorley & van Reekum, 2013; Mulckhuyse et al., 2013; Schmidt et al., 2012), but not temporal dynamics of oculomotor programming. In particular, in all studies that found an effect on saccade trajectory, no effect was found on saccade latency. That is, latency of saccades in the presence of an emotional or nonemotional distractor in the periphery were similar. Therefore, the stronger deviations toward and away from an emotional distractor do not indicate that the emotional stimulus captured attention faster or held attention longer than a neutral stimulus, but rather that the emotional distractor was a stronger competitor than the neutral one. Because the emotional distractor competed more strongly than the nonemotional distractor with the target within the saccade map, the trajectory of that saccade was modulated more strongly.

Summary and discussion: Attentional capture and oculomotor control

Most studies that investigated the effect of attentional capture by an emotional stimulus on oculomotor behavior found faster saccades to the emotional stimulus, reflecting enhanced attentional capture (Bannerman, Milders, de Gelder, et al., 2009; Bannerman, Milders, & Sahraie, 2009, 2010a, 2010b; Nummenmaa et al., 2009; Schmidt et al. 2015, 2017), slower saccades away from an emotional stimulus, reflecting delayed attentional disengagement (Bannerman et al., 2010a, 2010b; Belopolsky et al., 2011; Hopkins et al., 2016; Mulckhuyse & Dalmaijer, 2016; Schmidt et al., 2015, 2017), stronger deviations toward an emotional distractor (McSorley & van Reekum, 2013; Mulckhuyse et al., 2013), or away from an emotional distractor (Mulckhuyse et al., 2013; Nummenmaa et al., 2009; Schmidt et al., 2012). However, because of the set-up of most studies, emotional attention is never isolated from endogenous and exogenous attention, but rather most studies show an integration of emotion with endogenous or exogenous attentional processes (see also Brosch, Pourtois, Sander, & Vuilleumier, 2011). Exogenous attentional capture, most especially, seems to be enhanced by emotion (Carretié, 2014). For instance, studies that did find an effect of emotion often presented the emotional stimulus as a sudden onset (Bannerman, Milders, de Gelder, et al., 2009; Bannerman, Milders, & Sahraie, 2009, 2010a, 2010b; Hopkins et al., 2016; Mulckhuyse & Dalmaijer, 2016; Schmidt et al., 2015, 2017), whereas in visual search studies that did not find an effect of emotion, the stimulus display including the target and distractor were presented together and for relatively long durations (DeVue & Grimshaw, 2017; Hunt et al., 2007a). In other words, there was no additional luminance transient present together with the emotional signal. Indeed, it has previously been shown that stimulating the magnocellular pathway with a fear-conditioned luminant low-spatial-frequency grating enhances visual processing of this grating, whereas stimulating the parvocellular pathway with a fear-conditioned isoluminant (chromatic) high-spatial-frequency grating does not (Keil, Miskovic, Gray, & Martinovic, 2013), suggesting that visual saliency boosts the emotional signal. This view is consistent with a recently proposed model of exogenous attention to emotional stimuli which states that exogenous attention to emotional stimuli relies strongly on the magnocellular system (Carretié, 2014). More evidence comes from the finding that especially emotional stimuli presented as luminance transients affect the oculomotor system. For example, in the studies by Bannerman and colleagues (Bannerman, Milders, de Gelder, et al., 2009; Bannerman, Milders, & Sahraie, 2009, 2010a, 2010b), the oculomotor version was contrasted with a manual version in which participants responded manually to the location of the target. Whereas saccade latencies were specifically affected by short presentations of threat (≤40 ms), manual responses were affected by longer presentations of threat (>100 ms). Therefore, the authors suggested that briefly presented threat signals would be preferentially processed by a subcortical route. This is in line with an evolutionary explanation of threat detection which suggest that threat is processed automatically via a retinotectal pathway including SC, pulvinar, and amygdala (LeDoux, 2000; Öhman & Mineka, 2001). If threat and luminance transients are both preferentially processed via this subcortical pathway, it could affect oculomotor programming in the SC more directly than, for instance, via cortical feedback from parietal or frontal areas. In the last section, I will elaborate more on the role of luminance processing in relation to emotional stimuli and oculomotor programming.

Oculomotor capture

The most obvious demonstration of bottom-up attentional capture by an emotional stimulus would be if gaze were directed involuntarily to an emotional stimulus that did not additionally elicit an endogenous shift of attention due to task demands or an exogenous shift of attention due to visually saliency. In other words, to demonstrate that emotional stimuli capture attention independently from visual saliency or task demands, the emotional stimulus should capture the eyes purely based on its emotional saliency. However, to my knowledge, no study has yet demonstrated oculomotor capture with an emotional distractor that is not visually salient or irrelevant for the task. Nevertheless, in the next section, I will discuss results that showed involuntarily oculomotor capture by an emotional stimulus, although all emotional stimuli in these designs are to some extent visually salient or task relevant.

Oculomotor capture in forced choice and spatial cueing

In the forced-choice tasks by Bannerman et al. (Bannerman, Milders, de Gelder, et al., 2009; Bannerman, Milders, & Sahraie, 2009) and Nummenmaa et al. (2006, Experiment 2), in which participants were instructed to either saccade to the emotional or to the neutral picture, the results showed that initial saccades were directed more often to the emotional pictures in both task conditions. That is, whether the instruction was to saccade to the emotional or to the neutral picture, the emotional picture captured the eyes more often. Likewise, in the spatial cueing study by Nummenmaa et al. (2009), in which the emotional and neutral pictures were presented bilaterally and were task irrelevant, more error saccades were directed to the location of the emotional picture when instructed (endogenously or exogenously) to saccade to the opposite location, suggesting bottom-up driven oculomotor capture by emotional stimuli.

In an antisaccade task by Kissler and Keil (2008), participants were asked to make a pro-saccade towards or an antisaccade away from a neutral, positive or negative IAPS picture. Results showed that participants made more errors in the antisaccade condition when an emotional picture (negative or positive) was presented. However, more errors to emotional stimuli were observed only in the so-called fixation gap condition in which the fixation cross was extinguished before target onset. By adopting a temporal gap in an oculomotor paradigm, fixation neurons in the superior colliculus are disinhibited (Munoz & Wurtz, 1992). When fixation neurons are disinhibited, it becomes more difficult to suppress a reflexively driven saccade towards the onset stimulus (Munoz & Everling, 2004). Therefore, in the study by Kissler and Keil (2008), oculomotor control was more difficult in the presence of emotional information because participants were less able to suppress a saccade towards the emotional onset stimulus than to the neutral stimulus.

The results from these studies described above seem to indicate that emotional stimuli indeed affect early attentional selection by capturing the eyes involuntarily. However, the emotional stimuli in these studies are all to some extent task relevant. That is, in the forced choice task (Bannerman, Milders, de Gelder, et al., 2009; Bannerman, Milders, & Sahraie, 2009; Nummenmaa et al., 2006, Experiment 2), the content of the stimulus indicated the location to which a saccade had to be made and in the antisaccade task (Kissler & Keil, 2008), the location of the stimulus indicated the direction of the saccade. Therefore, in both designs the stimulus needs to be attended covertly in order to perform the task. Thus, even though these saccades reflect oculomotor capture, the error saccades are not purely exogenously driven.

In the emotional version of the additional singleton paradigm (Theeuwes et al., 1998), in which an additional sudden onset distractor is presented among distractors and a target in a circular array, the effect of emotion on exogenous saccades was shown. In these studies, the threatening distractor captured not only covert attention but also the eyes more often than a nonthreatening distractor (Hopkins et al., 2016; Mulckhuyse & Dalmaijer, 2016). This was demonstrated even when participants were not explicitly aware of the contingency between the distractor and the threat (Hopkins et al., 2016). Other studies corroborate the findings that fear-conditioned distractors increase exogenously driven error saccades. For instance, in the study by Mulckhuyse et al. (2013), in which a distractor was presented next to the saccade path, the threatening distractor not only modulated the trajectory but also captured the eyes more often than the nonthreatening distractor.

Besides the involuntary nature of exogenous saccades, a typical finding of exogenous saccades is that they are executed very rapidly after stimulus onset. The mean latency of the error saccades to the threatening and nonthreatening distractor in the study by Mulckhuyse and Dalmaijer (2016) was indeed very short (<200 ms), suggesting exogenous oculomotor capture. However, saccade latency was not affected by threat.

In the study by Nissens et al. (2017), the fear-conditioned distractor was not visually more salient than the other distractors, but it was , relevant for the task because one had to be fast in this trial to avoid a shock. Nevertheless, even though this warning would most likely prevent looking at the distractor, the threatening distractor captured the eyes more often. In addition, in contrast to Mulckhuyse and Dalmaijer (2016), Nissens et al. (2017) found that more short latency saccades were directed toward the threatening distractor than to the nonthreatening distractor. Nevertheless, the fastest error saccades to the threatening distractor were not any faster than to the nonthreatening distractor. Moreover, the shortest saccade latencies to the distractors were relatively long, that is, over 200 ms, whereas most latencies of purely stimulus-driven saccades are typically shorter than 200 ms (Godijn & Theeuwes, 2002; Mulckhuyse & Dalmaijer, 2016; Trappenberg et al., 2001). In addition, the CS+ distractor signaled not only threat, it also meant that in that particular trial, one could avoid a shock by being fast. Therefore, the emotional stimulus was to some extent relevant to the task, and the findings do not reflect pure automatic oculomotor capture by threat.

In contrast to studies with fear-conditioned distractors, the visual-search studies with emotional expressions (DeVue & Grimshaw, 2017; Hunt et al., 2007a) did not find any evidence of more oculomotor capture by emotional expressions. Moreover, in the study of DeVue and Grimshaw (2017), the latency of the error saccades to the faces with a neutral as well as an emotional expression were all below 200 ms. However, similar to Mulckhuyse and Dalmaijer (2016), saccade latency of these oculomotor capture trials was not modulated by emotional expression of the faces.

In sum, although threatening distractor stimuli possibly affect the oculomotor system early in time (Nissens et al., 2017), a modulation of the very short latency saccades (<200 ms) has not yet been found (DeVue & Grimshaw, 2017; Mulckhuyse & Dalmaijer, 2016). A possible explanation might be a ceiling effect for saccades initiated under 200 ms. Alternatively, the lack of a finding on latency could imply that a visually salient threatening distractor is not processed faster than a visually salient nonthreatening distractor, as it does not reach the saccade map any faster. Research manipulating saccade latency—for example, with a gap paradigm (McSorley et al., 2006, 2009)—could give more insight into these early temporal effects of emotion on exogenously driven saccades. Furthermore, in visual search with emotional expressions, fearful expressions should be included as they might yield different results. Note, that in the visual-search experiments with faces described above, only angry expressions were used as a negative emotional distractor. It is possible that a fearful expression, which shows more eye white, would affect the oculomotor system more strongly (see for reviews on facial expression and covert attention Frischen, Eastwood, & Smilek, 2008; Vuilleumier, 2002). The emotional value of a fearful expression is preferentially processed via low spatial frequencies and similar to luminance transients, low spatial frequencies are processed via the magnocellular pathway (Vuilleumier et al., 2003), which provides the most dominant subcortical input in the SC, controlling oculomotor behavior (Lang, Ghman, & Vaitl, 1988) (Table 1).

Table 1 Description and main results of the described studies investigating eye movement behavior in response to emotional threatening stimuli

Neural mechanisms involved in emotional modulation of oculomotor behavior

Most research that investigated the neural mechanisms involved in the control of oculomotor behavior focused on bottom-up visual saliency and top-down goal relevance information to explain oculomotor behavior. The SC, or more specifically the superficial layers of the SC, are supposed to be involved in saliency detection (White, Kan, Levy, Itti, & Munoz, 2017), whereas a more extended network, including parietal and frontal areas, are involved in signaling goal-relevant information (Schall, 1995). As discussed earlier, bottom-up and top-down information integrate in a so-called priority map (Fecteau & Munoz, 2006), which may be situated in a distributed network including SC, and parietal and frontal areas, such as the lateral interpariatal area in monkeys (Bisley & Goldberg, 2010), homologue to the intraparietal sulcus (IPS) in humans, and frontal eye fields (FEF’ Schall, 2002). The priority map may also be confined to the intermediate layers of the SC (White et al., 2017). The intermediate layers of the SC are associated with multisensory processing and visual-motor-related processes for shifting attention and gaze (Krauzlis, Lovejoy, & Zenon, 2013). Whereas the superficial layers receive only visual input—either directly from the retina or indirectly from visual cortex—the intermediate layers also receive input from the parietal and frontal areas and the basal ganglia, allowing for integration of bottom-up and top-down information (Krauzlis et al., 2013). A key structure involved in processing threatening information is the amygdala (Davis & Whalen, 2001), which receives highly processed visual information along the ventral pathway as well as coarse visual information from subcortical input (Vuilleumier, 2002, 2005). Although amygdala activity is associated with fast detection of threat (LeDoux, 2000; Öhman & Mineka, 2001), activity in the amygdala itself does not shift gaze. Therefore, if we assume that it is amygdala activity that modulates oculomotor behavior, how would this occur?

One possibility is through a subcortical loop involving the SC, the pulvinar and amygdala (see Fig. 2, red line). Indeed, in rodents, primates, and humans, it has been shown that visual projections connecting SC with the amygdala through the pulvinar exist (Day-Brown, Wei, Chomsung, Petry, & Bickford, 2010; LeDoux, 1998; Linke, De Lima, Schwegler, & Pape, 1999; Tamietto, Pullens, de Gelder, Weiskrantz, & Goebel, 2012). This pathway is mainly magnocellular and would therefore explain the findings that especially threatening stimuli which stimulate magnocellular processing—such as luminance transients (fear-conditioned stimuli presented as onsets) and low spatial frequency information (fearful expression of faces)—modulate the oculomotor system. This interpretation suggests that oculomotor control is modulated by emotional stimuli in a bottom-up manner. Moreover, this would indicate that the pulvinar and amygdala, besides the superficial layers of the SC, are part of an emotional saliency map.

Fig. 2
figure 2

Simplified model of the oculomotor system including the amygdala and three possible pathways by which the amygdala may modulate oculomotor behavior. In red, a subcortical loop, connecting the amygdala with the SC through the pulvinar. In blue and green, cortical connections by which the amygdala may amplify sensory processing in visual areas (blue), or by which the amygdala enhances processing of coarse visual features together with frontal areas. (Color figure online)

However, the idea of a direct retinotectal pathway to the amygdala has been heavily debated, and, so far, clear-cut evidence for this functional processing via this pathway in humans is lacking (Pessoa & Adolphs, 2010). Another possible mechanism by which amygdala activity modulates the oculomotor system would be via cortical connections. The amygdala has reciprocal connections with many visual areas as well as with frontal areas. It is possible that amygdala activity amplifies sensory processing of threatening stimuli in the visual cortex, which is then back projected to the SC (see Fig. 2, blue lines). However, recently it has been shown that neurons in the superficial layers of the SC detect visual saliency before neurons in V1 do (White et al., 2017). Consequently, in this scenario, emotional saliency would be detected later than visual saliency, which would contradict any evolutionary explanation suggesting that, above all, threat needs to be prioritized. Other cortical projections from the amygdala to frontal areas, such as the orbitofrontal cortex (OFC), could play a role in the emotional modulation of attentional selection (see Fig. 2, green lines). The OFC has been implicated in fast recognition of a “gist” of a scene based on feedforward projections processing low spatial frequency information (Bar, 2003; Barbas, 2015; Kveraga, Boshyan, & Bar, 2007), possibly facilitating detection of threat (Blair, Morris, Frith, Perrett, & Dolan, 1999; Pourtois et al., 2013; Rempel-Clower, 2007; Timbie & Barbas, 2015). This would be consistent with the hypothesis that threatening stimuli are preferentially processed via fast magnocellular pathways (Vuilleumier et al., 2003). Accordingly, the amygdala would then modulate the oculomotor system via frontal areas including OFC.

Most possibly, the amygdala is not the only source that detects emotional saliency and modulates selection (Pessoa & Adolphs, 2010; Vuilleumier, 2015). Moreover, modulation of oculomotor programming may occur in multiple ways and through multiple pathways (Vuilleumier, 2015), plausibly also depending on the other defining features of a stimulus. In daily life, an emotional stimulus is almost never exclusively emotionally salient and often coexists with physical salience or goal relevance. Moreover, learning and selection history (Awh et al., 2012) may play a major role in attentional selection of emotional stimuli. In the current framework of Awh et al. (2012), selection history emphasizes reward learning, whereas similar mechanisms may account for discriminative fear conditioning, in which participants learn to fear a specific stimulus (Belopolsky, 2015). Future eye-movement research should take these considerations into account when trying to identify how, when, and where emotion modulates attention selection.

Conclusion

In the present review, I have discussed several eye-movement studies investigating the influence of emotional stimuli on attentional selection. Results demonstrate that emotional stimuli indeed capture covert attention and affect subsequent saccadic behavior-facilitating saccades toward their location and interfering with saccades toward the opposite location. Moreover, threatening stimuli capture the eyes more often than do neutral stimuli. In particular, emotional stimuli presented as sudden onsets seem to affect the oculomotor system, suggesting biased magnocellular processing of emotional stimuli. However, fast oculomotor capture of emotional stimuli that are neither visually salient nor task relevant has not yet been demonstrated. Therefore, the question of whether emotional attention can act independently from exogenous and endogenous attention to potentiate perception, has not yet been answered with eye movements research. However, the results from the studies reviewed above suggest that emotion modulates endogenous and exogenous attentional processes rather than acting as a separate attention system. Future work could try to dissociate emotional attention from exogenous and endogenous attention in order to understand the mechanisms by which emotional stimuli affect visual and attentional selection.