In 1978, Premack and Woodruff (1978) published a landmark article in which they introduced the theory-of-mind (ToM) concept. The authors stated that a ToM can be assumed to be possessed by an individual that can impute mental states to him- or herself and others. Empirically, Premack and Woodruff’s article solely concerned whether the chimpanzee is able to make such mental state attributions. Thus, the ToM notion was originally used within an animal context and became a useful way of characterizing animal cognitive ability in a variety of species (e.g., Dally, Emery, & Clayton, 2005; Penn & Povinelli, 2007). Soon after Premack and Woodruff’s article appeared, a number of developmental psychologists applied the ToM idea to human infants (e.g., Wimmer & Perner, 1983), and the notion has now been applied across a range of contexts including, for instance, schizophrenia (Harrington, Siegert, & McClure, 2005), autism (Baron-Cohen, 2000), Alzheimer’s disease (Gregory, Lough, Stone, Baron-Cohen, & Hodges, 2002), decision-making (Torralva et al., 2007), and evolutionary psychology (Povinelli & Preuss, 1995).

A number of authors have recently shown that mental state attributions can occur during gaze cueing, in which the observation of where another person is looking influences attention allocation in the observer (Nuku & Bekkering, 2008; Teufel, Alexis, Clayton, & Davis, 2010; Teufel et al., 2009; Teufel, Fletcher, & Davis, 2010). In the gaze cueing paradigm, a face is presented in the center of a display with its eyes and/or head directed to the left or right. A target is then presented either at the gazed-at location or in the opposite hemifield. The results typically show that the reaction time (RT) to determine the identity or presence of a target is reduced when it is presented in the gazed-at position (Friesen & Kingstone, 1998; Langton & Bruce, 1999). This is usually taken as evidence that seeing a gaze triggers a shift of attention in the observer. The basic gaze cueing effect is highly robust, and many variations of this paradigm have been developed, all aimed at understanding various aspects of social cognition (e.g., Frischen & Tipper, 2004; Hietanen, 2002; Kuhn, Tatler, & Cole, 2009). The most common explanation suggests that gaze cueing is a reflective/bottom-up process that is driven by the mechanics of eye deviation perception (e.g., Baron-Cohen, 1995; Bayliss, di Pellegrino, & Tipper, 2004; Driver et al., 1999; Fernandez-Duque & Baird, 2005). According to this view, attention is shifted from the eyes because they deviate toward the gazed location. In addition to this explanation, a number of studies have demonstrated that mental state attribution can modulate the effect. Indeed, not only does gaze indicate where a person is looking, it also suggests that the individual is attending/perceiving something or someone at the gazed location. As Calder et al. (2002) pointed out, gaze “implies that the person may have some intention or goal towards this particular object. In other words, gaze engages the mechanisms involved in the attribution of intentions and goals to others” (p. 1130). In support of this notion, Nuku and Bekkering (2008) undertook a variant of the basic gaze cueing experiment in which they manipulated the agent’s ability to see. In their experiments, the gazer either had its eyes closed versus open or was blocked out by a dark rectangle versus wearing sunglasses. The authors were thus able to examine “whether we infer that the agent is physically able to attend the target” (p. 340). Their results showed larger cueing effects when the agent was able to see the target. This clearly suggests that inferring the agent’s mental state (i.e., “seeing” vs. “not seeing”) influences the degree to which an agent shifts an observer’s attention.

Teufel et al. (2010a) and others (e.g., Caron, Butler, & Brooks, 2002) have pointed out, however, that the kind of design used by Nuku and Bekkering (2008) confounds potential mental state attribution and properties of the stimuli. For instance, Nuku and Bekkering’s experiments not only manipulated the agent’s perception, but also characteristics of the agent’s eye region, which may have generated the results obtained. Teufel et al. (2010a) eliminated this potential confound by presenting agents who wore mirrored goggles and telling their participants that these individuals could either see or not see (i.e., the goggles were either transparent or opaque). Importantly, therefore, the inducing stimuli were identical in both seeing conditions, with only the participants’ belief being manipulated. As in Nuku and Bekkering’s study, Teufel et al. (2010a) observed greater gaze cueing when participants were informed that the agent could see. In a second experiment, Teufel et al. (2010a) manipulated the probability with which the face cued the target location, such that the target was twice as likely to occur at the uncued location. It is known that gaze cues are able to shift attention even when an observer knows that a target is more likely to appear at a non-gazed-at location (e.g., Driver et al., 1999). Teufel et al. (2010a) found that participants were only able to voluntarily shift their attention away from the gazed location when told the agent could not see through the goggles. As with Nuku and Bekkering’s observations, this shows that gaze cueing can be modulated by the observers’ beliefs about whether the agent can or cannot see. In a further study using mirrored goggles, Teufel et al. (2009) employed a gaze perception aftereffect in which prolonged exposure to a face gazing in one direction altered subsequent perception of where the face was looking (Jenkins, Beaver, & Calder, 2006). Teufel et al. (2009) reported that this effect was enhanced when the observer believed that the agent could see through the goggles.

Langton (2009), however, has urged caution in concluding that mental state attribution modulates gaze effects. Langton suggested the possibility that the important attribution may concern whether or not the agent’s perceptual mechanisms are functioning, rather than its mental state. Langton also made the point that typical gaze cueing studies have presented an isolated gazer that is not actually looking at anything. Furthermore, as Teufel et al. (2010a) also pointed out, the mental state account does not concur with one of the basic findings from the large body of gaze cueing work: Attentional shifts induced by a gazing agent appear to be largely reflexive. For example, gaze-cued shifts of attention are characterized by their rapid time course and resistance to cognitive control (e.g., Driver et al., 1999). Additionally, objects that have no mental state (e.g., a glove) but that incorporate a pair of eyes are also effective in shifting attention to the looked-at direction (e.g., Quadflieg, Mason, & Macrae, 2004), and gaze cueing is unaffected by cognitive load (Law, Langton, & Logie, 2009). These findings suggest that gaze cueing is largely controlled by bottom-up mechanisms, with little contribution from higher processes that are responsible for mental state attribution.

These studies suggest that mental state attribution is sufficient to modulate gaze cueing. However, one of the questions that follow from the Nuku and Bekkering (2008) study is whether mental state attribution necessarily modulates gaze cueing. Evidence for this has come from Samson, Apperly, Braithwaite, Andrews, and Bodley Scott (2010), who argued that humans “spontaneously” compute the perspective of another individual. In their basic experiment, an image of a room was shown. In the center was a human avatar that looked either toward the left-hand or the right-hand wall. The participants were asked to judge the number of discs located on the two walls and were required to do this from either their own or the avatar’s perspective. Crucially, the experimenters manipulated the consistency of the avatar’s and the participant’s perspective; on some trials, the avatar and participant could see the same number of discs, whilst on other trials they could see different numbers of discs. For example, if the avatar looked to the left-hand wall and one disc was located on each of the two walls, the avatar saw one disc and the participant, by virtue of seeing the whole room, saw two. By contrast, if two discs appeared on the left-hand wall and none of the right, both the participant and avatar saw the same number of discs—that is, two, both located on the left-hand wall. Samson et al.’s central results showed that RTs to make the disc number judgment were reduced when the avatar’s viewpoint was consistent with the participant’s, relative to when their viewpoints were inconsistent. Importantly, this occurred even when participants were told to ignore the avatar’s perspective. The authors concluded that these results were due to the discs being “seen by the other person” (original italics) and that computation of other people’s perspective occurs spontaneously. If humans do indeed spontaneously compute another’s perspective, and if this computation affects spatial attention (as was suggested by Samson et al., 2010), attentional modulation by an avatar’s perspective should also occur in gaze cueing tasks.

In the present work, we conducted three gaze cueing experiments in which we manipulated the agent’s perspective by employing a technique commonly used in studies that assess mental state attributions of seeing in nonhuman animals (e.g., Hare, Call, & Tomasello, 2001). Animal behavior work often uses a physical barrier positioned such that it either allows a stimulus to be seen or occludes it. For instance, a chimpanzee may be tested to determine whether it knows that another chimpanzee is unable to see a food item, due to the position of the occluding barrier.Footnote 1 Similarly, rather than changing some aspects of the cueing agent, we placed a physical barrier on either side of the agent. On “nonseeing” trials, these barriers fully occluded targets presented to the left or the right. By contrast, on “seeing” trials the barriers were moved so that they allowed the target to be seen. Thus, our use of physical barriers avoided the potential confounds that arise when aspects of the gazing agent itself are manipulated, such as when goggles are worn or the eyes are blanked out. If the attribution of seeing necessarily modulates gaze cueing, then the effect should be modified (e.g., decreased) when the agent’s vision is restricted by the barriers.

Experiment 1

In Experiment 1, the cue was a photograph of a female model whose head and gaze were oriented to the side (see Fig. 1). We also varied the interval between the onset of the cue and target to assess whether any mental state attribution effect changed over time.

Fig. 1
figure 1

Stimuli used in Experiment 1. This example shows a valid trial in the “seeing” condition

Method

Participants

A total of 38 participants from the University of Essex took part in exchange for course credit.

Stimuli and apparatus

The gaze cue agent was the head of a female 30 years of age. She looked out from a cardboard box that measured 18.9° in height and 12.8° in width. Door-like structures were incorporated into the sides of the box. When these doors were open, the agent could look out to the sides, but not when the doors were closed. This manipulation therefore generated the seeing and nonseeing conditions. When gazing to the side, the model was asked to also turn her head. The targets were black letters, S and H (3.5° high, 3.3° wide), that were placed to the left or right during the photographing. Thus, the model was actually looking at the letters. We edited the photographs so that only the cardboard box, the model, and the target letters were visible. The experiment was driven by an eMac computer incorporating a CRT monitor.

Design and procedure

A within-participants 2 × 2 × 3 factorial design was used. The first factor manipulated Cue Validity (valid, invalid), the second factor manipulated Visibility (seeing, nonseeing), and the third factor manipulated the Stimulus Onset Asynchrony (SOA) between the appearance of the cue and target (100, 400, and 800 ms). Although work in this field does not always use the same SOA values, the values that are typically employed are designed to index early reflective processes and later, top-down mechanisms. In order to ensure that any mental state attribution did not need to be computed trial by trial, the visibility conditions were blocked and their presentation order counterbalanced. The SOA and validity manipulations were presented within each block and presented in random order. The two blocks of trials in the experiment presented the (empty) cardboard box as a background with a fixation point located in its middle. Each trial began with the presentation of the model for 100, 400, or 800 ms, and then the target. This display remained until response, and the beginning of a trial was initiated by the participant’s response on the previous trial. Participants were explicitly told that the face could either see or not see the target letter, depending of which barrier was presented. Participants were informed of this at the beginning of each visibility block. In all, 36 valid and 36 invalid trials were presented in both visibility blocks for each SOA, thus generating 432 trials in total. The numbers of different trial types were balanced, such that target types and target locations were equated. The face validly cued the target location on 50 % of trials, and participants were informed of this contingency. Twenty-four practice trials were included.

Check for the validity of our visibility manipulation

In addition to the experiment proper, we also ran a test to determine whether our visibility manipulation was effective. Five participants who did not take part in the main Experiment 1 were shown 12 examples of our display, with doors either open or closed, and were asked which letters the model could see. The letters (Ss and or Hs) were located in the gazed-at direction, either outside the box on the inside door of the box (when the doors were closed), or in both the outside and inside positions. All five participants were 100 % correct. Thus, for instance, when the doors were closed, all stated that the model could only see the letter positioned inside the door of the box, and not the letter outside.

Results and discussion

The data from two participants were excluded because their error rates were greater than 20 %. An additional 3.4 % of responses were considered outliers (two SDs above or below a condition mean for each participant) and omitted from further analysis. Figure 2 shows the mean RTs for each condition. An ANOVA with Validity, Visibility, and SOA as within-participants factors revealed significant main effects of validity, F(1, 35) = 19.4, p < .001, η 2 = .36, and SOA, F(2, 70) = 51.1, p < .0001, η 2 = .59, but no significant main effect of visibility, F(1, 35) < 1. The interaction between validity and visibility was not reliable, F(1, 35) = 2.6, p > .11, nor was the three-way interaction, F(1, 35) < 1. The interaction between validity and SOA was, however, significant, F(2, 70) = 13.1, p < .001, η 2 = .27. With respect to the error rates, no effects were significant, all Fs < 2.9, all ps > .05.

Fig. 2
figure 2

Mean reaction times (RTs) and error rates from Experiment 1, together with standard error bars

The first notable aspect of these results is the presence of an overall gaze cueing effect. Participants were faster to identify the target when it appeared in the cued location. This replicates the many previous reports of eye gaze triggering a shift in an observer’s attention (e.g., Friesen & Kingstone, 1998; Langton & Bruce, 1999). The significant Validity × SOA interaction is also in line with other gaze cueing studies that have demonstrated that the gaze cueing effect builds up over time (e.g., Driver et al., 1999). Indeed, Fig. 2 shows that no cueing effect occurred at the 100-ms SOA. The most important aspect of Experiment 1, however, is the absence of any visibility effect on gaze cueing. The demonstration that the face cued attention despite having its vision restricted suggests that the mental state attribution of “seeing” does not necessarily modulate the gaze cueing effect. Given the lack of a gaze effect at the 100-ms SOA, we performed additional analyses to assess any influence of visibility on the gaze effect in the 400-ms SOA condition. This condition might be the important one in which to examine whether mental state influences gaze cueing, because Teufel et al. (2010a) observed modulation of gaze cueing at a 400-ms SOA. Our results, however, showed no significant interaction between validity and visibility, F(1, 35) = 1.9, p > .17. Indeed, if one considers the means only (see Fig. 2), a larger cueing effect occurred in the nonvisible condition. This was also apparent in the 800-ms SOA condition. In sum, the results from Experiment 1 revealed a robust cueing effect, but one that was not modulated according to whether the agent could or could not see the target.

It should be noted that a previous study by Kawai (2011) claimed that barriers modulated gaze cueing. However, close inspection of the data show clear evidence of gaze cueing effects at 105 ms in one of the “target occluded” conditions. Furthermore, the occluders also disrupted nonsocial shifts of attention triggered by arrow cues. Thus, the variation in cueing effects observed in that study was unlikely to be caused by changes to the mental state attributions made about the cue.

Experiment 2

The purpose of Experiment 2 was to assess our central question concerning mental state attribution and gaze cueing, using a different behavioral measure from that used previously. In Experiment 1 we had employed speeded motor responses to measure the gaze cueing effect. However, processes indexed by RTs may be adversely affected by response noise. As Milliken and Tipper (1998) pointed out, “the act of measurement may contaminate the measurement itself” (p. 216). An alternative to measuring processes that involve response-end mechanisms is to present stimuli under degraded conditions (e.g., brief displays) and measure the accuracy of perception. Because such measurements involve participants making a purely perceptual decision, rather than emitting a speeded motor response, such measurements are less contaminated by response noise. The use of accuracy as a potentially more sensitive measurement than RT has previously been noted by many authors. For instance, Santee and Egeth (1982) suggested that under the “data-limited” conditions (Norman & Bobrow, 1975) of briefly presented displays, accuracy measures are more sensitive to perceptual processes (see also Gellatly, Cole, Fox, & Johnson, 2003; Milliken & Tipper 1998; Rafal, Smith, Krantz, Cohen, & Brennan, 1990; Skarratt, Gellatly, Cole, Pilling, & Hulleman, 2014). Empirical support for this has come from Cole, Kuhn, Heywood, and Kentridge (2009), who showed that although color “singletons” do not automatically attract attention when RT is used to index capture, they do so when a “one-shot” change detection method is used.

In Experiment 2, therefore, we employed a change detection task in which so-called change blindness was induced. Change blindness is the phenomenon whereby observers often fail to notice a change to a visual scene if the change is masked by simultaneous visual transients (e.g., Simons & Rensink, 2005). The rationale for our use of the procedure was based on the link between attention and the degree to which change blindness is induced (e.g., Rensink, O’Regan, & Clark, 1997; Smith & Schenk, 2008). If a stimulus has attentional priority, one should expect it to be less susceptible to change blindness than a stimulus that does not receive attentional priority (Cole, Kentridge, & Heywood, 2004; Cole & Kuhn, 2009, 2010; Pisella, Berberovic, & Mattingley, 2004; Ro, Russell, & Lavie, 2001; Scholl, 2000; Smith & Schenk, 2008, 2010). In Experiment 2, we used the one-shot variant of the procedure, in which the changed item occurred once only. Crucially, the change occurred either at a location gazed at by a cueing agent or elsewhere in the display. As in Experiment 1, the gazing agent could either see the same stimuli as the participant or not.

Method

Participants

A group of 18 undergraduate participants were recruited from Durham University.

Stimuli and apparatus

The agent gazed at one of four positions (top right, top left, bottom right, and bottom left; Fig. 3). For nonseeing trials, the barriers were green that obscured the agent’s vision of the probe stimuli. For the seeing trials, windows appeared in the barriers that allowed the agent a clear view of the targets presented at the bottom and top positions. The stimulus letters were drawn from the set E, U, O, P, S, F, H, L, and A. Stimuli were generated using a Cambridge Research Systems ViSaGe graphics card and displayed on a 17-in. Sony Trinitron CRT monitor with a refresh rate of 100 Hz. Responses were collected using a button-box with two response buttons. Participants were seated 57 cm from the monitor and had their heads supported by a chinrest.

Fig. 3
figure 3

Stimuli used in Experiment 2. The avatar gazes to one of four possible target positions

Design and procedure

A within-participants 2 × 2 factorial design was used. The first factor manipulated Cue Validity (valid, invalid), and the second factor manipulated Visibility (seeing, nonseeing). Trials began with the appearance of the environment and a fixation point at the center of the monitor for 1,000 ms. The letter stimuli then appeared for 1,000 ms, followed by the gaze cue, which was present for 100 ms. The entire stimulus array was then occluded by a black mask for 50 ms. This mask was replaced by the changed stimulus array and gaze cue, which was present until the participant responded. During a 2,000-ms intertrial interval, a fixation point was presented on a blank gray screen. Seeing and nonseeing trials were randomly interleaved. A total of 200 trials were presented, with 20 % of these being no-change trials. When the target was present, the agent validly cued the target location on 25 % of trials. Participants were instructed to report seeing a change only when they were confident that one of the letters had changed. In practice, this meant that the participant had to know either the location or the identity of the change. If they were unsure whether a change had occurred, they were instructed to report that they had not detected a change.

Results and discussion

The overall false alarm rate was 6.2 %. Figure 4 shows the mean accuracies for the four conditions. An analysis of variance (ANOVA) with Validity and Visibility as within-participants factors revealed a significant main effect of validity, F(1, 17) = 16.9, p < .01, η 2 = .5, but no significant main effect of visibility, F(1, 17) < 1. The interaction was not significant, F(1, 17) < 1. As with Experiment 1, these data show a robust gaze cueing effect. However, despite the use of a different measure (i.e., nonspeeded detection accuracy), the effect was again not influenced by what the agent could see.

Fig. 4
figure 4

Results from Experiment 2 (accuracy). Standard error bars are also shown

Experiment 3

A growing number of studies have begun to examine visual cognition during real social interaction (Skarratt, Cole, & Kingstone, 2010). Such studies have led to some revisions of what is known about visual attention (see Skarratt, Cole, & Kuhn, 2012, for an extensive review). For instance, gaze cues were for a long time assumed to be unable to induce inhibition of return (IOR; Posner & Cohen, 1984). However, attention shifts generated by observing the eyes of a real person sitting opposite consistently produce large IOR (e.g., Cole, Skarratt, & Billing, 2012; Skarratt, Cole, & Kingstone, 2010; Welsh et al., 2005). One can argue that issues concerning mental state attributions would particularly benefit from experiments that involve interactions with real people. This is based on the assumption that it should be easier to compute the mental state of a real person than of a schematic or even photographed representation. In Experiment 3, therefore, we used a real person as the cue, who sat facing the participant before looking toward one of the two possible target locations (see Fig. 5). Physical barriers located on either side of the gazer allowed the targets to be seen or not seen. This experimental setup also controlled for a possible confound that may have existed in Experiments 1 and 2: Although participants in those experiments were informed that the lateral barriers either allowed the targets to be seen by the gaze cue or not, it was not entirely evident that this was inferred. Participants may not have actually believed that the barriers rendered the targets nonvisible in the occluded conditions. This could have been for many reasons, including poor depth clues that may not have adequately conveyed the intended positions of the targets. Presenting a real person adjacent to real barriers ensured no ambiguity as to what the “cue person” could see.

Fig. 5
figure 5

Setup for Experiment 3. The image shows a valid trial in which the barrier occludes the cue’s visibility of the target. The inset image shows what the cue person saw during the first part of each trial

Method

Participants

A group of 16 participants from the University of Essex took part in exchange for course credit.

Stimuli and apparatus

The gaze cue person was the third author. He sat approximately 160 cm from the participant with his back to a projector screen lit from behind. The visible part of the screen measured 98 cm in height and 175 cm in length. The targets could appear 65 cm to either the left or right of the cue person’s nose. Time was taken to ensure that the cue’s nose was always located centrally between the target locations. The occluding barriers were extendable screens that measured 85 cm in height and were extended to be 40 cm wide. They were positioned on two tables located on either side of the cue person. The targets were black letters presented on a uniform white screen, and were an S and an H that measured 13 cm in height and 11 cm in width. The experiment was driven by a Mac Book Pro, and responses were made via a Cedrus Button Box. A standard LCD monitor was additionally located behind the participant’s head (see below).

Design and procedure

A within-participants 2 × 2 factorial design was employed. As we reported previously, the two factors were Validity (valid, invalid) and Visibility (seeing, nonseeing). Each trial effectively began with the cue person returning his head/gaze from the side to look straight ahead and directly into the eyes of the participant. Approximately 500 ms after this head return was completed, a 3–2–1 visual countdown (each 500 ms) was presented to the cue person via a computer monitor located behind the participant (and above the participant’s head) that was only visible to the cue person. This countdown occurred at the top of the monitor on either the left or the right and informed the cue person which side he should look toward when the countdown was completed. The position of the monitor enabled this information to be seen peripherally by the cue person—that is, without the need to look away from the participant. This countdown procedure ensured that the cue person moved his head at almost the same moment on each trial. The target appeared exactly 600 ms after the countdown was completed. This exact timing was achieved via the use of a video splitter; a single computer presented identical information to both the cue person’s monitor and the participant’s screen. The countdown information was of course hidden from the participant (by a black cloth hung over the top of the screen). We estimated that the total head movement time was approximately 600 ms. Thus, the target appeared at about the same time as the cue person finished turning his head, or to put it another way, 600 ms after the cue person began his head movement. In the seeing condition, the cue person fixated the target.

The visibility condition was blocked and presentation order was counterbalanced. Manipulating visibility was achieved by placing the barriers such that they either touched the presentation screen (nonseeing) or were moved forward by 25 cm, allowing the cue person to see the targets. Although it was clearly evident that this barrier positioning rendered the targets visible or not visible to the cue person, each participant was asked to confirm that this was the case. All agreed. Every other aspect of the experiment was as reported previously.

Results and discussion

Using the same definition described previously, 2.2 % of responses were deemed to be outliers and omitted from further analysis. Figure 6 shows the mean RTs for each of the four conditions. An ANOVA revealed a significant main effect of validity, F(1, 15) = 34.6, p < .001, η 2 = .7, but no significant main effect of visibility, F(1, 15) = 1.8, p > .19. The interaction was not significant, F(1, 15) < 1. With respect to errors, there was no significant main effect of validity, F(1, 15) < 1, or visibility, F(1, 15) = 2.1, p > .17, and the interaction was also not significant, F(1, 15) = 1.4, p > .24. Overall, these results concur with those reported in Experiments 1 and 2. A cueing effect was observed but was not modulated by what the cueing agent could see.Footnote 2

Fig. 6
figure 6

Mean reaction times (RTs) and error rates from Experiment 3. Standard error bars are also shown

General discussion

The ability to infer the mental states of other individuals is one of the central tenets of social cognition. Furthermore, the orienting of one’s attention around a visual scene on the basis of the behavior of another individual (i.e., social attention) can clearly occur as a result of a mental state attribution, as when we orient gaze because we would like to know what another person is looking at. Across three experiments, we assessed whether attributing a mental state of “seeing” or “unseeing” necessarily modulates the gaze cueing effect. The results showed robust cueing effects, independent of whether or not the gaze cue could see the target. This suggests that the computation of the mental state of a gaze does not have a mandatory effect on gaze cueing. In the following section, we propose a new theoretical model of gaze cueing. In this model, we will also attempt to explain why mental state attribution effects vary across different experimental designs and to make new predictions about the conditions required in order to observe mental state effects on gaze cueing.

A schema theory of gaze cueing

To understand the boundary conditions of gaze cueing, we propose that gaze cueing can be considered within the theory of action control proposed by Norman and Shallice (1986) and Cooper and Shallice (2000; in this context, “actions” can refer to cognitive operations and motor outputs). Central to this view is the idea that action control is achieved by the activation of program-like representations called “schemas.” These schemas specify highly learned sequences of actions required to achieve a specific goal. Schemas are activated in a bottom-up fashion in response to the properties of the external environment. However, the threshold for activation of a schema can be modulated using top-down executive control (this construct corresponds to an “attentional resource”; Kahneman, 1973). Once activated, the operations specified by the schema are executed automatically (i.e., they are fast, are difficult to suppress, and do not require attention). With respect to social attention, we propose that repeated association between observed gaze direction and relevant stimuli leads to the formation of a gaze cueing schema that allows very rapid orienting of spatial attention to the gazed-at location. This idea is a more formal expression of the view that social attention is the consequence of learned associations, rather than an innate response to biological stimuli (e.g., Brignani, Guzzon, Marzi, & Miniussi, 2009).

The advantage of placing gaze cueing in this framework is that the factors that mediate the selection/deselection of schemas have been precisely specified by Cooper and Shallice (2000). Specifically, Cooper and Shallice argued that schemas have an activation value, which is the threshold that incoming excitatory influences must surpass in order to activate the schema. The activation value can be influenced by experience, such that repeatedly activating a schema lowers its activation value, and by top-down executive control processes, which can raise or lower the activation values. The level of excitation is determined by the presence of stimuli that match the trigger properties of the schema and by lateral influences from competing schemas. The probability of a schema becoming activated therefore depends on an interaction between the excitatory power of incoming sensory stimulation, practice, and the level of excitatory/inhibitory influence being exercised by the central executive.

The probabilistic nature of schema activation is important, because it explains how a threshold model can lead to modulations of the magnitude of cueing effects (e.g., Wiese, Wykowska, Zwickel, & Müller, 2012). The rationale can be best explained with an example. Let us imagine two experiments. The first is the canonical gaze cueing task, in which the top-down influence on activation values is weak. The second task includes a manipulation of mental state in which the top-down influence on activation values is powerful. Let us assume a 90 % probability of activating the gaze cue schema in the first of these experiments, a 50 % probability of schema activation in the second, and that the benefit of attending to a cued location is a 15-ms enhancement of RT. In the first experiment, for every ten valid trials, the cued location will be attended in nine, producing an average benefit of (9*15/10 = 13.5 ms). In the second experiment, the cued location will be attended on 5/10 trials, producing an average benefit of (5*15/10 = 7.5 ms). Thus, a probabilistic threshold model of gaze cueing can account for both the abolition and attenuation of cueing effects.

The schema theory of gaze cueing suggests a number of predictions about the conditions in which gaze cueing should be observed. First, if the cue information is powerful (e.g., it contains unambiguous information about gaze direction), the probability of engaging the gaze cueing schema would depend on the strength of the top-down control signals that regulate schema activation thresholds (i.e., how motivated is the observer to suppress the gaze cueing schema?). It would only be possible to inhibit the gaze cueing schema when the observer was very highly motivated to ignore gaze direction. Second, when the cue information is weaker (e.g., when the cue could be eyes or could be something else, or when gaze direction was ambiguous), schema activation would be more sensitive to the influences of executive control. In this case, the cueing effect should be attenuated by any manipulation that motivates the observer to inhibit the gaze cueing schema. These two predictions are consistent with the majority of the empirical data. Specifically, studies that have used unambiguous eye-gaze cues have tended to produce rapid, involuntary gaze cueing, even when participants know that the gaze direction is nonpredictive (e.g., Driver et al., 1999) and when contextual information suggests that gaze direction is irrelevant. In contrast, when some ambiguity regarding gaze direction is introduced by obscuring the eyes and using head-gaze as a cue (e.g., Nuku & Bekkering 2008; Samson et al. 2010; Teufel et al., 2009) or making the cues’ status as eyes ambiguous (Ristic & Kingstone, 2005), gaze cueing effects can be modulated by contextual information, such as knowledge that the cue can or cannot “see.” In this context, it should be noted that Wiese et al. (2012) manipulated observers’ beliefs about what the cue face “wants” to attend (the “intentional stance”). They reported that gaze cueing was attenuated when observers believed the cue did not want to see the targets (Wiese et al., 2012), even when the gaze direction was unambiguous. This result can be accounted for by the schema theory, if one accepts that manipulations of intentional stance elicit more powerful top-down control of cueing schema than the line-of-sight manipulations used in other studies.

Third, the reflexive gaze cueing effect should follow a developmental trajectory, such that gaze cueing in infants and young children should be slow and under conscious control, but that as the schema becomes established, the cueing effect should become increasing fast but resistant to cognitive control. Thus, young children should show weak cueing effects, particularly under conditions of high cognitive load, whereas older children and adults should be unaffected by cognitive load (this second prediction is consistent with recent data from Law et al., 2009). There should also be a systematic reduction in the latency at which cueing effects can be observed as age increases. Fourth, modulation of the gaze cueing schema depends on the availability of executive resources, so reducing the availability of these resources by imposing cognitive load or engaging in ego depletion (Baumeister, Bratslavsky, Muraven, & Tice, 1998; Baumeister, Muraven, & Tice, 2000) should limit the capacity for schema control. The model therefore predicts that the modulation of gaze cueing by mental state attribution will be reduced or abolished under conditions of high cognitive load (see Schneider, Lam, Bayliss, & Dux, 2012, for support). Fifth, the usual response to seeing averted gaze is to orient the eyes to the gazed-at location, so a gaze cueing schema may produce concurrent activation of the oculomotor system. This prediction is consistent with recent evidence that gaze cues engage the oculomotor system (Grosbras, Laird, & Paus, 2005; Kuhn & Kingstone, 2009; Ricciardelli, Bricolo, Aglioti, & Chelazzi, 2002), but that this activation is not required for gaze cueing (Friesen & Kingstone, 2003; Morgan, Ball, & Smith 2014). Additionally, nothing is special about the social aspect of gaze cues. The model predicts that any overlearned cue–target association can become encoded as a schema, and thus show the same pattern of results as gaze cueing. This prediction is consistent with the well-established finding that arrow cues trigger attention shifts that are behaviorally similar to those triggered by gaze cues (Ristic, Friesen, & Kingstone, 2002; Stevens, West, Al-Aidroos, Weger, & Pratt, 2008; Tipples, 2002), and recent evidence that overtraining any arbitrary association between stimulus property and spatial location can produce rapid, involuntary shifts of attention (Guzzon, Brignani, Miniussi, & Marzi, 2010).

Finally, the schema theory of gaze cueing explains why the phenomenon is observed in persons with autism (Leekam, Hunnisett, & Moore, 1998), a finding hard to explain from a mental state perspective. Specifically, because the cueing effect is the product of learning stimulus–response associations, which is a process that is intact in autism, people with autism should show reflexive cueing, assuming that they have been exposed to associations between gaze and relevant stimuli during development. However, these participants should experience problems with the modulation of gaze cueing in response to the mental states of a cue, because they do not attribute mental states to the gazer (they assume that it sees what they see), and so are not motivated to exert control over schema activation.

In summary, we have found that the attribution of the mental state of “seeing” to a gaze cue does not necessarily modulate gaze cueing effects. The irrelevance of an observed agent’s point of view was maintained for both depicted and real-life faces, and in tasks that indexed attention using both RT and response accuracy. We have proposed a schema theory of gaze cueing, which argues that mental state attribution can only influence reflexive gaze cueing when the information about gaze direction is ambiguous. This approach accounts for the failure to observe effects of mental state attribution in the present study as well as the positive results of previous studies, and it makes clear predictions about the results of future empirical studies.