The visual system is tasked with analyzing an enormous amount of information and one of its principle mechanisms for addressing this challenge is its ability to focus attention on a small region of the visual field. This capability is often described as a metaphorical spotlight (Posner, Snyder, & Davidson, 1980) or a zoom lens (Eriksen & St. James, 1986; Eriksen & Yeh, 1985) that can enhance the processing of information at a single region of the visual field to produce improvements in accuracy or faster reaction times for stimuli in the attended area.

One of the key questions driving attention research has been to understand whether this attentional focus is inherently restricted to a single location or whether it can be divided across multiple locations. The resolution of this question has critical implication for understanding the mechanisms that control the deployment of attention. For example, one proposed mechanism is “winner-take-all” (e.g., Itti & Koch, 2000), in which the most attended location in the visual field suppresses attention at all other locations. Winner-take-all is a simple, but powerful mechanism for explaining the tendency of attention to converge at one location in the visual field, to the exclusion of others.

However, several findings have provided evidence that attention can be divided into distinct regions in both sustained (McMains & Somers, 2004) and transient (Ansorge, 2004; Bichot, Cave, & Pashler, 1999; Dubois, Hamker, & VanRullen, 2009; Kawahara & Yamada, 2006; Kyllingsbæk & Bundesen, 2007) forms, and there have been corresponding efforts to clarify the debate as to what constitutes sufficient evidence of divided attention (Jans, Peters, & Weerd, 2010; but see Cave, Bush, & Taylor, 2010). These efforts have been accompanied by the development of theoretical and computational models of attention that explore candidate neural mechanisms for controlling attention (Cheal, Lyon, & Gottlob, 1994; Sandage, Trappenberg, & Klein, 2005; Zirnsak, Beuth, & Hamker, 2011). One lesson learned from some of these modeling efforts is that a system that behaves like a spotlight in most contexts can also exhibit divided attention in particular contexts (Sandage et al., 2005; Zirnsak et al., 2011).

One issue that has received relatively less focus is whether attention, when divided into two regions, provides the same effect as it does when restricted to a single location. This question has critical implications for our understanding of the mechanisms of visual perception. If a split attentional focus is less effective than a single attentional focus, the findings would support a limited-resource theory of attention in which a finite amount of processing resource can be deployed across one or more locations. However, if attention can be deployed at two locations with similar effects as at a single location, a limited-resource theory would be an insufficient explanation. To answer this question, in two experiments we tested the ability to report items from an RSVP stream with varying numbers of cues (Exp. 1) and varying temporal offsets between two targets (Exp. 2).

FormalPara Two modes of attentional deployment

The present experiments involve transient attention, as opposed to slower, sustained shifts of attention (Jonides, 1981). The key defining property of transient attention, for our purposes, is that it is triggered by a salient or task-relevant stimulus and begins within about 100 ms of the cue onset at the same location as the triggering stimulus (Nakayama & Mackeben, 1989; Wyble, Bowman, & Potter, 2009). This rapid deployment of attention has been suggested to be distinct from the slower, sustained form of attention (Berger, Henik, & Rafal, 2005). We will briefly review some of the most pertinent literature on both varieties of divided attention.

FormalPara Split deployment of sustained attention

Sustained shifts of attention are thought to be directed endogenously, by the volitional intent of the participant. Several studies have concerned the ability of attention to be divided between two locations in a sustained fashion. In such studies, participants typically split their attention between two distinct regions of a computer screen, and their attentional state is probed at key locations: at the attended locations, in the interstice (i.e., the space between the two targets), or in the area outside of the targets. These studies have involved behavioral (Awh & Pashler, 2000) as well as brain-imaging (McMains & Somers, 2004) data. Such studies typically uncover evidence that attention can indeed be split across two locations without attending to the region between them. However, Jans et al. (2010) argued against the generalizability of these findings, contending that human observers do not passively attend to two noncontiguous locations simultaneously, and thus that one’s ability to sustainably divide attention is a learned skill.

FormalPara Split deployment of transient attention

Attention can also be deployed rapidly in response to categorical targets (Wyble et al., 2009) or visual cues (Nakayama & Mackeben, 1989) at a given location. This is referred to as transient attention because the level of attention seems to follow a measurable time course. Directing attention with location cues causes shifts of attention to occur much more quickly than when endogenous cues are used, with observable benefits occurring less than 100 ms from the onset of the cue. In contrast, attention shifts produced by centrally presented cues require on the order of 300 ms to manifest (Müller & Rabbitt, 1989). A defining characteristic of transient attention is that the intended location of attention corresponds to the physical location of the cueing stimulus in the visual field, and this correspondence seems to have a particular significance for the visual system that enables the rapid deployment of attention.

Substantial evidence has indicated that shifts of stimulus-driven attention focus on a single location to the exclusion of others. For example, in studies of attentional dwell time (Duncan, Ward, & Shapiro, 1994; Petersen, Kyllingsbæk, & Bundesen, 2012), two targets are presented at two locations at different temporal offsets. The important finding of such studies has been that report of the second target (T2) suffers for a period of about 300 ms following presentation of the first target (T1), a finding that is similar to the attentional blink (Raymond, Shapiro, & Arnell, 1992), except that it occurs for spatially as well as for temporally offset stimuli. The dwell time finding suggests that a target stimulus triggers a shift of attention to its own location and that attention remains at that location for several hundred milliseconds, which makes it more difficult for stimuli at other locations to be perceived. The attentional-capture and attentional-cueing paradigms have also revealed evidence that attention shifts induced by featural singletons or by otherwise highly salient stimuli reduce the ability to report a target at a different location (Folk & Remington, 1998; Posner et al., 1980; Theeuwes, 1991). The data from such studies suggest that attention has a tendency to converge at a single locus in response to a salient stimulus.

However, these findings have to be reconciled with evidence that transient attention can be divided across two locations. Some of these studies have involved the simultaneous presentation of targets. In one such study, stimuli with two features in different locations were presented, and the probability of reporting features from both locations was equivalent to the product of the probabilities of reporting multiple features independently (Kyllingsbæk & Bundesen, 2007); this would not be the case if attention were deployed to only one of the two locations per trial. In another study, Bichot et al. (1999) contrasted the simultaneous and sequential presentation of two red digits that were to be attended, and they found no difference between the two conditions. In a follow-up experiment, they also presented cues at two locations to confirm that attention was not deployed in the intervening space. In a similar vein, Kawahara and Yamada (2006) found evidence for two discrete attentional foci by presenting dual rapid serial visual presentation (RSVP) streams (Forster, 1970). They found that the ability to report two simultaneous targets was enhanced if both targets occurred in the same locations as two immediately preceding targets. This effect was not observed if the two later targets appeared spatially between the earlier targets, suggesting that the attentional window created by the first two targets did not include the interstice. Further evidence of split transient attention has come from a study by Dubois et al. (2009), who replicated the final experiment of Bichot et al. (1999) and combined it with a mathematical formulation to demonstrate that attention was in fact being divided evenly between the two cued locations. Furthermore, using a GoodBourn and Holcombe (2013) dual RSVP task and found evidence of simultaneous selection from two locations within the same hemifield.

FormalPara Motivation for the present study

The aforementioned studies have collectively provided substantial evidence that transient attention can be divided across two separate locations, and consequently they indicate that the serial-spotlight model of attention is incomplete. However it is currently unknown what the consequences for dividing attention are, because the experiments that have employed dual spatial cues have not compared single and dual cues to an uncued baseline in order to measure the magnitude of the cueing benefit. The answer to this question has crucial implications for our understanding of the mechanisms underlying the deployment of attention. If attention is viewed as a strictly limited resource, the benefit from each of two simultaneous cues should be drastically reduced, relative to that produced by a single cue. The biased-competition model of attentional effects (Chelazzi, Duncan, Miller, & Desimone, 1998; Desimone & Duncan, 1995) also predicts a cost for dual cueing, because it explains attention benefits as a trade-off between the neural resources dedicated to each stimulus. Therefore, improvements in processing one cue should come at the expense of processing another, according to this theory.

To answer this question, in the present experiments we compared performance for identifying cued and uncued targets in trials with zero, one, or two cues (Exp. 1), and also determined whether the simultaneous presentation of two targets can result in better performance in identifying two discrete targets than can sequential presentation (Exp. 2).

Experiment 1

The first experiment was based on a novel dual-cueing paradigm in which four concurrent RSVP streams were presented, containing two targets among them. This experiment contrasted four conditions: no cues, two valid cues, one valid cue, and a valid cue paired with an invalid cue. The constellation of these conditions allowed us to explore the costs and benefits of cueing attention at multiple locations, relative to a single location. All participants had normal or corrected-to-normal vision.

Method

Participants

A group of 21 Syracuse University undergraduates with ages ranging from 18 to 23 years participated in this experiment. The participants were naive as to the purpose of the experiment. Of these participants, two were excluded from the analyses because their average T1 performance was two standard deviations below the group mean.

Apparatus and stimuli

The experiment was run on a Windows computer and was programmed in MATLAB using the Psychophysics Toolbox (Brainard, 1997). A 17-in. Dell CRT monitor was used to display this experiment. Four RSVP streams were presented in a horizontal array in the middle of the screen. The stimulus onset asynchrony (SOA) of all four RSVP streams was 67 ms. Each participant was instructed to place his or her chin on a chin rest, fixed such that the distance between the participant’s eyes and the central fixation was approximately 60 cm. At this distance, characters spanned approximately 0.8 × 0.4 visual degrees. The four RSVP streams were positioned in a horizontal row, with streams arranged –4.3, –1.3, +1.3, and +4.3 deg from the central fixation. The targets were drawn from a subset of letters from the Roman alphabet (B, C, D, F, G, J, K, L, N, P, R, T, X, Y, V), and the distractors were digits drawn from the set 2, 3, 4, 5, 6, 7, 8, 9. Between the two middle streams was a central fixation cross, which had a width of 0.3 deg. The fixation cross, the distractors, and the targets were black (RGB [0, 0, 0]) and were set against a gray background (RGB [150, 150, 150] on a scale of 0–255).

Design and procedure

The four RSVP streams updated synchronously and were each 20 items long. The two targets were always presented simultaneously and were preceded by between six and ten distractor-only frames. Four conditions were presented. In three of the four conditions, cues were used to direct attention to either one or two locations 67 ms prior to target onset (see Fig. 1). The cues were salient red bars presented so as to vertically bracket a particular RSVP stream. The bars were about 0.1 deg high × 1 deg in width and were presented 0.1 deg above and below the RSVP stream. The cues were time-locked to the onset of the RSVP frame immediately preceding the target and offset contiguously with the onset of the target-containing frame. Participants were told, “At the end of each trial, type in the letters you saw. There are at most two, and they are always different.” Participants were free to enter zero, one, or two letters at a response prompt, after which feedback was provided. Participants were shown a set of four practice trials with an SOA of 267 ms to illustrate each of the four conditions.

Fig. 1
figure 1

In Experiment 1, four horizontally arranged rapid serial visual presentation (RSVP) streams with an SOA of 67 ms were shown. On every trial, two targets (randomly chosen letters) were shown simultaneously at a randomly chosen time in the RSVP streams. Cues (pairs of red lines) directed attention one frame prior to the onsets of the targets. The four conditions shown above were randomly mixed within a single block. Target streams were counterbalanced across trials to include all combinations of locations of the two targets equal numbers of times across conditions

Two targets were employed in all four conditions. In the no-cue condition, both targets were uncued. The two-valid condition employed two cues that correctly indicated which two streams would contain targets. In the one-valid condition, just one cue was used. In the valid–invalid condition, one of two cues correctly indicated a target location, and one indicated a nontarget location. In the analyses described below, targets that were valid are referred to as C, whereas uncued targets are referred to as U. In some cases, when it is necessary to draw a distinction between two targets of the same kind, subscripts will be used, as in Ca Cb. Note that these designations are arbitrary, since the two cued targets are interchangeable in these analyses, except where noted. It is important to note that U denotes a target that was invalidly cued in the valid–invalid condition.

The stimuli were selected pseudorandomly, according to the restrictions that no distractor stimulus was immediately repeated within any RSVP stream and that both distractor and target stimuli would be unique among the four stimuli on screen at any given time. This experiment had a three-factor design, with the first factor being Condition (four levels), the second being the Location of One Target (four levels), and the third being the Location of the Other Target (three levels). Thus, the presentation of cues at each of the four conditions was fully counterbalanced. With the sole exception of the invalid cue in the valid–invalid condition, cue locations were determined by their respective targets. In that exceptional case, the invalid cue was presented in one of the two streams that did not contain a target, selected at random. Trials were randomly intermixed within a block of 144 trials, such that subjects could not develop an expectation about what trial type was to occur, nor predict the locations of the targets. Completing a trial took roughly 5 s, whereas the time between trials was approximately 5 s. The entire experiment took approximately 20 min for the typical participant to complete.

Results

To explore the effects of directing attention we compared accuracy scores between conditions (Fig. 2). First, to demonstrate that the cues were effective, we compared the accuracy for reporting a cued target C in the one-valid condition (M = .84, SE = .03) with the accuracy for reporting an uncued target U in the no-cue condition (M = .56, SE = .03), which revealed a highly significant difference [t(18) = –7.31, p < .001].

Fig. 2
figure 2

Data from Experiment 1: Accuracy of target reports from the four conditions (top) and joint accuracy of T1 and T2 (bottom). C, cued target; U, uncued (or invalidly cued) target

Presenting two valid cues rather than one produced a slightly smaller cueing benefit. C accuracy was slightly lower in the two-valid (M = .80, SE = .02) than in the one-valid (M = .84, SE = .03), t(18) = –2.31, p < .04, condition. Although this indicates a slight reduction in the cueing benefit from the one-valid condition, the total magnitudes of the cueing benefit on accuracy (with respect to the uncued target baseline in the no-cue condition) were highly similar in the one-valid (.34) and two-valid (.30) conditions.

Complementary to these findings, the presentation of a single cue substantially impaired the report of uncued targets. This was shown by comparing the raw accuracies of the uncued target U in the no-cue (M = .50, SE = .01) and one-valid (M = .30, SE = .02) conditions, where U accuracy was significantly lower in the one-valid condition [t(18) = 7.69, p < .01].

The cost of an invalid cue

To determine the influence of an invalid cue on a concurrently presented valid cue, we compared C report rates between the two-valid and valid–invalid conditions, and we found them to be nearly identical and statistically indistinguishable: C accuracy was not significantly different in the two-valid condition (M = .80, SE = .03) and the valid–invalid condition (M = .79, SE = .02), t(18) = 0.319, p = .75.

Similarly, the presence of an additional cue in the valid–invalid condition had no significant effect on accuracy for the uncued target in one-valid (M = .32, SE = .03) relative to valid–invalid (M = .28, SE = .03) trials, t(18) = 1.66, p < .12.

Was only one of the two cues effective per trial?

Comparing the data from the two-valid and one-valid conditions provides evidence that the enhancement of C accuracy in the two-valid condition does not reflect a mixture of trials in which only one of the two cues was effective per trial. If attention were being deployed exclusively to only one cue in the two-valid condition, we would expect the visual system to behave as it did in the one-valid condition, in which only one cue appeared in an individual trial. If this were true, since we could not know which of the two cues was attended on a particular trial, the accuracy boost from the attended cue should be randomly distributed across the two cued targets. Thus, the accuracy for both targets in the two-valid condition would be roughly equal to 58 %, which is the average of the C and U accuracies in the one-valid condition. Instead, both cued targets were reported, on average, 80 % (SE = .09) of the time. This suggests that both cues in the two-valid condition were individually almost as effective as the single cue in one-valid trials.

Independence of report probabilities

Another line of evidence suggestive of parallel deployment of attention to both cues would be to find evidence that the report levels of both targets were mutually independent in the two-valid condition. We did this first by computing joint accuracy (i.e., the percentage of trials in which both targets were correctly reported). The accuracy benefit afforded by two valid cues was clear, since joint accuracy was significantly higher in the two-valid (M = .64, SE = .04) than in the no-cue (M = .28, SE = .03) condition, t(18) = –8.279, p < .01.

We next compared the joint accuracy in two-valid condition, which was .64 (SE = .04), to the product of the raw probabilities of reporting the individual targets Ca and Cb, which was also .64 (SE = .03). If attention had a tendency to focus on one of the two cued locations, the probability of reporting both targets on a single trial should have been lower than the product of the individual probabilities of reporting both targets.

In an additional independence analysis for the two-valid condition, we compared its raw C accuracy against Cb | Ca (the accuracy of reporting one target, conditional on the other being reported also). If attention had a tendency to be captured by one cue and not the other, then Cb | Ca should be lower than the raw C accuracy. This was not true, since the raw C accuracy (M = .80, SE = .03) was nearly identical to Cb | Ca accuracy (M = .79, SE = .03), t(18) = 1.21, p = .24. These results argue against a model in which cued items are read out serially. Such a model would predict that the accuracy of reporting a second item, given correct report of another item, should be worse than the raw probability of reporting either of the two items.

Did the two cues evoke one large spotlight?

One explanation for these results could be that the two cues evoked a large attended region that encapsulated both of them and all interstitial stimuli. To test for this possibility, an analysis examined the valid–invalid U report rates for trials in which the uncued target occurred in either of the middle streams and the two cues appeared in the outer streams. This arrangement would put the target U between the two cues. However, U report was much lower in this case (M = .26, SE = .03) than for the positionally equivalent C in the two-valid condition (M = .79, SE = .04), t(18) = 13.63, p < .01. Note that for this analysis in both conditions, we only included trials in which one of the targets (U for valid–invalid and Ca or Cb for two-valid) occurred in one of the middle two streams and the other target occurred in one of the outer two streams. To ensure that the observed effect was not an artifact of selecting only trials in which U was flanked by two cues, in another analysis we compared U report in the valid–invalid condition when U was either physically between the two cues (M = .27, SE = .03) or outside of them (M = .29, SE = .03). No difference was apparent in this case [t(18) = –0.47, p = .65].

How did spatial eccentricity affect the report of cued and uncued targets?

The previous analysis may have been confounded by eccentricity effects, in that either uncued or cued report might be worse for targets in the outer two positions. To look for such confounds, we broke accuracy down by the spatial arrangement of the targets. We found that performance was very similar regardless of whether the targets were in the middle two or the outer two RSVP streams, for both cued and uncued targets. An analysis of raw C accuracy in the two-valid condition showed that when both targets occupied the middle two streams, C accuracy (M = .82, SE = .03) was not different from C accuracy when the targets occupied the outermost two streams (M = .82, SE = .04). Furthermore, one-way analyses of variance (ANOVAs) comparing single raw target accuracy across target arrangement permutations in both the no-cue [F(3, 18) = 1.02, p = .31, η 2 = .02] and the two-valid [F(3, 18) = 0.01, p = .91, η 2 = .03] conditions yielded null results, suggesting that the effect of target eccentricity on accuracy did not confound the comparison in the previous condition. For a visual comparison of target accuracy across the target spatial configurations, see Fig. 3. Note that in these analyses, we collapsed across mirror-symmetrical arrangements.

Fig. 3
figure 3

Data from Experiment 1: Raw target accuracies across the possible spatial arrangements of targets for the no-cue (a) and two-valid (b) conditions. The targets in each RSVP stream are represented by U (uncued) and C (cued), whereas distractors are represented by D. Note that each arrangement includes its respective symmetrical counterpart (e.g., CCDD also includes DDCC trials)

Does divided attention occur only across the two hemifields?

To determine whether the evidence for divided attention observed in this study resulted primarily from either hemisphere having its own independent attentional system (Alvarez & Cavanagh, 2005), we compared accuracy in conditions in which the two targets were on the same side or different sides. If attention in each hemifield within this paradigm is mediated by an independent system, we would expect reduced report rates for targets that were on the same side of the central fixation in the two-valid condition. However, this was not observed. Same-side C accuracy (M = .78, SE = .03) was not significantly different from different-side C accuracy (M = .80, SE = .02), t(18) = –1.22, p = .24, among two-valid trials. Furthermore, a hemifield independence model suggests that a cue in one hemifield should have less effect on an uncued item in the opposite hemifield. We found evidence to the contrary. In the one-valid condition, an uncued target presented on the opposite side of fixation from a cue was reported at about the same level as an uncued target in the same visual hemifield as a cued target (M = .33, SE = .04, and M = .3, SE = .03, respectively), t(18) = –0.85, p = .41.

Discussion

These results replicate earlier findings that attention can be deployed to two separate locations, and they thereby contribute to a growing corpus of evidence that attention can be divided across locations. More importantly, these results illustrate that attention cued at two locations produces almost the same enhancement of report as when attention is cued at one location. Thus, the data from Experiment 1 suggest that, at least on a short time scale, attention is not described well as either a pool of limited resource or a biased competition between attended stimuli. Moreover, the poor accuracy of reporting an uncued target in the presence of one or two cues suggests that, whereas it may be possible to divide attention across two locations, within a very short window of time (in this case, 67 ms), attention converges at the location of the cues, making it more difficult for a subsequent uncued target to be reported. These results thereby predict that in the absence of cues, the ability to report two simultaneous targets will be superior, relative to the ability to report two sequential targets. This prediction was tested in the next experiment.

Experiment 2

In Experiment 1, presenting a cue in one stream produced a sharp reduction in the ability to report a subsequent target in a different location. This finding suggests that attention rapidly converges on the location of an attended region, such that it is difficult to report targets in noncued locations, a theory that was suggested explicitly by Jefferies and Di Lollo (2009) and Dubois et al. (2009). This idea predicts that simultaneous targets (i.e., lag 0) should be easier to detect than sequential targets. This is an important test, because a serial-spotlight model should always predict superior performance for sequential items. There have been previous indications that simultaneous presentation produces superior report in several experiments (Duncan et al., 1994; Petersen et al., 2012; Shih, 2000), but the requisite statistical analyses have not been performed to confirm that the joint probability of reporting two targets is higher at lag 0 than at lag 1.

In Experiment 2, we used four RSVP streams with an SOA of 100 ms and two targets (T1 and T2) temporally separated in six lag conditions (0, 1, 2, 3, 5, and 7). The expected finding was that two targets, one presented before the other, would elicit an attentional blink (AB; Raymond et al., 1992) effect at SOAs between about 200 and 500 ms. A typical finding from such studies is that this AB effect is observed across all locations (i.e., a target at one location will elicit an AB at other locations; Visser, Zuvic, Bischof, & DiLollo, 1999; Shih, 2000); however, when the T2 occurs in the same spatial location as the T1, and immediately after it (i.e., about 100 ms later), performance will typically be quite good. This finding is called lag-1 sparing (Chun & Potter, 1995) and is observed to be strongest when the two targets share the same spatial location (Visser et al., 1999). The question of interest is what happens when two stimuli are presented simultaneously at different locations. If it is possible for attention to be momentarily divided, we would expect lag-0 superiority relative to lag-1—that is, further sparing at lag-0.

Method

Participants

A group of 58 Syracuse University undergraduates participated in this experiment. Five participants, whose mean performance on T2 | T1 trials was less than two standard deviations below the overall participants’ mean, were not included in the analyses. Participation satisfied a requirement for an introductory psychology course. Those who participated in Experiment 1 were not eligible to participate in Experiment 2, and participants were naïve as to the purpose of the experiment.

Apparatus and stimuli

Experiment 2’s apparatus and stimuli were identical with those of Experiment 1.

Design and procedure

All four streams refreshed simultaneously, with an SOA of 100 ms. Six lag conditions (0, 1, 2, 3, 5, and 7) were used. Both the first target stimulus (T1) and the second target stimulus (T2) of any trial were randomly assigned to one of four streams for all lag conditions, including lag-0, the simultaneous condition. However, for any lag-0 trial in which T1 and T2 would have occupied the same stream, only T1 was displayed (see Fig. 4). Presentations always consisted of 20 items per stream, and T1 occurred randomly in Positions 6–10 of its stream. In every trial, after the last presentation of RSVP, participants were shown either a dot or a comma for 100 ms at the location of the fixation cross. Trials were concluded by asking participants to indicate which target stimuli (T1 and T2) they believed had been presented by typing them on a keyboard. Participants were also asked whether they had seen a dot or a comma. This secondary dot/comma task was intended to increase the likelihood that participants would remain fixated in the center. A total of 96 trials were randomly intermixed within a block. The experiment was preceded by a block of six practice trials with an SOA of 300 ms. Completing a trial took approximately 5 s, making the experiment last approximately 25 min for the typical participant.

Fig. 4
figure 4

In Experiment 2, four horizontally arranged RSVP streams were shown with letter targets and digit distractors. The figure illustrates possible target arrangements in the lag-0 and lag-1 conditions. Four conditions with longer lags were also presented. Note that the locations of T1 and T2 were counterbalanced across trials to occur equally in all possible arrangements

Results

The results are shown in Fig. 5. T1 accuracy was first analyzed using a two-factor ANOVA, combining same- and different-stream trials together across lags, excluding the lag-0 condition. We observed main effects of stream association (same or different stream) [F(1, 52) = 14.68, p < .001, η 2 = .22] and lag [F(4, 208) = 13.58, p < .001, η 2 = .21], as well as a significant interaction [F(4, 208) = 7.28, p < .001, η 2 = .123] between the two. The same two-factor ANOVA was also run on T2 accuracy. Again, we observed main effects of stream association [F(1, 52) = 120.70, p < .001, η 2 = .70] and lag [F(4, 208) = 23.292, p < .001, η 2 = .31], as well as their interaction [F(4, 208) = 18.6, p < .001, η 2 = .264].

Fig. 5
figure 5

Data from Experiment 2. Joint accuracy refers to the proportions of trials in which both T1 and T2 were reported successfully on the same trial in the different-stream condition. The dotted lines in the top two panels represent accuracy for reporting a T1 presented without a T2. Error bars represent standard errors

Excluding lag-0 trials, T2 accuracy was overall higher in the same-stream (i.e., T1 and T2 occurred in the same stream) trials (M = .62, SE = .03) than in different-stream (i.e., T1 and T2 occupied different streams) trials (M = .43, SE = .03), t(52) = –11.08, p < .001. On trials with only a single target, participants correctly reported that target on 78 % of the trials (SE = .03).

Lag-0 sparing

Our principal focus was on the different-stream condition, because it contained simultaneous targets. First, the effect of lag on joint accuracy (the proportion of trials in which T1 and T2 were both accurately reported) was investigated using a one-way ANOVA, and we found a highly significant effect of lag [F(5, 260) = 17.68, p < .001, η 2 = .25].

The main result of interest in these data was the performance in the lag-0 condition for different streams. If attention could be deployed simultaneously to both streams, then we predicted superior performance at lag-0 as compared to lag-1. For this analysis, in the simultaneous condition, the two targets were equivalent, but we distinguished them arbitrarily as T1 and T2 for the sake of comparison with the lag-1 condition. We conducted this analysis in three ways, comparing raw T2 accuracy, joint T1–T2 accuracy, and average T1–T2 accuracy, which is the average number of T1s and T2s reported, regardless of whether they were reported together. All three of these analyses showed lag-0 sparing. T2 accuracy was shown to be significantly higher at lag-0 (M = .56, SE = .04) than at lag-1 (M = .43, SE = .04), t(52) = 4.50, p < .01. Joint accuracy of the two targets was likewise higher at lag-0 (M = .34, SE = .04) than at lag-1 (M = .26, SE = .05), t(52) = 3.37, p = .01. The average of the T1 and T2 accuracies was also significantly higher at lag-0 (M = .62, SE = .03) than at lag-1 (M = .58, SE = .03), t(52) = 2.04, p < .05.

Discussion

Although participants performed best overall in same-stream trials, when T1 and T2 occurred in different streams, participants were more likely to report simultaneously presented targets than targets presented in immediate sequence (lag-1), an effect we refer to as lag-0 sparing. The existence of lag-0 sparing suggests that attention can be deployed to two locations simultaneously, but that it quickly converges around the first of two targets when they are presented sequentially (Dubois et al., 2009; Jefferies & Di Lollo, 2009). This finding is most clearly evident in the superior T1–T2 joint and average performance in lag-0 trials relative to lag-1 trials.

It is important to note that, whereas performance was better at lag-0 than at lag-1, it was worse in both cases than at lag-7. This finding is consistent with earlier findings of limitations in the rate at which information can be encoded that are evident when comparing simultaneous to successive stimulus presentations (Duncan, 1980). However, we wish to draw a distinction between simultaneous encoding and simultaneous deployment of attention. The data presented here suggest that the ability to deploy attention to multiple targets does not necessarily imply that those targets can be encoded into memory without interference. Thus, it is expected that spacing targets farther apart in time, beyond the range of the attentional blink, will typically improve performance, regardless of whether attention is parallel or serial. For further discussion of interference during encoding in RSVP tasks, see Wyble, Potter, Bowman, and Nieuwenstein (2011), Dell’Acqua, Dux, Wyble, and Jolicœur (2012), or Dux and Marois (2009).

General discussion

These experiments explored the capabilities and limitations of attentional deployment in response to salient stimuli, with the intent of discovering how attention differs when it is deployed to one or two locations. The observed evidence suggests a parallel deployment of attention, in line with other previous work concerning divided attention, which we reviewed in the introduction. Moreover, these data allow a comparison of the costs and benefits of divided attentional deployment by comparing trials with no cues to trials featuring one or two cues.

To review our findings, in Experiment 1 we employed zero, one, or two spatial cues presented among four RSVP streams. Cues that were valid enhanced the ability to report a target at that location substantially. Importantly, the magnitude of this enhancement for two simultaneously cued targets was just barely smaller (a benefit of 30 % accuracy) than the enhancement elicited by a single cue (a benefit of 34 % accuracy).

The combination of four conditions in Experiment 1 provided evidence that attention was in fact distributed across multiple locations. A joint accuracy analysis suggested that the effect was not due to a mixture of stochastic deployment to one or the other cued target. Furthermore, by looking at trials in which a target appeared in between two cues, we confirmed that attention was not deployed in the interstice. Finally, by comparing performance when the two cues were in the same hemifield or between hemifields, and also when an uncued target in the one-valid condition was either on the same side of fixation, relative to the cued target, or on the different side, we did not find evidence for hemifield independence in this task. The combination of these results supports a model in which attention can be rapidly deployed to two locations at the same time and in which the magnitudes of the attentional enhancement are similar, whether one or two locations are attended. However, whereas we have shown that the intervening area between two cues is not attended when the cues are in different hemifields, our method did not allow us to eliminate the possibility that two cues in the same hemifield might produce a single attentional window encompassing both cues. Our results were also insufficient to discount the possibility that attention is deployed in a circular “doughnut” shape (Jans et al., 2010).

Experiment 2 provided clear evidence that, within a certain time range, it is easier to identify two targets at different locations when they are presented simultaneously. With as little as a 100-ms lag time between the two targets, attention seemed to converge on the location of the first target, producing impaired report of the second target, as described by Jefferies and Di Lollo (2009). This lag-0 sparing was found for a raw analysis of T2 accuracy as well as for a joint analysis of T1 and T2 accuracy and an analysis of average T1–T2 accuracy. This finding that the joint accuracy of two targets is superior for simultaneous relative to sequential presentation provides an important statistical confirmation of previous findings in Duncan et al. (1994), when targets were either simultaneous or sequential. Note that these results run counter to those of Bichot et al. (1999), who found no difference between simultaneous and sequential presentation. However, the paradigm used here required participants to detect targets by their category rather than by color, and this may have facilitated the ability to encode those items into memory.

Theoretical implications

The results of Experiment 1 indicate that identification performance can be enhanced to nearly the same degree by cues at one or two locations, but this deployment of attention comes at an enormous cost to uncued locations. Furthermore, the impairment of report at an uncued location is similar whether one or two cues are presented. These results provide important constraints on models of attention, because it is difficult to explain these data from a strict limited-resources perspective. If a fixed pool of resources were divided between two attention windows, then the two-valid and valid–invalid conditions of Experiment 1 should have produced markedly less enhancement for the cued items, in comparison to the benefit provided by a single cue in the one-valid condition.

These results seem to partially contradict the findings that supported the zoom-lens model, in which adding more cues to a display increased reaction times to detect a target (Eriksen & St. James, 1986). Similarly, Hogendoorn, Carlson, VanRullen, and Verstraten (2010) found differences in the time to access visual information by asking participants to report the time on one or two animated clock faces. They found that cueing two clock faces increased the variability and the latency of the reported times value relative to encoding a single clock face. To understand the difference between our results and these, it is important to bear in mind that these paradigms measure reaction time rather than accuracy, and thus may measure different aspects of the cognitive processes required for report. We suggest that limitations in the rate of information extraction may be distinct from limitations in attentional deployment. This idea is supported by modeling efforts aimed at understanding target encoding within the attentional blink paradigm, in which two targets are presented at different SOAs (Wyble et al., 2009; Wyble et al., 2011). In that model, the rate of consolidation of information into memory is affected by the number of items being encoded, such that even strongly attended items are encoded more slowly when multiple items have to be processed simultaneously. Thus, paradigms in which reaction time is a dependent variable may reveal the interference that results from encoding information from two locations that are strongly attended.

The present results also suggest that the control of attention is more complex than a biased competition between cued and uncued stimuli. Data from neurons in the inferotemporal cortex of monkeys suggest that stimuli compete with one another for representation by cells with large receptive fields in the later stages of the visual system (Chelazzi et al., 1998; Desimone & Duncan, 1995). According to this framework, attention might act by biasing the competition in favor of the attended stimuli, such that more of the neurons in the brain process information from the attended stimulus at the expense of the unattended stimulus. Although this theory explains many findings in the attention literature (including the results of the one-valid condition in the present Exp. 1), it cannot readily explain how attention can be enhanced at two locations to nearly the same degree as at one location. Biased competition describes an inherent trade-off between the processing of two stimuli, such that performance improves for a given stimulus at the expense of another. Therefore, the present results pose a difficulty for a theory in which biased competition is the primary means of deploying attention. Note that the finding of divided attention per se is not what argues against competition, since it is possible for a competitive framework to deploy attention to multiple locations concurrently. Even a winner-take-all model of attention can simulate divided regions of attention, given certain assumptions (Sandage et al., 2005). Rather, the similarity of the attentional enhancements between the one-valid and two-valid conditions of Experiment 1, as well as the superior joint performance at lag-0 relative to lag-1 in Experiment 2, argues against competition as the mechanism of attentional enhancement.

To explain these results, we propose a convergent gradient-field (CGF) model of attention, which combines existing theories of a gradient field (Cheal et al., 1994) and a convergence of attention (Jefferies & DiLollo, 2009). In this model, attention is initially deployed broadly across a region of the visual field (Jonides, 1983), monitoring for salient stimuli that can trigger the deployment of transient attention (Nakayama & Mackeben, 1989; Yeshurun & Carrasco, 1998). Following the onset of either cues or targets, transient attention is deployed at the corresponding location(s), producing a gradient of attention across the visual field, with one peak at the location of each target or cue (Cheal et al., 1994). We further propose that following the deployment of attention at one or more peaks, monitoring of the remaining visual field is momentarily suspended, effectively producing a convergence of attention around these peaks. Once attention has converged, it is difficult for attention to be deployed to other locations until processing of the currently attended stimuli is completed. This facet of the theory explains why report of an uncued item in the one-valid condition of Experiment 1 is so poor, because attention had already converged at the location of the cue prior to the onset of the targets, and therefore the target was less able to recruit attention. However, in the two-valid condition, attention could be deployed at both cued locations prior to the convergence of attention.

The CGF model behaves like a single spotlight of attention in most respects; if two attentional triggers (cues or targets) occur at different locations with a time lag of between 100 and 700 ms, attention will still be engaged by the first stimulus when the second appears, which limits the ability of the second stimulus to recruit attention. However, unlike a spotlight, the CGF model predicts a critical window during which it is possible to deploy attention to two locations. Note that this description of attention as a CGF resonates at least in part with a recently published computational model of attention (Zirnsak et al., 2011). This model also predicts that cues that are nearly simultaneous will produce attention deployed to both locations, if the SOA between the cues and the target is sufficiently short.

Importantly, in a gradient-field model, attention is represented as variations in the rate of information acquisition at different locations in the visual field (Cheal et al., 1994). Each peak in the gradient field produces an enhancement of processing at that location, and it is possible for multiple such peaks to coexist. Subsequent to the stage of attentional processing, multiple items would enter a stage of memory encoding and may interfere at that point. In the case of highly familiar stimuli, such as the letters used here, evidence suggests that it is possible to encode at least two items in parallel with a slight amount of interference, and the addition of further items to the encoding process produces progressively greater amounts of interference (Dell’Acqua et al., 2012; Wyble et al., 2011).

Our results also provide constraints on the time course of attentional convergence. By comparing the results of the uncued and one-valid conditions, Experiment 1 indicates that within just 67 ms of the onset of a spatial cue, the ability to report a target at a noncued location decreases dramatically relative to the condition in which there was no cue. This finding suggests that a highly salient cue causes attention to converge at the cued location within less than 100 ms. This time scale is broadly congruent with similar estimates from other paradigms that have used multiple simultaneous targets (Jefferies & DiLollo, 2009) or cues (Dubois et al., 2009).

Conclusion

The results of the present experiment confirm that attention converges rapidly around the location of a single cue, greatly reducing accuracy at other, uncued locations. However, if two cues appear simultaneously, attention seems to converge at both locations, producing cueing benefits that are nearly identical to those produced by a single cue. These results place important constraints on a theory of attention, because they preclude models in which attention is a strictly limited resource that must be divided across locations. A second experiment demonstrated convergent support for simultaneous enhancement at two locations by comparing the joint probability of reporting two items that were simultaneously or sequentially presented at different locations, revealing superior performance in the simultaneous condition. These results suggest that further studies comparing the magnitude of cueing effects with varying numbers of cues against an uncued baseline may provide critical data for attention models.