Search for visual objects and events is controlled by selection intentions. To find the object we are looking for, we need to know (and internally represent) the properties of this object. The critical role of top-down control in attentional selectivity was emphasized by William James (1890), who described “anticipatory ideational preparation” as one of the two mechanisms that implement the attentive process. In recent models of attention, top-down control is conceptualized as the guidance of attentional selectivity by attentional templates (e.g., Desimone & Duncan, 1995; Duncan & Humphreys, 1989) or top-down task sets (e.g., Folk, Remington, & Johnston, 1992). At the phenomenological level, attentional templates are explicit selection intentions. At the functional level, they are representations of task-relevant features and/or objects that are held in working memory and guide the attentional selection of objects and events that match these features (e.g., Carlisle, Arita, Pardo, & Woodman, 2011; Olivers, Peters, Houtkamp, & Roelfsema, 2011). In computational models, attentional templates are implemented as top-down weights that determine the impact of visual features on the central salience map that guides the spatial allocation of attention to visual objects (e.g., Müller & Krummenacher, 2006; Wolfe, 1994, 2007; see also Bundesen, 1990, for similar ideas). Although salient visual events such as abrupt onsets or feature singletons will sometimes attract attention in a purely bottom-up, stimulus-driven fashion (e.g., Theeuwes, 1991, 2010), top-down control can override the impact of bottom-up salience, such that visually salient events (feature singletons or abrupt onsets) will capture attention only if their features match a currently active attentional template. For example, Folk and colleagues (e.g., Folk & Remington, 1998; Folk et al., 1992) have demonstrated that spatially uninformative singleton cues trigger behavioral spatial-cueing effects indicative of attentional capture only when their features match the current task set, while physically identical cues will not capture attention when their properties are irrelevant in a given task context.

Event-related brain potential (ERP) markers of attentional selectivity are very useful to demonstrate the impact of top-down task sets on the attentional selection of salient visual events. The N2pc component is an enhanced negativity over posterior scalp electrodes that emerges around 200 ms after the onset of a visual search array contralateral to the side of an attended stimulus and is linked to the spatial selection of candidate target items among distractors in visual search tasks (Eimer, 1996; Eimer & Kiss, 2008; Luck & Hillyard, 1994; Mazza, Turatto, Umiltà, & Eimer, 2007; Woodman & Luck, 1999). This component is not sensitive to preparatory attentional orienting prior to the arrival of a target stimulus (Kiss, Van Velzen, & Eimer, 2008) or to pure space-based selection in the absence of visual objects (Woodman, Arita, & Luck, 2009), but instead marks the spatially selective enhancement of visual processing for objects that match current target-defining features. Given these properties, the N2pc can be employed as an online electrophysiological marker of the current activity profile on the spatiotopic salience map that guides the attentional selection of visual objects (Wolfe, 1994). In support of the hypothesis that attentional capture is contingent on current task sets, it has been demonstrated that feature singleton cues trigger an N2pc when they are task-set matching (e.g., red singleton cues during search for red targets), but not when the singleton feature is task-irrelevant (e.g., red singleton cues during search for small targets; e.g., Eimer & Kiss, 2008; Eimer, Kiss, Press, & Sauter, 2009; Lien, Ruthruff, Goodin, & Remington, 2008; Leblanc, Prime, & Jolicœur, 2008). Although there is also N2pc evidence for salience-driven attentional capture (Hickey, McDonald, & Theeuwes, 2006), recent research has shown that the presence of this apparent bottom-up effect is determined in a top-down fashion by observers’ knowledge about the temporal demands of the current attentional-selection task (Kiss, Grubert, Petersen, & Eimer, 2012).

The fact that attentional capture by feature singletons is determined by top-down task sets opens up the possibility to employ attentional-capture effects in order to study the nature and properties of such task sets. For example, the question of whether attentional target selection is guided by a feature-specific task set (e.g., “find the red object”) or is directed toward any feature discontinuity (“find the odd one out”; singleton search mode; Bacon & Egeth, 1994) can be decided by investigating whether singleton cues that do not match the current target features still capture attention. In a previous study (Eimer & Kiss, 2010), we demonstrated with behavioral and ERP measures that such cues did indeed trigger attentional capture when participants had to find one of two equally likely color singleton targets, indicating that they adopted a singleton search mode in this task. In contrast, no attentional capture by nontarget singletons was elicited in a task in which color served as the criterion for distinguishing go and no-go stimuli (see also Folk & Anderson, 2010, for similar findings), demonstrating that attention was now guided by a feature-specific task set.

In all previous studies that have used spatial-cueing paradigms to investigate the role of top-down task sets in attentional capture, the targets were defined in terms of one specific feature or feature dimension (e.g., “red” or “any color discontinuity”). But visual search in real-world environments is rarely directed toward an isolated feature value (“find any red item”), but is usually guided more specifically toward visual target objects that are defined by a combination of different features (e.g., search for a “red” and “round” and “small” cherry). The task sets that are activated in such more naturalistic circumstances are obviously more complex than the single-feature task sets that are typically studied in spatial-cueing experiments. The objective of the present study was to determine how such multiple-feature task sets are organized. One possibility is that each target-defining feature is represented separately and independently in an attentional template. According to this independent feature list account, search for a cherry is guided by a task set in which each cherry-defining attribute is represented separately and independently. This implies that any visual object that matches at least one target feature would attract attention (i.e., all objects that are red or round or cherry-sized).

This account is in line with the core assumptions of most current models of visual attention. For example, a central claim of Feature Integration Theory is that visual objects are generated by conjoining features during focal-attentional processing (e.g., Treisman, 1988; Treisman & Gelade, 1980). Because independent features exist preattentively, whereas integrated objects do not, attentional selection necessarily operates at the feature level on the basis of information that is provided by independent input modules for each perceptual dimension (e.g., color, orientation, or motion). The Guided Search model (Wolfe, 1994, 2007) makes similar assumptions. In this model, visual information is initially analyzed in parallel, in separate, feature-specific channels, and these channels send input to a spatiotopic salience map that determines where attention will be allocated. Top-down control is implemented as the task-dependent weighting of the inputs of different feature channels. Importantly, these weights are applied independently to each channel, and each channel contributes independently to the activity profile on the activation map. In other words, attentional control operates independently and in parallel for each feature channel.

An alternative possibility is that when observers search for targets that are defined by a combination of features from different dimensions, attentional templates do not represent these features separately and independently, but instead as fully integrated object representations. If attentional templates were organized in this way, objects that only shared some features with the current target object (e.g., a red rose encountered while searching for the cherry) should not capture attention, since single shared features have no independent role in the control of attention. In this case, attention would only be attracted by visual objects that possess all key-defining features (i.e., all objects that are red and round and cherry-sized), but not to partially matching objects. This alternative hypothesis, that visual search is guided by integrated representations of target objects, is consistent with object-based attentional-selection accounts (e.g., Duncan, 1984), and also with evidence that visual working memory (where top-down attentional templates reside) represents integrated visual objects rather than individual features (e.g., Luck & Vogel, 1997). However, this theory is obviously at odds with key assumptions of Feature Integration Theory and Guided Search (see above).

To study whether attentional templates are independent feature lists or integrated object representations, we employed behavioral and ERP measures of attentional capture and a variant of the spatial-cueing procedure introduced by Folk et al. (1992) that had been adapted for use with ERPs in our previous work (e.g., Ansorge, Kiss, Worschech, & Eimer, 2011; Eimer & Kiss, 2008; Eimer et al., 2009). On each trial, a cue array preceded a search array. The search arrays contained one singleton bar that always differed in both its color (red or blue) and its size (small or large) from the other five distractor bars, which were gray and medium-sized (see Fig. 1). Participants were instructed to attend to target bars in the search array that were defined by the combination of one specific color and size (e.g., a small red bar), in order to report the orientation of these target bars (horizontal or vertical) with a left or right buttonpress. Target bars were present on half of all trials. In the other half, the search arrays contained one color/size singleton bar that matched only one or neither of the two target-defining attributes. No response was required on these trials. Target color (red or blue) was varied across participants. Target size (small or large) was varied across blocks, with eight successive blocks in which participants had to respond to small target bars followed by eight blocks with large target bars, or vice versa.

Fig. 1
figure 1

Schematic illustration of the trial sequence, stimulus displays, and response requirements in the present study, separately for blocks with small targets (left panel) and large targets (right panel)

The cue arrays that preceded each target array also contained a color/size singleton, and the location of this singleton was not informative with respect to the location of the feature singleton in the subsequent search array. Critically, the singleton cue could match both the current target color and the current target size (C+S+; e.g., a red small cue in blocks in which participants searched for small red target bars), match only the current target color but not target size (C+S–), match only target size but not target color (C–S+), or match neither of the two target-defining features (C–S–). The critical question was which of these cues would capture attention. To answer this question, we measured behavioral spatial-cueing effects (i.e., RTs to targets at cued vs. uncued locations) and N2pc components separately for each of the four cue types. The stimulus onset asynchrony between cue and search arrays was 200 ms, which is sufficient to obtain cue-triggered N2pc components that are not contaminated by early sensory ERP responses to subsequent search arrays (e.g., Eimer & Kiss, 2008; Eimer et al., 2009). We chose to employ only small singleton cues, and not cue arrays in which size singletons were larger than the context elements, because an earlier study (Kiss & Eimer, 2011) had shown that cue arrays with large size singletons trigger an early asymmetric sensory response prior to the N2pc, due to the fact that these arrays were not fully balanced across hemifields in terms of low-level visual attributes. Because no such early ERP asymmetry was observed for small-size-singleton cue arrays, these were used in the present study. Because small feature singleton cues were presented in all blocks, their designation as S+ or S– cues varied across the two halves of the experiments in which participants searched for small or large targets, respectively. They matched the current target size in the part of the experiment in which participants searched for small targets, and served as S– cues in the other half, in which large targets were task-relevant.

In line with previous demonstrations of task-set-contingent attentional capture (e.g., Eimer & Kiss, 2008; Folk et al., 1992), C+S+ cues were expected to capture attention, as reflected by behavioral spatial-cueing effects and an N2pc component, because they matched both target-defining features. In contrast, no attentional capture was predicted for fully mismatching C–S– cues. The critical question was whether attentional-capture effects would also be found for partially matching cues (C+S– and C–S+). If top-down task sets were represented as independent feature lists, these cues should indeed trigger attentional capture, since the presence of one target-defining feature would be sufficient to attract attention to its location. According to the alternative hypothesis that attentional templates are represented as integrated object representations, attentional-capture effects should be triggered solely by fully matching C+S+ cues, but not by cues that only match one of the two task-relevant attributes: Because each target-defining feature could not guide the allocation of attention independently, the joint presence of both target features would be necessary to trigger task-set-contingent attentional capture.

Experiment 1

Method

Participants

A group of 14 volunteers took part in this experiment. One participant was excluded from further analyses because of an error rate exceeding 10 % in at least one condition, and another was excluded because of extremely slow response times (RTs; 3 SDs above the group mean in all conditions). The remaining 12 participants (mean age 28.1 years; five male, seven female) all had normal or corrected-to-normal vision.

Stimuli and procedure

On each trial, a cue display and a search display were presented in rapid succession, each for 50 ms and separated by a 150-ms blank. The search array consisted of six horizontal or vertical bars placed equidistantly along the circumference of an imaginary circle (diameter 8.4°) around the central fixation point (see Fig. 1). The orientation of each bar was randomly assigned. Each search display contained a singleton bar defined by a combination of color (red or blue; CIE chromaticity values .619/.339 and .154/.092) and size (small or large; 0.4° × 0.8° and 0.9° × 1.8°) that was presented randomly and equiprobably at one of the four lateral locations among gray (CIE chromaticity values .286/.312) medium-size (0.6° × 1.3°) bars. All of the colors were equiluminant (10.3 cd/m2).

The targets were defined by a specific combination of color and size. Six participants searched for a small red bar in eight successive blocks, and for a large red bar in eight other successive blocks. The other six participants searched for a small blue bar in eight blocks and for a large blue bar in another eight successive blocks. Half of the participants started with small targets, the other half with large targets. On 50 % of the trials, the singleton bar in the search array was a color/size target and participants were required to report its orientation (horizontal or vertical) by pressing one of two vertically aligned buttons. On the other half of trials, the singleton bar matched only one or neither of the two target-defining attributes with equal probability, and participants were instructed to refrain from responding on these trials. Sixteen blocks of 48 trials per block were delivered (eight blocks with small targets and eight blocks with large targets), resulting in a total of 768 trials. On 24 trials per block, the search arrays included a color/size target singleton bar. Arrays with singleton bars that only matched the target size, only the target color, or neither, were each presented on eight trials per block.

On each trial, the search array was preceded by a cue array that was composed of six sets of four closely aligned dots that preceded the search display at the same locations as the stimuli in the search array. Each cue array contained a color/size singleton that was presented randomly and equiprobably at one of the four locations in the left or right hemifield. This singleton cue was always smaller (0.4° × 0.4°) than the other five items (each 0.8° × 0.8°) and was equally likely to be red or blue, while all of the other items in the cue array were gray. Singleton cues were defined with respect to their match with the currently active task set. C+S+ cues matched both the target color and the target size (e.g., small red singletons in blocks in which participants searched for small red target bars). C+S– cues matched the current target color but not its size (e.g., small red singletons in blocks in which large red bars were the targets). C–S+ cues matched target size, but not target color, while C–S– cues matched neither of the two target-defining attributes.

Electroencephalography (EEG) recording and analysis

A continuous EEG was DC-recorded from 23 electrodes mounted in an elastic cap at standard positions of the extended International 10–20 System. The EEG was sampled at 500 Hz with a 40-Hz low-pass filter. During recording, all electrodes were referenced to the left earlobe and were re-referenced offline to the average of both earlobes. Trials with artifacts (vEOG exceeding ±60 μV, hEOG exceeding ±25 μV, or any other channel exceeding ±80 μV) were removed from the analyses (on average, 9.3 % of all trials). The continuous EEG was segmented from –100 ms before cue onset to 400 ms post-cue-onset. The waveforms were averaged separately for each combination of the four cue types (C+S+, C+S–, C–S+, or C–S–). The N2pc components elicited by color singleton cues were quantified on the basis of ERP mean amplitudes in the 190- to 270-ms time window following the onset of the cue array at posterior lateral electrodes PO7 and PO8.

Results

Behavioral performance

The mean correct RTs for trials in which targets appeared at cued versus uncued locations are presented in Fig. 2 (top panel), separately for each of the four cue types (C+S+, C–S+, C+S–, and C–S–). RTs were faster in blocks in which participants searched for large targets among medium distractors than in blocks with small targets [560 vs. 605 ms; t(11) = 3.86; p < .003]. More importantly, RT spatial-cueing effects indicative of attentional capture were present in trials with C+S+ cues, but were very small or entirely absent on trials in which cues did not match the current target color, target size, or both.

Fig. 2
figure 2

Mean correct response times (RTs) to cued and uncued target locations, separately for the four cue types (C+S+, C–S+, C+S–, and C–S–) in Experiment 1 (top panel) and Experiment 2 (bottom panel)

A first analysis contrasted the spatial-cueing effects on RTs in trials with fully target-matching and fully mismatching cues (C+S+ vs. C–S–). A main effect of cue validity, F(1, 11) = 8.83, p < .02, was accompanied by a highly significant cue validity by cue type interaction, F(1, 11) = 41.6, p < .001, reflecting the expected pattern of task-set-contingent attentional capture: A highly significant spatial-cueing effect of 37 ms was present on trials with C+S+ cues, t(11) = 5.05, p < .001, while no reliable RT differences emerged between targets at cued versus uncued locations on trials with C–S– cues (–4 ms; t < 1). The critical second analysis compared spatial-cueing effects on trials with fully and partially target-matching cues (C+S+ vs. C+S– vs. C–S+). We found a main effect of cue validity, F(1, 11) = 7.4, p < .02. Critically, a significant cue validity by cue type interaction was again obtained, F(2, 22) = 15.9, p < .001, demonstrating that the magnitudes of attentional capture differed between fully and partially matching cues. Follow-up t-tests revealed that in contrast to the highly significant 37-ms cueing effect triggered by C+S+ cues, no reliable cueing effects were found for either C–S+ cues (–2 ms; t < 1) or C+S– cues [9 ms; t(11) = 1.36, p = .2]. In summary, RT spatial-cueing effects indicative of attentional capture were only elicited by fully target-matching cues, but not by any of the other three cue types. This was further confirmed by Bonferroni-corrected pairwise comparisons of spatial-cueing effects between cue types, which demonstrated that the cueing effects triggered by C+S+ cues differed reliably from the valid–invalid RT differences observed for the other three cue types [all ts(11) > 3.89, all ps < .01]. In contrast, we found no difference between the (nonsignificant) spatial-cueing effects for any combination of partially matching and nonmatching cues [all ts(11) < 1.7, all ps > .117].

Error rates were low. Incorrect keypresses were registered on average on 1.8 % of all target trials (ranging between 0.7 %, for trials with valid C+S+ cues, and 2.6 %, for trials with invalid C+S+ cues). No main effects or interactions involving the factors Cue Color, Cue Size, or Cue–Target Location emerged for error rates. False alarms occurred on 0.1 % of all trials with nontarget singleton bars.

N2pc components

Grand-averaged N2pc components elicited in response to the four different cue types are shown in Fig. 3. N2pc amplitudes were largest when both cue color and size matched the current target-defining features (C+S+ cues). For C+S– and C–S+ cues, N2pc amplitudes were attenuated, and they were smallest for C–S– cues.

Fig. 3
figure 3

Grand-average event-related brain potentials (ERPs) of Experiment 1, elicited at posterior electrodes PO7/8 contralateral and ipsilateral to the location of the color/size singleton cue, presented separately for blocks with small targets (S+ cues, top row) and with large targets (S– cues, bottom row), and for target-color cues (C+ cues, left side) and non-target-color cues (C– cues, right side). For each cue type, topographical N2pc scalp distribution maps are shown. These represent differences between brain activity measured in the N2pc time window (190–270 ms after cue onset) over hemispheres ipsi- and contralateral to the singleton cue and were constructed by spherical spline interpolation (Perrin, Pernier, Bertrand, & Echallier, 1989) after mirroring the difference amplitudes to obtain symmetrical but inverse amplitude values for both hemispheres

Analogous to the RT analyses, the first analysis of N2pc mean amplitudes (measured in the 190- to 270-ms poststimulus time window) compared the N2pc components for fully target-matching C+S+ cues and fully mismatching C–S– cues. A main effect of contralaterality, F(1, 11) = 41.1, p < .001, was accompanied by a Contralaterality × Cue Type interaction, F(1, 11) = 53.8, p < .001, confirming that the N2pc was larger for C+S+ than for C–S– cues (–2.79 vs. –0.62 μV; SEM: 0.4 vs. 0.17 μV). Follow-up analyses comparing contralateral and ipsilateral ERPs were conducted separately for both cue types, and here we found that the N2pc was reliably present not only for C+S+ cues, t(11) = 7.04, p < .001, but also for C–S– cues, t(11) = 3.69, p < .01. The second analysis compared the N2pc components for fully and partially target-matching cues (C+S+ vs. C+S– vs. C–S+). A main effect of contralaterality, F(1, 11) = 33.3, p < .001, was accompanied by a Contralaterality × Cue Type interaction, F(2, 22) = 14.2, p < .001, demonstrating significant N2pc amplitude differences between these three cue types (see Fig. 3). Follow-up t-tests confirmed that N2pc components were reliably present for partially target-matching cues [C+S– and C–S+; ts(11) = 3.99 and 3.49, respectively, both ps < .01]. However, the N2pc to fully target-matching cues was larger than the N2pcs to either of the two partially matching cues C+S– and C–S+ (–1.88 vs. –0.85 μV; SEM: 0.47 vs. 0.24 μV), as confirmed by Bonferroni-corrected pairwise comparisons [both ts(11) > 2.55, both ps < .014, one-tailed]. Finally, the N2pc to fully-mismatching C–S– cues was smaller than the N2pc to partially matching C+S– cues [t(11) = 3.4, p < .01, one-tailed] and smaller than the N2pc to C–S+ cues in a 200- to 270-ms postcue time window [t(11) = 2.01, p < .05, one-tailed].

Discussion of Experiment 1

The behavioral effects obtained in Experiment 1 were clear cut. As predicted, spatial-cueing effects on RTs were present for fully target-matching C+S+ cues, but not for fully mismatching C–S– cues, thereby demonstrating that task-set-contingent attentional capture (Folk et al., 1992) is not just elicited under conditions in which task sets specify a single target feature or target dimension, but also when targets are defined by a combination of features from two dimensions. The critical behavioral finding of Experiment 1 was that partially matching C+S– and C–S+ cues failed to trigger reliable behavioral spatial-cueing effects. This contrast between the presence of large and reliable behavioral attentional-capture effects for fully matching cues, and the absence of such effects for partially matching cues is remarkable, as it appears inconsistent with the assumptions of models such as Feature Integration Theory or Guided Search, which postulate that the top-down guidance of attention operates independently for different feature-specific channels. If this was correct, the presence of one target-defining feature singleton in the cue array should be sufficient to attract attention, irrespective of whether the other target-defining feature is also present. The fact that behavioral attentional-capture effects were only triggered by fully matching cues provides strong prima facie evidence against the idea that top-down attentional task sets are represented as independent feature lists, and this is in line with the hypothesis that fully integrated object representations play an important role in the control of attentional object selection.

However, the N2pc results obtained in Experiment 1 tell a different story. As expected, a large N2pc was triggered in response to C+S+ cues, indicating that these cues captured attention. Relative to C+S+ cues, N2pc components were strongly attenuated for C–S– cues, demonstrating that attentional capture by C+S+ cues was primarily contingent on the currently active top-down task set. The most important observation concerned the pattern of N2pc results on trials with C+S– and C–S+ cues. Even though N2pc amplitudes on these trials were reliably attenuated relative to trials with C+S+ cues (Fig. 3), it is obvious that partially target-matching cues triggered substantial N2pc components, which suggests that they captured attention. In fact, the reduction of N2pc amplitudes for partially matching cues relative to C+S+ cues is perfectly consistent with the hypothesis that attentional capture is determined independently by information from different feature dimensions. In guided search (Wolfe, 1994, 2007), the activation profile on the salience map that determines where spatial attention is allocated depends on the top-down weighting of input channels that code target-defining attributes. Because each channel contributes independently and additively to this activation profile, fully matching cues should trigger a stronger spatial bias than do partially matching cues, due to the input from two rather than just one top-down-weighted channel. When we compared N2pc amplitudes for C+S+ cues with the sum of the N2pc amplitudes in response to C+S– and C–S+ cues, no difference emerged [t(11) < 1], in line with the view that the N2pc reflects the additive contributions of independent feature channels to the current activation pattern on the salience map.

Even if the contributions of color and size channels to the N2pc were additive, they were certainly not equal, as the N2pc was reliably larger for C+S– than for C–S+ cues [t(11) = 2.37, p < .05; Fig. 3]. It is possible that because target color remained constant for each participant, while target size changed between the two parts of Experiment 1, participants based their search primarily on color and processed size only once a target-color object had been selected. Several aspects of the observed results make this account unlikely. First, it predicts that target-color cues should trigger RT spatial-cueing effects indicative of attentional capture, regardless of whether they match the current target size. This was clearly not the case, as reliable RT spatial-cueing effects were only triggered by fully matching C+S+ cues. There was no difference in the sizes of the (nonsignificant) spatial-cueing effects elicited by C+S– and C–S– cues [t(11) = 1.59, p = .14], which is also inconsistent with a generic attentional prioritization of target color. Furthermore, as this prioritization should only be observed in the second half of the experiment (after target size had changed while target color remained constant), N2pc amplitude differences between color-matching and -nonmatching C+ and C– cues should be larger in this second half. In fact, we found a nonsignificant tendency for this difference to be larger in the first half [1.83 μV vs. 1.37 μV; t(11) = 1.35, p = .2], which again is not in line with the color prioritization account. A more likely explanation for the fact that N2pc amplitudes were larger for C+S– than for C–S+ cues is that color singletons generally trigger larger N2pc amplitudes than do feature singletons in other dimensions. For example, Seiss, Kiss, and Eimer (2009, Exp. 2) found smaller N2pc components in response to task-relevant shape singletons than to relevant color singletons. The question of whether such N2pc differences reflect a general tendency for color singletons to be more potent attractors of attention than singletons defined in other visual dimensions has not yet been systematically addressed.

Another notable observation was that although the N2pc to fully mismatching C–S– cues was smaller than the N2pc elicited by fully or partially matching cues, it was still reliably present. This suggests that these nonmatching cues sometimes captured attention in a purely salience-driven, bottom-up fashion. How can this observation be reconciled with our interpretation of the N2pc results in terms of attentional control by independent feature lists? The fact that N2pc amplitudes to C–S– cues were attenuated relative to the other three cue types demonstrates that bottom-up salience cannot be the sole cause for attentional capture by these cues, and that top-down control played an important role. However, the presence of an N2pc to C–S– cues certainly shows that attentional control of search for color/size targets was not perfect, and that salience-driven attentional capture by nonmatching singleton cues did occur on a subset of trials. Residual N2pc components for task-irrelevant visual feature singletons have already been observed in previous ERP experiments that studied single-feature search for color or shape (Eimer & Kiss, 2010; Seiss et al., 2009), and have also been interpreted in terms of imperfect attentional control. It will be important to investigate in future studies whether the increased working memory demands imposed by search for feature conjunctions relative to single-feature search produces larger impairments in the efficiency of top-down attentional control, and thus increases the probability of salience-driven capture by task-irrelevant singletons.

Overall, the pattern of behavioral and ERP results observed in Experiment 1 presents an intriguing conundrum. While the N2pc data strongly suggest that attentional object selection is guided independently by information from different feature dimensions, the pattern of behavioral spatial-cueing effects points equally forcefully toward integrated object representations as the critical unit in top-down attentional control. We will return to this issue in the General Discussion. In Experiment 2, we investigated the effects of a simple change in task instructions on the behavioral and ERP markers of task-set-contingent attentional capture. While targets were defined by a specific color/size conjunction in Experiment 1 (e.g., “red and small”), they were defined as feature disjunctions in Experiment 2. For example, participants had to respond to singleton bars when they were red or when they were small (including bars that were both small and red), and to ignore bars only when they were blue and large. Because targets were no longer associated with a distinct combination of color and size in Experiment 2, attentional target selection should not be guided by fully integrated object representations. If the behavioral spatial-cueing effects observed in Experiment 1 reflect attentional guidance by integrated object representations, a qualitatively different pattern of effects should be obtained in Experiment 2, in which target selection can be controlled independently by color and size: Cues that match only one of the two target-defining features should now trigger spatial-cueing effects indicative of attentional capture.

Experiment 2

Method

Participants

A total of 15 new participants were recruited to take part in Experiment 2. Three were excluded from the analyses due to excessive eye movements. The remaining 12 participants (mean age 26.3 years; five male, seven female) all had normal or corrected-to-normal vision.

Stimuli, procedure, and analyses

The same stimulus design was used as in Experiment 1, with the following exceptions. The singleton bar was now equally likely to be a small red, large red, small blue, or large blue bar. Participants had to respond to bars that matched the current target size (large or small, in eight successive blocks), the current target color (red or blue, varied across participants), or both, and to ignore bars that matched neither of the two target features. Therefore, response-relevant target bars were present on 75 % of all trials. Although bars that matched only one of the two target features were now response-relevant targets, the corresponding cue types are still referred to as “partially matching” (C+S– and C–S+ cues), to maintain consistency with the terminology employed in Experiment 1. Each block now contained 64 trials, resulting in a total of 1,024 trials. In all other respects, the procedures were identical to those of Experiment 1. On average, 6.5 % of all trials were excluded from the analyses due to eye movement and other artifacts.

Results

Behavioral data

Figure 2 (bottom panel) shows the mean correct RTs for trials with targets at cued versus uncued locations, separately for the four cue types (C+S+, C–S+, C+S–, and C–S–). In marked contrast to Experiment 1, RT spatial-cueing effects indicative of attentional capture were now apparent not just for C+S+ cues, but also for cues that only matched one of the two possible target attributes. As in Experiment 1, the first analysis contrasted the spatial-cueing effects in trials with C+S+ versus C–S– cues. We found a main effect of cue validity, F(1, 11) = 11.9, p < .005, and an interaction between cue validity and cue type, F(1, 11) = 4.8, p < .05, again demonstrating the predicted pattern of task-set-contingent attentional capture: A significant spatial-cueing effect of 29 ms was present on trials with C+S+ cues, t(11) = 3.93, p < .002, and there was no reliable RT difference between targets at cued versus uncued locations on trials with C–S– cues (7 ms; t < 1). The critical second analysis again contrasted the attentional-capture effects for fully matching cues and for cues that matched just one target attribute (C+S+ vs. C+S– vs. C–S+). Unlike in Experiment 1, the main effect of cue validity, F(1, 11) = 40.7, p < .001, was now no longer accompanied by an interaction between cue validity and cue type, F(2, 22) < 1, suggesting that spatial-cueing effects indicative of attentional capture did not differ between these three cue types. Follow-up t-tests confirmed that the cueing effects triggered by C+S+, C–S+, and C+S– cues (29, 25, and 26 ms, respectively) did not differ from each other [all ts(11) < 1]. In summary, cues that matched both or just one target-defining feature elicited equivalent behavioral attentional-capture effects, while these effects were absent for nonmatching C–S– cues.

Incorrect keypresses were registered on average on 1.7 % of all target trials (ranging between 0.7 %, for trials with valid C+S+ cues, and 2.5 %, for trials with invalid C+S+ cues). No main effects or interactions involving the factors Cue Color, Cue Size, or Cue–Target Location emerged for error rates. False alarms occurred on 2.6 % of all no-go trials.

To confirm that qualitatively different patterns of attentional-capture effects were obtained for the partially matching cues in Experiments 1 and 2, the RT data obtained in both experiments on trials with C+S– and C–S+ cues were combined for an analysis that included Experiment as an additional between-subjects factor. We found a main effect of cue validity, F(1, 22) = 19.4, p < .001, and, more importantly, an interaction between cue validity and experiment, F(1, 22) = 11.3, p < .003, reflecting the absence of spatial-cueing effects for partially matching cues in Experiment 1, and the presence of such effects in Experiment 2.

N2pc data

Figure 4 shows the grand-average N2pc components elicited by singleton cues, separately for the four cue types. In contrast to Experiment 1, N2pc amplitudes were very similar across all cues. This was confirmed by the analysis of C+S+ versus C–S– cues (–2.33 vs. –2.32 μV; SEM: 0.32 vs. 0.31 μV), which obtained a main effect of contralaterality, F(1, 11) = 60.3, p < .001, without any interaction between contralaterality and cue type, F(1, 11) < 1. Likewise, the analysis of cues that matched both target-defining features (C+S+) or only one of them (C+S– and C–S+; –2.34 vs. –2.48 μV; SEM: 0.29 vs. 0.34 μV) yielded a main effect of contralaterality, F(1, 11) = 64.6, p < .001, but no interaction between contralaterality and cue type, F(1, 11) < 1.

Fig. 4
figure 4

Grand-average event-related brain potentials (ERPs) of Experiment 2, elicited at posterior electrodes PO7/8 contralateral and ipsilateral to the location of the color/size singleton cue, presented separately for blocks with small targets (S+ cues, top row) and with large targets (S– cues, bottom row), and for target-color cues (C+ cues, left side) and non-target-color cues (C– cues, right side). Topographical maps of N2pc scalp distributions are also shown for each cue type, calculated as in Fig. 3

To confirm that qualitatively different N2pc patterns were obtained for partially matching cues in Experiments 1 and 2, the N2pc data obtained in both experiments for C+S– and C–S+ cues were combined, analogously with the between-experiment analysis for the RT data. A main effect of contralaterality emerged, F(1, 22) = 77.4, p < .001, and, more importantly, an interaction between cue validity and response instruction, F(1, 22) = 5.9, p < .03, reflecting the fact that the N2pc amplitudes for partially matching cues were smaller in Experiment 1 than in Experiment 2.

Discussion of Experiment 2

The behavioral effects obtained in Experiment 2 were again clear cut, but they were qualitatively different from the results observed in the first experiment. Reliable effects of spatial cueing on RTs were found not only on trials with C+S+ cues, but also with C+S– and C–S+ cues. In contrast, C–S– cues that did not match either target-defining feature did not trigger spatial-cueing effects. This is the pattern of results that would be predicted on the basis of the hypothesis that the guidance of attention is based on independent representations of task-relevant visual features. Because disjunction targets were used in Experiment 2, the presence of one target-matching feature was sufficient to make a singleton bar in the search array task-relevant. There was no need for participants to employ an integrated object representation as an attentional template, and the behavioral results demonstrate that the control of attention was indeed guided independently by separate feature channels. Interestingly, the impacts of these separate channels on behavioral attentional capture were nonadditive: The spatial-cueing effects were not larger for C+S+ than for C+S– and C–S+ cues. The fact that the pattern of behavioral spatial-cueing effects was qualitatively different in Experiments 1 and 2, where physical stimulus parameters were the same, but task demands either encouraged or discouraged the control of attention by integrated object representations, provides evidence that feature-based and object-based attentional templates have distinct roles in the top-down attentional guidance of search for visual singleton targets that are defined in multiple dimensions.

The N2pc results observed in Experiment 2 also differed from the pattern found in the first experiment. N2pc components were no longer largest for C+S+ cues and smallest for C–S– cues, but were in fact equal in size for all four cue types. In a previous study (Eimer & Kiss, 2010), we interpreted the presence of an N2pc for singleton cues that did not match the current target features as evidence that participants had adopted a singleton search mode (“find any feature discontinuity, regardless of its value”; Bacon & Egeth, 1994). The pattern of N2pc results observed in Experiment 2 suggests that when faced with a demanding “disjunction search” instruction, participants opted for this search mode, so that all feature singleton cues captured attention, regardless of their value. However, even though C–S– cues triggered large N2pc components, there were no corresponding behavioral spatial-cueing effects. As in Experiment 1, we are again faced with a puzzling dissociation between the electrophysiological and behavioral markers of attentional capture.

General Discussion

The aim of this study was to investigate whether the guidance of attention during search for targets defined by a combination of features from different dimensions (color and size) is based on independent feature lists or on integrated object representations. We measured the behavioral and electrophysiological markers of task-set-contingent attentional capture in two experiments that employed the spatial-cueing paradigm developed by Folk et al. (1992). The critical new factor was that the feature singleton cues differed from their context elements in both color and size, and that the task relevance of the features in these two dimensions was varied independently.

The pattern of behavioral spatial-cueing effects observed in both experiments provides strong prima facie evidence that attentional guidance is based on integrated object representations rather than on independent feature lists. In Experiment 1, in which targets were defined by a conjunction of color and size, only fully matching C+S+ cues triggered spatial-cueing effects indicative of attentional capture, whereas partially target-matching cues did not. If the allocation of attention was guided in parallel by independent inputs from different feature channels, as is postulated in the Guided Search model of visual object selection (Wolfe, 1994, 2007), the presence of one target-matching feature in the cue display should have been sufficient to trigger attentional capture, and spatial-cueing effects should therefore have been observed for fully as well as for partially matching cues.

It is conceivable that fully matching cues trigger a stronger spatial bias because the additive combination of target-matching inputs from two different dimensions produces a steeper location-specific gradient on the salience map. The observation that spatial-cueing effects were restricted to trials with C+S+ cues could thus be due to the fact that a strong cue-induced spatial bias is necessary to trigger reliable behavioral attentional-capture effects to subsequent targets. This possibility was ruled out by the behavioral results of Experiment 2, which used identical procedures, except that the presence of a task-relevant color or shape was now a sufficient target criterion, irrespective of whether the other target feature was also present. Such a difference in the task instructions between Experiments 1 and 2 should not affect the relative impacts of fully versus partially matching cues on the activation profile of the salience map. However, spatial-cueing effects of equivalent size were observed for fully and partially matching cues in Experiment 2, suggesting that the spatial bias triggered by both types of cues was sufficient to trigger reliable behavioral attentional-capture effects. The absence of such effects for partially matching cues in Experiment 1 is thus difficult to reconcile with the hypothesis that top-down attention is controlled independently by information from different feature dimensions. Instead, the qualitative differences in the patterns of attentional-capture effects between both experiments suggest that participants utilized different attentional templates: In Experiment 1, attention was guided by fully integrated object representations because the targets were defined by a conjunction of features. In Experiment 2, in which feature conjunctions were no longer relevant for finding targets, participants employed independent representations of both target features.

This conclusion that attention can be guided by integrated object representations appears to be supported by the systematic pattern of behavioral spatial-cueing effects obtained in Experiments 1 and 2. However, such a conclusion would be in direct conflict with leading models of visual attention, such as Feature Integration Theory or guided search. Are these models mistaken with respect to the basic architecture of attentional control? The pattern of N2pc results obtained in the present study shows that such a conclusion would be premature. The properties of the N2pc (its spatiotopic organization, its sensitivity to the presence of target-defining features, and its link to object-based rather than purely space-based attentional selection) suggest that this component is a direct electrophysiological correlate of the current activation profile on the salience map that guides the allocation of attention. In Experiment 1, the N2pc was largest in response to C+S+ cues, but it was also reliably triggered, albeit in an attenuated fashion, by partially target-matching cues, and even by fully nonmatching singleton cues, in spite of the fact that none of these cues triggered behavioral spatial-cueing effects indicative of attentional capture. This N2pc pattern is consistent with the hypothesis that additive top-down weighted inputs from independent color and size channels to the salience map produce a strong spatial bias toward the location of fully target-matching cues, and a somewhat reduced bias for partially matching cues. In other words, the N2pc results observed in Experiment 1 suggest that attention is guided by independently represented target-defining features from different dimensions.

A similar dissociation between the N2pc and behavioral spatial-cueing effects was found in Experiment 2. Here, the N2pc to C–S– cues was equal in size to the N2pcs triggered by fully or partially matching cues, indicating that participants had adopted a singleton search mode in which all feature discontinuities captured attention, irrespective of whether they matched the properties of the current target (Bacon & Egeth, 1994). But, unlike feature-matching cues, C–S– cues did not trigger behavioral spatial-cueing effects indicative of attentional capture, which is not the expected pattern for singleton search mode. The presence of reliable cue-triggered N2pc components in the absence of corresponding behavioral attentional-capture effects in both experiments is clearly puzzling. In most previous ERP studies of task-set-contingent attentional capture during search for feature singleton targets (e.g., Ansorge et al., 2011; Eimer & Kiss, 2008; Eimer et al., 2009; Leblanc et al., 2008; Lien et al., 2008), the presence of an N2pc component for task-set-matching stimuli was associated with corresponding behavioral attentional-capture effects (but see Eimer & Kiss, 2010, Exp. 2). Why did we instead find systematic dissociations between the ERP and behavioral markers of attentional capture?

The major new aspect of the present study was that targets were now defined by a conjunction (or disjunction) of two features from different dimensions, rather than by a single constant feature. The presence of reliable N2pc components to C+S– and C–S+ cues in Experiment 1 demonstrates that when task sets specify a feature conjunction, each feature is able to trigger attentional capture. The absence of corresponding behavioral spatial-cueing effects for these cues suggests that attention was no longer focused on the location of these cues at the point in time when the target arrays were presented. In other words, attention had been rapidly disengaged from cues that only partially matched the current conjunctively defined target. The hypothesis that rapid attentional disengagement is responsible for the absence of behavioral spatial-cueing effects for task-irrelevant singleton cues is not new. Theeuwes and colleagues (Belopolsky, Schreij, & Theeuwes, 2010; Theeuwes, Atchley, & Kramer, 2000) have argued that all salient singleton stimuli capture attention in a task-set-independent, bottom-up fashion, and that behavioral task-set-contingent attentional-capture effects (Folk et al., 1992) are the result of fast attentional disengagement from nonmatching cues. We have previously shown that this explanation cannot account for task-set-contingent attentional capture during search for single-feature targets (Ansorge et al., 2011). However, the present findings suggest that a sequential pattern of task-set-contingent attentional capture followed by rapid disengagement of attention from nontarget objects might indeed be a core component of top-down attentional control during search for targets that are defined by a combination of features (see also Fukuda & Vogel, 2011, for links between working memory capacity and recovery from attentional capture). In Experiment 2, in which participants searched for a color/size disjunction, all feature singleton cues initially captured attention, but only target-matching cues produced behavioral spatial-cueing effects. Again, this suggests that attention was withdrawn in a task-set-contingent fashion from cues that matched neither target-defining feature.

While this temporal sequence of attentional capture followed by rapid disengagement of attention from cues that do not match all target-defining features can account for the dissociation between ERP and behavioral markers of attentional capture, other interpretations remain possible. For example, the N2pc might pick up any cue-induced spatially selective modulations of the activity profile on the salience map, while behavioral spatial-cueing effects could be generated at a later stage of attentional object selection at which this spatial bias has to exceed a more conservative threshold in order to produce such effects. Our temporal account would predict that shortening the cue–target interval should result in the emergence of behavioral spatial-cueing effects for partially target-matching cues. This will need to be investigated in future studies. Another question for future research concerns the time course of template-guided attentional target selection across trials. In the present experiments, target-defining features were specified at the start of each block and remained constant for a series of blocks. There is evidence that under such conditions, attentional templates are transferred from visual working memory to long-term memory (Carlisle et al., 2011). It will thus be important to investigate attentional guidance by feature lists or integrated object representations in paradigms in which the target features relevant for a given trial are cued anew at the start of each trial.

From a methodological perspective, the dissociation between electrophysiological and behavioral markers of attentional capture observed in the present study shows that it can be seriously misleading to draw general inferences about the architecture of top-down attentional control on the basis of behavioral data alone. Had we only measured behavioral spatial-cueing effects, we would have concluded that attentional capture and attentional target selection are controlled by fully integrated object representations. The N2pc results show that such a conclusion would be incomplete at best, and that the initial capture of attention is largely driven by mechanisms that operate independently for different feature dimensions.

The aim of the present study was to find out whether the attentional selection of conjunctively defined targets is guided by independent inputs from parallel feature channels or by integrated object representations. Our findings suggest a two-stage selection scenario that incorporates both alternatives. The N2pc results are in line with attentional guidance by independent features. They show that during search for feature combinations, each task-set-matching feature attracts attention irrespective of whether the other target feature is also present. The pattern of behavioral attentional-capture effects indicates that target selection is then achieved by the rapid deallocation of attention from only partially matching visual objects. In other words, task-set-contingent attentional capture is under the control of input from independent feature channels, while the maintenance of attention at the location of candidate target events is determined by whether they match the attributes of an integrated object representation.