Introduction

Most visual scenes comprise a huge amount of details, which cannot be processed simultaneously. Therefore, the attentional system tries to filter out irrelevant information. There is an ongoing debate as to what extent this process is driven by properties of a scene or by intentions of an observer (Egeth & Yantis, 1997; Theeuwes, 2010). Goal-directed (“top-down” or endogenous) modulation of attention indicates that participants can intentionally allocate their (visual) attention to subsets of a scene or search array. For instance, response times tend to decrease if participants are informed about the color of an upcoming search target. On the other hand, stimulus-driven (“bottom-up” or exogenous) modulation of attention is described as an automatic and involuntary process that depends on physical properties of the stimulus features (e.g., stimulus onset, luminance, color). Accordingly, the onset or presence of a salient stimulus or event usually facilitates search performance (Burnham, 2007; Desimone & Duncan, 1995). Several features like color, orientation, size, or motion have been identified to modulate the mechanisms of visual attention (Wolfe & Horowitz, 2017).

It has also been discussed that (stereoscopic) depth information represents a feature that modulates attentional processing. The visual system is constantly required to extract (stereoscopic) depth information from three-dimensional (3-D) surroundings as a necessity to interact with the environment. For instance, it is crucial to know whether an object is located in close proximity and thus constitutes an obstacle or, alternatively, is located in far distance and is not immediately behaviorally relevant. In general, the deployment of attention across 3-D space has been of interest in several previous studies, yet it is still debated to what extent depth information is used to guide attention and whether it operates exogenously or endogenously. A large part of the related literature indicates that attention, indeed, can be voluntarily directed to specific depth planes in 3-D space. Many studies employed visual search paradigms to investigate the interaction of (stereoscopic) depth information and attentional processing. In such experiments, the search array is usually distributed across two (or more) depth planes. For instance, Nakayama and Silverman (1986) observed that items located in an unattended depth plane, which shared a feature with the target (i.e., same motion or color), did not interfere with the search for that target. Response times did not increase with increasing set sizes, and observers reported that distinct depth planes could be searched effortlessly (Nakayama & Silverman, 1986). Subsequent investigations, however, revealed that deployment of attention in 3-D space is determined by the overall perceptual organization of a visual scene. Accordingly, attention was not exclusively related to (stereoscopic) disparity (i.e., arbitrary points in space) but rather was spread along perceived surfaces (He & Nakayama, 1994, 1995).

During the past 25 years, several studies further investigated the interaction of attentional processing and depth information. Although not all studies confirmed that depth information facilitates visual search (e.g., O’Toole & Walker, 1997), there is ample support for the notion that attentional mechanisms are depth sensitive. Yet it remains a controversial issue whether attention in 3-D space spreads automatically or requires endogenous control. In their line of research, Theeuwes and colleagues asked participants to identify the orientation of a colored target bar (tilted to left or right). Participants responded faster when the depth-position (depth plane) of the target was highlighted by a predictive cue (Theeuwes, Atchley, & Kramer, 1998). However, it was observed that a distractor singleton (i.e., colored vertical bar that was not the target) located in a different depth plane than the target captured attention, slowing down responses. This effect was even evident when participants were confident about the target depth plane. Only when target and distractor were distributed across different depth planes and were at the same time differentially colored, attentional capture could be prevented (Theeuwes et al., 1998). Using a different visual search paradigm, Finlayson and colleagues found improved search performance when the search items were distributed across two depth planes compared with a search within a single fronto-parallel plane (Finlayson, Remington, Retell, & Grove, 2013). Furthermore, there was evidence that even small binocular disparities were sufficient to induce the impression of separate depth planes and, consequently, facilitated search. However, it was also shown that knowledge about the upcoming target depth plane constituted a prerequisite for higher search efficiency. The authors therefore concluded that depth information from binocular disparity is not processed involuntarily (Finlayson et al., 2013). Likewise, foreknowledge about depth planes was a prerequisite for improved search performance in a letter search task across separate depth planes (Dent, Braithwaite, He, & Humphreys, 2012).

In contrast, a recent study provided evidence that expectancy discrepant (or surprising) depth information caused involuntary attentional capture (Plewan & Rinkenauer, 2017b). Using a variant of the surprise capture paradigm (Gibson & Jiang, 1998; Horstmann, 2002), participants initially searched for one of two letters in a circular array within a single fronto-parallel plane without any further information about the target location. Following this phase of uncued trials, surprisingly, the target location was predicted by a valid depth cue (i.e., item presented in front of or behind the search plane). The depth cue was completely expectancy discrepant, but still an immediate increase of task performance was observed (i.e., lower error rates, faster responses). This effect, however, seems to be a relatively slow process as it was only observed in association with a long interval (400 ms) between cue and target (Plewan & Rinkenauer, 2017b). In line with this idea of a slow integration of depth information, a recent brain imaging study revealed that segmenting a 3-D search display correlates with activations in higher tier visual areas (Roberts, Allen, Dent, & Humphreys, 2015).

Another aspect related to depth processing that has been addressed in some studies is the relative depth position of items within a 3-D scene. This relates to relative differences between objects rather than their absolute distance from an observer. In case of a 3-D search task, the target can be located in a depth plane in front of or behind other items. Finlayson and Grove (2015) presented target and nontarget items, which were distributed across up to four distinct depth planes, and asked their participants to perform a search task. Targets in “near” depth planes were identified faster than those located in “far” depth planes, irrespective of their absolute distance to the observer (Finlayson & Grove, 2015). Using a flanker-task paradigm (Eriksen & Eriksen, 1974), it was shown that increasing separations between target and flankers in depth resulted in attenuated response-compatibility effects (Andersen & Kramer, 1993). Accordingly, interference was strongest when flanker stimuli were perceived in front of a target (crossed disparity) and in its close proximity. Additionally, a recent series of experiments using a simple reaction time paradigm reported that target objects, which are located closer to an observer, not only elicited shorter reaction times but also more forceful responses compared to target objects located farther away (Plewan & Rinkenauer, 2016, 2017a).

Such results were interpreted as support for the idea that there is an egocentric attentional search gradient through space. According to this view, attentional resources are not uniformly deployed across 3-D space and decrease along with increasing distance. In fact, objects in close proximity to an observer are considered to receive more attentional resources than similar objects farther away. This idea is further supported by studies investigating attentional reorientation in 3-D space. For instance, if attention is directed to a specific depth plane, attentional reorienting across depth planes takes time (Atchley, Kramer, Andersen, & Theeuwes, 1997) and can be performed faster toward closer objects (Chen, Weidner, Vossel, Weiss, & Fink, 2012). Such distinct effects, however, might be limited to experimental conditions performed under high perceptual load (Arnott & Shedden, 2000; Atchley et al., 1997). Other studies even indicated that attention cannot be restricted to a particular depth plane but rather spreads across 3-D space (Ghirardelli & Folk, 1996; Theeuwes & Pratt, 2003). For instance, Theeuwes and Pratt (2003) investigated inhibition of return (Posner & Cohen, 1984) to locations within and across depth planes. As expected, inhibition of return was observed when targets appeared at the cued location but also when the target was presented at a different depth plane (in front of or behind the cued location). Attention was thus not limited to a specific depth plane. A recent study, however, suggests that attentional mechanisms operate differentially in near and far depth planes. Inhibition of return was only associated with targets that were perceptually closer to the observers (Wang, Liu, Chen, & Zhang, 2016).

From a behavioral point of view, it seems plausible that depth information receives high processing priority. But, as outlined above, so far there is no general consensus about the interplay between (stereoscopic) depth information and attentional processing. In particular, it is unclear whether depth information has a similar impact on attentional processing as other well-studied stimulus features (e.g., color). Previously, the so-called additional singleton task (Theeuwes, 1991, 1992, 2010) has successfully been employed to investigate the contribution of different stimulus features on visual selection. In this task, a salient singleton is embedded in an otherwise neutral stimulus array (e.g., a green circle among red circles) or is additionally accompanied by an irrelevant singleton distractor (e.g., a red square). Participants are explicitly instructed to respond to the target and are aware that any other singleton is irrelevant to the task. Thus, increased reaction times associated with such irrelevant distractors imply attentional capture. Stimulus properties like color, form, or intensity have been shown to affect attentional capture to a different degree. For instance, reaction times increased in case participants searched for a green diamond among green circles when an irrelevant red circle was presented simultaneously (Theeuwes, 1992). In contrast, there was no such interference in the opposite search condition when a distinctively colored singleton and an irrelevant form distractor were simultaneously displayed. As a result, it was proposed that spatial attention is shifted to the location with the highest salience (e.g., an irrelevant but salient distractor). In line with that, it was reported that initial saccades frequently landed on the location of a salient distractor (Theeuwes, de Vries, & Godijn, 2003). Some studies confirmed this effect of oculomotor capture. However, it has also been shown that oculomotor capture might require transient changes in the stimulus display (Wu & Remington, 2003) or can actively be suppressed (Gaspelin, Leonard, & Luck, 2017). Moreover, it was argued that attentional capture does not necessarily reflect shifts of spatial attention but may rather be explained in terms of nonspatial filtering costs, since the presence of multiple singleton items requires additional time (Folk & Remington, 1998).

In order to learn more about the role of depth information in the visual processing hierarchy, the additional singleton paradigm (Theeuwes, 1991, 1992) was adapted in the present study. In three experiments, depth or color singletons (i.e., items presented in a distinctive depth plane or color) served as target or irrelevant distractor, respectively. Based on findings of previous studies investigating attentional mechanisms in 3-D space, it can be expected that salient depth information defining the target, facilitates target detection in a 3-D search array. More importantly, if depth information represents a modulator of attentional processing, distractors defined by depth are expected to involuntarily capture attention and inhibit search for other target features. Conversely, it is unclear whether the search for a target defined by depth is prone to attentional capture from a color distractor or distraction from other depth planes. If attention is not uniformly distributed across space and preferentially operates in closer depth planes, reduced or even no interference from distractors displayed behind a target is expected, neither by color distractors nor by items presented in a farther depth plane. Conversely, a target presented in a more distant depth plane may be more susceptible to interference effects since all distracting information appears in front of it. Thus, the relative position of target and distractor items within a search array may be a crucial modulator of attentional processing.

Experiment 1

Method

Participants

Twelve volunteers (three women) participated in the experiment and received either course credit or a 10€/h. Participants’ ages ranged from 19 to 29 years (median age = 25 years). All participants reported no history of psychiatric or neurological disorders and had normal or corrected-to-normal vision. Stereo vision capability was verified using the TNO (Netherlands Organization for Applied Scientific Research) test for stereoscopic vision (all participants revealed stereo thresholds of ≤ 120 arcsec), and color vision was tested using Ishihara’s test for color blindness (Ishihara, 1983). According to the Edinburgh Handedness Inventory (Oldfield, 1971), one participant was left-handed and one was ambidextrous. The experiment was conducted in accordance with the declaration of Helsinki, and all participants gave written informed consent. The experimental framework was approved by the Ethics Committee of the Leibniz Research Centre for Working Environments and Human Factors.

General procedure and experimental design

The experimental setup was generated using the virtual reality software Vizard 4 (© WorldViz, LLC). Stimulus material was presented via professional stereo head-mounted displays (HMD, nVisor ST50), with a resolution of 1,280 × 1,024, a refresh rate of 60 Hz (single frame rate 16 ms) and a 50° diagonal field of view. The visual focus of the HMD was set to 10 m. Both screen displays were arranged in a way such that they are placed closely in front of the participants’ eyes. Therefore, a vivid depth impression can be evoked via stereoscopic presentation. Participants were free to make head movements, yet visual stimulation was constant throughout the experiment, as stimulus coordinates were fixed to the HMD. Responses were recorded using custom-made response devices.

The additional singleton search task (e.g., Theeuwes, 1991, 1992) has previously been introduced to investigate top-down and bottom-up control of attention. In order to investigate the relationship between stereoscopic depth information and attentional processing, the experimental paradigm was adapted in the present study (see Fig. 1). Either six or nine rings were circularly arranged around a gray fixation point (diameter 0.4° visual angle) in front of a uniform black background. Each ring was rendered from a three-dimensional model of a torus, with an inner radius of 0.7° and a width of about 0.1°. The distance between each ring and the fixation point was 3.5°. One ring was a green colored singleton (target) while the remaining items were depicted in gray. Each ring encircled a white line segment (0.06° × 0.5°) which could be horizontal or vertical, or tilted 22.5° to either side with respect to horizontal or vertical orientation (see Fig. 1). All search items as well as the fixation point were displayed within the same depth plane, namely, a perceived viewing distance of 57 cm (central plane) with respect to the HMD device. However, depending on the experimental condition (see below), in each trial one distractor item could also be presented closer to (52 cm, near condition) or farther away from (62 cm, far condition) the participant. Visual angles were adjusted accordingly: Thus, the inner radius of the ring was 0.77° in near plane and 0.65° in far plane, while the width of the line segment was adjusted to 0.55° (near plane) and 0.46° (far plane). Adjusting the visual angle reflects the “natural” viewing experience; however, this is in contrast to related studies which kept the visual angle constant across depth planes (e.g., Atchley et al., 1997; Theeuwes et al., 1998). The binocular disparity between items in near and far depth planes was about 68.24 arcmin.

Fig. 1
figure 1

Illustration of the stimulus material used in Experiments 1 and 2. Upper row outlines stimuli as employed in Experiment 1. Left image depicts the small set with a color target and a far distractor. Right image depicts the large set with color target and a near distractor. The condition without distractor is not shown as well as the control condition without salient target and distractor. Middle and bottom row contain sample stimuli from Experiment 2. Left images represent small sets with far and near target accompanied by a color distractor. Right images display large sets with far and near target together with distractors displaced in the opposite direction (i.e., near and far, respectively). Again, the condition without distractor as well as control condition is not shown. Dotted rings were displayed in green in the actual experiment, smaller and larger rings represent farther and nearer rings, respectively. All figures are not drawn to scale

Each trial was initiated by the onset of the fixation point centered within the central depth plane. Participants were asked to fixate this point throughout the experiment. After a variable interval of 500 to 1,000 ms, the remaining part of the search display appeared and was presented until participants made a response. Following a further interval of 1,500 ms, a new trial started automatically. Participants performed a two-alternative forced-choice task, namely, deciding whether the target line segment was oriented in a horizontal or vertical direction. In case of an erroneous response, a short acoustical feedback (100 ms) was given.

In total, four different experimental conditions were tested. Three of them only differed in terms of an irrelevant distractor: The colored target was accompanied by a distractor item presented in the near or far depth plane (henceforth, the near or far distractor condition), or alternatively was presented among otherwise neutral items (henceforth, the no distractor condition). In addition, there was a neutral control condition that contained neither a colored target nor any salient (depth) distractors. In this condition, only neutral items were presented, and thus the position of the critical line segment was not highlighted. Participants were instructed that the colored item (if present) will always be the target, and that there is no need to search in other locations. Conditions were presented in a block-wise manner while set size (small: six items; large: nine items) was manipulated randomly within blocks. Each participant performed two blocks per distractor condition (i.e., eight blocks in total), and the sequence of blocks was individually randomized. Within each condition, all locations were equally likely to contain the target, while target orientation (horizontal vs. vertical) was randomly allocated on a trial-by-trial basis. It was concurrently ensured that distractors were never presented directly adjacent to the target. Participants performed 108 trials per condition and set size, which resulted in a total of 864 trials (108 × 2 (set sizes) × 4 (conditions)). Following each block, a short break was introduced that lasted at least 10 seconds and had to be terminated via button press. In order to familiarize with the task, all participants performed 72 independent training trials including only the control and no distractor condition. Overall, the experiment procedure took about 1 hour.

Task accuracy and speed (i.e., mean reaction times in correct trials) during the distractor conditions were determined and subsequently submitted to a 3 (distractors) × 2 (set size) repeated-measures analysis-of-variance (ANOVA). Results derived from the control condition were analyzed separately. Statistical analyses were performed using the free statistical software R (https://www.R-project.org/). Obtained F values, p values, and generalized eta squared (\( {\upeta}_{\mathrm{G}}^2 \)) are reported (Bakeman, 2005; Olejnik & Algina, 2003). In case (post hoc) t tests were conducted, corresponding t values and Cohen’s d (Cohen, 1988) are specified. In order to calculate search slopes for each participant, reaction times derived from the small set were subtracted from those obtained with the large set size and divided by the difference in number of items (i.e., 9 − 6 = 3).

Results and discussion

Task performance was high in all experimental conditions as indicated by a mean error rate of about 2.16%. Due to these low numbers, erroneous trials were not further analyzed.

The control condition was analyzed apart from the remaining conditions since there was no salient target and clearly more effort was required. Reaction times were substantially slower compared with the remaining conditions, and, as expected, the data reveal a strong set-size effect associated with the search task. Targets were identified much faster in the smaller set (1.43 s) than in the larger set (1.89 s), t(11) = 8.73, p < .001, d = 2.52. The associated search slope was steep (154 ms/item).

Reaction times of the remaining experimental conditions (no distractor, near distractor, far distractor) are depicted in Fig. 2. A 3 (distractors) × 2 (small vs. large set) ANOVA revealed an effect of set size, FS(1, 11) = 11.94, p = .005, \( {\eta}_G^2 \) = 0.004. More importantly, the different distractor types had no effect on search performance, FD(2, 22) = 1.37, p = .27, \( {\eta}_G^2 \) = 0.007. Likewise, the interaction of distractor type and set size was also not significant, FD×S(2, 22) = 0.06, p = .95. Search slopes in all three conditions were only modestly positive, and none of these slopes statistically differed from zero (no distractor: 5 ms/item, p = .08; near distractor: 4 ms/item, p = .14; far distractor: 6 ms/item, p = .11). In addition, these slopes were substantially shallower compared with the control condition (all ps < .001). This can be taken as evidence for parallel search across all items (cf. Theeuwes, 1991).

Fig. 2
figure 2

Mean reaction times from Experiment 1. Error bars represent 95% within-subject confidence intervals (Moray, 2008)

In contrast to a recent study that revealed attentional capture by expectancy discrepant depth cues (Plewan & Rinkenauer, 2017b), no effects related to irrelevant depth information were observed in the present experiment. Distractor items displayed in unattended depth planes did not interfere with the search for a salient color target. Previously, it has been discussed that an irrelevant distractor only captures attention when it is more salient than the target (Theeuwes, 1992). Most likely irrelevant depth information—as employed in the present experiment—is less salient than color information and thus does not capture attention involuntarily.

There was no uncertainty about the target identity and, accordingly, the attentional set could have been limited to the relevant feature (e.g., color). Under these conditions, visual search was not affected by additional depth information. However, the question remains what happens if participants search for targets located in predefined depth planes (near or far) while at the same time salient (color) distractors are displayed in a different depth plane. For this purpose, Experiment 2 was conducted testing two new groups of participants, with one group searching for targets in the near or far depth plane, respectively.

Experiment 2

The results from Experiment 1 imply that visual search for color targets is not prone to distraction from singletons displayed in other depth planes. Experiment 2 was designed to investigate the question whether the search for depth singletons is affected by a color distractor or a depth distractor displayed in a diametrically opposed depth plane.

Method

Participants

A new sample of 24 participants was recruited for Experiment 2. Participants were allocated to one of two subgroups (see below) in alternating order. All other criteria were identical to Experiment 1. Participants in Subgroup A were 20–34 years old (median age = 23 years, 11 women, two left-handed) while the age in Subgroup B ranged between 21 and 31 years (median age = 24 years, six women, all right-handed).

Procedure

The experimental framework was similar to Experiment 1. In this experiment, however, targets were singletons defined by depth instead of color. This means, the target was presented in front of or behind the items in the central depth plane. More specifically, Subgroup A was informed that the target will always appear in the depth plane located in front of the search array (near depth plane), while Subgroup B was aware that the target will be displayed behind the remaining search array (far depth plane; see Fig. 1). Again, participants in both subgroups underwent four experimental conditions (no distractor, color distractor, depth distractor, and control condition). No distractor indicates that only the depth singleton target was presented among neutral items. The color-singleton distractor was of the same color as the target in Experiment 1 and was always presented in the central depth plane among the nonsingleton items. Due to this stimulus configuration, distractor and target not only varied in terms of color but were also presented in different depth planes. However, only color was decisive for the distractor’s saliency because only this feature differed from the neutral nonsingleton items. Depth distractors were presented in front of or behind the search array, but were always diametrically opposed to the target depth plane. If the target was located in near depth plane (Subgroup A), the distractor was displayed in the far depth plane, and vice versa (Subgroup B). No target or distractor singletons were presented in the control condition. Thus, the orientation of the critical line segment had to be identified in an otherwise neutral search array.

Data analysis was also equivalent to Experiment 1, except that target location was included as a between subjects factor while distractor and set size were treated as within subject factors. Accordingly, a 2 (group: near vs. far target) × 3 (distractors) × 2 (set sizes) ANOVA was conducted.

Results and discussion

Task performance in both groups was comparably high, as observed in Experiment 1. The mean error rate across all conditions was 1.85% in Subgroup A and 1.95% in Subgroup B, respectively. Due to these low numbers, erroneous trials were not further analyzed.

In line with Experiment 1, a pronounced set-size effect for both groups was evident in the control condition. Targets were identified substantially faster within the small set compared with the larger set, Subgroup A: 1.46 vs. 2.09 s), t(11) = −6.22, p < .001, d = 1.80; Subgroup B: 1.32 vs. 1.76 s, t(11) = −9.77, p < .001, d = 2.82. The associated search slopes had a mean of 211 ms/item (Subgroup A) or 146 ms/item (Subgroup B) and did not statistically differ from each other, t(11) = 1.74, p > .10.

The remaining reaction times are summarized in Fig. 3 and were submitted to a 2 × 3 × 2 ANOVA. A significant main effect of distractor was found, FD(2, 44) = 13.08, p < .001, \( {\eta}_G^2 \) = 0.049, as well as a significant Group × Set Size interaction, FG×S(1, 22) = 6.64, p = .02, \( {\eta}_G^2 \) = 0.007. Reaction times did not differ between both groups, FG(1, 22) = 1.06, p = .31, and the interaction of group and distractor only approached significance, FG×D(2, 44) = 2.60, p = .085. In addition, no significant effect of set size was obtained, as was true for the remaining interactions (all ps > .22). In order to further disentangle these effects, separate ANAOVAs for each subgroup were performed. Reaction times in Subgroup A (near target) were significantly modulated by different distractor types, FD(2, 22) = 7.71, p < .01, \( {\eta}_G^2 \) = 0.049. Set size did not affect reaction time, and there was also no interaction between both factors (all ps > .19). Accordingly, search slopes did not differ from zero (no distractor: −13 ms/item, color distractor: −7 ms/item, far distractor: −6 ms/item; all ps ≥ .19). Paired-sample t tests revealed that reaction times were faster in the no-distractor condition (all ps ≤ .017), while no difference between color and depth distractor conditions (p > .10) was evident. A similar pattern was observed in Subgroup B (far target). Targets displayed behind the search array (far depth plane) were identified slower when a distractor was presented simultaneously, FD(2, 22) = 7.97, p = .01, \( {\eta}_G^2 \) = 0.07. There was a nonsignificant trend that targets in small sets were identified faster, FS(1, 11) = 4.68, p = .053, \( {\eta}_G^2 \) = 0.016, while search slopes did not differ from zero (no distractor: 12 ms/item, p = .14; color distractor: 32 ms/item, p = .06; far distractor: 12 ms/item, p = .12). The interaction of distractor and set size also did not approach significance, FD×S(1, 11) = 1.92, p > .17. Again, reaction times were faster in the no-distractor condition (all ps ≤ .011), while no difference between color and depth distractor conditions was found (p > .07).

Fig. 3
figure 3

Mean reaction times (RTs) from Experiment 2. Upper part shows RT from near target conditions and bottom part shows the analog data from far target conditions. Error bars represent 95% within-subject confidence intervals (Moray, 2008)

An inspection of reaction times obtained in the control conditions might suggest a difference in terms of general response speed between both groups (faster response in association with far targets). However, an inspection of reaction times in the remaining distractor conditions reveals that there is—if at all—an opposite trend (faster responses associated with near targets). The ANOVA also did not suggest any differences between both subject groups. Therefore, it seems unlikely that the observed effects can be attributed to a general difference between both subgroups.

More importantly, the results clearly revealed that participants benefit from the presence of depth singleton targets. Reaction times were substantially shorter than those observed in the control condition and not further modulated by increasing set size. Depth information is most likely utilized to trigger parallel processing of the search items, which holds true for targets presented in near and far depth plane. At the same time, search for a target in a predefined depth plane (near or far) was prolonged when an irrelevant distractor was presented simultaneously. This finding differs from Experiment 1 and denotes a competition between depth target and distractor singletons. Apparently, both distractors caused interference to the same degree (i.e., independent of the feature defining the distractor), although participants were confident about the target depth plane.

Moreover, it is also apparent that reaction times in general tended to be longer in comparison with those obtained in Experiment 1. In order to test this effect, mean reaction times from all no distractor conditions (including trials from small and large sets) were collapsed and submitted to an additional between-subjects ANOVA with the factor target (near, far, color). Indeed, there was a main effect of target condition, F(1, 33) = 3.92, p = .03, \( {\eta}_G^2 \) = 0.19, but direct comparisons (via Welch t test) revealed only significant differences between color target and far target, t(16.43) = 3.11, p < .01, d = 1.27; other p ≥ .19. This may indicate that color represents a more salient feature that is processed faster than depth information. Consequently, depth distractors in Experiment 1 would have been unable to cause interference as depth information is processed subsequent to color.

Another aspect that might have contributed to the observed differences between both experiments is the relative target position within the 3-D display. In fact, in Experiment 1, the actual target search was restricted to a single depth plane. In contrast, searching for a depth singleton target in Experiment 2 required (attentional) shifts across depth planes (i.e., from fixation point to the near or far depth plane). It is well established that functional networks for attention and eye movements are closely connected (Corbetta et al., 1998). Thus, it might be possible that interference caused by distractors was not related to attentional capture but rather to incompatibility of general oculomotor functions (e.g., vergence or accommodation). A third experiment was conducted in order to test whether (attentional) shifts across depth planes differentially modulate search performance.

Experiment 3

The visual search task in Experiment 1 could be performed within one depth plane, but not in Experiment 2. Although participants were asked to focus on the fixation point throughout the experiment, at least a covert shift of attention from fixation point to the target depth plane was necessary. It is unclear whether this contributed to the observed findings and caused prolonged reaction times. Thus, Experiment 3 was intended to explicitly test whether the initial position of the fixation point influenced the search task and whether this constitutes an attentional effect or reflects incompatibility of oculomotor functions. Accordingly, in this experiment, the fixation point was presented in the central depth plane (as in Experiments 1 and 2) or alternatively in the same depth plane as the target (near or far). It can be hypothesized that targets are detected faster if they are displayed in the same depth plane as the fixation point.

Method

Participants

A new sample of 15 participants was recruited for Experiment 3. All criteria were identical to Experiments 1 and 2. Participants’ age ranged from 19 to 34 years (median age = 24 years, nine women, all right-handed).

Procedure

The experimental task was a close replication of the previous experiments, but only the small stimulus set (six items) was employed. Three within-subjects variables were modulated in a factorial design: Target depth plane (near or far), distractor (color or depth distractor), and fixation plane (equal or unequal to target). The target singleton was defined by depth, but in contrast to both previous experiments each participant searched for targets in both depth planes (near and far). Prior to each experimental block the target depth plane of subsequent trials was specified. Overall, there were four experimental blocks separated by short breaks. Target position was randomized across blocks (i.e., two blocks of near and far targets). The fixation point was displayed in the central depth plane (as in Experiments 1 and 2) or alternatively was presented in the same depth plane as the target. This was varied across blocks (e.g., near target + near fixation; far target + central fixation, etc.; see Fig. 4). In line with Experiment 2, depth distractors were always presented diametrically opposed to the target depth plane while color distractors were presented in the central depth plane among the remaining neutral items. Distractor identity was randomized within each block (i.e., color or depth distractor). Neither a condition without singleton distractor nor a neutral control condition was conducted. Each block comprised 108 trials, including 54 depth and color distractor trials that resulted in a total of 432 experiment trials. Prior to the experiment, participants finished 72 trainings trials such that the overall experimental procedure lasted about 30 minutes. Data analysis was performed in line with both prior experiments. Thus, a 2 (target depth: near vs. far) × 2 (distractors: color vs. depth) × 2 (fixations: central vs. target plane) ANOVA was calculated.

Fig. 4
figure 4

Illustration of four possible stimulus relations in Experiments 3. In these schematic top-views of the stimulus arrays the dotted line represents neutral items while only target (T), distractor (D), and fixation point (small circle) are depicted. The actual stimulus display as perceived by the participants was similar to the examples in Fig. 1. Upper left quadrant: Target is displayed in the near depth plane, whereas fixation point and (color) distractor appear in the central depth. Upper right quadrant: Target is displayed in the far depth plane, whereas the fixation point is located in the central depth plane and the (depth) distractor appears in the near plane. Lower left quadrant: Target and fixation point are displayed in the near depth plane, whereas the (depth) distractor appears in the far plane. Lower right quadrant: Target and fixation point are displayed in the far depth plane, whereas a (color) distractor appears in the central plane

Results and discussion

As observed in Experiments 1 and 2, task performance was very high. Across all conditions, participants made errors only in about 1.20% of the trials. Therefore, erroneous trials were not further analyzed.

Reaction times are summarized in Fig. 5. According to the 2 × 2 × 2 ANOVA, there was no significant main effect of fixation, FF(1, 14) = 2.42, p = .14, \( {\eta}_G^2 \) = 0.008. Thus, it was not decisive whether the fixation point was displayed in the same depth plane as the target. Likewise, there was no main effect of distractors, FD(1, 14) = 1.57, p = .23, \( {\eta}_G^2 \) = 0.003, indicating that both distractor types similarly influenced the search task. The main effect of target depth plane was significant, FT(1, 14) = 9.29, p < .01, \( {\eta}_G^2 \) = 0.002. Overall, faster responses were observed when targets were presented in the far depth plane. This effect was associated with a significant Target Plane × Fixation Plane interaction, FT×F(1, 14) = 6.43, p < .024, \( {\eta}_G^2 \) = 0.003. Participants tended to react slower to targets presented in the near depth plane only in those cases when the fixation point was not displayed in the same depth plane as the target, color distractor: t(14) = −2.81, p = .01, d = 0.73; depth distractor: t(14) = −2.96, p = .01, d = 0.76. No differences between near and far targets were observed when fixation point and target were presented in the same depth plane (all ps ≥ .40). Finally, there was an interaction between distractor and fixation plane, FD×F(1, 14) = 4.95, p < .043, \( {\eta}_G^2 \) = 0.018, indicating that color distractors caused an interference only when fixation point and target were displayed in different depth planes. No such effect was related to depth distractors. The remaining interactions did not approach significance level (all ps ≥ .27).

Fig. 5
figure 5

Mean reaction times (RTs) from Experiment 3. Error bars represent 95% within-subject confidence intervals (Moray, 2008)

Overall, this pattern of results was similar to the previous experiment and revealed that an (attentional) shift from fixation point to target depth plane did not generally decelerate search performance. Therefore, it is unlikely that distractor effects observed in Experiment 2 originate from incompatibilities of oculomotor functions. However, it is also apparent that reaction times were slower when attention was shifted from the central to the near depth plane compared with the opposite direction. This was an unexpected findings, as various studies revealed that closer objects are processed faster (e.g., Finlayson & Grove, 2015; Plewan & Rinkenauer, 2016, 2017a; Theeuwes et al., 1998; Theeuwes & Pratt, 2003) or observed no general differences of reaction times (e.g., Finlayson et al., 2013; Plewan & Rinkenauer, 2017b; Theeuwes & Pratt, 2003). Some studies explicitly investigated spatial reorientation across different depth planes using 3-D versions of the spatial cueing paradigm (Posner, 1980). As expected, shifting attention across depth planes takes additional time. These costs were even more pronounced when reorienting from near to far stimulus locations was required. Accordingly, it has been suggested that more attentional resources are allocated to closer depth planes (Chen et al., 2012; de Gonzaga Gawryszewski, Riggio, Rizzolatti, & Umiltá, 1987; Downing & Pinker, 1985). However, it has also been shown that such effects may depend on the characteristics of the stimulus material. For instance, in a previous study, the virtual representation of the observers’ bodies within their field of view was manipulated. Attentional resources were focused to stimuli in close proximity when such a virtual body was presented, while they were allocated away from the observer when the virtual body was absent (Maringelli, McCarthy, Steed, Slater, & Umiltà, 2001). Conceptually, the present study differs from such previous experiments since no reorientation of attention was required. Participants were confident about the target depth plane, yet subtle changes of the experimental setup altered the deployment of attentional resources. Likewise, when initial fixation was directed to the central depth plane, it was observed that color distractors caused stronger interference. It is conceivable that these salient distractors bind attention to the central depth plane which in turn delays shifts to the target depth plane.

General discussion

In the present study, we investigated the impact of relevant and irrelevant depth information on attentional processing. In line with previous studies, it was confirmed that depth information in general facilitates visual search. Target singletons defined by depth were associated with shorter reaction times (compared with neutral control conditions) and effortless parallel search. However, this process is prone to interference by irrelevant depth and color distractors. At the same time, the impact of irrelevant depth information was limited as depth distractors did not inhibit the search for a salient color target. These findings indicate that attention indeed is depth sensitive, but it also appears that depth constitutes a relatively weak feature that may be integrated subsequent to other features. Thus, it remains unclear whether depth information operates strictly stimulus driven.

A recent study provided evidence that unexpected depth information automatically captures attention (Plewan & Rinkenauer, 2017b). Participants performed a demanding letter search task, and suddenly the target was highlighted by a depth cue. This immediately improved search performance in terms of reaction times and error rates, although participants had no reason to expect informative cues at all. Thus, involuntary attentional capture was induced by depth information without a specific top-down setting. In contrast, results from the present experiment are less consistent. It was hypothesized that irrelevant depth information should interfere with the search for a color target. This, however, was not the case for any depth distractor. No differences to distractor absent trials were observed. Previous investigations using the additional singleton task revealed that distractors’ saliency constitutes a prerequisite for attentional capture (Theeuwes, 1992). According to the model proposed by Theeuwes, attention is directed to the stimulus feature that generates the strongest activation on a preattentive level. In the present context, it can be assumed that target color is more salient than deviations in depth, and thus there was no capture effect in Experiment 1 (color target), while in Experiment 2 (depth target) color and depth distractors similarly captured attention.

Opposing theories suggested that attentional capture is always—at least partially—under top-down control (for an overview, see, e.g., Burnham, 2007). For instance, capture effects are assumed to be contingent on task-related (endogenous) control settings or perceptual goals (Folk, Remington, & Johnston, 1992; Folk, Remington, & Wright, 1994). Alternatively, it has been proposed that attentional capture may be a result of an (inappropriate) internal search mode (Bacon & Egeth, 1994). Following this idea, participants might have applied a singleton search mode in the present study instead of a more efficient feature search mode. Given that integration of depth information is apparently a relatively slow process, depth distractors could not modulate the search for color targets. Conversely, when processing depth singleton targets, color or depth pose a source for interference.

Color and depth information represent features that affect the allocation of attention, albeit the underlying processes may differ. Results derived from Experiment 1 indicate that color has a stronger impact on attentional processing (as it was not modulated by irrelevant depth information). If the impact of color on attention is indeed more pronounced, then it would have been expected that a color distractor also results in more pronounced effects of attentional capture. Findings from Experiment 2 did not confirm this assumption. Color and depth distractors equally slowed down responses. Experiment 3 provides evidence that color distractors may modulate reactions times more strongly than depth distractors. This differential effect was observed only when attentional shifts to the target plane were required and was not associated with a general slowdown of reaction times. Thus, it is unlikely that effects of attentional capture in the present experiments are related to functional differences in stimulus integration or incompatibilities of oculomotor functions. However, the inconsistent pattern of results suggests that the influence of depth information on attentional processing is more susceptible to specific task demands or stimulus configurations. Alternatively, it has also been proposed that effects of attentional capture are not exclusively spatial in nature (Folk & Remington, 1998), and thus additional processes (e.g., nonspatial filtering costs) might contribute to the integration of depth information.

In contrast to other stimulus features, depth information can be regarded as ambiguous signal. For instance, the relative position or size of an object represents an important monocular depth cue and must be integrated along with stereoscopic disparity. Hence, attentional processes might be modified by subtle differences within a 3-D display. In line with that, it has been shown that facilitating effects of disparity information in visual search might depend on specific aspects of the stimulus configuration (O’Toole & Walker, 1997). The authors reported that search performance varied when targets were presented with crossed or uncrossed disparities (i.e., targets perceived in front of or behind a fixation plane). Stimulus size or dimensions of the search array were shown to be important variables as well. Other previous studies often kept physical stimulus size constant, while perceived stimulus size was invariant across depth planes in the present study. This equals the natural viewing experience, but at the same time physical stimulus properties (i.e., retinal size) varies between different depth planes. It has recently been shown that reaction times are determined by perceived rather than by physical stimulus size (e.g., Plewan, Weidner, & Fink, 2012; Sperandio, Savazzi, Gregory, & Marzi, 2009). Also, it was lately reported that variations of perceived and physical stimulus size within a 3-D display (similar to the present study) did not substantially modulate reaction times (Plewan & Rinkenauer, 2016, 2017a). Thus, it is unlikely that effects observed here can be reduced to differences in (physical) stimulus appearance.

Another aspect that found empirical support in several previous investigations is the idea of an egocentric search gradient through space. A recent study, for example, reported that targets that are located closer to an observer are detected faster. Hence, it was concluded that visual search is performed along an axis from near to far locations (Finlayson & Grove, 2015). However, there are also findings that cannot easily be integrated into this model (e.g., Atchley et al., 1997; Finlayson et al., 2013; Theeuwes et al., 1998). The same is true for the present results: Assuming that there is an attentional gradient operating from near to far space, closer objects should be regarded as behaviorally more relevant, and hence as more salient. Accordingly, if a near target and a far distractor are presented simultaneously the near target should always capture attention first. If, in contrast, a far target is presented along with a near distractor, then a strict model of an attentional search gradient would predict that the (irrelevant) near location is scanned first. Yet, in the present study, irrelevant color or depth distractors affected the search task equally and, in Experiment 3, there were even faster responses associated with targets presented in the far depth plane.

Taken together, the present study provided further evidence that stereoscopic depth information competes for the allocation of attentional resources with other distinct features like color. Presented among other task-relevant items, target or distractor items defined by depth may possess a lower saliency, which in turn leads to slower stimulus integration. Moreover, no evidence in favor of an egocentric depth sensitive search gradient was observed, which might indicate that attentional resources may be allocated uniformly across depth planes. It can further be assumed that attentional effects which are related to depth information strongly depend on specific aspects of the 3-D search array and that task requirements might also be considered as contributing factor.