Introduction

The three-dimensional (3D) world contains almost unlimited information. Due to limited capacity, the amount of information is reduced along the visual pathways (Anderson, Van Essen, & Olshausen, 2005). In this process the allocation of limited attentional resources is essential to make an adequate interaction with the environment possible. Directing attention to locations within the visual field, which are of particular interest, represents an important step in this processing sequence. It is well established that shifts of attention facilitate the processing of stimuli in attended compared to unattended regions (Posner, 1980). Likewise, several stimulus features like color, size, or motion have been identified to modulate attentional mechanisms and visual selection (Wolfe & Horowitz, 2017).

Visual selection represents the passage of information from an initial pre-attentive stage to attentive processing (Theeuwes, 2010). There is still an ongoing debate whether visual selection is initiated voluntarily on the basis of intentional or goal-directed behavior of an observer (“top-down”) or is driven by properties of the stimulus itself (“bottom-up”). Accounts in favor of bottom-up mechanisms predict that stimuli or events are always (automatically) selected when they are sufficiently salient (Theeuwes, 2010). One example comes from research using the additional singleton paradigm. In this visual search task, participants search for a predefined and salient target (e.g., a green square among green circles). It has been shown that an additional salient item (e.g., red circle) substantially prolonged reaction times (RTs). It was concluded that this distractor draws attention away from the target. This effect of attentional capture was even true when participants were aware that the distracting feature (e.g., red color) is completely task irrelevant (Theeuwes, 1991, 1992). Top-down control, in contrast, indicates that attention is intentionally directed to a particular stimulus feature in the environment (Bacon & Egeth, 1994; Folk, Remington, & Johnston, 1992). The endogenous cueing paradigm introduced by Posner (1980) is a demonstration of top-down selection. If a centrally presented symbol predicts the target location participants responses become faster and more accurate. Moreover, there is evidence that salient distractors do not capture attention under all circumstances and therefore attentional capture may be dependent on current top-down settings (Bacon & Egeth, 1994; Folk & Remington, 1998; Folk et al., 1992; Folk, Remington, & Wright, 1994). In this regard, it has recently been shown that the need to search for a target (i.e., reduced number of potential target locations) modulated the impact of a salient non-target distractor (Bertleff, Fink, & Weidner, 2016, 2017).

Allocation of spatial attention has been mostly investigated in experiments presenting stimuli within a single fronto-parallel plane (e.g., a computer screen). This of course does not reflect natural viewing conditions and it is surprising that “depth” has often been neglected in previous research (van der Stoep, Serino, Farnè, Luca, & Spence, 2016). Interaction with a 3D environment requires extraction of information from different depth planes. It is possible to subdivide 3D scenes in distinct spatial regions, which differ in terms of their behavioral relevance (Previc, 1998). For instance, objects immediately surrounding an observer can directly be grasped and manipulated. On the other hand, elements located in the more distant external world are less relevant for direct behavioral adjustments but rather support more general functions like orientation or posture control. Thus, distance between object and observer might modulate perceptual processes and determine prioritization of certain stimuli. Some studies have already investigated the deployment of attention in depth and reported equivocal results. While there is evidence that information from unattended depth planes can be filtered out or that attentional processing operates differently in near or far depth planes, other investigations reported a less consistent pattern of results. Most studies used visual search paradigms to investigate the relation of depth and attention. For instance, Nakayama and Silverman (1986) presented search arrays of different sizes in which one target was defined by color, motion, stereoscopic depth, or a conjunction of these features. It was reported that RTs were not prolonged when items in an unattended depth plane shared features with the target. Thus, participants apparently were able to direct their attention to a specific depth plane. Similar results were reported in a study using a spatial cueing paradigm (Atchley, Kramer, Andersen, & Theeuwes, 1997). Four potential stimulus locations were equally distributed across two depth planes. The longest RTs were observed when an invalid position in an invalid depth plane was pre-cued. This also indicates that participants direct their attention to a specific depth plane. However, under conditions of low perceptual load the difference between valid and invalid depth-cues was no longer observed (Atchley et al., 1997). In another series of visual search experiments, elements of a search array were distributed across two depth planes while the identity of a colored bar (tilted to left or right) had to be indicated. There was interference from distractors presented in the invalid depth plane even when participants were completely confident about the target depth plane (Theeuwes, Atchley, & Kramer, 1998). Likewise, it was reported that search across different depth planes is only more efficient when the target depth plane and the task was an easy feature search (Finlayson, Remington, Retell, & Grove, 2013).

As outlined above there is no general consensus about the role of depth in attentional processing. Yet, several studies indicate that there is an egocentric attentional gradient through space that implies prioritization of stimuli in near compared to far depth planes (Andersen & Kramer, 1993; Arnott & Shedden, 2000; Blini et al., 2018; Chen, Weidner, Vossel, Weiss, & Fink, 2012; Downing & Pinker, 1985; Finlayson & Grove, 2015; Plewan & Rinkenauer, 2016, 2017; Wang, Liu, Chen, & Zhang, 2016). In a recent study, Finlayson and Grove (2015) presented a visual search task across up to four depth planes. Selection of target items was faster when they were presented in front of the search array (near depth plane), even though attention was directed to the most distant depth plane at the beginning of each trial. Converging evidence comes from a series of experiments using a simple reaction paradigm (Plewan & Rinkenauer, 2016, 2017): Spheres were presented in different distances (depth planes) and participants had to confirm the onset as fast as possible via manual response. In general, spheres elicited faster responses when they were perceptually closer to the observers (Plewan & Rinkenauer, 2017), and it was even evident that participants applied more response force in these conditions (Plewan & Rinkenauer, 2016). This effect was observed under natural viewing conditions (i.e., retinal size decreases with distance) where mechanisms of size constancy apply (Sperandio & Chouinard, 2015) and also when physical stimulus size was kept constant (i.e., perceived stimulus size increases with distance) across depth planes. Likewise, a recent study found that a stimulus identification task (sphere or cube) was performed faster in closer proximity to the observer (Blini et al., 2018), while this effect was independent of viewing conditions (perceived or physical size constant). Apparently, there is an advantage for stimuli located relatively closer to the observer. It was proposed that approaching or closer objects possess a higher behavioral urgency (Franconeri & Simons, 2003) and therefore elicit faster and more forceful responses (Plewan & Rinkenauer, 2016). For instance, an approaching ball might be regarded as more dangerous than a ball flying away from you. This notion is not undisputed (Abrams & Christ, 2005) and of course not all objects approaching the body can be considered as adverse. However, there is also evidence from neuroscientific studies suggesting a specific representation and prioritization of the space immediately surrounding the body (Brozzoli, Gentile, Petkova, & Ehrsson, 2011; Graziano & Cooke, 2006; Makin, Holmes, Brozzoli, & Farnè, 2012; Previc, 1998).

Recently, it was reported that targets defined by depth (depth-singletons) were able to modulate the deployment of attention (Plewan & Rinkenauer, 2018b). Participants performed a variant of the additional singleton task with targets and distractors defined by (stereoscopic) depth and color. This task involves not only detection of a certain stimulus but also identifying the orientation of a line segment within this stimulus. No interference was observed when participants searched for a colored target that was accompanied by a depth singleton in another depth plane. In contrast, when the target was defined by depth, there was interference by a colored distractor as well as by a depth singleton displayed diametrically opposed to the target (Plewan & Rinkenauer, 2018b). Moreover, another recent study reported that task performance improved immediately when a salient but completely unexpected depth cue highlights the target position in a demanding letter search task (Plewan & Rinkenauer, 2018a). Such studies had in common that task-relevant stimuli were presented in different depth planes. However, in a 3D environment it is a common task to select competing objects within the same depth plane while additional information is available in other depth planes.

Thus, the present study addressed the research question of whether there are differential behavioral effects when target and distractor are displayed in the same depth plane or distributed across different depth planes and whether this is further modulated by the relative position of target and distractor (near or far depth plane). The depth plane of a distractor should not be relevant if target selection is performed only on the basis of non-spatial stimulus properties. If, in contrast, depth is considered in the selection process, this should result in longer processing times when target and distractor appear within the same depth plane. Furthermore, it can be assumed that closer depth planes are prioritized and therefore it should be easier to focus attention to a near depth plane and at the same time neglect (irrelevant) information in more distant depth planes.

Experiment 1

Methods

Participants

A sample of 17 volunteers (12 woman) participated in the experiment and received either course credit or a small remuneration (10€/h). Two participants did not finish the experiment and one participant was excluded from the data analysis due to unusually long response times. Ages of the remaining 14 participants ranged from 18 to 33 years (median: 24.5 years). None of the participants reported a history of psychiatric or neurological disorders and all had normal or corrected-to-normal vision. Stereo vision capability was verified using a TNO (Netherlands Organization for Applied Scientific Research) test for stereoscopic vision (all participants revealed stereo-thresholds of ≤ 120 arc s). According to the Edinburgh Handedness Inventory (Oldfield, 1971), all participants were right-handed.

All participants provided written informed consent prior to the experiment. The experimental framework was approved by the Ethics Committee of the Leibniz Research Centre for Working Environments and Human Factors.

Experimental setup and procedure

The experimental setup was similar to the work recently described by Plewan and Rinkenauer (2018b). Stimulus material was generated using the virtual reality software Vizard 4 (© WorldViz, LLC) and presented via professional stereo head-mounted displays (HMD, nVisor ST50) with a resolution of 1,280 x 1,024, a refresh rate of 60 Hz (single frame rate 16 ms) and a 50° diagonal field-of-view. The visual focus of the HMD was set to 10 m. Both displays were arranged in a way such that they are placed closely in front of the participants’ eyes. Therefore, a vivid depth impression can be evoked via stereoscopic presentation. Participants were free to make head movements, yet visual stimulation was constant throughout the experiment, as stimulus coordinates were fixed to the HMD. Responses were recorded using custom-made response devices.

Participants performed a demanding visual search task, which was an adaptation of the additional singleton paradigm (Theeuwes, 1991, 1992). In this task, all stimulus elements encircle a line segment of varying orientation. Thus, target selection required participants to detect the target stimulus and also to identify the orientation of the line segment. In each trial of the experiment, six or nine rings were circularly arranged around a gray fixation point (diameter ~0.4° visual angle) in front of a uniform black background. Each ring was rendered from a three-dimensional model of a torus, with an inner radius of about 0.7° and a width of about 0.1°. The distance between each ring and the fixation point was ~3.5°. Each ring encircled a white line segment (~0.06 x 0.5°), which could be horizontal or vertical, or tilted 22.5° to either side with respect to horizontal or vertical orientation (see Fig. 1). As outlined above, the actual task was to decide – via button press with the right hand – whether a horizontal (left button) or vertical line (right button) was displayed within the target stimulus. Each trial started with onset of the fixation point. After a variable interval of 500–1,000 ms, the search array appeared and remained on the screen until the participant made a response. 1,500 ms after a response a new trial was automatically initiated.

Fig. 1
figure 1

Illustration of the stimulus material used in Experiments 13. Upper images represent simplified front views of the small and large stimulus set. Dotted rings indicate target (T) and distractor (D) singletons, which were solid rings in the actual experiment and salient due to their depth position. Also, no green-colored distractor is displayed, while half of the experimental trials contained a colored distractor (see Methods section). As displayed in the lower left image, stimuli were distributed across three depth planes. In this example, the target is presented in the far depth plane while a distractor was displayed in the near depth plane. In the lower right image, simplified top-views of the stimulus configuration are depicted. All possible relations of target and distractor are outlined, while the remaining items in the central depth plane are represented by a dotted line. All figures are not drawn to scale

The perceived distance to the search array was 57 cm with respect to the observer. Depending on the experimental condition (see below) the actual target and a distractor item were presented in front of the search array (52 cm, henceforth near depth plane) or behind it (62 cm, henceforth far depth plane). The retinal disparity between the near and far depth planes was about 68 arc min and about 31 or 37 arc min between central and far or near depth plane, respectively. To mimic the natural viewing experience, perceived stimulus size was kept constant across depth planes. Accordingly, stimulus size of near and far objects was linearly scaled according to Emmert’s law (Emmert, 1881) and the principles of size constancy (i.e., physical size of near objects was slightly increased and decreased for far objects). Target depth plane (near, far) was randomly allocated on a trial-by-trial basis. A non-target distractor simultaneously appeared in the same or the diametrically opposed depth plane. This distractor was never directly adjacent to the target; moreover, it was green colored in half of the trials. Set size (six or nine items), distractor color (neutral or green), and distractor depth plane (same or different as target) were varied in a 2 x 2 x 2 factorial design. Distractor depth plane as well as distractor color were varied across experimental blocks, while set size was varied within blocks. Consequently, four different block types were tested (distractor in same depth plane, distractor in different depth plane, distractor in same depth plane + colored, and distractor in different depth plane + colored). Each block was repeated twice. Thus, in total the experiment consisted of eight experimental blocks that were interspersed by self-paced breaks. The sequence of blocks was individually randomized for each participant. Each block comprised 108 trials. Prior to the actual experiment, participants performed 72 trials to familiarize themselves with the task, which were discarded from further analyses. Overall, the experiment took about 90 min.

Mean response times (RTs) were individually collapsed across all valid trials of each condition. Trials with erroneous answers or delayed response (> 5 s) were regarded as errors and excluded from further analyses. Experimental conditions were then compared by means of a repeated-measures analysis of variance (ANOVA). Data analyses were performed using the free statistical software R (https://www.R-project.org). Obtained statistical parameters (F-, p-, and generalized eta squared (\( {\upeta}_{\mathrm{G}}^2 \)); Bakeman, 2005; Olejnik & Algina, 2003) are reported. In case (post hoc) t-tests were conducted, corresponding t-values and Cohen’s d (Cohen, 1988) are specified.

Results and discussion Experiment 1

Task performance was high as indicated by a low error rate of about 2.3% of the trials across all experimental conditions. Due to these low numbers, error rates were not further analyzed. Mean RTs are summarized in Fig. 2 and Table 1. RTs were submitted to a 2 x 2 x 2 repeated-measures ANOVA, which revealed a significant main effect of distractor depth plane; F(1,13)=6.91, p = .02, \( {\eta}_G^2 \) = 0.023. Target selection was slower when the distractor was presented in the same depth plane as the target. There was a non-significant trend of a set size effect, FS(1,13)=3.18, p = .097, \( {\eta}_G^2 \) = 0.005. Search duration was only modestly increased by a larger set size. This can be regarded as evidence for a parallel search (Theeuwes, 1991) and reveals that target items were salient within the search array. The main effect of distractor color as well as all interactions did not approach the conventional significance level (all p ≥ .23, \( {\eta}_G^2 \) ≤ 0.0086).

Fig. 2
figure 2

Mean reaction times from Experiment 1. Labels indicate target-distractor relation (same – different) and distractor appearance (neutral – colored). Error bars represent 95% within-subject confidence intervals (Moray, 2008)

Table 1 Summary of Experiments 13. Results are displayed for each experimental condition and mean reaction times (RTs) are rounded to milliseconds (ms). Labels correspond to Figs.2, 3 and 4. Standard deviations are presented in brackets

Target selection was clearly influenced by the relative position of target and distractor items within the 3D search array. Participants needed more time to identify the target when there was a distractor within the target depth plane, while the shortest RTs were observed when target and distractor were displayed in different depth planes. It is likely that participants implicitly separated the search array in distinct depth planes and therefore faced interference when multiple items were presented within one depth plane. The color of the distractor did not further enhance this effect, even though color (compared to depth information) has been suggested to be a stimulus property that more efficiently modulates attentional allocation (Plewan & Rinkenauer, 2018b).

Apparently, interference occurs when multiple similar items appear within the same depth plane. However, the present data provide no insight into whether the relative position of target and distractor within the 3D search array also modulate the response pattern. Previous findings indicate that objects in close proximity are processed faster and attention might operate along an egocentric gradient from near to far space (Finlayson & Grove, 2015; Plewan & Rinkenauer, 2016, 2017). To further investigate this question, two additional experiments were performed. They were largely similar to Experiment 1; however, the relative depth position of the target was explicitly manipulated. In Experiment 2 participants were explicitly informed about the target depth plane, while in Experiment 3 the target depth plane was randomly allocated.

Experiment 2

The framework of Experiment 2 was largely comparable to Experiment 1, but since distractor color had no substantial effect it was no longer manipulated. Instead, the target depth plane was included as experimental factor to further investigate the role of the target-distractor relationship within a 3D search array.

Methods

Participants

A new sample of 17 volunteers (nine women) was recruited for Experiment 2. Criteria and prerequisites were identical to Experiment 1. Participants’ age ranged between 18 and 32 years (median: 24). One participant (female, 26 years old) was excluded from analysis due to high error rates and unusually slow RTs. According to the Edinburgh Handedness Inventory (Oldfield, 1971), all participants were right-handed.

Experimental setup and procedure

The experimental setup and task were identical to Experiment 1. However, in this experiment, the depth plane of the target (near or far) during the upcoming block was displayed on the screen in red letters prior to each block. The target-distractor relationship (same or different depth plane) was fixed in each block, while set size was varied within blocks. Accordingly, three experimental factors were manipulated in a factorial design: Target depth plane (near or far), distractor depth plane (same or different as target), and set size (six or nine items). Again, four different block types were repeated twice (target near – distractor near, target near - distractor far, target far – distractor near, target far – distractor far) resulting in eight blocks of 108 trials. To familiarize themselves with the task, participants completed 72 trials prior to the experiment, which were not analyzed. Overall, the experiment took about 90 min.

Results and discussion Experiment 2

As observed in Experiment 1, participants performed well on the task and only a low number of errors was obtained (~4.2% errors across all experimental conditions). Therefore, error rates were not further analyzed.

Mean RTs are summarized in Fig. 3 and Table 1. As observed in Experiment 1, a 2 x 2 x 2 repeated-measures ANOVA revealed a significant main effect of distractor depth plane, F(1,15)=21.40, p < .001, \( {\eta}_G^2 \) = 0.083. RTs were shorter when the distractor was displayed in a depth plane opposed to the target. Also, there was a significant main effect of target depth plane, F(1,15)=11.41, p = .004, \( {\eta}_G^2 \) = 0.058. Targets presented in the near depth plane elicited faster responses. Again, there was no significant set size effect (F(1,15)=3.26, p ≤ .091, \( {\eta}_G^2 \) = 0.012). Yet, the three-way interaction of target depth plane, distractor depth plane, and set size was significant, F(1,15)=4.84, p = .044, \( {\eta}_G^2 \) = 0.002. While there was almost no set size effect (10 ms) in the target near – distractor far condition, in the opposite condition the set size effect was most pronounced (73 ms; albeit a direct comparison revealed no significant difference; t(15)=2.05, p=.06, d=0.51). The remaining interactions also did not reach the conventional significance level (p≥.075, \( {\eta}_G^2 \) ≤ 0.0025).

Fig. 3
figure 3

Mean reaction times from Experiment 2. Labels indicate target depth plane (near – far) and target-distractor relation (same – different). Error bars represent 95% within-subject confidence intervals (Moray, 2008)

The results confirm and extend the observations of Experiment 1. Longer RTs are observed when target and distractor are located within the same depth plane. Moreover, the findings are largely in line with the idea of an egocentric attentional gradient through space. Targets were selected faster in the near compared to the far depth plane. This apparently is not a uniform process as the significant three-way ANOVA indicates that RTs increase when the target is located in the far depth plane while a large number of distractors was presented in the near depth plane. Accordingly, distractor interference is not only determined by the spatial target-distractor relation but also by the number of distractor items. Interference was highest when many distractors were presented in front of the target. Another informal observation from this experiment was that participants clearly used the information about the upcoming target depth plane. In Experiment 1 RTs ranged from 1,129–1,309 ms while RTs were substantially shorter (872–1,094 ms) in the present experiment. As already reported in previous research (Plewan & Rinkenauer, 2018b), participants seemingly use foreknowledge about the target depth plane to focus their attention in this distinct depth plane. This, however, is prone to interference from salient information displayed in other depth planes.

Experiment 3

Experiment 2 revealed that the relation of target and distractor (same or different depth plane) might influence visual target selection. Likewise, foreknowledge about the target depth plane facilitated search. To further investigate these effects, target and distractor depth plane were completely randomized in Experiment 3. Changing the (spatial) predictability of target and distractor items was expected to increase the influence of any given cue or salient information in the search array. Depth information can be regarded as the salient stimulus feature that summons attention. However, depth information potentially is a weaker modulator of attention compared to other stimulus features such as color (Plewan & Rinkenauer, 2018b). In Experiment 1, no substantial effects were associated with color (which was therefore not included in Experiment 2). Yet, it is possible that distinct features like color or depth information are differentially processed when the search task needs to be performed under more demanding or uncertain conditions. If stronger interference is associated with color this would be further evidence that depth information might be processed subsequent to other stimulus features. Hence, color was also reintroduced in Experiment 3.

Methods

Participants

A new sample of 16 volunteers (14 women) was recruited for Experiment 3. Criteria and prerequisites were identical to Experiments 1 and 2. Participants’ age ranged between 19 and 32 years (median: 21.5). According to the Edinburgh Handedness Inventory (Oldfield, 1971), three of them were left-handed.

Experimental setup and procedure

The experimental setup was identical to Experiments 1 and 2. The same experimental factors as in Experiment 1 were manipulated, but in addition target depth plane was also systematically varied. Unlike Experiment 1, target depth plane as well as its relation to the distractor (same or different depth plane) were completely randomized in this experiment. Thus, four experimental factors were manipulated in Experiment 3: Target depth plane (near or far), distractor depth plane (same or different as target), distractor color (neutral or green), and set size (six or nine items).

All experimental variations were repeated 54 times, but due to randomization they were not equally distributed across blocks. As in both previous experiments, participants performed eight blocks consisting of 108 trials each, which resulted in a total of 864 trials. In addition, 72 trainings trials were conducted prior to the actual experiment. The experimental procedure took about 90 min.

Results and discussion Experiment 3

Again, participants performed well on the task and error rates were low (~1.6% errors across all experimental conditions). Therefore, erroneous trials were not further analyzed.

Mean RTs are summarized in Fig. 4 and Table 1, and were submitted to a 2 x 2 x 2 x 2 repeated-measures ANOVA. This analysis revealed a significant main effect of distractor color, F(1,15)=14.74, p = .002, \( {\eta}_G^2 \) = 0.010. Accordingly, colored distractors elicited longer RTs than neutral ones. Also the distractor depth plane (same or different depth plane) had a significant effect on RTs, F(1,15)=6.67, p = .021, \( {\eta}_G^2 \) = 0.005. Collapsed across all conditions, RTs were shorter when target and distractor appeared in different depth planes (1,309 ms, same depth plane: 1,345 ms). A set size effect was also significant, F(1,15)=7.61, p = .015, \( {\eta}_G^2 \) = 0.034. Set size effects for each condition were determined as difference of large and small set divided by 3 (nine−six items). Effects ranged from 3 ms/item to 47 ms/item, and were particularly pronounced when the target appeared in the far depth plane. This observation was underlined by a significant target depth plane x set size interaction, F(1,15)=6.04, p = .027, \( {\eta}_G^2 \) = 0.0035. It is also important to note that set size effects are still relatively small. Using a similar experimental setting without a salient target, a set size effect of 154 ms/item was reported (Plewan & Rinkenauer, 2018b). Accordingly, the target in the present experiment can still be regarded as the salient item in the search array, which facilitated the visual selection process. There was no main effect of target depth plane, F(1,15)=1.07, p = .318, \( {\eta}_G^2 \) = 0.009. This indicates that there was no general advantage for targets in either depth plane without any prior task relevant information. Yet, there was an additional interaction between target and distractor depth plane, F(1,15)=6.50, p = .022, \( {\eta}_G^2 \) = 0.0028. This effect suggests that distractors presented in the target depth plane caused more interference when the target was located in the near depth plane. The remaining interactions did not approach the conventional significance level (all p > .15, \( {\eta}_G^2 \) ≤ 0.0011).

Fig. 4
figure 4

Mean reaction times from Experiment 3. Labels indicate target depth plane (near – far), target-distractor relation (same – different), and distractor appearance (neutral – colored). Error bars represent 95% within-subject confidence intervals (Moray, 2008)

Experiment 3 indicates that allocation of attention in 3D space is differently modulated when no information about the target and its relation to the distractor is available. In general, attention is not distorted towards the near or far depth plane, but the results suggest that stimuli are differentially processed depending on the distribution of items in the near or far depth plane. Two observations are of particular importance. On the one hand, target search in the near depth plane was more strongly influenced by a distractor item presented in the same depth plane. On the other hand, RTs for far compared to near targets were more prolonged when the search array contained a larger number of non-target items. In both cases, the color of distractors did not further modulate these effects.

Although targets in the near depth plane were not associated with faster RTs per se, the findings of Experiment 3 are also largely in line with the idea of an egocentric search gradient. When targets are located in the near depth plane additional information in this depth plane causes strong interference. At the same time, stimuli presented behind this depth plane affect visual selection to a lesser degree. Conversely, targets in the far depth plane were selected more slowly when the search array (in front of the target) contained more items. Seemingly, participants search “through” the whole volume before they approach the most distant depth plane. In contrast to Experiment 1, a strong effect of distractor color was observed. Apparently, participants were unable to ignore the salient color item in Experiment 3. It can be speculated that participants easily inferred the target-distractor relationship (same/different depth plane) in Experiment 1. This in turn might deallocate attentional resources, which can be used to filter out the salient (but irrelevant) distractor color. This was of course impossible in Experiment 3 as there was no consistency in the relation of target and distractor. Taken together, the results indicate that the allocation of attention is depth sensitive. However, it is most likely that not only the depth position guides attention but rather the structure of the whole 3D volume is taken into account together with task-specific aspects.

General discussion

Three visual search experiments revealed that the relation of target and distractor position within a 3D search array substantially modulates visual target selection. Mainly two conditions were compared: Two competing items (salient due to their depth position) were displayed either in a single depth plane or were distributed across two depth planes. Longer RTs were observed when target and distractor singletons appeared within the same depth plane. This can be regarded as evidence that the focus of attention can – at least to a certain extent – be oriented to distinct depth planes. However, this does not seem to be a uniform process that operates equally throughout 3D space. Attentional processing was clearly affected by the relative position of target and distractor and there was no general advantage related to targets located in the near or far depth plane.

The present findings are largely in line with previous studies emphasizing a contribution of stereoscopic depth information in the attentional processing stream. For instance, it was recently reported that depth singletons facilitate visual target selection (Plewan & Rinkenauer, 2018b). Apparently, participants used depth information to segment a 3D search array and focused their attention onto a particular depth plane. Other studies already reported that segmentation of a 3D search array improves search performance (Finlayson & Grove, 2015; Theeuwes et al., 1998). However, it was also shown that salient, but irrelevant depth singletons cause interference when presented along with a target in another depth plane (Plewan & Rinkenauer, 2018b). This observation is extended by the present findings. Simultaneous presentation of target and distractor in different depth planes caused less interference than competing information within the same depth plane. This observation indicates that participants can (voluntarily) narrow their focus of attention to a distinct depth plane, yet this does not exclude the possibility that irrelevant information from other depth planes still summons attention.

Several previous studies investigated the distribution of visual attention in 3D space using targets that were flanked by similar items (e.g., Andersen & Kramer, 1993; Eberhardt & Huckauf, 2017; Rinkenauer & Grosjean, 2008). In such experiments, the horizontal separation of target and distractor stimuli as well as their relative depth positions are varied (e.g. flankers were presented in front of or behind the target). It was consistently shown that relative depth position of target and flanker stimuli changed the pattern of response. For instance, stronger effects of crowding were reported when relevant and irrelevant items were displayed within the same depth plane (Eberhardt & Huckauf, 2017). However, it might also be argued that effects observed in the present study were related to physical size differences. Stimulus size was inversely scaled to distance (as in natural viewing conditions) and thus the (physical) size of target and distractor stimuli differed when both items were displayed in different depth planes. However, some studies revealed that RTs are more susceptible to changes in perceived size (which was constant in the present experiments) than to differences in physical size (Plewan, Weidner, & Fink, 2012; Sperandio, Savazzi, Gregory, & Marzi, 2009). Likewise depth-induced behavioral effects tend to be stable when perceived or physical size is varied across depth planes (Blini et al., 2018; Plewan & Rinkenauer, 2017). Finally, there is no consistent advantage for either near or far targets across the experiments, indicating general differences in low level stimulus processing. Thus, it is unlikely that effects observed in the present experiments are strongly related to differences in physical stimulus size.

Comparing the single experiments of the present study, differences in terms of the overall response speed are apparent. Shortest RTs were obtained in Experiment 2 when participants were aware of the target depth plane while longest RTs were observed in Experiment 3 in absence of reliable information about target depth plane or target-distractor relation (same or different). This can be regarded as additional evidence that participants successfully used task-relevant information to narrow their focus of attention to a distinct depth plane and reduced the influence of salient but irrelevant distractors. It has previously been reported that foreknowledge about the target location can be used to reduce the need to search for a target and elicits faster reactions (Bertleff et al., 2017). Yet, the present results indicate that participants were unable to completely ignore irrelevant information in unattended depth planes. Even in Experiment 2 where participants had full confidence about the target depth plane RTs were relatively long. Although stereoscopic depth information is clearly used to guide attention, this is seemingly a weak and error-prone feature. As outlined above, color has been shown to be a stronger modulator of attention compared to stereoscopic depth information (Plewan & Rinkenauer, 2018b). Likewise, it has recently been reported that guidance of attention by binocular rivalry can easily be interrupted by other features (Zou, Utochkin, Liu, & Wolfe, 2017).

In contrast, it was recently shown that stereoscopic depth information is salient and summons attention instantaneously (Plewan & Rinkenauer, 2018a). In this study, participants performed a demanding letter search task, which was immediately facilitated as soon as a completely unexpected depth cue was introduced. Yet, it is difficult to compare both experimental conditions since participants in the present experiment were consistently confronted with competing information (from different depth planes). This instance is important because there is other evidence that effects of stereoscopic depth on attentional processing strongly depend on the actual stimulus configuration (Finlayson et al., 2013; O’Toole & Walker, 1997) or foreknowledge about the target depth plane (Dent, Braithwaite, He, & Humphreys, 2012; Roberts, Allen, Dent, & Humphreys, 2015).

Another central finding of the present experiments was that attention was not uniformly distributed across the 3D search array. Not only the absolute target position (near or far depth plane) influenced target selection, but also the position of the distractor relative to the target caused differential effects. Several previous studies reported faster responses elicited by stimuli presented relatively closer to an observer (e.g., Finlayson & Grove, 2015; Gawryszewski, Riggio, Rizzolatti, & Umiltá, 1987; Plewan & Rinkenauer, 2016, 2017; Theeuwes & Pratt, 2003), which was often interpreted in accordance with an egocentric spatial gradient model. According to such models, attentional resources decrease along with distance to the observer or fall off behind the attended depth plane and hence predict faster processing of stimuli located closer to the observer. On a theoretical level this was often related to the idea that closer or approaching objects possess a higher behavioral urgency (Franconeri & Simons, 2003). Results from the present study can only partially be integrated into this theoretical assumption. In Experiment 2, there was a main effect of target depth plane, indicating overall faster responses associated with targets presented in the near depth plane. However, the target-distractor relation had a stronger effect and numerically faster responses were obtained when target and distractor were displayed in different depth planes, irrespective of the actual target depth plane (i.e., near or far). More importantly, no main effect of target depth plane was observed in Experiment 3. In some conditions, near targets elicited even longer RTs than far targets. At the same time, the relative position of the distractor was more critical in the near target conditions. Distractor depth plane (same or different with respect to target) had only a weak impact when targets were presented in the far depth plane, while stronger interference was observed when distractors coincided with targets within the near depth plane. One possible explanation is that participants did not focus their attention to the depth plane of the fixation point but rather chose to reside in the most distant plane to look “through” the whole search array. This in turn would have resulted in a reorientation of attention every time the target appeared in the near depth plane. This, however, is unlikely since Finlayson and Grove (2015) in their study explicitly guided attention to the most distant depth plane prior to a 3D visual search task and still observed fastest reactions associated with targets in the closest depth plane. Moreover, assuming that participants (in-)voluntarily shifted their attention to the far depth plane does not explain why a distractor in the same depth plane had a stronger impact on selection of near depth plane targets. Apparently, attentional resources are not distributed along a strict egocentric gradient but rather are specifically tuned depending on the current stimulus and task configuration.

Although not explicitly required, participants most likely performed at least covered shifts of attention across depth planes. Participants were asked to fixate a point in the central depth plane while targets appeared in front of or behind it. Some studies directly investigated shifts of attention or reorientation of attention in 3D space (e.g., Arnott & Shedden, 2000; Bourke, Partridge, & Pollux, 2006; Chen et al., 2012; Theeuwes & Pratt, 2003; Wang et al., 2016) and in line with the spatial gradient model it was often concluded that more attentional resources are allocated to proximate areas around the observed depth plane. For instance, Arnott and Shedden (2000) presented two similar objects sequentially in different depth planes (induced by random-dot-auto-stereograms) and asked their participants to compare both items. It was observed that judgments were faster when the second object was present in a closer depth plane than the first one. This effect, however, was only evident under conditions of high perceptual load (Arnott & Shedden, 2000). Other studies revealed asymmetrical effects with respect to the trajectories of attentional reorientation. For instance, it was reported that shifts of attention are faster performed towards unexpected stimuli in near depth planes (Chen et al., 2012) and that inhibition of return was more pronounced when targets are displayed in near depth planes (Wang et al., 2016). Thus, it might appear surprising that faster reactions were associated with the far depth plane in Experiment 3, while the opposite was true in Experiment 2 (shorter RTs in near depth plane conditions). But these findings may still be interpreted in agreement with the spatial-gradient model of visual attention, assuming the gradient is flexibly adjusted to current features of the visual surrounding. In Experiment 3, participants had no foreknowledge about the target depth plane and therefore had to (re-)adjust their focus of attention in every trial (which was not necessary in Experiment 2). In this uncertain state, the focus of attention may spread along a gradient away from the observer and more resources are dedicated to the near depth plane. Accordingly, conflicting information in this (near) depth plane causes stronger interference, while additional, distracting information displayed in farther depth planes is more likely to be ignored. Conversely, selection of far depth plane targets was slower when the amount of irrelevant information located closer to the observer is increased and thus occupies attentional resources. In addition to that, sensitivity for different aspects of target selection might vary along the hypothetical egocentric spatial gradient. For example, it is quite conceivable that attention in proximate regions is loose and particularly suited for stimulus detection while more distant regions might be less attended in general but more sensitive for stimulus identification. Such effects might have further contributed to task related differences as observed in the present study.

Taken together, the present study provides new insight into the mechanisms of attentional allocation in 3D space. The results indicate that stereoscopic depth information is differentially processed in near and far depth planes, and are largely in agreement with the idea that attention is organized along an egocentric spatial-gradient through space. However, this gradient is not static and may be adjusted to changes of stimulus material or task demands. Thus, visual selection based on stereoscopic depth information seems to be an error-prone process that can easily be interfered by other (salient) features.