Introduction

In everyday life, humans have to interact with a three-dimensional environment. In this respect, stereoscopic information constitutes an important source of information. For instance, it has been suggested that stereoscopic information substantially improves shape discrimination. Even if rich monocular shape information is available, discrimination ability benefits from stereoscopic viewing (Lim Lee & Saunders, 2011). In recent times, the availability and usage of stereoscopic displays has markedly increased. However, only a few studies have investigated the underlying perceptual and attentional mechanisms. Results from these studies were not constantly in line with data derived using conventional frontoparallel viewing conditions. It has been proposed that (stereoscopic) depth information might be processed differently compared with other stimulus dimensions (Nakayama & Silverman, 1986). Nakayama and Silverman revealed that targets that were distinct in terms of color and motion required an effortful serial search. In contrast, targets that were characterized by depth information and motion or color, respectively, were much easier to detect (parallel search). Results from other studies supported the idea of an egocentric attentional gradient through space (Andersen, 1990; Chen, Weidner, Vossel, Weiss, & Fink, 2012; Downing & Pinker, 1985; Finlayson & Grove, 2015; de Gonzaga Gawryszewski, Riggio, Rizzolatti, & Umiltá, 1987). In a recent study, for instance, Finlayson and Grove asked their participants to perform a visual search task across different depth planes. The authors found that targets presented closer to the observers were detected faster than those displayed farther away. This was true although participants’ attention was guided to the most distant depth plane prior to each search trial (Finlayson & Grove, 2015). Also, it has been shown recently that objects that are perceptually located closer to an observer elicit shorter reaction time (Plewan & Rinkenauer, 2016, 2017). From a behavioral point of view, it seems plausible that closer objects are associated with a higher behavioral urgency and therefore receive processing priority (Franconeri & Simons, 2003). Such behavioral findings are supported by neurophysiological data. Some brain structures differentially process objects in near and far space (Wang, Li, Zhang, & Chen, 2016), and there also are reports of neural populations selective for crossed and uncrossed disparities (Parker, 2007). Conversely, some studies indicated that spatial separation of search arrays does not necessarily facilitate visual search. For instance, Theeuwes and colleagues conducted a series of attentional cuing experiments in which a target object had to be identified among distractors while attention was cued to one depth plane. The authors found that attention can be directed effectively to a specific depth plane. It was further reported that invalid information presented in another depth plane still captured attention and led to prolonged visual processing (Theeuwes, Atchley, & Kramer, 1998). It has been speculated that an attentional gradient might be related to high task demands or perceptual load (Arnott & Shedden, 2000; Atchley, Kramer, Andersen, & Theeuwes, 1997).

Another aspect that has not been investigated is the relationship between depth information and expectancy discrepant events, namely involuntary attentional processing. Especially in real-world situations, the visual system often is confronted with novel information from different depth locations. Although salient stimuli are generally expected to capture attention automatically (for an overview see e.g., Burnham, 2007), it has been shown that this might not hold true for surprising or expectancy discrepant information. For instance, Gibson and Jiang asked their participants to search for one of two target letters within a circular array of distracting letters. After half of the trials, the target letter was unexpectedly presented in a novel and deviating color. Despite increased target saliency, no improved target detection rate was observed in this critical trial (i.e., first appearance of the colored target) but only in subsequent trials, which constantly included a predictive salient target (Gibson & Jiang, 1998). Apparently, it takes some time until a surprising salient stimulus feature captures attention.

To further investigate this issue, the experimental design was subsequently modified (Horstmann, 2002). In the first part of the experiment, letter positions were cued by uninformative placeholders. After half of the trials, one of these placeholders was surprisingly presented in a different color. In contrast to previous results by Gibson and Jiang, it was observed that this expectancy discrepant cue improved task performance, because it instantaneously captured attention. Moreover, there was no immediate but rather a delayed improvement of task performance in case the color cue was presented simultaneously with the letters (Horstmann, 2002). Increased performance in the critical trial, however, goes along with an increase in terms of response times. Processing of unexpected task features obviously requires additional time (Meyer, Niepel, Rudolph, & Schützwohl, 1991). On the basis of this observation, several other aspects of surprise capture have been investigated. For instance, it has been shown that the stimulus onset asynchrony between cue and target should be at least 400 ms (Horstmann, 2006; Horstmann & Becker, 2008) and that colored singletons failed to capture attention if their occurrence is not surprising (Horstmann, 2005). Likewise, a surprising cue on a distractor location inhibited task performance and response times (Horstmann & Becker, 2011). Yet, the recurring finding that a salient color cue captures attention directly with the first occurrence seemingly holds true for other stimulus features. For instance, the surprising onset of a motion cue provoked an increased proportion of correct answers (Becker & Horstmann, 2011). In this study, Becker and Horstmann (2011) presented the pattern of a rotating diamond at the target location. The motion cue also led to an instantaneous improvement of task performance, whereas this effect could not be observed if uninformative motion cues were presented in previous trials.

Accurate integration of novel or surprising information from different depth planes is an important task for the visual system. The surprise capture paradigm constitutes a solid empirical foundation to investigate the processing of expectancy discrepant depth information without an explicit attentional set. Using this paradigm, it was expected that surprising depth information will be able to capture attention on its first appearance as has been demonstrated for other stimulus properties. Furthermore, it was hypothesized that there might be differential effects related to the relative position in depth. If there is a strong attentional search gradient through space, only cues present in front of the search display should summon attention instantaneously while depth information in a more distant location should be less behavioral relevant.

Experiment 1

Methods

Participants

A sample of 46 volunteers (33 women, 13 men) participated in the experiment. Three participants reported to be left handed. All participants were remunerated (10€/h) or received course credit. Participants’ age ranged from 19 to 32 (median 24) years. All participants had normal or correct-to-normal vision and stereo vision capability was verified using TNO test for stereoscopic vision (all participants revealed stereo-thresholds of ≤ 120”). The experiment was conducted in accordance with the declaration of Helsinki, and all participants gave written, informed consent. The experimental framework was approved by the local ethics committee. Participants were assigned to one of two subgroups (see below) in alternating order. Due to one incorrect assignment, both subgroups were not completely balanced.

General procedure and experimental design

The experimental setup was generated using the virtual reality software Vizard 4© (WorldViz, LLC). Stimulus material was presented via professional stereo head-mounted displays (HMD, nVisor ST50), with a resolution of 1,280 x 1,024, a refresh rate of 60 Hz (single frame rate 16 ms) and a 50° diagonal field-of-view. The visual focus of the HMD was set to 10 m. Both screen displays are placed closely in front of participants’ eyes. Therefore, a vivid depth impression can be evoked via stereoscopic presentation. Responses were recorded using custom-made response devices.

The experimental design was adopted from previous studies on surprise capture (Horstmann, 2002). Initially, a fixation point (0.4°) was presented for 1,000 ms in the center of the display with a perceived distance of 57 cm from the observer. In every trial, 12 rectangular grids (0.8° x 0.8° with a 2 x 2 structure) were displayed circularly around the fixation point (with a radius of 3.4°) within the same depth plane (Fig. 1). These placeholders were presented for 400 ms, which has been shown to be an appropriate duration for surprise capture to take place (Horstmann & Becker, 2008). Subsequently, this array was replaced by one of two target letters (“H” or “U”) along with eleven distractor letters (A, C, D, E, F, I, L, P, S, T, M). All letters were displayed for five frames (83 ms) with a height of approximately 0.72° and a width of approximately 0.55°. The positions of target and distractor letters were randomly allocated on a trial-by-trial basis with every target position being equally likely across the whole experiment. All stimuli were white colored in front of a uniform black background.

Fig. 1
figure 1

Schematic time course of an experimental trial. The fixation point was replaced by placeholders after 1,000 ms. In the first half of the experiment, these placeholders (presented for 400 ms) were uninformative (upper image), while in the remaining trials the cue at the subsequent target position was spatially displaced (the lower image illustrates the case of a closer cue; arrow was not shown in the experiment). The actual target screen was presented for 83 ms, followed by a blank screen until response. Figures are not drawn to scale.

Participants performed a two alternative forced-choice task, namely they had to indicate whether an “H” or “U” was presented (left button “U”, right button “H”). Erroneous responses were accompanied by a short acoustical feedback (100 ms). After half of the trials (trial 49), the placeholder at the subsequent target position was surprisingly displaced in depth. The placeholder was presented closer (43 cm) to the observer for one subgroup (N = 24) while in the other subgroup (N = 22) this placeholder was rendered farther away (71 cm). The displaced placeholder also predicted the target location in the remaining trials (trials 50-96). Participants encountered the displacement before it was no longer regraded as expectancy discrepant. In accordance with previous research on surprise capture, the first half of the experiment is henceforth termed precritical trials (no depth cue present, trials 1-48). The first appearance of the surprising depth cue is henceforth defined as critical trial (surprising depth cue present, trial 49), whereas the remaining trials are labeled as postcritical trials (depth cue present, trials 50-96). To familiarize with the task, all participants performed 12 independent training trials without any depth cues before the actual experiment. As a measure of task accuracy, error rates across pre- and postcritical trials were determined individually for each participant. The same procedure was applied to estimate response times. The resulting parameters were submitted to paired two-sample t tests, Welch two-sample t tests, or regression analyses on trial numbers in order to compare both conditions. Cohen’s d (Cohen, 1988) is reported as a measure of effect size.

Results and Discussion

On average, 70.38% (standard deviation (SD) = 8.74) of the targets were correctly identified during the precritical trials. Task performance was markedly improved in the postcritical trials with about 80.67% (SD = 13.87) correct trials across participants (t(45) = 5.23, p < 0.01, d = 0.89; Fig. 2). The critical trial was erroneous in only two participants (i.e., 95.65% correct). Apparently, the surprising occurrence of the depth cue was able to capture attention immediately. Calculating the 95% confidence interval for the critical trial reveals a lower bound of 85.16%. Hence, accuracy in the critical trial was better than in the precritical trials and even superior to the mean accuracy rate in the remaining postcritical trials. Moreover, there was no improvement of accuracy observed during the precritical or postcritical trials. Separate regression analyses with trial numbers and proportion of correct responses in the pre- and postcritical trials confirm a lower accuracy rate in the precritical trials (0.7 vs. 0.78) with a modest and nonsignificant slope in either condition (precritical: 0.0003 (p = 0.69); postcritical: 0.001 (p = 0.17)). The predictions derived from the regression analyses thus also indicate that performance in pre- and postcritical trials was below the lower bound of the confidence interval estimated for the critical trial. The cue’s depth location (i.e., near vs. far) did not influence the results as there was no difference in terms of accuracy between both subgroups (all p > 0.05).

Fig. 2
figure 2

The proportion of correct responses of each experimental trial derived from Experiment 1 . Pre- and postcritical trials are represented by filled and open grey circles, respectively. The linear fits of the associated regression analysis are visualized by the thin lines. The critical trial (trial number 49) is displayed as solid black circle. Error bars denote 95% confidence intervals.

An inspection of the response times reveals a similar pattern. The mean response times recorded in the precritical trials (1.29 sec, SD = 0.59) were longer than those observed during postcritical trials (1.02 sec, SD = 0.36; t(45) = 4.53, p < 0.01, d = 0.55). The 95% confidence interval of the critical trial (mean = 1.02 sec) ranged from 0.92 to 1.12 seconds. Accordingly, the comparison of precritical trials and critical trial suggest that the first occurrence of an unannounced depth cue led to a substantial acceleration of response speed. A comparison of critical trial and postcritical trials revealed no additional difference in response times. Excluding reaction time data from those participants with erroneous responses in the critical trial did not meaningfully change the pattern of results. Also reaction times were not differentially affected by the depth cue’s relative position (all p > 0.05). However, there is a non-significant trend that the subgroup of participants which perceived the depth cue as farther away reacted slightly faster (0.93 sec (0.26) vs. 1.11 sec (0.41) vs. t(39.33) = 1.87 p = 0.07).

The findings of experiment 1 indicate that expectancy discrepant depth information does immediately capture attention. As has been shown for other stimulus features (stereoscopic), depth information is associated with a sudden increase of task performance. However, these results are not completely in line with previous reports in two aspects. First, task performance in the critical trial was not only better compared with precritical trials but also compared with postcritical trials. Second, response times recorded in the critical trial were already markedly reduced. This is at odds with the common finding that surprising or novel information elicits prolonged response latencies. Therefore, it can be questioned whether the pattern of results actually represents surprise capture or instead is indicative for alternative attentional processes. To rule out the latter possibility, it would be necessary to see whether the attentional integration of depth information follows the common time course of surprise capture. It has been shown that surprise capture can reliably be observed if there is an asynchrony of approximately 400 ms (Horstmann, 2006; Horstmann & Becker, 2008). Thus, a reduced latency between depth cue and target should not elicit a surprise capture effect.

Experiment 2

The paradigm from Experiment 1 was adjusted to test whether the observed attentional effects can be actually related to surprise capture. Therefore, only the latency between depth cue and target was reduced to 100 ms. It was proposed that surprise capture is relatively slow (Horstmann, 2006; Horstmann & Becker, 2008), and thus no surprise capture was expected under these conditions.

Methods

A new sample of 42 participants (18-32 years, 32 women, 10 men) was recruited for Experiment 2. All prerequisites as well as the experimental design were identical to Experiment 1 with the only exception that the interval between depth cue and target onset was reduced to 100 ms.

Results and Discussion

Overall, performance was comparable to Experiment 1 (Fig. 3). Proportion of correct responses was 73.81% (SD = 12.85) in the precritical trials and 85.92% (SD = 13.99) in the postcritical trials, signifying an improvement (t(41) = 6.55, p < 0.01, d = 0.90). In contrast to Experiment 1, the critical trial was erroneous in nine participants (i.e., 78.57% correct). The 95% confidence interval for the critical trial ranged from 63.19% to 89.70%, which did not provide evidence for immediate attentional capture. Again, there was no increase of accuracy observed during the precritical or postcritical trials. In fact, separate regression analyses indicated improved task performance in postcritical (0.86) compared with precritical trials (0.75) but revealed even slightly (nonsignificant) negative slopes (precritical: −0.0004 (p = 0.65); postcritical: −0.0002 (p = 0.64). Both subgroups did not differ in terms of task accuracy (i.e., near vs. far target; all p > 0.05).

Fig. 3
figure 3

The proportion of correct responses of each experimental trial derived from Experiment 2 . Pre- and postcritical trials are represented by filled and open grey circles, respectively. The linear fits of the associated regression analysis are visualized by the thin lines. The critical trial (trial number 49) is displayed as solid black circle. Error bars denote 95% confidence intervals.

As observed in Experiment 1, response times obtained in the precritical trials (0.98 sec, SD = 0.29) were slower than those recorded in the postcritical trials (0.79 sec, SD = 0.27; t(45) = 6.99, p < 0.001, d = 0.69). The 95% confidence interval of the critical trial (mean = 0.82 sec) ranged from 0.71 to 0.93 seconds. Again, the first occurrence of an unannounced depth cue led to a substantial acceleration of response speed, while no response time differences between critical and postcritical trials were observed. Reaction times were not differentially affected by the placement of the cue closer to or farther from the observers (all p > 0.05).

Improved task performance and decreased response times indicate that participants also benefitted from the introduction of a depth cue with reduced latency (100 ms). However, unlike in Experiment 1, the typical pattern of surprise capture was not observed. Accuracy rate in the critical trial did not differ from precritical trials. Thus, the notion that the integration of a surprising or expectancy discrepant stimulus is a relatively slow process seemingly holds true for depth information.

General Discussion

The present experiments revealed that surprising depth cues can capture attention, namely a depth cue improved search performance even though participants were uninformed about the cue-target relationship. In Experiment 1, the accuracy in the surprising critical trial actually was not only superior to the mean accuracy in the precritical trials but even superior to the mean accuracy in the postcritical trials. In Experiment 2, the latency between depth cue and target was markedly reduced and surprise capture was no longer obtained, although participants achieved higher accuracy rates in subsequent trials. In addition, there was no evidence that the relative position of the depth cue (i.e., near vs. far) influenced accuracy, albeit distant depth cues might elicit slightly faster response times.

Successful interaction with surprising or novel information is regularly required in everyday life. The surprise capture paradigm has been shown to be a useful tool to investigate such involuntary shifts of attention. For instance, it has previously been shown that expectancy discrepant color cues are able to capture attention on their very first appearance (Horstmann, 2002). Interestingly, participants’ attention is directed to the cue locations even though no attentional set for a particular stimulus feature is activated. To our knowledge, the present study provides the first evidence that depth information works similar within this experimental framework. This is especially important, because there are inconsistent findings regarding the role of (stereoscopic) depth information in attentional search tasks. For instance, some studies suggested that depth information may facilitate attentional processes (Abrams & Christ, 2005; Andersen, 1990; Andersen & Kramer, 1993; Arnott & Shedden, 2000; Finlayson & Grove, 2015; Nakayama & Silverman, 1986), whereas other studies did not confirm such a general role of depth information (Dent, Braithwaite, He, & Humphreys, 2012; Finlayson, Remington, Retell, & Grove, 2013; Theeuwes et al., 1998).

The present findings strengthen the notion that surprising depth information can substantially modulate behavior. Task performance in the critical trial was not only better than performance in precritical trials but also better compared with postcritical trials. This is an unforeseen observation; in previous studies, surprise capture task performance in the critical trial was equal to or even below postcritical trials (Horstmann, 2002). Experiment 2 verified that processing of expectancy discrepant depth information follows the typical time course of surprise capture. In general, surprise capture seems to be computationally expensive and thus relatively slow (Horstmann, 2006; Horstmann & Becker, 2008). It has been proposed that it originates from at least three sources (Horstmann, 2015): perception of a surprising stimulus, detection of discrepancy, and shift of attention. Thus, it is conceivable that these components variably interact with different stimulus features. For instance, the observation from Experiment 1 that accuracy rates of almost all postcritical trials are below the accuracy in the critical trial may suggest that only the first unexpected occurrence of a depth cue possesses an additional alerting effect. Similar to other features, the recognition of the novel information requires a certain amount of time (400 vs. 100 ms).

Regarding the resulting motor response, a surprising depth cue also might trigger a different response pattern. Responses to surprising stimulus material are generally expected to take more time. The inhibitory component of such surprise responses are regarded to originate from discrepant task information (Meyer et al., 1991). The first encounter of a depth cue certainly was expectancy discrepant, yet there was no inhibition of the motor response in the critical trial. Other cues, such as color or motion, have been shown to provoke longer response times in the critical trial (Becker & Horstmann, 2011; Horstmann, 2002, 2005). However, some studies on surprise capture kept the target item on the screen for a longer duration (e.g., until response), whereas in the present experiments the target was removed after a brief presentation. Accordingly, increased dwelling on the target might have inhibited responses in those previous experiments. Nonetheless, surprising depth information can be integrated with higher priority, and once registered it is seemingly processed faster than other stimulus properties.

This notion appears reasonable from a behavioral point of view. Depth information in general is supposed to have a strong influence on behavior, because it signals potential threats under natural viewing conditions. Thus, within the framework of the behavioral urgency hypothesis, approaching (or closer) objects are believed to demand instantaneous processing priority (Franconeri & Simons, 2003). However, there are contrasting results from a study by Abrams and Christ (2005), who employed stereoscopic depth information to create receding stimulus motion. Trials, including the onset of receding motion cues, were associated with faster response times compared with trials comprising static or shrinking cues, respectively. According to these studies, surprising peripheral depth cues represent a valuable source of information.

Alternatively, it might be argued that it is not depth information per se that captures attention but rather a stimulus onset, which is caused by spatial displacement. The target letter following a depth cue may be regarded as a stimulus onset along the depth axis. Thus, target and placeholder did not occupy the same spatial location, which is unlike to precritical trials. Abrupt stimulus onsets have been shown to capture attention particularly in conditions where no strong attentional set is established (Yantis & Jonides, 1990). This also may be an explanation why the expected response time advantage of closer targets was not observed.

The present study reveals that expectancy discrepant depth information can capture attention irrespective of its relative location to an observer. Although the pattern of results is generally in line with similar studies on surprise capture there are some remarkable differences. The most striking observation might be that surprising depth information did not evoke prolonged response times. Innovative technical devices, such as smart-glasses or head-mounted displays, offer the possibility to integrate virtual or augmented stereoscopic depth information into daily life and working environments. Therefore, the present findings strengthen the notion that stereoscopic depth information may constitute a useful cue to facilitate task performance. This might be particularly important for tasks that require rapid integration of novel or surprising information from different spatial locations.