Selective attention—the ability to focus on relevant information while ignoring irrelevant and potentially interfering distractors—is crucial for any kind of controlled cognitive processing. Research has investigated the circumstances under which task-irrelevant information affects behavior. The Stroop task (Stroop, 1935) is perhaps the most prominent example within this research tradition: In the classical Stroop task, participants are asked to respond to the color of a word (target) while ignoring its meaning (distractor). Participants typically fail to ignore the word meaning, which leads to impaired performance when the target and the distractor are incongruent (distractor interference).

One important recent finding is that distractor interference—in the Stroop task as well as in similar response interference paradigms—is reduced when additional stimuli besides the target and the distractor are presented (e.g., Lavie, 2005; Lavie, Hirst, De Fockert, & Viding, 2004; Tsal & Benoni, in press). As an attempt to explain these findings, a load theory has been proposed, in which it is assumed that available task-irrelevant information (perceptual load) consumes perceptual capacity that is no longer available for the processing of the distractor. Distractor processing should thus be prevented when the target is presented under concurrent perceptual load, with the result being improved selective attention. This was confirmed in studies manipulating the number of stimuli that needed to be attended in order to find and identify the target (e.g., Lavie, 1995; Lavie & Tsal, 1994).

Recently, a different line of research has been established assuming that the mere presence of additional stimuli is sufficient, and thus attention does not need to be allotted, for distractor interference to be reduced (Benoni & Tsal, 2010; Tsal & Benoni, in press). This explanation of the perceptual-load effect is based on the so-called Stroop dilution effect (Kahneman & Chajczyk, 1983) and is built upon the early visual interference account proposed by Brown, Roos-Gilbert, and Carr (1995). The Stroop dilution effect shows that interference effects in Stroop tasks are diminished when an additional neutral word is presented elsewhere in the visual field. Benoni and Tsal (2010) and Tsal and Benoni (in press) reported evidence that the dilution produced by neutral stimuli is critical, rather than perceptual load, in reducing distractor interference. They introduced a dilution condition similar to a high-perceptual-load condition, except that the task-irrelevant stimuli were clearly distinguished from the target (e.g., presented in a different font color). Contrary to the predictions of the perceptual-load account, interference was reduced in the dilution condition. The authors assumed that distractors are processed similarly in displays with few or with many additional stimuli, and argued instead that distractor interference is reduced in a high-load condition because stimuli interfere with each other at an early stage of visual processing, before lexical coding, by degrading each other’s feature representations. The source of the interference decrease is therefore seen at an early stage of parallel processing of the (visual) information.

In both lines of research (perceptual load and dilution), most studies have used visual materials, and task-irrelevant and task-relevant stimuli were always presented simultaneously (see Miles & Proctor, 2009, for a notable exception). To further extend our knowledge of the circumstances that improve selective attention, our research aim was twofold: First, we were interested in whether a reduction of distractor interference by the presence of additional stimuli can also be found in the auditory domain. Second, we aimed to investigate whether concurrent presentation of task-irrelevant and task-relevant stimuli is a necessary condition for interference reduction to occur.

For these purposes, we administered an auditory Stroop task (Leboe & Mondor, 2007). In the control condition, participants were presented with a complex high- or low-pitched tone sounding from a high- or low-positioned speaker. They were instructed to attend to the pitch and to ignore the location. Pitch and speaker position could be congruent (i.e., a high-pitched tone from a high-positioned speaker or a low-pitched tone from a low-positioned speaker) or incongruent (i.e., a high-pitched tone from a low-positioned speaker or a low-pitched tone from a high-positioned speaker). Performance in incongruent trials is typically slower and more erroneous than performance in congruent trials.

In the experimental condition, shortly before the Stroop stimulus sounded, four additional short or long tones were presented. Participants were instructed to ignore these additional tones. By presenting the task-irrelevant tones somewhat before the Stroop tone, they were clearly distinguishable from the target. Participants could prepare for the Stroop tone in both conditions, because a fixation cross was presented on the computer screen immediately before the Stroop tone sounded. If task-irrelevant information can also reduce interference in the auditory domain, the Stroop interference effect should decrease significantly in the experimental condition, relative to the control condition in which the Stroop task was performed in the absence of additional stimuli. A reduced Stroop effect in the experimental condition would also show that distractor interference can be reduced even with nonconcurrent presentation of the to-be-ignored stimuli and the Stroop stimulus. This procedure might allow for distinguishing between the perceptual-load and dilution accounts: Since the task-irrelevant tones were clearly separable from the target, a reduction of interference would support the dilution account over the perceptual-load account.

Method

Participants

The participants were 36 students (22 women, 14 men) at the University of Freiburg with different majors, participating for course credit or as paid volunteers. Their mean age was 24 years, ranging from 22 to 43 years.

Stimuli and apparatus

All tones were presented at a sound pressure level of approximately 70 dB. Two loudspeakers were positioned approximately 11° above and below participants’ visual angle. A chinrest was used to keep head position constant for all participants. The distance from the chinrest to the computer screen was approximately 57 cm.

Auditory stroop task

As in the study reported by Leboe and Mondor (2007), two tones were generated and stored before the experiment started. The tones were based on sine waves with a duration of 120 ms. The low-pitched tone consisted of a fundamental frequency of 362 Hz along with a first (724 Hz) and a second (1086 Hz) harmonic. Analogously, the high-pitched tone consisted of a fundamental frequency of 732 Hz plus the first (1464 Hz) and second (2196 Hz) harmonics. Relative to the fundamental frequency, the intensities of the first and second harmonics were set to 50% and 25%, respectively. Participants had to decide whether the tones were high or low by pressing “j” for high-pitched tones and “k” for low-pitched tones on a standard computer keyboard. Additionally, they were instructed to ignore the location (i.e., to ignore whether the tone was presented from the high-positioned or the low-positioned loudspeaker).

To-be-ignored stimuli

In the experimental condition, participants heard a sequence of four to-be-ignored tones. These tones were presented from both loudspeakers simultaneously and were either short (90 ms) or long (150 ms). Tones were separated by a silent intertone interval of 800 ms. The pitches of the short and long tones were based on the mean frequency pattern of the high-pitched and low-pitched tones of the Stroop task—i.e., a fundamental frequency of 547 Hz, a first harmonic (1094.5 Hz), and a second harmonic (1641 Hz). Relative to the fundamental frequency, the intensities of the first and second harmonics were again set to 50% and 25%, respectively. Tone length was counterbalanced across trials.

Procedure and design

Figure 1 depicts the trial sequence. Each trial started with a blank screen of 300 ms. Subsequently, in the control condition the symbol “+” was presented for 200 ms to allow participants to prepare for the Stroop task. Participants then heard the Stroop tone (120 ms) and were instructed to respond within a response window of 1,500 ms. After a response was registered or 1,500 ms had passed, the next trial started after an intertrial interval of 500 ms. In the experimental condition, the initial blank screen was followed by the symbol “#” for 200 ms, to indicate the presentation of the to-be-ignored tones. Afterward, the four to-be-ignored short or long tones were presented. Then a blank screen was presented for 800 ms, followed by the symbol “+” presented for 200 ms. Finally, the Stroop task was presented as described above.

Fig. 1
figure 1

Schematic depiction of the procedure. In the experimental condition (left panel), each trial started with a blank screen for 300 ms, followed by the symbol “#” for 200 ms, indicating the beginning of the to-be-ignored stimuli: Participants heard four short or long tones with maximal duration of 3,000 ms (maximal four long tones of 150 ms each plus three intertone intervals of 800 ms each). Participants were instructed to ignore the tones. After a blank screen of 800 ms, participants saw the “+” symbol, indicating the beginning of the Stroop task. Participants heard a high-pitched or a low-pitched tone (120 ms) presented from a high-positioned or low-positioned loudspeaker and had to indicate whether the tone pitch was high or low. After a response was given or a response window of 1,500 ms had passed, participants saw a blank screen for 500 ms before the next trial started. In the control condition (right panel), the trial started with a blank screen for 300 ms, followed by a “+” symbol presented for 200 ms, indicating the beginning of the Stroop task. From there, the trial was identical to the experimental condition

The experiment had a 2 (congruency: congruent, incongruent) × 2 (condition: control, experimental) × 2 (order of task: control condition first, experimental condition first) design; the first two factors were varied within subjects, and the last factor was varied between subjects. In a block consisting of 30 trials, participants practiced the condition on which they would start. Subsequently, participants performed the task in each of the conditions within two blocks of 40 trials each. After completion of the first condition, the second was administered as described above. In the practice blocks, participants received feedback about their reaction time (RT) and saw the word Fehler [“error”], printed in red, in case of an erroneous keypress. The experiment lasted approximately 30 min.

Results

RT analyses were based on trials in which no errors were made. Outliers in response latencies were deleted from each individual’s distribution according to Tukey’s criterion (i.e., more than 1.5 interquartile ranges below the first or above the third quartile); this led to the exclusion of 1.2% of the trials. Mean RTs in the control and experimental conditions were 443 and 444 ms, respectively; mean error rates in the control and experimental conditions were 6.55% and 4.60%, respectively. See Table 1 for mean correct response latencies (in milliseconds) and mean error rates (%) separated for condition (control condition, experimental condition) and congruency (congruent, incongruent).

Table 1 Mean correct response latencies (in milliseconds) and mean error rates (%) of the Stroop task separated for condition (control condition, experimental condition) and congruency (congruent, incongruent)

RT and error data were analyzed in a 2 (congruency) × 2 (condition) × 2 (order of task) analysis of variance. Responses were faster and more accurate in congruent trials (i.e., main effects of congruency in RTs as well as error rates), F(1, 34) = 4.11, p = .05, \( \eta_p^2 \) = .11, and F(1, 34) = 10.10, p < .01, \( \eta_p^2 \) = .23, respectively. The main effect of condition was significant in the error data, F(1, 34) = 16.34, p < .01, \( \eta_p^2 \) = .33, revealing that more errors were made in the control condition. There was a tendency for participants to respond more slowly in the condition on which they started, F(1, 34) = 3.66, p = .06, \( \eta_p^2 \) = .10. This reflects a training effect: Participants’ reactions became faster during the course of the experiment.

Importantly, the interaction between congruency and condition was significant in the RT data, F(1, 34) = 8.73, p = .01, \( \eta_p^2 \) = .20. Follow-up analyses (see below) indicated that the interference in RTs was larger in the control than in the experimental condition (cf. Table 1). The parallel effect in the error data just failed to reach significance, F(1, 34) = 3.94, p = .06, \( \eta_p^2 \) = .10; descriptively, the interference effect was smaller in the experimental condition (cf. Table 1). For all other effects, F < 2, p > .17.

Recall that the only difference between the conditions was the to-be-ignored tones presented before the Stroop stimulus. From this it follows that nonconcurrently presented task-irrelevant tones can help participants ignore the irrelevant distractor information. This was confirmed by follow-up analyses of the Stroop interference effects separated by conditions. These analyses revealed that the main effect of congruency was significant in the control condition for both the RT and error data, F(1, 34) = 9.46, p < .01, \( \eta_p^2 \) = .22, and F(1, 34) = 12.50, p < .01, \( \eta_p^2 \) = .27, respectively. In contrast, the main effect of congruency in the experimental condition did not reach significance for either RTs or error rates, F(1, 34) = 0.02, p = .88, \( \eta_p^2 \) < .01, and F(1, 34) = 2.75, p = .11, \( \eta_p^2 \) = .08, respectively. Thus, interference effects in the auditory Stroop task were leveled when auditory task-irrelevant information was presented just before the Stroop stimulus.

Discussion

In the present study, participants performed an auditory Stroop task under two conditions. In the control condition, participants heard only the Stroop tone and had to indicate whether they heard a high-pitched or a low-pitched tone. High- and low-pitched tones could be either congruent or incongruent with the to-be-ignored location from which the tones were presented, resulting in faster RTs and higher accuracy in congruent than in incongruent trials. In the experimental condition, just before the Stroop tone was presented, participants heard four short or long tones that they were instructed to ignore. Stroop interference effects were largely reduced in this condition.

These results demonstrate that (a) task-irrelevant auditory tones can reduce distraction in auditory selective attention and (b) interference reduction can be obtained for nonconcurrent presentation of task-irrelevant stimuli and the target.

As reviewed in the introduction, both perceptual-load theory and the dilution account predict interference reductions when additional to-be-ignored stimuli are presented. Which of these accounts is supported by the present experiment? Two observations support the dilution account: First, the results show that mean RTs did not differ between the conditions. Given that an increase in RTs is the crucial manipulation check for perceptual load (e.g., Lavie, 1995), our data speak against perceptual-load effects in the present experiment. However, note that an overall increase in RTs might be a problematic criterion for a perceptual-load manipulation check because it might reflect not only a capacity limit (which is assumed to be the relevant mediator of the perceptual-load effect; Lavie & De Fockert, 2003) but also an increase in task difficulty.Footnote 1 Second, the to-be-ignored stimuli were clearly separable from the target (i.e., they were presented before the target and separated by a fixation stimulus); it was clear to participants that they did not need to be attended. In sum, the present findings are not easily interpreted as showing an effect of perceptual load; instead, they may be better explained as showing dilution effects.

Miles and Proctor (2009) found reduced compatibility effects when to-be-ignored stimuli were presented before a visuospatial compatibility task comprising symbolic distractors (words and arrow). The present results extend these findings, as follows: First, dilution effects occurred for distracting spatial locations (i.e., a nonsymbolic distractor); second, they occurred when both the distractor and the task-irrelevant information were presented auditorily.

The present results, as well as those of Miles and Proctor (2009), seem to contradict findings by Cho, Lien, and Proctor (2006). In their Stroop task, the to-be-named color was carried by either a color word or a simultaneously presented neutral word. Stroop dilution effects only emerged when a neutral word was the color carrier. In the present study, however, Stroop dilution was observed in the absence of neutral stimuli (i.e., task-irrelevant tones diluted the “embedded” location of the target tone, even though the target tone was also the carrier of the pitch), suggesting that Cho et al.’s results only apply to concurrent presentation of the task-irrelevant stimuli and the target–distractor compound.

A possible alternative interpretation of the present finding is that, due to the to-be-ignored tones, participants might have been unable to discriminate tone location (i.e., distractor information) in the experimental condition. In support of this notion, Leboe and Mondor (2007) found increased overall errors in the auditory Stroop when participants responded to tone location rather than to tone frequency. Yet, in the present study, participants were clearly able to discriminate tone location, as demonstrated by the interference effects in the control condition. It might be possible that the to-be-ignored tones further reduced the discriminability of tone location, thereby reducing interference in the experimental condition. This alternative explanation of the present finding, however, is indistinguishable from the dilution account (Kahneman & Chajczyk, 1983; see Tsal & Benoni, in press, for a discussion): When task-irrelevant stimuli are presented in addition to the distractor and target, the features of all stimuli compete with each other, degrading the quality of their representations.

Both the perceptual-load account and the dilution account predict that a larger number of additional stimuli should lead to greater interference reduction effects. However, in both accounts, the number of additional stimuli has not been manipulated in a parametric manner. It remains to be seen whether, or under which circumstances, a single stimulus would be sufficient to eliminate distractor interference. Interestingly, recent research by Fischer, Plessow, and Kiesel (2010) showed that a single tone increased, rather than decreased, distractor interference (albeit in a different selective attention task); presumably, the tone signal served as a go signal, increasing perceptual detection of both the target and distractor information (Correa, Lupiáñez, Madrid, & Tudela, 2006).

In sum, the present findings show that auditory selective attention can be improved when task-irrelevant tones are presented before the target–distractor stimulus. The present study is the first inquiry into the nature of this effect in the auditory domain, and research is underway to investigate it in more detail.