Value-driven attentional capture in the auditory domain

Anderson, Brian A.

doi:10.3758/s13414-015-1001-7

Value-driven attentional capture in the auditory domain

Published: 22 October 2015

Volume 78, pages 242–250, (2016)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Value-driven attentional capture in the auditory domain

Download PDF

Brian A. Anderson¹

2358 Accesses
33 Citations
2 Altmetric
Explore all metrics

Abstract

It is now well established that the visual attention system is shaped by reward learning. When visual features are associated with a reward outcome, they acquire high priority and can automatically capture visual attention. To date, evidence for value-driven attentional capture has been limited entirely to the visual system. In the present study, I demonstrate that previously reward-associated sounds also capture attention, interfering more strongly with the performance of a visual task. This finding suggests that value-driven attention reflects a broad principle of information processing that can be extended to other sensory modalities and that value-driven attention can bias cross-modal stimulus competition.

Value-driven attentional priority is context specific

Article 09 September 2014

Brian A. Anderson

Cross-modal generalization of value-based attentional priority

Article 17 August 2022

Laurent Grégoire, Lana Mrkonja & Brian A. Anderson

Guiding spatial attention by multimodal reward cues

Article 28 December 2021

Vincent Hoofs, Ivan Grahek, … Ruth M. Krebs

Attention determines which among multiple competing stimuli are represented in the brain (Desimone & Duncan, 1995). Attentional selection has long been understood to arise from the interplay between the bottom-up physical salience of the stimulus (e.g., Theeuwes, 1992; Yantis & Jonides, 1984) and its relationship to the top-down goals of the observer (e.g., Folk, Remington, & Johnston, 1992). More recently, a wealth of research has demonstrated that reward information provides a third source of input to the attention system. When a large reward is received, attention is strongly primed to select the rewarded target (Hickey, Chelazzi, & Theeuwes, 2010; see also Della Libera & Chelazzi, 2006). More persistent attentional biases develop for stimuli that have been learned to predict a reward outcome (e.g., Anderson, Laurent, & Yantis, 2011a, b; Della Libera & Chelazzi, 2009; Raymond & O’Brien, 2009). Such previously reward-associated stimuli capture attention even when task-irrelevant and physically nonsalient (e.g., Anderson et al., 2011b), demonstrating that learned value plays a distinct role in the guidance of attention (referred to as value-driven attention; see Anderson, 2013, for a review).

Value-driven attention was originally identified in the domain of vision, and a large effort has been undertaken to characterize the influence of learned value on visual attention. This includes its spatial specificity (e.g., Anderson & Yantis, 2012; Failing & Theeuwes, 2014; Theeuwes & Belopolsky, 2012), contextual specificity (Anderson, 2015a, b; Anderson, Laurent, & Yantis, 2012), extension to different visual properties (e.g., Chelazzi et al., 2014; Laurent, Hall, Anderson, & Yantis, 2015; Lee & Shomstein, 2014), persistence (Anderson & Yantis, 2013), relationship with psychopathology (Anderson, Faulkner, Rilee, Yantis, & Marvel, 2013; Anderson, Leal, Hall, Yassa, & Yantis, 2014), mechanism of learning (e.g., Sali, Anderson, & Yantis, 2014; Le Pelley, Pearson, Griffiths, & Beesley, 2015), and neural mechanisms (e.g., Anderson, Laurent, & Yantis, 2014; Hickey & Peelen, 2015; Peck, Jangraw, Suzuki, Efem, & Gottlieb, 2009; Qi, Zeng, Ding, & Li, 2013).

Principles of salience-driven and goal-directed attention, although most extensively studied in vision, have been extended to other sensory modalities (see Spence, 2010, for a review). In the auditory domain, for example, a task-irrelevant sound can draw spatial attention to its approximate source of origin (e.g., Spence & Driver, 1994, 1997). Salient auditory singletons can capture attention and interfere with the ability to discriminate an auditory target (Dalton & Lavie, 2004, 2007), similar to the impact of salient singletons in the visual domain (e.g., Bacon & Egeth, 1994; Folk & Anderson, 2010; Theeuwes, 1992). Auditory singletons can also capture attention and interfere with the performance of a nonspatial visual search task when they share a defining feature with the visual target (e.g., both are singletons in their duration; Dalton & Spence, 2007). In addition, identifying a target sound can elicit an attentional blink that interferes with subsequent visual processing (Arnell & Jolicoeur, 1999), although the robustness of such temporal cross-modal interference may be limited (e.g., Duncan, Martins, & Ward, 1997). Shifting attention from processing auditory information to processing visual information and vice versa incurs a performance cost and recruits the same parietal mechanisms known to mediate attention shifts within a sensory modality (Shomstein & Yantis, 2004; Yantis et al., 2002).

Learned associations with reward can modulate the sensory processing of a stimulus within modalities other than vision (e.g., Pantoja et al., 2007), and previously reward-associated sounds can facilitate the processing of visual stimuli through cross-modal interactions (Pooresmaeili et al., 2014). There is also evidence that auditory stimuli can influence value-driven attention in the visual domain. Specifically, when a sound is used to indicate reward, visual stimuli that are paired with this sound subsequently capture attention (Miranda & Palmer, 2014). However, it is currently unknown whether the principle of value-driven attention similarly extends to sensory modalities beyond vision, such that previously reward-associated sounds automatically capture attention. Relatedly, it remains unknown whether value-driven attention can bias cross-modal stimulus competition, with previously reward-associated sounds interfering with the performance of a visual task. Evidence in the affirmative would suggest that value-driven attention reflects a broad principle of human information processing.

In the present study, I examine whether previously reward-associated sounds can automatically capture attention and interfere with the performance of a visual task. Across two experiments, I demonstrate that a sound previously associated with high reward interferes more strongly with the identification of a visual target than a sound previously associated with comparatively low reward. My findings extend the principle of value-driven attentional capture into the domain of auditory attention.

Experiment 1

In Experiment 1, participants first completed a training phase involving the detection of auditory targets. Participants listened for one of two spoken letters played over the computer speakers and pressed a key every time they heard one of those target sounds. One target sound yielded a high reward every time it was identified, whereas the other yielded a low reward. Nontarget spoken letters were also played, to which participants withheld responding. Irrelevant visual letters were presented during training to compete with the perception of the spoken letters (see Desimone & Duncan, 1995), requiring participants to select auditory stimuli while explicitly ignoring concurrent visual stimulation. This was to maximize the strength of any learned attentional bias on subsequent information processing, which could involve both enhanced activation of the target sound as well as the suppression of concurrent visual input.

Once participants had experienced these sound-reward associations during training, these same sounds served as task-irrelevant distractors during a visual search task for a shape-defined target in a subsequent unrewarded test phase. The distractor sound could be that of a former high-reward target (high-value distractor), a former low-reward target (low-value distractor), or a former nontarget. All visual search trials involved a distractor sound in order to equate the alerting effects that auditory stimuli might have on task performance across conditions. Of interest was whether the high-value distractor sound impaired performance of the visual search task, consistent with cross-modal value-driven attentional capture in the auditory domain.

Method

Participants

Twenty-six participants were recruited from the Johns Hopkins University community. All reported normal or corrected-to-normal visual acuity and normal color vision. One participant was an outlier, with a capture score that deviated from the grand mean by more than 2.5 standard deviations and was replaced; replacing this participant did not change any of the statistical conclusions.

Apparatus

A MacBook Pro laptop computer equipped with MATLAB software and Psychophysics Toolbox extensions (Brainard, 1997) was used to present the stimuli. Participants were seated approximately 65 cm from the laptop in a dimly lit room. Manual responses were entered using the keyboard on the laptop. The sounds were presented from the built-in speakers.

Training phase

Stimuli

A steady stream of colored letters (each 1.3° × 1.5° visual angle) was presented at the center of the screen (see Fig. 1a). The identity of each letter was selected from the set {A, C, F, G, H, J, K, M, N, P, R, T, U, V, X, Y}, and the color of each letter was selected from the set {blue, cyan, pink, orange, yellow, white}. On some letter displays, simultaneous with the onset of the visual letter, a spoken letter was played over the computer speakers. The spoken letter could be an “A,” “Y,” “X,” or “H.” The spoken letters were taken from the male voice stimulus set used by Shomstein and Yantis (2004) in an auditory attention task. The duration of each spoken letter was 240 ms, with 10 ms of silence added to the end of the sound file (for a total play time of 250 ms). A bank total reflecting earnings in the task was always visible 5.2° below the visually-presented letters, , centered along the x-axis in white 40-point font. Feedback concerning task performance appeared periodically between the visually presented letters and the bank total, also centered along the x-axis in white 40-point font.

Design

For the visually presented letters, letter color and letter identity were randomly selected from the respective stimulus sets with the rule that no color or identity could repeat on consecutive letter displays. Each of the four spoken letters were presented equally often, the order of which was randomized within each block of trials. Therefore, the correspondence between the identity of the spoken letter and the concurrent visually presented letter was at chance (1/16). Each participant listened for the same two target letters: “A” and “Y.” One target letter was assigned a high reward value while the other was assigned a low reward value that was awarded every time the target was correctly identified. Which target letter served as the high-reward target was counterbalanced across participants.

Procedure

The training phase consisted of four blocks of trials, each of which comprised 610 letter-displays presented for a duration of 500 ms each. Each letter display contained a visually presented letter at the center of the screen. One hundred twenty of these letter displays also contained a spoken letter, the duration of which lasted the first 250 ms of the letter display. Letter displays that included a spoken letter were randomly preceded by three, four, or five letter displays with no spoken letter. Once participants initiated the block by pressing the space bar, the entire letter-display set ran through to completion. After each block, participants were provided a brief rest period. There were no practice trials.

Participants were instructed to press the space bar as quickly as possible whenever they heard an “A” or “Y” spoken, which served as the target letters. They were informed that doing so would result in a monetary reward added to their total earnings. Participants were also instructed to withhold responding to the other two spoken letters and were informed that doing the opposite would result in a small amount of money being deducted from their total earnings. Participants were also informed that the money they earned from the first part of the experiment would serve as their compensation for completing the entire study. Participants earned 10¢ whenever they identified the high-reward target sound and 2¢ whenever they identified the low-reward target sound; they lost 5¢ for pressing the space bar in response to a nontarget sound. The instructions made no reference to the amounts of money that could be gained or lost, or any association between these amounts and the letter identities, which had to be learned from experience in the task. Following a depression of the space bar within 1,500 ms of the target sound presentation, participants received feedback indicating “+10¢” or “+ 2¢,” while their bank total was simultaneously updated. If participants did not respond after 1,500 ms, the word “Miss!” was presented as feedback, and if participants responded within 1,500 ms of a nontarget sound, they received feedback indicating “- 5¢,” while their bank total was simultaneously updated. Miss and loss feedback remained on the screen for one letter display, and reward feedback remained on the screen for one letter display plus any letter displays occurring after a response was recorded that fell within the 1,500 ms response deadline. Note that reward was predicted only by the identity of the target sound and not by an associated motor response (which was the same for each target sound).

Test phase

Stimuli

Each trial consisted of a fixation display, a search array, and (in the event that a correct response was not registered) a feedback display (see Fig. 1b). The fixation display contained a white fixation cross (0.7° × 0.7°) presented in the center of the screen, and the search array consisted of the fixation cross surrounded by six colored shapes (each 2.4° × 2.4°) presented along an imaginary circle with a radius of 4.8°. The six shapes comprising the search array consisted of either a diamond among circles or a circle among diamonds, and the target was defined as the unique shape. The color of each shape was drawn from the same set of six colors used during training (blue, cyan, pink, orange, yellow, white) without replacement on each trial. Different colors were included in the stimulus array to increase nontarget heterogeneity and thereby reduce the salience of the target (Duncan & Humphreys, 1989), maximizing the ability of the auditory distractors to effectively compete for attention. The feedback display, if presented, consisted of the words “Incorrect” or “Too Slow” in white 40-point font at the center of the display. All stimuli were presented on a black background.

Immediately preceding the search array, one of three sounds from the training phase was presented. The sounds used were the previous high-reward target (high-value distractor), the previous low-reward target (low-value distractor), and a previous nontarget (the sound “X”).

Design

One-third of the trials contained a high-value distractor, one-third a low-value distractor, and one-third a former nontarget distractor. For each of these trial types, the target was presented in each location equally often, the bar inside of which was equally often vertical and horizontal. Thus, in the test phase, the distractor sounds were completely irrelevant to the performance of the visual search task.

Procedure

The experimenter started the test phase at the request of the participant, immediately following a self-paced break between the training and test phases. Participants were instructed to ignore the spoken letters and to focus on identifying the oriented bar within the unique shape. The test phase consisted of 144 trials, which were preceded by 20 practice trials that did not include the spoken letters.

Each trial began with a fixation display that was presented for a randomly varying interval of 500, 600, or 700 ms. Then, while the fixation display was still on screen, one of the three sounds was presented over the speakers for the full 250 ms duration. Immediately after the sound was finished playing, the search array appeared and remained on screen until a response was made or 1,200 ms had elapsed, after which the trial timed out. Sounds were presented prior to the presentation of the stimulus array in order to ensure that the identity of the sound, which unfolds over time, was sufficiently processed before the target could be selected. The timing of the distractor sounds relative to the stimulus array was informed by the time required for an auditory target to elicit an attentional blink in the visual domain (Arnell & Jolicoeur, 1999).

Participants made a forced-choice target identification by pressing the “z” and the “m” keys for the vertically and horizontally orientated bars within the targets, respectively. Thus, the stimuli in the search array were linked to a different response rule than the auditory targets in the training phase, thereby minimizing contributions from response priming by the distractor sounds to RT. The search array was followed immediately by error feedback (the words “Incorrect” or “Too Slow”) for 1,000 ms in the event that a correct response was not registered (this display was omitted following a correct response) and then by a blank 1,000 ms intertrial interval. No monetary rewards were given in the test phase. At the conclusion of the test phase, participants were paid the amount they had earned in the training phase.

Data analysis

In the training phase, RT in target detection served as the primary measure of interest. Hit rate (percentage of targets detected) and false alarm rate (percentage of nontargets eliciting a response) were also computed. In the test phase, both responses that were incorrect and responses that exceeded the timeout limit were scored as errors; only correct responses were included in the analysis of RT. For both phases of the experiment, RTs more than 2.5 standard deviations above or below the mean of their respective condition for each participant were trimmed.

Results and discussion

Training phase

Participants were significantly faster to detect a high-reward target (M = 560 ms) than a low-reward target (M = 591 ms), t(25) = 4.67, p < .001, d = .92. This suggests that high-reward targets had greater attentional priority than low-reward targets. Because participants were not explicitly informed of the reward structure of the task, this attentional bias was the result of reward learning. Hit rate was very high and did not differ for high- and low-reward targets, t(25) = 0.48, p = .645 (99.4 % and 99.3 %, respectively). False alarm rate was very low (0.8 %).

To examine the evolution of the observed attentional bias for the high-reward target over time, the RT difference between high- and low-reward targets was computed separately by block and compared across the four blocks of training. This analyses revealed no main effect of block, F(3, 75) = 1.29, p = .285, (M = 28, 37, 30, and 41 ms across blocks). Although the attentional bias increased by 13 ms from the first to last block, this difference was not reliable, t(25) = 1.61, p = .119. An attentional bias for the high-reward target was evident as early as the first block, t(25) = 4.04, p < .001, d = .79, suggesting that the reward learning and consequent effects on attentional priority unfolded rapidly in this task.

Test phase

A repeated-measures analysis of variance (ANOVA) on mean RT with distractor condition (former nontarget, low-value, high-value) as a factor revealed a main effect, F(2, 50) = 5.59, p = .006, η ²_p = .183 (see Fig. 2). Participants were slower to report the target when a high-value distractor sound was emitted (M = 700 ms) compared to both a low-value distractor sound (M = 684 ms), t(25) = 2.59, p = .016, d = .51, and a former nontarget sound (M = 681 ms), t(25) = 3.64, p = .001, d = .71. The difference between the high- and low-value distractor conditions can only be explained in terms of relative value, as the actual sounds used were counterbalanced across these conditions. Accuracy did not differ by distractor condition, F(2, 50) = 0.17, p = .841. Accuracy was 84.0 %, 84.3 %, and 84.7 % across the former nontarget, low-value, and high-value distractor conditions, respectively.

Experiment 2

Experiment 1 provides evidence that sounds previously associated with high reward automatically capture attention in a manner that biases stimulus competition away from input arising from the visual system. Such value-dependent auditory distraction occurred when participants were actively searching for a visual target, consistent with cross-modal value-driven attentional capture. However, there are aspects of the experimental design that complicate interpretation of the observed distraction.

During training, the auditory targets were associated with a motor response (participants were free to choose which hand they responded to targets with). Responding to visual targets during the test phase involved responding with each hand, depending on the orientation of the bar contained within the target, and thus required either the same or a different motor response than that previously associated with the high- and low-value distractor sounds. To the degree that the distractors elicited their previously associated motor response, the high-value distractor may have given rise to increased response conflict on some trials, which may have partly accounted for the observed slowing of RT in that condition.

Also, during the test phase of Experiment 1, the distractors were presented in advance of the stimulus array. This was done to ensure that the distractors were sufficiently processed to the degree that they would be able to effectively compete for attention with a visual target. However, although irrelevant to the visual search task itself, all distractors predicted the exact timing of the visual search array and participants may have therefore voluntarily attended to the distractors (in spite of the instruction to try to ignore them) in order to better prepare for the upcoming visual task. To the degree that this occurred, the cost in performance associated with the high-value distractor may be explained by delayed disengagement following voluntary attentional orienting rather than the initial capture of attention.

Finally, irrelevant visual letters were presented during the training phase in order to pair the selection of targets with the need to ignore visual input. Although maximizing the magnitude of possible attention effects, this creates ambiguity in the nature of the value-driven attentional bias. It is unclear the degree to which the value-driven bias observed in Experiment 1 reflects the selection of the auditory distractors vs the suppression of concurrent visual input.

To address these issues, Experiment 2 involved a conceptually similar approach with a simpler design. No visual letters were presented during training, and participants responded with different hands during the training phase (left hand) and test phase (right hand). The distractor sounds were played simultaneously with the onset of the stimulus array, making them no more predictive of the stimulus array than the onset of the visual stimuli itself. Finally, the different colors were removed from the stimulus array at test (all stimuli were white) in order to enhance the visual salience of the target (Duncan & Humphreys, 1989) and thereby examine the robustness of value-driven auditory attentional capture.