Introduction

Arousal, the global state of activation of our central and autonomic nervous system, is one of the driving forces in human cognition and behavior. Stable individual differences in global arousal level can shape temperamental traits and contribute to psychopathology. In addition, global arousal level shows large intra-individual fluctuations, not just those expressed in the sleep-wake rhythm, but also on a much faster time scale: moment-to-moment, spontaneous as well as task-related fluctuations that have large effects on cognition, behavior, and the underlying brain circuitry (Cavanagh, Wiecki, Kochar, & Frank, 2014; Eldar, Cohen, & Niv, 2013; Mather & Sutherland, 2011; Murphy, Vandekerckhove, & Nieuwenhuis, 2014; Nassar et al., 2012; Pourtois, Schettino, & Vuilleumier, 2013; Sørensen, Vangkilde, & Bundesen, 2015). These fluctuations in arousal, which occur even in the face of constant sensory stimulation, give rise to substantial variability in task performance and cortical state – sources of variability that are often treated as a nuisance or noise. A better understanding of these effects of arousal will therefore allow psychologists and neuroscientists to account for large portions of hitherto unexplained variance in their data.

According to a recent proposal, arousal level boosts the competition during decision-making between representations that have high priority (e.g., because of bottom-up salience or top-down task relevance) and representations that have lower priority (Mather & Sutherland, 2011; see also Hockey & Hamilton, 1970; Smith, 1985). Accordingly, arousal amplifies the contrast between weak and strong neural inputs, thus strengthening already existing biases in perception and memory. Computational modeling studies (Eldar et al., 2013; Eldar, Niv, & Cohen, 2016) and neurobiological knowledge (Mather, Clewett, Sakaki, & Harley, 2016) suggest that these “winner-take-more/loser-take-less” dynamics are caused, at least in part, by the norepinephrine-mediated changes in global gain that accompany arousal. Recent research has begun to yield evidence for the arousal-biased competition (ABC) theory, showing for example that increases in arousal enhance perceptual learning of highly salient stimuli (Lee, Itti, & Mather, 2012; Sakaki, Fryer, & Mather, 2013) and of stimulus features that match a person's attentional predisposition (e.g., a bias toward visual or semantic information; Eldar et al., 2013), while impairing learning of lower-priority stimuli.

The goal of the present study was to test the ABC theory by examining the effects of arousal on bottom-up attentional biases, focusing in particular on a previous study in the domain of visual attention and short-term memory (Sutherland & Mather, 2012). Sutherland and Mather played negative (arousing) or neutral sound clips from the International Affective Digital Sound set (IADS; Bradley & Lang, 2007) to observers, before flashing brief arrays of letters. Observers were instructed to report as many letters as possible. The letters were of either high or low contrast, such that high-contrast letters would receive high priority due to their physical salience, and low-contrast letters would receive lower priority. Importantly, on each trial the display contained three high-contrast and five low-contrast letters, forcing the signals to compete for limited processing resources. In this situation, ABC theory predicts that if the arousal level of observers is heightened before viewing the letter display, this will boost the competitive advantage of letters that already have high priority in a “winner-take-more”/“loser-take-less” manner. Therefore, an observer should be biased even more in favor of the high-contrast letters, and this shift in priorities would necessarily be at the expense of priority given to low-contrast letters, due to limitations in processing capacity (Bundesen, 1990), as well as visual short-term memory capacity, typically estimated to be three to four items in young healthy participants (e.g., Cowan, 2001; Luck & Vogel, 1997; McAvinue et al., 2012; Zhang & Luck, 2008).

Sutherland and Mather’s (2012) results confirmed this prediction: Under arousal high-contrast letters were reported correctly more often, while low-contrast letters were reported less often. Importantly, there was no difference in the number of correctly reported letters overall, which supports the notion that arousal biases the allocation of attention, but does not affect the size of the visual short-term memory (VSTM) store that encodes the stimuli. This finding highlights a potential problem in Sutherland and Mather’s data analyses. They applied factorial repeated-measures ANOVAs to their data, with the independent factors type of sound (negative vs. neutral) and visual contrast (high vs. low), and calculated main effects and the interaction effect of these factors on the probability of correct letter report. Given that VSTM is of a more-or-less fixed size (Cowan, 2001; McAvinue et al., 2012; Luck & Vogel, 1997; Zhang & Luck, 2008), a signal processing bias towards stimuli with high bottom-up or top-down priority will increase the number of correctly reported stimuli of that type, which will necessarily lead to a lower number of correctly reported low-priority stimuli. In other words, the numbers of correctly reported high- and low-priority stimuli are not independent of each other; when presented simultaneously, the two types of stimuli are forced to share a fixed amount of processing and memory resources. As a result, the interaction terms in Sutherland and Mather’s ANOVAs, which were presented as a key measure of arousal-biased competition, are hard to interpret in that they essentially count the same effect twice: once as enhanced recall of high-contrast letters, and then again as impaired recall of low-contrast letters – two sides of the same coin.

In the analyses reported below, we captured the bias towards high-priority stimuli and the bias against low-priority stimuli in a single measure, which we compared between arousing and neutral trials using frequentist and Bayesian t-tests. We also report a re-analysis of the original data of Sutherland and Mather (2012), which shows that using this more appropriate measure does not change the outcome of their experiments.

We present the results of two main experiments modelled on the original experiment by Sutherland and Mather (2012). Experiment 1 was an attempt at replicating the results of the original paper, using a different arousing stimulus set, pictures from the International Affective Picture System (Lang, Bradley, & Cuthbert, 2008), while Experiment 2 was a close replication of the original experiment, using IADS sounds (Bradley & Lang, 2007). Both experiments tested the prediction that arousal boosts the competition between physically salient, high-priority stimuli and less salient, low-priority stimuli, thus increasing the selectivity of bottom-up attention.

Experiment 1

Experiment 1 was designed to test whether pre-trial arousal enhances the bottom-up attentional bias toward physically salient versus less salient stimuli in a multi-letter identification task. In contrast to Sutherland and Mather (2012), who used IADS sound clips to manipulate arousal, we used negative and neutral IAPS pictures (Lang et al., 2008). A disadvantageous property of the IADS digital sound set is that it is much smaller than the IAPS picture set (167 sounds vs. 1,182 pictures). This makes it difficult to define large and well-delineated categories of stimuli based on arousal and/or valence ratings. Conversely, the IAPS picture set allows the selection of relatively large selections of pictures with a high degree of dissimilarity in arousal and valence ratings. Figure 1 compares the arousal and valence ratings of the IADS and IAPS stimulus sets, and highlights the samples used in the present experiments. Because we were able to construct picture categories with a larger distance in arousal ratings, we assumed that, if anything, our arousal manipulation would be stronger than that of Sutherland and Mather. Indeed the highly arousing IAPS pictures used here are known to cause robust sympathetic arousal responses, such as increased pupil dilation and skin conductance (Bradley, Miccoli, Escrig, & Lang, 2008), and have repeatedly been demonstrated to elicit a large late positive potential (LPP), an electrophysiological arousal response (e.g., Ásgeirsson & Nieuwenhuis, 2017; Brown, van Steenbergen, Band, de Rover, & Nieuwenhuis, 2012; Cuthbert, Schupp, Bradley, Birbaumer, & Lang, 2000).

Fig. 1
figure 1

A scatterplot of all (a) IAPS pictures and (b) IADS sounds, plotted by arousal and valence/pleasure ratings (Bradley & Lang, 2007; Lang et al., 2008). The colored symbols (* and +) reflect the pictures and sound clips included in the current experiments, while the gray circles represent the remaining pictures and clips in the database. Note that the sound clips in panel b are the same as those used by Sutherland and Mather (2012)

Methods

Participants

Thirty-nine participants (26 female), aged 19–30 (mean = 23.5 years, SD = 2.7) participated in Experiment 1. All reported normal or corrected-to-normal vision and normal color vision. Participants were compensated with €4 or course credit. They were informed of the rights of human participants, in accordance with the Declaration of Helsinki. The study was reviewed and approved by the Ethics Committee of the Institute of Psychology at Leiden University (CEP number 9501177439).

Stimuli and apparatus

The experiment ran on Windows 7 desktop computers. Stimulus presentation and data collection were conducted in MATLAB, using the Psychophysics Toolbox (Brainard, 1997; Kleiner, Brainard, & Pelli, 2007; Pelli, 1997). Stimuli were presented on a white background (r = 255, g = 255, b = 255) on a CRT monitor set at a refresh rate of 100 Hz. Viewing distance was approximately 60 cm. The target stimuli were capitalized letters in Arial Bold typeface, subtending 3.4° of visual angle (height). The letter displays were adopted from Sutherland and Mather (2012). There were eight target letters on each trial, presented against a white background (Fig. 2): three high-contrast, or dark, letters (r = 102, g = 102, b = 102) and five low-contrast, or light gray, letters (r = 204, g = 204, b = 204). Observers were instructed to report as many letters as possible, regardless of stimulus contrast. Observers performed ten practice trials before performing the experimental task. The letters were presented in a circular array with a radius of 9°, measured from the center of the fixation cross to the center of each letter. On each trial eight letters were chosen at random from the English alphabet (excluding the letter I).

Fig. 2
figure 2

The progression of a trial in Experiment 1, which was similar to that of Sutherland and Mather (2012). See Methods for details

Arousal was manipulated by presenting highly arousing negative pictures and neutral pictures from the International Affective Picture System (Lang et al., 2008). The arousing pictures had a very low valence rating and high arousal rating (Lang et al., 2008), while the neutral pictures had a medium valence rating (neither positive nor negative) and a low arousal rating. Each category consisted of 36 pictures (see Table 1), each of which was shown once during a full experimental session, which consisted of 72 trials.

Table 1 Identification numbers of all IAPS pictures and IADS sounds used in the current experiments.

To make sure that there were no major physical differences between the two picture categories, we calculated perceived luminance of each pixel by applying the equation: Li= .213 * Ri+ .715 * Gi+ .072 * Bi, where Li is the luminance of a pixel i, and Ri, Gi and Bi represent the normalized input to each color channel for that pixel of the display. The parameters of the equation emulate the sensitivity of human vision to light of different wavelengths (Rec. 709 standard), and yield a luminance value as a ratio of full luminance of a computer monitor. The average luminance of each picture was calculated by aggregating the luminance value of each pixel. Average luminance was then compared between the arousing and neutral picture categories. Furthermore, we computed the root mean square error of the pixel luminance to get a coarse measure of picture contrast, and compared between picture categories. Welch-corrected independent-samples t-tests did not reveal any differences in perceived luminance (t(70) = -.17, p = .86) or in root mean square contrast (t(70) = -1.29, p = .20).

Pictures, subtending 15.2° by 11.9° of visual angle, were presented in the center of the screen, and subsequently masked by a visual noise mask of the same size, made up of a matrix of 256 × 256 achromatic pixels of random luminance (see Fig. 2).

Procedure

Observers were instructed about the task upon arrival in the lab. The procedure of the experiment was explained by an illustrative figure, similar to Fig. 2. Observers were told that their task was to report as many target letters as possible, and to ignore the distractors. Then they performed 20 practice trials on which no IAPS picture was presented.

A typical trial procedure is illustrated in Fig. 2. Each trial began with a fixation cross in the center of the monitor for 4 s, followed by the presentation of an IAPS picture. The picture presentation was terminated after 1 s by a visual noise mask (500 ms). Thereafter, the fixation cross appeared again for an interval of varying duration (250–2,500 ms). The combined durations of the noise mask and subsequent fixation period matched that of Sutherland and Mather (2012), and were deemed appropriate to minimize spillover of arousal effects across subsequent trials. Indeed, several papers have reported clear behavioral (Demanet, Liefooghe, & Verbruggen, 2011; Kristjánsson, Óladóttir, & Most, 2013; Verbruggen & De Houwer, 2007) and EEG effects (Bradley & Lang, 2000; Brown & Cavanagh, 2017; Hajcak & Nieuwenhuis, 2006) of arousal in experimental designs with intermixed neutral and arousing IAPS pictures, and inter-trial intervals that were shorter than in the current study. Finally, the letter display appeared for 200 ms, followed by a “Recall Now” display. This message prompted the participant to type in the identities of all remembered target letters. When the participants had exhausted their remembered letters, they initiated the next trial by pressing the spacebar.

Data analysis

To avoid the problem of interdependence between the numbers of high and low contrast letters reported from the same limited capacity memory store, we defined a novel primary dependent measure. This was the ratio between the proportion of correctly reported low-contrast targets and proportion of correctly reported high-contrast targets. This yields the attentional weight of a low-contrast target, relative to that of a high-contrast target, which is how it will be referred to henceforth. Using this measure, we ensured that inferential statistics were performed on a single summary measure.

The primary analyses were Bayesian paired-samples t-tests (Rouder, Speckman, Sun, Morey, & Iverson, 2009) performed using JASP (JASP team, 2018), using the default Cauchy prior width (.707). Robustness analyses were performed using JASP and the BayesFactor package for R (version 0.9.12; Morey & Rouder, 2015). The aim in each experiment was to keep on testing participants until we had strong evidence for or against arousal-biased competition. Strong evidence is the descriptor of a Bayes factor of 10 or higher (suggested by Jeffreys, 1961), meaning that the collected data is ten times more likely under the HA than H0, or vice versa. A key strength of the Bayesian approach is that it enables us to quantify evidence for both the alternative and the null hypothesis (for more information on Bayesian alternatives to frequentist statistics, see Jarosz & Wiley, 2014; Morey & Rouder, 2011; Rouder et al., 2009; Verhagen & Wagenmakers, 2014). Furthermore, Bayesian inference is immune to the problems associated with optional stopping of data sampling (e.g., Wagenmakers, 2007), which allows the researcher to monitor the strength of evidence, until it is sufficiently compelling (Rouder, 2014). Therefore, we did not rely on a priori power calculations in the current study, but decided on a criterion for what we consider compelling evidence, for or against the alternative hypothesis: a Bayes factor > 10.

Results

The left panel of Fig. 3 shows the average probabilities of reporting a target as a function of visual contrast (high vs. low) and type of picture (neutral vs. negative). The data presentation is equivalent to that of Fig. 2 in Sutherland and Mather (2012, p. 1370), but the numerical pattern is reversed: under arousal there is a slight decrease in the proportion of correctly reported high-contrast letters, and an increase in low-contrast letters. To circumvent the methodological problems described in the Introduction, we did not perform independent analyses for the high- and low-contrast letters. Instead we examined the relative attentional weight of a low-contrast letter (see Methods).

Fig. 3
figure 3

Results from Experiment 1, shown as proportion correct for low- and high-contrast targets (left), and as the relative weight of low-contrast targets (right). Error bars represent 1 standard error of the mean

A paired-samples t-test comparing the weights of low-contrast letters did not reveal a significant effect of type of picture (negative vs. neutral; t(38) = -1.487, p = .15, d = -.24). Also, the small numerical trend was in the opposite direction of arousal-biased competition, meaning that, if anything, presenting a negative arousing picture increased, rather than reduced, the weight of the low-contrast targets. A Bayesian paired-samples t-test classified the evidence against the alternative hypothesis as strong, with a Bayes factor of 13.4 for H0, relatively to HA.

To make sure that these results could not be attributed to the larger number of trials in our study (72, compared to 40 in Sutherland & Mather, 2012), we re-analyzed the data after excluding trials 41–72. This led to the same outcome (BF01=13.4 for H0).

Experiment 2

Experiment 1 failed to reveal evidence for ABC theory (Mather & Sutherland, 2011), despite being a close replication of the original study of Sutherland and Mather (2012), with the exception of the arousing stimuli used (pictures vs. sounds). In Experiment 2, we ran an almost exact replication of the experiments presented by Sutherland and Mather (2012; Experiments 1 and 2, condition 1), with the same arousing stimuli (IADS sound clips), letter displays and number of trials, in an attempt to confirm or exclude an important role for the type of arousing stimuli.

Methods

Participants

Forty-one observers (32 female), 19–31 years old (mean = 23.7 years, SD = 3.1), participated in the experiment. Participants from Experiment 1 were excluded from participation. All reported normal or corrected-to-normal vision and normal color vision. Participants were compensated with €4 or course credit. One participant dropped out due to discomfort.

Stimuli and apparatus

Stimuli and apparatus were identical to Experiment 1, with the following exceptions. Sound stimuli were presented with Sennheiser HD202 headphones. Each IADS sound clip was 6 s long, compared to the 1-s presentation of IAPS pictures in the previous experiments. Figure 2 describes a typical trial in Experiment 2, if the picture and noise mask is replaced by a fixation cross and 6-s long sound clip. We used the same 40 sounds as Sutherland and Mather (2012). The sound numbers are listed in Table 1. Figure 1b shows the arousal and pleasure ratings of each sound (collected by Bradley & Lang, 2007). The sounds that formed the neutral and negative categories did not differ significantly in terms of mean peak amplitude (-3.3 and -3.6 dB, respectively, t(38) = -.857, p = .397), but did differ in terms of average RMS power (t(38) = - 3.149, p = .003; a full list of sound properties is provided in Table 4 of Bradley & Lang, 2007).

Results

The left panel of Fig. 4 shows the average proportion of correct low- and high-contrast targets. As in the previous experiment, the numerical pattern of results was in opposition to that in Sutherland and Mather’s (2012) experiments, in that high-contrast letters were slightly less likely to be reported following negative arousing sounds than following neutral sounds. The weight of low-contrast distractors did not significantly differ between sound types (t(39) = -.919, p = .36, d = -.15). A Bayesian paired-samples t-test yielded a Bayes factor of 10.5 in support of H0: the data were 10.5 times more likely under the hypothesis that arousal had no effect on the competition between high- and low-priority letters, than under the hypothesis of an arousal-related increased bias for selecting high-contrast letters.

Fig. 4
figure 4

Results from Experiment 2, shown as proportion correct for high- and low-contrast letters (left), and as the relative weight of low-contrast targets (right). Error bars represent 1 standard error of the mean

Experiments 3A and 3B: Manipulation checks

Experiments 1 and 2 did not reveal evidence for arousal-biased competition. Although our experimental designs were highly similar to those of Sutherland and Mather (2012, 2015), it is possible that our arousal manipulations were not effective in our study population. To exclude this possibility, we re-ran Experiments 1 and 2, with the addition of electroencephalography (EEG), to check whether the arousal manipulation worked. If the picture and sound stimuli successfully modulated arousal in our participants, we expected to find modulations of the electrophysiological LPP component (Brown et al., 2012; Cuthbert et al., 2000). Furthermore, we asked all participants to provide subjective arousal and valence ratings of each picture and sound.

Methods

Experiments 3A and B were identical to Experiments 1 and 2, respectively, with the following exceptions.

Participants

Thirty-two volunteers participated in the experiments. In Experiment 3A, the mean age of 16 observers (14 female) was 22.2 years (SD = 2.9). In Experiment 3B, the mean age of 16 observers (11 female) was 21 years (SD = 1.8). Participants were compensated with course credits or a cash payment (€7.5).

EEG acquisition and analysis

Electroencephalographic measurements were performed using a 64-channel BioSemi recording system. Recordings were limited to 17 scalp channels (F3, Fz, F4, C3, Cz, C4, CP1, CPZ, CP2, P7, P3, Pz, P4, P8, O1, Oz, O2), and the left and right mastoids. Eye movements were recorded with two pairs of bipolar electrodes on the observer’s right eye (HEOG, VEOG). Offline processing was performed with the EEGlab toolbox (Delorme & Makeig, 2004) for MATLAB. The data were re-referenced to the mastoid electrodes. The data were filtered using a band-pass filter (cut-offs: 0.1 Hz and 30 Hz), visually inspected for the presence of clear artifacts, and subjected to an independent components’ analysis. Components were rejected and a second low-pass filter was applied (cut-off: 8 Hz). Data cleaning procedures led to an average of 1.1 trial reduction in Experiment 3A (IAPS; range: 0–6 trials), and 1.6 trials (range: 0–6) in Experiment 3B (IADS). Event-related potentials (ERPs) were calculated from baseline-corrected data, split into epochs locked to the onset of the picture or sound stimuli. Trial averaging was performed separately for each stimulus category (neutral or negative) and each of the centroparietal electrodes (CP1, CPz, and CP2), where the LPP is known to be prominent (e.g., Ásgeirsson & Nieuwenhuis, 2017; Brown et al., 2012; Hajcak, Dunning, & Foti, 2009; Schupp et al., 2000).

Subjective ratings of stimuli

After the experimental session, the observer was asked to rate each picture or sound in terms of arousal and valence on scales from 1–9. They typed the rating numbers using the numbers pad of a regular keyboard. IAPS pictures were visible until arousal and valence ratings were performed. A silhouette of a loudspeaker was visible while each IADS sound was played. Ratings of sounds were performed immediately after each sound clip ended.

Results

Behavioral data

Data were analyzed with the same methods as in Experiments 1 and 2. For Experiment 3A, a paired-samples t-test revealed a significant difference in the weight of a low-contrast letter between the negative and neutral conditions, but in the opposite direction of what is predicted by the ABC theory, t(15) = -2.92, p = .011, d = -.73 (Fig. 5). A paired-samples Bayesian t-test yielded a Bayes factor of 11.7 against the hypothesis that arousal biased attention towards high-contrast letters. For Experiment 3B, the t-test did not reveal a difference between conditions, t(15) = -.42, p = .68, d = -.10, and the Bayes factor suggested medium-strength evidence for the null hypothesis (BF01 = 5.2).

Fig. 5
figure 5

Behavioral results from Experiments 3A and B, shown as proportions of correctly reported high- and low-contrast letters (panels a and c), and as mean weights of a low-contrast letters (panels b and d), aggregated across all participants

EEG measurements

ERPs were averaged across the three centroparietal electrodes: CP1, CPz, and CP2, separately for each observer and stimulus category. For Experiment 3A, a t-test on the averaged data in the time window between 400 and 600 ms following picture onset revealed a significant arousal modulation, t(15) = 5.13, p < .001, d = 1.28 (Fig. 6a). The time window was chosen based on previous studies of the LPP, which is usually near its maximum in this period (Codispoti, Ferrari, & Bradley, 2007; Hajcak et al., 2009; Schupp et al., 2000). This suggests that negative pictures increased cortical arousal.

Fig. 6
figure 6

The event-related modulations of centroparietal brain activity by IAPS pictures (a) and IADS sounds (b). The time windows chosen for statistical analysis are highlighted by gray backgrounds

An analogous analysis was performed for the ERPs obtained in Experiment 3B. However, this analysis was somewhat exploratory, since there is little literature on arousal modulations of evoked potentials by IADS sounds. The time window chosen for the analysis was 800–1,200 ms after stimulus onset. The exploratory results mirrored those of the IAPS experiment, albeit with a smaller effect size, t(15) = 2.84, p = .012, d = .71.

Subjective ratings

To test whether the participants found the pictures in the pre-defined negative stimulus category to be more arousing than pictures in the neutral category, the ratings of each participant were averaged across the two stimulus categories. A paired-samples t-test confirmed that IAPS pictures in the negative category were considered more arousing than those in the neutral category, t(15) = 12.47, p < .001, d = 4.4. An analogous analysis of IADS ratings also confirmed that negative sounds were rated as more arousing than sounds from the neutral category, t(15) = 6.6, p < .001, d = 1.0.

Average subjective ratings were also compared to the means of norm ratings provided by Lang et al. (2008) and Bradley and Lang (2007) for the IAPS pictures and IADS sounds, respectively (Fig. 7). IAPS ratings were very consistent with norm ratings: Arousal ratings correlated highly with the arousal ratings of the norm population, r(70) = .979, p < .001, and so did valence ratings, r(70) = .981, p < .001. IADS ratings were also highly correlated with norm ratings on the arousal, r(38) = .859, p < .001, and valence, r(38) = .879, p < .001, dimensions.

Fig. 7
figure 7

The average ratings made by observers in Experiments 3A and B. Pre-defined categories are represented by circles (neutral stimuli) and diamonds (negative arousing stimuli). Norm ratings are included for comparisons (triangles; Bradley & Lang, 2007, Lang et al., 2008)

Discussion

Experiment 3A demonstrated that arousing IAPS pictures evoked a robust LPP modulation in observers performing the same task as that used in Experiment 1. In addition, post-test ratings of picture stimuli confirmed that the observers found pictures in the negative category much more arousing than pictures in the neutral category. Despite the large physiological arousal response and subjective reports, participants’ behavior was opposite to the predictions of arousal-biased competition: Exposing observers to arousing pictures made them more likely to report low-contrast letters.

In Experiment 3B arousing IADS sounds evoked a significant LPP modulation that was less robust and delayed in time compared to the LPP modulation in Experiment 3A. Subjective ratings also confirmed that negative sounds were found to be more arousing than neutral sounds, but the categorical delineation was not as clear as it was for pictures. The reduced LPP modulation was to be expected, since interpretation of some IADS sounds requires integration of information over time, which would necessarily delay arousal responses on some trials. Furthermore, both norm ratings (Bradley & Lang, 2007) and the self-ratings obtained here suggest that IADS sounds are not as reliably arousing as IAPS pictures (which motivated Experiment 1). Yet, the overall conclusion is that the arousal manipulations were successful in Experiments 3A and B, and therefore presumably also in Experiments 1 and 2, which used procedures identical to the control experiments.

A re-analysis of the data from three studies of Sutherland and Mather

In the Introduction of the present paper, we discussed a problem with the way data were analyzed in the original study of Sutherland and Mather (2012). Because we used a different analysis procedure, we wanted to assess the data of Sutherland and Mather using the same procedure. Therefore, we obtained the publicly available data from three studies by Sutherland and Mather (2012, 2015, 2018), calculated weights of low-contrast distractors, and ran Bayesian t-tests.

First, we used data from the original publication (Sutherland & Mather, 2012, pooled data from Experiments 1 and 2, condition 1), collected from 110 participants (18–29 years old) in an experiment identical to Experiment 2 in the present report. The purpose of the analysis was to quantify the evidence for or against arousal-biased competition. A Bayesian t-test on the weights of low-contrast stimuli revealed strong evidence for arousal-biased competition (BF10=12.0 for HA; weight difference = .054).

Then, we ran the same analysis on the data of 55 participants from an ageing population (61–80 years old; Sutherland & Mather, 2015). The data from one participant were discarded due to extreme weights, which were more than 7 SDs from the group mean in one of the conditions. Although the data show a numerical trend towards the direction predicted by the ABC theory, the Bayesian paired-samples t-test on the remaining 54 participants did not yield much evidence for arousal-biased competition in the ageing group. In fact, the Bayes factor revealed that the data were about twice as likely under H0 (no arousal-biased competition) than under HA (BF01=1.9; weight difference = .018).

Finally, we re-analyzed data from 55 young and healthy participants (18–29 years old) reported in Sutherland and Mather (2018), who were subjected to the same task procedure as in the studies mentioned above, except that the design included both positive and negative arousing sounds. When Sutherland and Mather compared the trials with arousing and neutral pictures, categorized based on the published arousal norm ratings for the included sounds, they found no effect of arousal level on the bias toward reporting salient, high-contrast letters. However, when they conducted analyses based on the participants’ own (post hoc) arousal ratings of each sound, they found that sounds with higher self-rated arousal ratings were consistently associated with greater bias towards reporting high-contrast letters. In our re-analysis based on pre-defined picture categories, we found a Bayes factor of 5.5 in support of the null hypothesis (i.e., against arousal-biased competition). A re-analysis based on the self-rated valence and arousal ratings yielded a Bayes factor of 4.9 in support of the ABC hypothesis. These Bayesian comparisons of the weights of low-contrast distractors thus mirror the results of the frequentist statistical results of Sutherland and Mather (2018).

Bayes factor robustness check

The reported evidence against a reliable arousal-linked bias towards high-priority stimuli is based on Bayes factors, obtained from the Bayesian analog to paired-samples t-tests (Rouder et al., 2009). This method of analysis demands an assumption about the prior distribution of effect sizes, which is, in most cases, not known. In the Results sections of the three experiments, we presented Bayes factors obtained under the assumption of a neutral Cauchy prior of width .707. This assumption may be disputed, but due to the scarcity of data on arousal-biased attention in brief multi-element displays, it is difficult to propose a clearly superior alternative. Here, we assess the robustness of the current analyses, by using uninformed, and informed priors.

First, we estimated the evidence for the null hypothesis over a large range of priors. This was done by obtaining Bayes factors for the full range of prior widths, from .1 to 1.5 (Fig. 8). For the sake of simplicity, we focus on three prior widths, which we categorically name narrow, medium, and wide priors. This analysis revealed Bayes factors > 5.5 in both experiments under the assumption of a narrow prior, while Bayes factors under the assumption of medium and wide priors were all > 10 (Table 2). The analysis shows that, under a wide range of plausible assumptions about the distributions of effect sizes, the evidence lends moderate to strong support in favor of the null hypothesis.

Fig. 8
figure 8

Bayes factors for Experiments 1 and 2, estimated over a range of Cauchy prior widths (.1 – 1.5). Vertical lines mark categorically defined priors: narrow, medium, and wide

Table 2 Bayes factors (BF01) under three different prior widths

Second, we performed a final analysis of the data, using a prior based on the posterior distribution obtained from the re-analysis of Sutherland and Mather’s (2012) data, and a method adopted from Verhagen and Wagenmakers (2014). Hitherto, our Bayesian analyses compared the effect sizes of sampled data to a prior distribution of effect sizes with a maximum density around 0. Conversely, the method employed here asks whether the current effect size (i.e., from Experiments 1 and 2) is similar to what was found before (Sutherland & Mather, 2012) or whether it is absent (equality of effect size Bayes factor test; Verhagen & Wagenmakers, 2014, p. 1464). We pitted the null hypothesis: the effects sizes are equal, against the alternative hypothesis: the true effect size is smaller than the effect size obtained by Sutherland and Mather (2012). The Bayesian t-test of the original effects size (Sutherland & Mather, 2012), after conversion to weight ratios (see Experiment 1, Methods) yielded a posterior distribution with a median of .386, and 95% confidence intervals of .117 – .655. This posterior distribution was approximated by using a normal distribution with a mean of .386, and standard deviation of .137. When Bayesian t-tests were repeated, using the informed prior distribution, the resulting inverse Bayes factors were 37.9, and 22.9, for Experiments 1 and 2, respectively. These results present strong support for the hypothesis that the true effect sizes are smaller than the effect size obtained by Sutherland and Mather (2012).

General discussion

The goal of the present study was to test the ABC theory (Mather & Sutherland, 2011; Mather et al., 2016) by examining the effects of transient changes in arousal on bottom-up attentional biases and visual short-term memory. In two experiments we examined if arousal boosts the competition between physically salient, high-priority stimuli and less salient, low-priority stimuli, using arousing and neutral pictures (Experiment 1) and sounds (Experiment 2) to manipulate arousal. Using Bayesian statistics, we found strong evidence for the null hypothesis in both experiments: increased arousal did not boost the bottom-up bias towards reporting high-contrast letters. So, we were unable to replicate the “winner-take-more/loser-take-less” signature of arousal-biased competition.

We used the same task as used in Sutherland and Mather (2012): report as many letters as possible, regardless of stimulus contrast. In Experiment 1 we used negative IAPS pictures that according to norm ratings were more reliably arousing, and neutral pictures that were less arousing than the arousing and neutral IADS sound clips used by Sutherland and Mather. Experiment 2 was an almost exact replication of Sutherland and Mather’s experiments, using the same sound clips as arousing and neutral stimuli. Therefore, we were surprised by the null results in both experiments. Indeed, when we reanalyzed the original data from Sutherland and Mather’s (2012, 2015) experiments using our composite measure of “winner-take-more/loser-take-less” effects, Bayesian statistics indicated strong evidence for arousal-biased competition in the younger population, while results were not diagnostic in the older population.

What might account for these large discrepancies? One possibility that we considered is experimenter bias: a researcher's cognitive bias may cause him to subconsciously influence the participants of an experiment. Given the computerized instructions and little interaction between the experimenters and our participants, we find it hard to imagine that the experimenters influenced the study outcome. Furthermore, Experiment 1 was carried out at a time when the experimenters still expected to get results consistent with arousal-biased competition. So the results from those experiments are unlikely to reflect the experimenters’ expectations.

Another subtle difference in task design or study context between our experiments and those of Sutherland and Mather’s (2012) concerns the task instructions. We forgot to include one of their instructions: “due to the difficulty of the task it was emphasized that participants should be less concerned with avoiding errors and more concerned with recalling letters” (p. 1368). However, this does not seem to have affected the way our participants approached the task; the average number of correctly reported letters in our Experiment 2 was 3.75, very similar to the average number of 3.85 in the original experiments.

Another potential cause of the discrepancy might be systematic differences between the participant samples. Although two control experiments confirmed that our arousal manipulations were successful, it is hard to exclude the possibility that our sample of participants – most or all were students at Leiden University – was less aroused by these manipulations than the participants tested by Mather and colleagues, who do not specify the population from which participants were sampled. For example, differences in the way research groups advertise a study, and mention the use of disturbing auditory sound clips, might cause a selection bias, for example by affecting the probability that individuals high and low in harm avoidance sign up for the study; and traits such as harm avoidance can affect the impact of task-irrelevant emotional stimuli on the performance of cognitive tasks (e.g., Most, Chun, Widders, & Zald, 2005). Future studies on arousal-biased competition should chart relevant personality variables of the participants and examine if these can explain individual differences in the presence and degree of arousal-biased competition.

In line with our current findings, we recently found that pre-trial arousal neither enhanced nor impaired selectivity in a focused visuospatial attention paradigm (Ásgeirsson & Nieuwenhuis, 2017), suggesting that arousal does not amplify the effects of top-down biases. Yet, we believe that there is substantial evidence that arousal level modulates the strength of pre-existing biases, including those associated with stimulus salience (Mather et al., 2016), attentional predisposition (Eldar et al., 2013, 2016), cognitive control (Warren, Murphy, & Nieuwenhuis, 2016), expectation (Hockey, 1970; Smith, 1985), as well as a number of more subtle biases (de Gee et al., 2017; Urai, Braun, & Donner, 2017). However, the exact circumstances under which these effects of arousal occur, or the direction of these effects, are still poorly understood. Future studies should include subjective or physiological trial-to-trial arousal measurements, to examine arousal-biased competition on a finer time scale and taking into account item-related arousal differences (Sutherland & Mather, 2018). Future research should also aim to delineate which effects of arousal on attentional selectivity are related to arousal per se (i.e., regardless of valence; Sutherland & Mather, 2018) and which are limited to negative valence (Gable & Harmon-Jones, 2010; van Steenbergen, Band, & Hommel, 2011). While these factors were confounded in the current study, a recent study made a first attempt at dissociating these factors, using the same task but including positive in addition to negative and neutral IADS sounds (Sutherland & Mather, 2018). Importantly, the impact of stimulus salience on task performance was modulated by arousal, but not by valence ratings. Finally, a computationally tractable modelling framework will be necessary to generate and test more precise predictions about arousal-biased competition (Warren et al., 2016).

Data are available at https://doi.org/10.6084/m9.figshare.6510302.v3