Salience-Based Selection: Attentional Capture by Distractors Less Salient Than the Target

Michael Zehetleitner; Anja Isabel Koch; Harriet Goschy; Hermann Joseph Müller

doi:10.1371/journal.pone.0052595

Abstract

Current accounts of attentional capture predict the most salient stimulus to be invariably selected first. However, existing salience and visual search models assume noise in the map computation or selection process. Consequently, they predict the first selection to be stochastically dependent on salience, implying that attention could even be captured first by the second most salient (instead of the most salient) stimulus in the field. Yet, capture by less salient distractors has not been reported and salience-based selection accounts claim that the distractor has to be more salient in order to capture attention. We tested this prediction using an empirical and modeling approach of the visual search distractor paradigm. For the empirical part, we manipulated salience of target and distractor parametrically and measured reaction time interference when a distractor was present compared to absent. Reaction time interference was strongly correlated with distractor salience relative to the target. Moreover, even distractors less salient than the target captured attention, as measured by reaction time interference and oculomotor capture. In the modeling part, we simulated first selection in the distractor paradigm using behavioral measures of salience and considering the time course of selection including noise. We were able to replicate the result pattern we obtained in the empirical part. We conclude that each salience value follows a specific selection time distribution and attentional capture occurs when the selection time distributions of target and distractor overlap. Hence, selection is stochastic in nature and attentional capture occurs with a certain probability depending on relative salience.

Citation: Zehetleitner M, Koch AI, Goschy H, Müller HJ (2013) Salience-Based Selection: Attentional Capture by Distractors Less Salient Than the Target. PLoS ONE 8(1): e52595. https://doi.org/10.1371/journal.pone.0052595

Editor: Joy J. Geng, University of California, Davis, United States of America

Received: August 2, 2013; Accepted: November 19, 2013; Published: January 28, 2013

Copyright: © 2013 Zehetleitner et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This research was supported by German Research Foundation (DFG, www.dfg.de/en/) grant EC 142 (Excellence Cluster “Cognition for Technical Systems”), DFG grant MZ-887/3-1, German-Israeli Foundation for Scientific Research and Development (www.gif.org.il) grant 1130-158.4, and a fellowship of the LMU Graduate School of Systemic Neurosciences (www.gsn.lmu.de), GSC 82/1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Visual attention can be allocated in a stimulus-driven (bottom-up) or an observer-guided (top-down) fashion [1], with both sources of control combining to determine which location or object in the field is attended. The process of selection often is investigated in the realm of visual search. In this paradigm, the task is to find a pre-defined target among distractors and (depending on the task) indicate its presence or absence or make another decision based upon its features. Attentional selection in the search process has been subject to a variety of experimental studies [2]–[5] as well as computational models [6]–[10].

A variant of the visual search paradigm that permits attentional selection to be investigated precisely is the visual search distractor paradigm [11], [12]. In this paradigm, a task-relevant target singleton and an irrelevant distractor singleton (both carrying unique features compared to all other stimuli) are surrounded by homogeneous non-target stimuli. An example would be a display containing a predefined target, a grey tilted bar, and a distractor, a colored vertical bar, amongst grey vertical non-target bars. The task is to find the target while ignoring the distractor. Typically, the item with the highest feature contrast is selected first or ‘captures attention’ initially, as evidenced by reaction time (RT) interference (for distractor-present compared to -absent trials) when the distractor is characterized by a higher feature contrast (relative to the non-targets) than the target [3], [11]–[16], but not when it has a lower feature contrast [11]–[13]. On this basis, it has been claimed “that the initial shift of attention [is directed] to the most salient singleton” [3] and “that the bottom-up salience signal of the stimuli in the visual field determines the selection order” [3].

In terms of functional architecture, stimulus-driven selection in visual search is thought to be mediated by an attention-guiding ‘master’ [17], ‘activation’ [6], or ‘salience map’ [18]–[20], which codes the physical distinctiveness of each location in the field in terms of its total feature contrast against the surrounding locations: the more a stimulus differs from those in its surround (e.g. a bar tilted by 45°, as compared to 7°, amongst vertical bars), the stronger its salience signal. A winner-take-all mechanism then selects that location on the salience map for focal-attentional allocation which exhibits the highest level of activation. In terms of the computations involved, existing models assume that after low-level feature extraction, a center-surround algorithm returns contrast images for each feature channel; these feature contrast maps are later combined to form the feature-independent salience map, which serve as the basis for the attentional selection mechanism [18]. Although, in principal, attention is guided to the location with the highest activation, salience models typically assume noise to influence some stage(s) of salience computation [19], [20]. Noisy coding turns selection into a stochastic process: the more salient the target, the higher the probability that it is the first item selected. The assumption of noise influencing attentional guidance is shared by prominent models of visual search [6], [8], [10], [21].

Noise turns computed salience into a random variable with a certain distribution and an expected value. Consequently, these models require a differentiation of the concept of ‘salience’: salience may refer (i) to the expected value of the distribution of salience estimates, which corresponds to the distinctiveness of each item from its surround, as captured by contrast images or image statistics [22]–[24]; or (ii) to the actual outcome of the salience computation process on a given trial, which is subject to variability (due to noise) and can thus deviate from the expected value. To illustrate this differentiation, it is instructive to linken salience-based selection to motion (direction) discrimination treated as a decision process [25]. Discrimination of motion direction within random dot kinematograms is a frequently used paradigm in the modeling of decisions [26]. Typically in this paradigm, some 100 dots are moving within a bounded area (some 3° of visual angle in diameter): a proportion of dots move coherently to either the left or the right, while the remaining dots have random trajectories. The observer's task is to indicate the direction of the coherent motion. The decision model [25] presupposes the existence of motion-sensitive cells whose rate of firing is proportional to the coherence of motion in a specific direction. For the left versus right decisions, the relevant cells are those tuned to leftward and, respectively, rightward motion within their receptive fields. Hence, when a patch of dots is presented with a proportion of dots moving coherently e.g. to the right, signal detection models of this decision assume that the cells of both types exhibit activity, which is noisily distributed around different means. In particular, with rightward coherent motion in a random dot kinematogram, the activity induced in ‘right cells’ would be distributed around a mean value greater than that of the activity induced in ‘left cells’. The higher the proportion of coherently moving dots in the display, the farther apart the means of the two activity distributions are. A decision could be made by drawing one sample of evidence from the ‘left’ unit and one from the ‘right’ unit, choosing that direction which shows a higher level of evidence [27]. Decision models that do not only describe the outcome of decisions (as is the case with signal detection models), but also the distribution of decision times assume that the noisy activity of the motion-sensitive cells is integrated, or accumulated, over time. The output of this accumulation process, the decision variable, is constantly compared against a decision criterion, until the decision is made. That is, the noisy activity of motion detectors (e.g. in MT) is accumulated into a decision variable (presumably in the lateral intraparietal sulcus, LIP), based on which the decision is made.

We propose a similar logic for salience-based selection. Instead of two motion detectors for the two relevant directions in a random-dot motion discrimination task, we posit salience detectors for each location of visual space which are sensitive to feature contrast. These detectors have previously been assumed to be noisy. Instead of a signal detection theory-based decision, such as in Guided Search 2.0 [10], we propose that each detector's activity is accumulated into a decision variable over time. All these decision variables are constantly compared against a criterion, with the first accumulator whose activity reaches the criterion leading to attentional selection of the respective location. Accordingly, this model of selection does not only describe the outcome, but also the time course of selection decisions. That is, salience-based selection, rather than being taken to consist of the two successive steps, namely ‘salience computation’ followed by ‘attentional selection’, is considered as dynamic process in which a noisy signal is accumulated over time that triggers a selection decision.

Thus, as becomes apparent from the above considerations, there are two conceptually different notions of salience. The construct of physical feature contrast, which corresponds to motion coherence in the random dot kinematogram, is represented as sensory data by the activity of salience detectors in the brain (analogous to the activity of motion detectors representing motion coherence). This momentary neural representation is distributed around its mean, that is, it is a noisy signal. Because the expected salience value, that is the mean of the neural salience representation, is not linearly related to physical feature contrast [28], [29], it needs to be estimated. This estimation is the intent of current salience models [22]–[24]. However, relevant for selection on a given trial is the accumulated signal of the neural representation, which is the decision variable. For clarity, in the remainder of the article, we refer to the concept of expected salience value as stimulus salience and the actual or accumulated estimate as selection salience, because the latter is the basis for attentional selection on a given trial. Stimulus salience is related to physical stimulus properties: for instance, a horizontal bar among vertical bars has a higher stimulus salience than a bar tilted by 30°. Solely based on the value of stimulus salience, focal-attentional selection would have to favor the horizontal bar. However, owing to noise in the computation process, the resulting estimates (i.e. selection salience) are distributed around the expected value of stimulus salience. Hence, if the distributions of selection salience for horizontal and 30° orientation contrasts overlap, first selection of the 30° bar is possible in principle: the selection salience of the 30° bar can be higher on a given trial than that of the horizontal bar. Stimulus and selection salience do not usually have to be differentiated in standard visual search (detection) tasks with only one salient target being present – because, despite noise, the stimulus salience distributions of target and non-targets virtually never overlap and the selection salience of a non-target can never be higher than that of the target. However, this differentiation becomes important when two conspicuous stimuli are presented, but only one is task-relevant: if selection salience is higher for the irrelevant (distractor) stimulus, even though its stimulus salience is lower than that of the relevant (target) item, it will nevertheless be attentionally selected first.

Thus, because of the noisy salience computation, in the distractor visual search paradigm, attentional capture would occur when the distractor has a higher selection salience than the target. A distractor can have a higher selection salience if its stimulus salience is higher, equal, or even lower compared to that of the target, depending on the overlap between the distributions of the target's and the distractor's selection salience. Consequently, (i) the occurrence of attentional capture would be proportional to the relative stimulus salience of the target and the distractor and (ii) distractors even less stimulus salient than the target would capture attention in a proportion of trials. This implies that if the proportion of attentional capture events is high, RT interference would be large; and if it is low, interference would be small.

Note, however, that this hypothesis has never been tested directly. Most studies of attentional capture have used only singleton distractors that were more salient than the target [14], [30]–[33], and so cannot address this issue at all. On the other hand, there are a few studies that have contrasted (at most) two stimulus salience conditions [11]–[13], [34]. But even then, one cannot logically make any inferences about the stochastic dependency of selection (order) on stimulus salience (quite apart from the fact that interference effects heavily depend on the sample that is drawn from all possible stimulus salience values, that is the studies with two settings are likely to have contrasted only extreme, low and high, values of stimulus salience). In other words, although salience and visual search models assume noise in the selection process accounting for attentional capture by less stimulus-salient distractors, there is, to our knowledge, as yet no empirical evidence for this assumption. Testing this assumption would require varying the salience of targets and distractor parametrically, rather than (just) dichotomically.

On this background, the present study was designed to test the hypothesis of stochastic dependency between stimulus salience and attentional selection [10], [21], using a combined approach of behavioral evidence and quantitative modelling [18]–[20]. In the behavioral part, we parametrically manipulated the stimulus salience of pop-out targets and pop-out distractors – so as to be able to (i) examine the occurrence of attentional capture across a greater range of stimulus salience values and (ii) determine the quantitative relationship between stimulus salience and attentional selection, that is, selection salience. For achieving these aims, it was necessary to quantify the difference in stimulus salience between targets and distractors – which we did by means of a visual search go/no-go detection task in which each of the pop-out stimuli, whether it served as a target or a distractor in the visual search distractor task, was presented as a single, to-be detected pop-out stimulus (i.e., without an irrelevant pop-out stimulus being present in the display). The detection RTs measured in this task served as estimates for stimulus salience. The difference in stimulus salience between a given target-distractor pair in the visual search distractor task was then quantified in terms of the difference in their associated detection RTs when they were presented alone in the visual search detection task. This procedure permitted us to compare stimulus salience across different dimensions.

Given that noise in the salience computation process turns attentional selection into a stochastic process, we expected (i) RT interference to be dependent on the relative stimulus salience and (ii) even less stimulus-salient distractors (compared to the target) to interfere, that is capture attention, in some proportion of trials. By contrast, if salience is not a random variable, as suggested by some authors [11], [12], or noise is too small to affect attentional selection between two salient stimuli, attentional capture should occur only with distractors more stimulus-salient than the target. In order to verify that RT interference by less salient distractors is indeed caused by attentional capture, we recorded eye movements in an additional experiment with distractors less salient than the target.

As a second step, we computationally modeled the results of the behavioral visual search distractor experiment; specifically, we modeled selection salience in the distractor paradigm based on the stimulus salience parameters estimated from the behavioral data in the detection task (see also [35]). The model we implemented is based on two-stage models of visual search, which assume that stimulus salience is computed spatially in parallel for all items in the display (stage 1) and then focal attention is allocated to the item with the highest selection salience value (stage 2). Note, that our model only describes the first step of this process: the salience-based decision as to what location in space attention should select. The second step, including attentional engagement and stimulus identification, is outside the scope of the present model. The only model that (to our knowledge) has made the distinction between stimulus salience and selection salience explicit is Guided Search [10]. GS assumes that the selection salience value is stochastically related to stimulus salience, that is pre-attentive salience coding for each item in the display is subject to noise, necessitating a signal-detection-type decision [36] as to which item to transfer to the second, focal-attentional processing stage. Signal detection models, in general, account for response proportions, such as those of hits and false alarms, but not for the temporal duration of the underlying decisions. Likewise, GS makes statements only about the proportion of selection decisions directed to the target versus to a non-target, but not the time-course with which the decisions are made. However, pop-out targets can differ in the speed with which they are singled out, that is they can be equivalent in terms of selection proportion (the target is always selected first), but differ in the time it takes until the item is selected. Behaviorally, it has been demonstrated that targets that pop out (i.e., that have flat RT/set-size functions) can differ in detection RTs [37]–[40]. For example, among vertical bars, both a target tilted by 45° and one tilted by 12° pop out, but differ in their associated detection RTs. Töllner, Zehetleitner, Gramann, and Müller [41] demonstrated that such differences in RTs are indeed attributable to differences in selection times: the latency of the so-called N2pc component of the EEG, which is assumed to reflect the transition from pre-attentive to post-selective stimulus processing [42], [43], increased as a function of decreasing stimulus salience of the pop-out target. Given this finding and the notion that a selection decision is based on the accumulated sensory evidence [25], we considered it important to take into account the time course of selection decisions in our model; that is, we simulated the data of the visual search distractor paradigm in a new model of salience-based selection that assumes a time course of selection decisions and thus permits the proportion of capture trials to be predicted for a given salience difference (derived from the respective detection RTs) between target and distractor.

In summary, the present study had two goals, one empirical and one theoretical. Empirically, it was designed to test two central predictions of visual search and salience models: in a distractor paradigm, (i) RT interference should be proportional to the difference in stimulus salience between target and distractor, and (ii) interference should also be observed with distractors less stimulus-salient than the target. Furthermore, assuming that this RT interference is actually caused by attentional capture (rather than some filtering cost [44]) less stimulus-salient distractors should also be found to capture the eyes. Theoretically, the study was intended to computationally model the conceptual distinction between stimulus salience (as estimated by RTs in a search detection task without distractors) and selection salience, the noisy estimate of stimulus salience computed by the pre-attentive visual system. To this end, the data of the behavioral visual search distractor experiment were modeled, based on the behaviorally estimated stimulus salience parameters. The model makes predictions about which item is selected first, rather than about RT interference.

Behavioral Reaction Time Experiment

Methods

Ethics statement.

Participants gave their written informed consent. The study was approved by the ethics committee of the Department of Psychology, LMU Munich, in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki).

Participants.

Fifteen paid (€ 16) volunteers, with a median age of 27 (range 20–50) years, five of them male, all dextral and with visual corrected-to-normal acuity, participated in this study.

Stimulus presentation and data acquisition.

The experiment was conducted in a sound-insulated room, and was controlled by a program purpose-written in C++. Stimuli were presented on a 19″ View Sonic Graphics Series G 90 fB monitor at a resolution of 1,024×768 pixels and a refresh rate of 85 Hz; viewing distance was approximately 57 cm. Participants responded using their left and right index fingers, respectively, to press one of two vertically arranged buttons on a purpose-built response pad. RTs and response accuracy were recorded online.

The display consisted of 39 vertical broken grey bars presented on black background and arranged on three imaginary concentric circles (1.88°, 3.25°, and 4.63° of visual angle in radius, with 8, 12, and 18 bars, respectively) around the center of the screen, which was occupied by another bar. Bars were 0.25°×1.13° in size and had a 0.13°-gap randomly located at the top or the bottom of each bar. Targets differed from non-targets in orientation (7, 8, 9, 14 and 45° tilted from vertical), and distractors differed from non-targets in luminance (13.8, 14.8, 17.9, 19.4, and 25.5 cd/m² for distractors and 5.25 cd/m² for non-targets). A pilot experiment was conducted to ensure that target and distractor salience was sufficient for these stimuli to ‘pop out’ from the search array, that is, their associated detection times were independent of the number of non-targets in the display (see Text S1 and Table S1).

Design and procedure.

Two 1-hr sessions were carried out on consecutive days, at the same time of day. The first part of each session was the distractor experiment; the second part was a post-experiment for stimulus salience measurement (for the latter, see Baseline salience measurement). The within-subject design of the distractor experiment was 2 (distractor present vs. absent)×5 (target salience)×5 (distractor salience) factorial, resulting in 25 salience difference conditions. A target was present on all trials; a distractor occurred randomly in 50% of the trials. Target and distractor were placed randomly at the 12 possible positions on the second circle to keep eccentricity constant. All salience difference conditions were presented in random order within blocks. Participants completed 20 blocks of 50 trials each day, yielding a total of 2,000 trials and 40 trials per salience difference condition.

Each trial started with a white fixation dot (radius = 0.05°) presented for a duration uniformly distributed between 900 and 200 ms, that was superseded by the search display which remained present until response (Figure 1A). Participants were instructed to indicate, as quickly and accurately as possible, the gap location (top or bottom) of the target by pressing the upper or lower button, respectively. In case of an error, visual feedback was provided, followed by an additional 500-ms blank screen before the next trial. At the end of each block, participants were informed about their mean RT and error rate.

Download:

Figure 1. Experimental design and stimuli.

(a) A search display, consisting of 39 broken grey bars arranged around three imaginary concentric circles, was presented in the center of the screen, on a black background. There was always an orientation target; and in half of the trials (randomly determined), there was also a luminance distractor. Each trial started with a white fixation spot that was hidden while the display was presented until response. Inter-stimulus-intervals varied randomly in the range 900±200 ms. While ignoring a bright distractor, participants searched for a tilted target bar and decided, via a speeded button press, whether the gap was located at the top or the bottom of the bar. This response decision required focal attention to be allocated to the target. (b) 25 Salience difference conditions resulted from 5 orientation (7, 8, 9, 14, 45°) and 5 luminance (13.8, 14.8, 17.9, 19.4, and 25.5 cd/m²) contrasts.

https://doi.org/10.1371/journal.pone.0052595.g001

Baseline salience measurement.

Because salience is not linearly related to physical contrast [29], we used a behavioral measurement of salience, which was collected in a post-experiment after each session of the distractor experiment. Stimuli were the same as in this experiment. All target orientation and distractor luminance contrasts from the distractor experiment (Figure 1B) were presented as (to-be-detected) targets randomly intermixed with target-absent displays (as in the distractor experiment, targets never occurred on the outer circle). The design was 2 (target presence vs. absence)×2 (dimension luminance vs. orientation)×5 (contrast) factorial. Dimensions were blocked, contrasts were mixed within blocks. Participants' task was to indicate the presence of an orientation or luminance target via button press; response was to be withheld if no target was present. Four blocks consisting of 80 trials were performed each day, yielding a total of 640 trials and 32 trials per contrast condition. The stimulus display was presented until response or a maximum of 1,200 ms. Error feedback was provided visually, immediately after the false response.

Using these detection RTs as our measure of stimulus salience, we calculated the salience difference between stimuli by subtracting distractor salience from target salience. For example, if a target was detected at a rate of 300 ms and an distractor at a rate of 400 ms, then their salience difference was −100 ms. Note that items of higher salience are associated with shorter RTs; negative salience differences indicate a distractor less salient than the target, and positive differences a distractor more salient than the target. This salience difference measure served as independent variable in the distractor experiment.

Data analysis.

Only correct-response trials were used for analysis (distractor experiment: 96.5%; baseline salience measurement: 99.0%), excluding RTs shorter than 150 and longer than 1,500 ms in the distractor experiment (0.8%) and shorter than 150 and longer than 1,000 ms in the baseline salience measurement (0.2%). The first 20 trials (first 10 trials of the baseline salience measurement) of each session and the first 3 trials of each block served as practice trials and were also excluded from analysis. RT interference was calculated by subtracting mean RTs for target-only trials from mean RTs for target-plus-distractor trials. Statistical data analysis was carried out with R software [45]. Regression analyses were conducted with n = 25 salience difference conditions (aggregated across 15 participants); t-tests for RT interference of less salient distractors were conducted with n = 15 participants.

To test for the dependency of RT interference on relative salience between target and distractor, we used nonlinear least-square estimation for regression function fitting. The nonlinear function followed the form:(1)where a is the asymptote or maximum RT interference, d the salience difference, p the inflection point, and g the growth factor of the function.

Goodness of fit comparison of the regression functions was carried out using Bayes Information Criterion [46], which is calculated according to(2)where L is the maximum likelihood of the data under the regression function, k the number of parameters to be estimated, and n the number of observations. Smaller BIC values indicate a better model fit.

Results and Discussion

We investigated the order of attentional selection in a distractor experiment with a unique, orientation-defined pop-out target present on all trials and a unique, luminance-defined pop-out distractor randomly interspersed in half the trials (Figure 1A; for stimulus pop-out characteristics, see Text S1 and Table S1). Target orientation and distractor luminance were manipulated such that the salience difference between the two items was varied parametrically in 25 steps (Figure 1B). Stimulus salience was estimated in a post-experiment (Baseline salience measurement) in which no distractors were presented and targets could be defined in the orientation or the luminance dimension. The times required to detect these targets served as salience estimates for the stimuli in the distractor experiment (Figure 2). We used the mean salience difference values of all participants to predict RT interference on distractor-present, compared to distractor-absent, trials using nonlinear regression functions. RT interference in this task is commonly attributed to automatic prior selection of the distractor, and absence of interference to direct selection of the target [12].

Download:

Figure 2. Empirical data of the baseline salience measurement and data fitted by the accumulator salience model.

Left panel: five salience levels of orientation targets. Right panel: five salience levels of luminance targets. Symbols depict RT quantiles of each condition as follows: o = .1, Δ = .3, + = .5, × = .7, and ◊ = .9. Lines represent RTs generated by the model. Fitted RTs differ from empirical RTs by 5 ms on average (range: 0 to 28 ms). Additional parameter estimates were T_er = 300 ms, s_er = 70 ms, a = .08, and β = .294.

https://doi.org/10.1371/journal.pone.0052595.g002

Figure 3A presents the observed RT interference (for correct-response trials), averaged across participants (mean RT [± SEM] on distractor-present trials = 660 [±12.9] ms; mean RT interference = 28 [±4.4] ms), for luminance-defined distractors and orientation-defined targets as a function of their salience difference. RT interference was strongly correlated with the salience difference (n = 25; Pearson's r = .91 [t(23) = 10.8, p<.001]), indicative of the order of selection (‘target first’) being dependent on relative object salience. This relationship already exhibits the expected characteristics: (i) the magnitude of interference varies with the salience difference between target and distractor, and (ii) distractors considerably less salient than the target do interfere with search.

Download:

Figure 3. Behavioral interference and modeled proportion of capture as a function of salience difference.

(a) Empirical RT interference, averaged across participants, represents the RT difference, in ms, between distractor-present and distractor-absent trials. Salience difference, averaged across participants, was derived from detection times in the baseline salience measurement requiring a simple target-present vs. target-absent decision (see Methods of Behavioral reaction time experiment). Negative x-values indicate distractors less salient, and positive x-values distractors more salient than the target. Dots represent mean values of RT interference for each salience difference condition (n = 25); arrows indicate the associated standard errors. Red dots indicate significant RT interference by distractors significantly less salient than the target (t-tests: p<.05). Solid curve: regression function curve R₂. (b) Proportion of capture in the distraction experiment was predicted by salience difference, derived from fitting empirical salience difference values. Again, dots represent mean values of RT interference for each salience difference condition (n = 25). The curve depicts the nonlinear relationship according to R₂.

https://doi.org/10.1371/journal.pone.0052595.g003

Next, we fitted two nonlinear regression functions to the data, one with the inflection point free to vary (R₁) and one in which it was fixed to 0 ms salience difference (R₂). We then compared the functions' goodness of fit by examining their Bayes Information Criterion values [46], where smaller BIC values indicate a better fit. Regression function R₁ yielded an asymptote of 73 ms, an inflection point of 7 ms, and a growth factor of 29 ms. For the nonlinear regression function R₂, where the inflection point was set to 0 ms, the RT interference asymptote was estimated to be 67 ms, and the growth factor to be 26 ms salience difference. BIC value comparison confirmed regression function R₂ (with the inflection point set to 0 ms) to fit the data better than R₁ (BIC_R1 = 178 vs. BIC_R2 = 175; see Table 1 for details).

Download:

Table 1. Parameter estimates of the model predictions fitted to empirical and modeled data.

https://doi.org/10.1371/journal.pone.0052595.t001

These results argue in favor of a proportional first selection of the distractor dependent on its salience difference to the target. The function where the inflection point was set to 0 ms indicates that equally salient targets and distractors are equally likely (50%) to be selected first. First-selection probability for a given item then increases as its relative salience increases. The shift of the inflection point into the positive range in regression function R₁ indicates that at the point at which selection probability is equal for both items, the target is actually less salient than the distractor (rather than the two stimuli being equi-salient). This might reflect an influence of top-down control, permitting the target to compensate for this discrepancy in relative salience. However, reconsidering our measure of relative salience, it is possible that target and distractor salience is not the same in the distractor experiment as measured in the baseline salience measurement. There are three possibilities of how they may differ between tasks. First, if a stimulus is presented alone as in the baseline salience measurement, the display is more homogeneous compared to when an additional distractor is presented – in which case salience might be overestimated in the baseline salience measurement relative to the distractor experiment. However, because this would apply to both the target and the distractor, this should not affect relative salience in the distractor experiment. A second reason for diverging relative salience in the distractor experiment derives from the fact that stimulus salience was measured after the distractor experiment. One might argue that assigning the role of target to the orientation dimension (and that of distractor to the luminance dimension) in the distractor experiment induces ‘priming’ for orientation-defined singletons, resulting in an overestimation of target salience and an underestimation of distractor salience in the subsequent baseline salience measurement. The implication is that at 0 ms salience difference, the distractor would actually be more salient than the target and the true point of equal salience would lie in the negative range of salience differences. However, according to Maljkovic and Nakayama [47], priming effects for the orientation dimension, as an aftereffect of having been assigned the target role in the distractor experiment, should dissipate within a few trials in the baseline salience measurement. Third, stimulus salience might be different in the distractor experiment because of top-down weighting [48]–[51]. When both stimuli are presented together, as in the distractor experiment, the weight of the target might be up-modulated and that of the distractor down-modulated. That is, the salience values determined in the baseline salience measurement would be under-estimates for targets and over-estimates for distractors. If this was the case, true equality of salience should be in the positive range of salience differences and the distractor would be even less salient than the target at the point of 0 ms salience difference. To test for the latter two possible types of salience estimation errors, we fitted regression functions with varying inflection points from −10 to 15 ms salience difference and calculated the corresponding BIC's. As figure 4 shows, BIC was lowest for a regression function with the inflection point in the positive range of salience differences. This implies that at 0 ms salience difference, in the distractor experiment, the distractor is still less stimulus-salient than the target and top-down weighting shifts the point of equal salience difference into the positive range. Consequently, our measure of salience difference is rather conservative, that is RT interference by less salient distractors is actually even higher than we have assumed here.

Download:

Figure 4. Course of BIC dependent on the inflection point of the regression function.

Regression functions were fitted according to formula (1), with the inflection point as fixed parameter. Inflection points are specified in ms of salience difference.

https://doi.org/10.1371/journal.pone.0052595.g004

The nonlinear regression function already implies that distractors less salient than the target do interfere with search. To examine RT interference by less salient distractors more closely, we conducted t-tests for all salience differences for which the distractor was significantly less salient (criterion of 0 ms salience difference) than the target. These tests confirmed there are indeed distractors less salient than the target that produced significant RT interference (Figure 3A).

Overall, the findings of RT interference being sigmoidally related to relative salience and of less salient distractors capturing attention, are compatible with visual search and salience models [10], [18]–[24] that assume that the salience coding and, thus, the selection process is subject to internal noise.

Computational Model

A second, theoretical goal of the present study was to develop and test a computational model of how stimulus salience translates into selection salience, that is, a model accounting for the variation in the outcome of the selection process based on stimulus salience – concretely by simulating the data of the distractor paradigm. Importantly, the model we devised makes predictions about the item that is selected first (rather than directly about RT interference) and takes noise and the time course of selection, based on stimulus salience, into account. Selection is assumed to involve a decision between all stimuli in the display and the dynamics of selection processes to be stochastic in nature [10], [19]–[24], with the outcome being dependent on stimulus salience and a noise component.

In more detail, the model assumes that the salience map develops over time probabilistically (Figure 5). Each item in the visual scene is represented by a sensory-evidence accumulator unit, the drift rate of which corresponds to stimulus salience. Accumulation is assumed to be a leaky and noisy process [52]. That is, sensory evidence does not accumulate infinitely, but comes to settle eventually around an asymptotic value (mathematically the proportion of the drift rate to leak). A selection decision is triggered as soon as sensory evidence for a specific location exceeds a threshold. In this model, stimulus salience determines the drift rate with which sensory evidence is accumulated, and selection salience is the accumulating, or accumulated, sensory evidence. In contrast to this dynamic process, which is continuous over time, conventional models of visual salience essentially envisage a snapshot-like topographic representation of the (physical) feature contrasts present in the scene, which serves as the basis for selection decisions: the location of maximum contrast is attentionally selected by a winner-take-all mechanism, the time course of which is usually not modeled explicitly.

Download:

Figure 5. Stochastic model of salience-based selection.

(a) For each location in the visual field, salience is accumulated over time t = {t₁, t₂,…, t_k} by leaky accumulators. Gray jagged lines represent sample paths of sensory evidence accumulation over time, influenced by noise. Mean accumulation behavior is indicated by solid black lines. Salience asymptotes s (s_t = target salience, s_d = distractor salience, s_nt = non-target salience) indicate maximum salience when time is infinite and noise absent; asymptotes correspond to the salience values of map locations computed by deterministic models. (b) Selection time distributions (t = target, d = distractor) indicate selection time variation due to noise. Overlap of these distributions (red area) marks the range within which a distractor may be selected first even if it is less salient than the target. (c) The final salience pattern evolves over time, as illustrated by heat maps at different points in time.

https://doi.org/10.1371/journal.pone.0052595.g005

For simulating the results of the distractor experiment, in a first step, we fitted the model to the empirical baseline salience measurements in order to obtain parameter estimates for stimulus salience; in the next step, these parameters were used to simulate selection salience in terms of the probability of a distractor versus a target being selected first.