Vision presents a constant flood of information to the brain, but only a small fraction of the items in the visual field merit further processing, either because they are distinct from other items or because they are likely to help observers meet their long-term goals (Connor, Egeth, & Yantis, 2004). Visual search is a widely used experimental technique for studying how visual attention selects items of interest from all other visible items. Prominent models of visual search, such as Guided Search (Wolfe, 2007) and FeatureGate (Cave, Kim, Bichot, & Sobel, 2005), compare each item’s visual features to its neighbors’ features and to a set of target features. Activation is then conferred on each item in proportion to the difference between its own features and its neighbors’ features, and to the similarity between its own features and the target features. These two kinds of activation represent the outputs from bottom-up and top-down attentional processing, respectively.

Bottom-up processing operates on a visual stimulus presented to the eyes, whereas top-down processing represents perceptual judgments and other mechanisms that lie outside of visual perception per se (Firestone & Scholl, 2014). By clearly distinguishing between bottom-up and top-down processing, models of visual search implicitly acknowledge the classic perception–cognition divide, which asserts that perception is cognitively impenetrable (Pylyshyn, 1999). The visual features that are subjected to bottom-up processing include the color, form, and motion of display items, rather than any higher-level meaning attached to those items (Wolfe & Horowitz, 2004). For example, “2” and “9” have distinct shapes, but also represent distinct numerical quantities. Many visual searches that were initially adduced as evidence that alphanumeric characters’ semantic associations can drive visual search (e.g., Egeth, Atkinson, Gilmore, & Marcus, 1973; Jonides & Gleitman, 1972) could be explained more parsimoniously in terms of shape differences (Duncan, 1983; Krueger, 1984). Indeed, Wolfe and Horowitz (2004) expressed doubt that alphanumeric characters’ semantic associations could be shown to guide search, because manipulating a character’s semantic association typically entails also manipulating its shape.

Recently, researchers have developed various techniques to control for shape differences while manipulating semantic associations in visual search experiments (Godwin, Hout, & Menneer, 2014; Lupyan, 2008; Lupyan & Spivey, 2008; Schwarz & Eiselt, 2012; Sobel, Puri, & Hogan, 2015). These studies overcame the methodological challenge posed by Wolfe and Horowitz (2004), but seem unlikely to violate the cognitive impenetrability of perception. For example, Sobel et al. found that search for targets that are near each other on the number line (i.e., 5 and 6) is faster and more efficient than search for targets that are distant (i.e., 5 and 9), and they argued that proximity on the number line primarily influences top-down, but not bottom-up, processing. In this article, we intend to use the distinction between bottom-up and top-down processing to shed some light on a current debate in the size congruity literature (Arend & Henik, 2015; Santens & Verguts, 2011).

In a traditional size congruity experiment (Besner & Coltheart, 1979), participants view two different numbers and select the numerically larger (or smaller) one. The target also has a different physical size than the other number, so in some trials the target’s numerical and physical sizes are congruent (e.g., a numerically and physically large target, such as 2 ), whereas in other trials the target’s numerical and physical sizes are incongruent (e.g., a numerically large but physically small target, such as 8). In other size congruity experiments, participants select the target on the basis of its physical size (Henik & Tzelgov, 1982). Response times (RTs) are typically faster when the target’s numerical and physical sizes are congruent than when they are incongruent, implying that the processing of numerical and physical size are not completely independent (Santens & Verguts, 2011). Although it seems clear that the processing of numerical and physical sizes must interact, there remains disagreement about the locus in the processing stream at which the interaction occurs.

Two opposing accounts predominate in the size congruity literature (Santens & Verguts, 2011; Schwarz & Heinze, 1998). According to the shared-representation account (Schwarz & Heinze, 1998; Walsh, 2003), numerical and physical sizes are initially mapped onto a single mental construct and remain integrated throughout the entire processing sequence. In contrast, the shared-decision account (Faulkenberry, Cruise, Lavro, & Shaki, 2016; Santens & Verguts, 2011) asserts that numerical and physical sizes are initially mapped onto two distinct mental constructs, and that the processing of numerical and physical sizes proceeds along separate parallel pathways that only interact at the decision level. Recently, Risko, Maloney, and Fugelsang (2013) proposed an alternative account based on attention. They noted that in visual searches, large items capture attention more than do small items (e.g., Proulx, 2010; Proulx & Egeth, 2008), so visual capture by the physically larger number might contribute to the size congruity effect (SCE). Risko et al. manipulated the stimulus onset, reasoning that the first item to appear would have the opportunity to trigger attentional processing before the other item. If the first item to appear were the physically larger item, it should enjoy advantages due to both its temporal onset and physical size, but if the first item to appear were the physically smaller item, it should only have the temporal-onset advantage. Consistent with the authors’ hypothesis, the SCE was larger when the physically larger item appeared first than when the physically smaller item appeared first.

Arend and Henik (2015) questioned the validity of the results in Risko et al. (2013), on the basis of two methodological limitations. In Risko et al., the participants indicated which of two items was numerically larger, but they were never asked to indicate which item was numerically smaller, nor were they ever asked to respond to the items’ physical size. We aimed to build on the findings of Risko et al., to further explore the role of attention in the SCE, while remedying the methodological limitations identified by Arend and Henik. The discovery of attentional effects in a size congruity experiment by Risko et al. implies that the typical size congruity experiment is essentially a visual search task with just two search items. We adapted the size congruity paradigm to the visual search paradigm, and included all four conditions mentioned by Arend and Henik: Participants localized target items that were numerically smaller, numerically larger, physically smaller, or physically larger than the nontarget distractors.

To control for the possibility that our results could be explained by the shared-representation or shared-decision accounts, we extended on a technique developed by Santens and Verguts (2011). In their second experiment, participants responded to a digit’s numerical size in one condition and to its parity (evenness) in another condition. The visual stimuli were the same in both conditions, so only the participants’ decision alternatives were manipulated; that is, a 2 would elicit a “small” response in the numerical size condition, and an “even” response in the parity condition. Santens and Verguts argued that if the same stimuli are used in both conditions, there should be no difference between the resulting representations, so the shared-representation account predicts that no difference in the SCEs should occur between conditions.

To control for both the shared-representation and shared-decision accounts, in our Experiments 1 and 2 participants were exposed to the same stimuli, and they had the same decision alternatives. The visual displays in Experiments 1 and 2 each contained a single target digit that was distinct from the nontarget distractors, due to its unique numerical and physical size. In Experiment 1, participants were instructed to select the item that had a unique numerical size, and in Experiment 2 they were instructed to select the item that had a unique physical size. Because the target in each display was unique both numerically and physically, each display would elicit the same decision, regardless of the experiment in which it appeared. Thus, the representation and decision alternatives were held fixed between Experiments 1 and 2, and the only difference was whether participants attended to numerical size or physical size. In the language of visual search, bottom-up attention was fixed while top-down attention was manipulated.

Although physical-size singletons have been shown to capture attention (Proulx, 2010; Proulx & Egeth, 2008), this seems to be attributable to the combination of bottom-up salience and top-down task settings (Kiss & Eimer, 2011). Thus, we hypothesized that the physical-size singleton target should elicit bottom-up processing in Experiment 1, but in Experiment 2 it should elicit both bottom-up and top-down processing. Accordingly, search should be less efficient (i.e., steeper RTs as a function of display size) in Experiment 1 than in Experiment 2. Once participants reached the target, on congruent trials the target’s numerical and physical sizes should both activate the correct response node, but on incongruent trials the target’s numerical and physical sizes should activate competing response nodes (Faulkenberry et al., 2016; Santens & Verguts, 2011), so in both Experiments 1 and 2 the responses should be faster in congruent than in incongruent trials. The target’s physical size can be directly extracted from its visual appearance, whereas determining the target’s numerical size entails an extra step of connecting its appearance with symbolic associations stored in memory (Lupyan, Thompson-Schill, & Swingley, 2010; Schwarz & Heinze, 1998). Thus, interference from incongruent physical size should engage more quickly than interference from incongruent numerical size, and the SCE should be stronger in Experiment 1 than in Experiment 2. In summary, we hypothesized that SCEs should occur in both Experiments 1 and 2, but steeper slopes and a stronger SCE should occur in Experiment 1 than in Experiment 2.

Experiment 1: search for a numerical size singleton

Method

Participants

We obtained permission from the University of Central Arkansas (UCA) Institutional Review Board to carry out all experiments, and we treated participants in accordance with the ethical guidelines stipulated by the American Psychological Association. In light of recent studies that have revealed an effect of numerical magnitude on visual search (Godwin et al., 2014; Reijnen, Wolfe, & Krummenacher, 2013; Schwarz & Eiselt, 2012; Sobel et al., 2015), we anticipated a similarly large effect of d = 1.25, for which a minimum of 14 participants per group would be needed to achieve 80 % power at an alpha of .05 (Bausell & Li, 2002). A total of 14 UCA undergraduate students (12 female, two male) between the ages of 18 and 35 (mean = 21.1 years) volunteered for the experiment in exchange for course credit.

Apparatus

All experiments were conducted on a MacBook computer connected to a CRT monitor with a screen resolution of 1,024 × 768 pixels. Programs written in Real Studio Basic presented stimulus arrays to the monitor and gathered responses from the keyboard.

Stimuli

To reduce shape differences between the digits, we constructed versions of the digits 2, 3, 8, and 9 from line segments as on the faces of digital clocks and depicted in the screen shots in Fig. 1. All four digits were used in all conditions. At a viewing distance of 56 cm, the physically smaller digits were 0.61° wide × 1.2° tall, and the physically larger digits were 0.92° wide × 1.8° tall. Each visual array contained one target digit and either four, six, or eight distractor digits. The search items (target plus distractors) were distributed evenly around an imaginary circle with a radius of 5.9° that was centered on a fixation cross, consisting of two orthogonal line segments each 1.0° long. The fixation cross and digits were white (Commission Internationale de L’Eclairage [CIE] x/y coordinates of .29/.30, with a luminance of 60 cd/m2) against a black background. The target digit appeared in one of four quadrant locations: upper right, lower right, lower left, or upper left. The participants’ task in each trial was to indicate which side of the display contained the target. To ensure that the position of the target was readily distinguishable from the vertical meridian, targets were always placed at least 30° of arc away from vertical—that is, in terms of a clock face, targets in the upper right quadrant were placed at a randomly determined location between 1 o’clock and 3 o’clock, in the lower right quadrant between 3 o’clock and 5 o’clock, in the lower left quadrant between 7 o’clock and 9 o’clock, and in the upper left quadrant between 9 o’clock and 11 o’clock.

Fig. 1
figure 1

Stimulus arrays containing seven items (one target and six distractors) in each of the four target size conditions in Experiments 1, 2, and 3. The target’s numerical and physical sizes are congruent in the upper left and lower right displays, and incongruent in the lower left and upper right displays

In the numerically small target condition, the target digit was a 2 or 3 and the distractor digits were 8 s and 9 s; in the numerically large target condition, the target digit was an 8 or 9 and the distractor digits were 2 s and 3 s. The target was also physically smaller or larger than the distractors in every display. The two levels of the target’s numerical size and two levels of the target’s physical size were manipulated orthogonally within subjects so that all participants were exposed to four levels of target size: numerically and physically small target (congruent), numerically and physically large target (congruent), numerically small but physically large target (incongruent), and numerically large but physically small target (incongruent).

Procedure

The experiment began with presentation of a series of instructional windows that participants could read at their own pace, and then click a button labeled “Next” to advance to the next window. Participants were informed they would be searching for a number less than 5 in one half of the experiment and greater than 5 in the other half of the experiment; the block order was counterbalanced across participants. Although the instructions explicitly described the numerical size of the target, they did not mention that the target digit would have a unique physical size in all displays.

Each trial began with the onset of the stimulus array, which remained visible until participants responded by pressing either “z,” to report that the target appeared on the left side of the display, or “/,” to report that the target appeared on the right side of the display. The latency between the onset of the stimulus array and the keypress was recorded for each trial. When the response was correct, the stimulus array disappeared, leaving only the fixation cross on the screen for 750 ms, followed by presentation of the stimulus array for the next trial. When participants made an error, a white screen with the word “Incorrect” in the middle appeared for 750 ms, followed by the screen containing just the fixation mark for another 750 ms until the stimulus array for the next trial appeared.

Each participant completed three replications of every combination of target’s numerical size (two levels), target’s physical size (two levels), target quadrant (four levels), target digit (two levels), and display size (three levels), for a total of 288 experimental trials. After completing half of the trials, participants were invited to take a short break and reminded that for the remainder of the experiment the target’s numerical size would switch. Except for blocking of the target’s numerical size, all other variables were randomly intermixed. The first six trials overall and the first six trials after the break were considered practice, so participants carried out a total of 300 (288 experimental + 12 practice) trials, lasting approximately 15 min. Results from the error and practice trials were excluded from the analysis.

Results

For each participant in each of the 12 conditions (3 display sizes × 2 numerical target sizes × 2 physical target sizes), a trimming program removed all RTs that were either greater than the mean plus three standard deviations for that condition or less than 100 ms; a total of 2.0 % of the data points were removed. Error rates (i.e., trials on which participants gave the wrong response and trials removed by the trimming procedure) were submitted to a 3 × 2 × 2 × 2 analysis of variance (ANOVA) with display size, numerical target size, and physical target size as within-subjects variables, and block order (numerically small target first or numerically large target first) as a between-subjects variable. For Experiment 1 and all subsequent experiments, none of the main effects or interactions from the analysis of error rates were significant. Errors were not analyzed further and will not be discussed further. For the remainder of this article we will focus on analyses of RTs.

The mean correct RTs were submitted to a 3 × 2 × 2 × 2 ANOVA with display size, numerical target size, and physical target size as within-subjects variables, and block order as a between-subjects variable. The significant interaction between numerical size and block order, F(1, 12) = 28.4, p < .001, η p 2 = .70, was evidence for a practice effect. Responses were slower for the target’s numerical size presented in the first block, so participants who searched for numerically small targets in the first half of the experiment were slower for numerically small targets than for numerically large targets, and vice versa for participants who searched for numerically large targets in the first half of the experiment. However, the main effect of block order was not significant, and none of the other interactions with Block Order as a factor were significant, so the RTs depicted in Fig. 2 represent means pooled across both levels of block order.

Fig. 2
figure 2

Response times as a function of the display size in Experiment 1. The error bars represent 95 % confidence intervals (Loftus & Masson, 1994). To prevent any error bars from overlapping, some data markers in this figure and later ones have been jittered

As is common in visual search experiments, RTs increased with display size, F(2, 24) = 22.5, p < .001, η p 2 = .65. The mean slope of RTs as a function of the display size was 11.2 ms/item, meaning that search was relatively efficient as compared to other visual searches (Wolfe, 1998), and much more efficient than previous visual searches for digits that were all the same physical size (mean slope = 44.7 ms/item; Sobel et al., 2015). The significant main effect of physical size, F(1, 12) = 8.30, p = .014, η p 2 = .41, indicates that search was faster when the targets were physically larger than the distractors. From examining Fig. 2, this effect appears to be driven at least in part by the RTs for physically large targets, which level off between seven-item displays and nine-item displays. Contrasts confirmed that for physically large targets, RTs were significantly different between five-item and seven-item displays, F(1, 12) = 8.39, p = .013, η p 2 = .41, but were not significantly different between seven-item and nine-item displays, F(1, 12) = 0.157, p = .70, η p 2 = .055. This finding is supported by the significant interaction between physical size and display size, F(2, 24) = 3.94, p = .033, η p 2 = .25. The main effect of numerical size was not significant, but the Numerical Size × Physical Size interaction was, F(1, 12) = 31.5, p < .001, η p 2 = .72, indicating that search was faster when the numerical and physical sizes were congruent than when they were incongruent. Simple-effect analyses confirmed that when the target was physically small, search was significantly faster when the target was numerically small than when it was numerically large, F(1, 12) = 6.97, p = .020, η p 2 = .37, and when the target was physically large, search was significantly faster when the target was numerically large than when it was numerically small, F(1, 12) = 10.6, p = .006, η p 2 = .47. No other interactions were significant.

Discussion

Experiment 1 revealed a SCE for visual search: Search was faster when numerical and physical sizes were congruent than when they were incongruent. The effect of physical size is consistent with previous work showing that larger items capture attention more than do smaller items (Proulx, 2010; Proulx & Egeth, 2008), although finding this effect was surprising, because we expected participants to adopt a top-down strategy to attend to numerical rather than physical size (Kiss & Eimer, 2011). In this context, the interaction between physical size and display size was illuminating. For displays containing five or seven items, participants relied on a relatively inefficient search strategy, but then for displays containing nine items, search was relatively efficient. Why did efficiency increase with display size in the physically large target conditions? Bottom-up salience increases with the density of display items (Bravo & Nakayama, 1992; Sobel, Pickard, & Acklin, 2009; Todd & Kramer, 1994), so perhaps dense (nine-item) displays boosted the bottom-up salience of the target to the point that participants did not need to rely on a top-down setting for physically large targets to capture attention. In Experiments 4 and 5, we explored this possibility by using dense displays.

Because manipulating numerical size typically entails a confounding manipulation of visual features (Wolfe & Horowitz, 2004), experimenters who argue that digits’ numerical sizes influence visual search need to carefully discount alternative explanations (Godwin et al., 2014; Schwarz & Eiselt, 2012; Sobel et al., 2015). The main effect of numerical size was not significant, and thus requires no such consideration, but the significant interaction between numerical and physical size warrants further examination. The numerically small digits (2 and 3) differed in both brightness and shape from the numerically large digits (8 and 9). Because the digits used in Experiment 1 consisted of line segments of equal length, each digit’s brightness was proportional to the number of its constituent line segments. Both “2” and “3” contain five line segments, whereas “8” contains seven and “9” contains six line segments, so for a particular physical size, the numerically large digits are brighter than the numerically small digits. For items presented on a dark background, as in our displays, the effect of brightness on visual search is asymmetrical: Relatively bright items are more salient than relatively dim items (Braun, 1994; Nothdurft, 2006). Thus, the brightness of numerically large digits could explain one of the simple interaction effects (for physically large targets, search was significantly faster for numerically large/relatively bright targets than for numerically small/relatively dim targets), but not the other simple interaction effect (for physically small targets, search was significantly faster for numerically small/relatively dim targets than for numerically large/relatively bright targets). As for differences in shape, although it is possible that participants memorized the targets’ shapes rather than their numerical sizes, this is unlikely to explain the significant interaction between numerical and physical size. Thus, neither brightness nor shape differences are plausible explanations for the interaction between numerical and physical size.

Because we intended to isolate effects originating at the level of attention from those originating at the levels of representation and decision, Experiment 2 involved the same stimuli and decision alternatives as Experiment 1. We hypothesized that instructing participants to attend to the target’s physical size rather than its numerical size should reduce the slope of RTs as a function of display size, and reduce the strength of the interaction between numerical and physical size. To test this, the participants in Experiment 2 were instructed to find the item with the unique physical size.

Experiment 2: search for a physical size singleton

Method

Participants

A total of 14 UCA undergraduate students (ten female, four male) between the ages of 18 and 25 (mean = 20.5 years) volunteered for the experiment in exchange for course credit. None had participated in Experiment 1.

Stimuli and procedure

The visual displays were the same as in Experiment 1, with the only difference being the instructions. All participants were instructed to search for the physically unique (either smaller or larger) item in every display. As in Experiment 1, the numerical size of the target switched halfway through the experiment, so for half of the participants the item with unique physical size was numerically large in the first half of the experiment and numerically small in the second half of the experiment, and vice versa for the other half of the participants. Because the target had a unique physical size for every display, there was no need to change the instructions halfway through the experiment, and all participants received the same instructions. The instructions did not mention that the target also had a unique numerical size in every display, or that the numerical size of the target would change in the second half of the experiment.

Results

The same trimming routine from Experiment 1 removed a total of 2.2 % of the data points. The mean correct RTs were submitted to a 3 × 2 × 2 × 2 ANOVA with display size, numerical size, and physical size as within-subjects variables, and block order as a between-subjects variable. As in Experiment 1, we found a significant interaction between numerical size and block order, F(1, 12) = 4.80, p = .049, η p 2 = .29, but the main effect of block order was not significant, and none of the other interactions with Block Order as a factor were significant, so the RTs depicted in Fig. 3 represent means pooled across both levels of block order. The main effect of display size was not significant, perhaps because the slope of RTs as a function of display size (6.0 ms/item) reflected a more efficient search than the 11-ms/item slope observed in Experiment 1. The significant main effect of physical size, F(1, 12) = 7.76, p = .016, η p 2 = .39, indicates that search was faster when the targets were physically larger than the distractors.

Fig. 3
figure 3

Response times as a function of display size in Experiment 2. The error bars represent 95 % confidence intervals (Loftus & Masson, 1994)

The main effect of numerical size was not significant, but as in Experiment 1, the significant Numerical Size × Physical Size interaction, F(1, 12) = 5.78, p = .033, η p 2 = .33, indicates that search was faster when numerical and physical size were congruent than when they were incongruent. However, although both simple interaction effects had been significant in Experiment 1, neither of the simple interaction effects was significant at either level of physical size in Experiment 2. Furthermore, the effect size of congruence (Numerical Size × Physical Size interaction) was larger in Experiment 1 (η p 2 = .68) than in Experiment 2 (η p 2 = .33). To confirm this, we submitted the mean correct RTs from Experiments 1 and 2 to a five-way ANOVA with experiment as a between-subjects variable. A significant three-way interaction between numerical size, physical size, and experiment, F(1, 24) = 7.49, p = .012, η p 2 = .24, confirmed that the effect of congruence was stronger in Experiment 1 than in Experiment 2. None of the other interactions with Experiment as a factor from the five-way ANOVA, and none of the other effects from the four-way ANOVA, were significant.

Discussion

The significant main effect of the target’s physical size was somewhat surprising in Experiment 1, but not in Experiment 2, because here participants were instructed to attend to physical size. The shallower slopes of RTs as a function of display size in Experiment 2 than in Experiment 1 confirmed our hypothesis that a top-down strategy to attend to physical size enhanced search efficiency. The smaller effect size of the Numerical Size × Physical Size interaction in Experiment 2 than in Experiment 1 confirmed our hypothesis that participants can extract the target’s physical size from the visual stimulus more quickly than they can connect the target’s shape to its numerical size. The correlation of shallow RT slopes with a smaller SCE in Experiment 2 implies that if the RT function were flat, due to an extremely salient target, the effect of congruence might vanish. In the remaining experiments we boosted the salience of the target by giving it a unique color (Exp. 3) and increasing the density of display items (Exps. 4 and 5).

Experiment 3: search for a numerical size and color singleton

Method

Participants

A total of 14 UCA undergraduate students (13 female, one male) between the ages of 19 and 23 (mean = 20.5 years) volunteered for the experiment in exchange for course credit. None had participated in the previous experiments.

Stimuli and procedure

The instructions were the same as in Experiment 1: Participants were asked to search for a number less than 5 in one block and a number greater than 5 in the other block. The visual displays were the same as in previous experiments, with a numerical size and physical size singleton target among white distractors. The only difference was that the target was also a red color singleton (CIE = .61/.33, 32 cd/m2).

Results

The same trimming routine used in the previous experiments removed a total of 2.1 % of the data points. The mean correct RTs were submitted to a 3 × 2 × 2 × 2 ANOVA with display size, numerical size, and physical size as within-subjects variables, and block order as a between-subjects variable. As in previous experiments, we observed a significant interaction between numerical size and block order, F(1, 12) = 10.9, p = .006, η p 2 = .48, but the main effect of block order was not significant, and none of the other interactions with Block Order as a factor were significant, so the RTs depicted in Fig. 4 represent means pooled across both levels of block order. None of the other effects were significant, and the essentially flat RT functions (–0.20 ms/item) suggest that the target popped out from the distractors regardless of the display size (Wolfe, 1998).

Fig. 4
figure 4

Response times as a function of display size in Experiment 3. The error bars represent 95 % confidence intervals (Loftus & Masson, 1994)

Discussion

As expected, when the target was sufficiently salient that it popped out from distractors regardless of display size, the SCE vanished. Of course, this may be because the most salient feature of the target (its color) was not a kind of “size,” thus minimizing the opportunity for interference between physical and numerical aspects of size. In Experiments 4 and 5, rather than manipulating salience on the basis of color, we boosted the salience of the target’s physical size by raising the display density. On the one hand, increasing the salience of the target based on physical size beyond what it was in Experiment 2 might be sufficient to abolish the SCE, as making the color salient did in Experiment 3. On the other hand, if the SCE hinges on some aspect of target size being its most salient attribute, it might reemerge in this case. In Experiment 1, search was faster for displays with nine items than for displays with seven items when the target was physically large. Perhaps the nine-item displays exceeded a density threshold, beyond which search was efficient enough to give rise to flat RTs as a function of display size.

Experiments 4 and 5 were designed to replicate the tasks in Experiments 1 and 2 while boosting bottom-up salience by packing more digits into each display. To create displays containing more than nine digits, in Experiments 4 and 5 each display item consisted of three digits, so even the smallest (five-item) displays contained 15 digits. Although most size congruity experiments have used single digits, Fitousi and Algom (2006) showed that the SCE extends to numbers with more than just one digit. In number comparison tasks, participants do not respond to the overall numerical size of multidigit numbers, but instead decompose the numerals into their constituent digits (Korvorst & Damian, 2008). The tendency for participants to focus on a given placeholder depends on the proportion of trials that rely on that placeholder to make a magnitude judgment (Macizo & Herrera, 2011). That is, comparisons of three-digit numerals with the same hundreds digit (e.g., 247 and 283) require processing of the tens digits to select the larger magnitude, but comparisons of numerals with different hundreds digits (e.g., 247 and 983) can rely on just the leading digits. In Experiments 4 and 5, all of the targets used different hundreds digits than the distractors, thereby encouraging participants to adopt a strategy of focusing just on the leading digit of each numeral.

Experiment 4: search for a three-digit numerical size singleton

Method

Participants

A total of 14 UCA undergraduate students (11 female, three male) between the ages of 18 and 25 (mean = 21.7 years) volunteered for the experiment in exchange for course credit. None had participated in any of the previous experiments.

Stimuli and procedure

The instructions were the same as in Experiment 1, except that the participants were instructed to search for a number less than or greater than 500. All of the targets and distractors were three-digit numerals, arranged on the same imaginary circle used in previous experiments (radius of 5.9°). The first (hundreds) digit was 2 or 3 for numerically small items, and 8 or 9 for numerically large items. The other (tens and units) digits were randomly selected from the range between 0 and 9. The blank space between any two digits in a single numeral was 20 % of the width of each digit, so most of the physically small numerals had a 0.12°-wide blank space between the digits (which were 0.61° wide), and most of the physically large numerals had a 0.18°-wide blank space between the digits (which were 0.92° wide). The exception was for the digit 1, which was just a vertical line centered in the same-sized imaginary rectangle occupied by the other digits, so it had an extra 0.30°-wide blank space for physically small numerals, or a 0.46°-wide blank space for physically large numerals. Screen shots from the four target size conditions are depicted in Fig. 5.

Fig. 5
figure 5

Stimulus arrays containing seven three-digit items (one target and six distractors) in each of the four target size conditions in Experiments 4 and 5. The target’s numerical and physical sizes are congruent in the upper left and lower right displays, and incongruent in the lower left and upper right displays

Results

The same trimming routine from Experiment 1 removed a total of 1.8 % of the data points. The mean correct RTs were submitted to a 3 × 2 × 2 × 2 ANOVA with display size, numerical size, and physical size as within-subjects variables, and block order as a between-subjects variable. As in the previous experiments, we observed a significant interaction between numerical size and block order, F(1, 12) = 20.2, p = .001, η p 2 = .63, but the main effect of block order was not significant, and none of the other interactions with Block Order as a factor were significant, so the RTs depicted in Fig. 6 represent means pooled across both levels of block order.

Fig. 6
figure 6

Response times as a function of display size in Experiment 4. The error bars represent 95 % confidence intervals (Loftus & Masson, 1994)

The main effect of display size was significant, F(2, 24) = 20.2, p = .001, η p 2 = .63. With mean RT slopes of 35.2 ms/item, search was less efficient than for single digits in Experiment 1, and very inefficient relative to typical visual searches (Wolfe, 1998), but comparable to the efficiency of search for digits that are all the same physical size (range: 27–63 ms/item; Sobel et al., 2015). In Experiments 1 and 2, the main effect of physical size was significant but the main effect of numerical size was not. In Experiment 4, this pattern was reversed; the main effect of physical size was not significant, but the main effect of numerical size was, F(1, 12) = 9.26, p = .010, η p 2 = .44. As in Experiments 1 and 2, the significant interaction between numerical size and physical size, F(1, 12) = 28.4, p < .001, η p 2 = .70, reveals that search was faster when the numerical and physical sizes were congruent than when they were incongruent. Simple-effect analysis confirmed that for physically small targets, search was faster when the target was numerically small than when it was numerically large, F(1, 13) = 10.3, p = .007, η p 2 = .44, and for physically large targets, search was faster when the target was numerically large than when it was numerically small, F(1, 13) = 44.0, p < .001, η p 2 = .77.

An unexpected significant three-way interaction between numerical size, physical size, and display size, F(2, 24) = 10.5, p = .006, η p 2 = .47, implied that the two-way interaction between numerical and physical target size differed across the levels of display size. The three-way interaction appeared to be driven primarily by the steeper RT slopes for the incongruent conditions (mean slope = 59.8 ms/item) than for the congruent conditions (mean slope = 10.7 ms/item). To confirm this, the search slopes were submitted to a 2 × 2 ANOVA with numerical size and physical size as within-subjects variables. The interaction between numerical size and physical size was significant, F(1, 13) = 14.0, p = .002, η p 2 = .51, but neither of the main effects of numerical size or physical size was significant.

Discussion

Although Experiments 1 and 2 had both revealed significant effects of physical size, the effect of physical size was not significant in Experiment 4. Perhaps the higher density of display items made both small and large physical sizes salient, so that the physically large targets did not enjoy an advantage over physically small targets. The dense displays in Experiment 4 apparently also boosted the brightness contrast between numerically large/relatively bright and numerically small/relatively dim items enough that the effect of numerical size was significant. Nevertheless, as in Experiment 1, the salience of brightness differences cannot explain the significant interaction between numerical and physical size, although it might explain the different effect sizes for the simple interactions. That is, the simple effect size was greater for physically large targets p 2 = .77), for which the target in the congruent condition was numerically large/relatively bright, than for physically small targets (η p 2 = .44), for which the target in the congruent condition was numerically small/relatively dim.

The three-way interaction between numerical size, physical size, and display size and the steeper slopes for the incongruent conditions than the congruent conditions were surprising. As in Experiments 1 and 2, these results suggest that in the incongruent conditions, the target’s numerical and physical sizes activated competing response nodes in parallel (Faulkenberry et al., 2016; Santens & Verguts, 2011), but in Experiment 4 the distractors’ physical size was more salient, and so had more influence. The target accumulated activation in the correct response node, and over time eventually won the competition against the distractors. As the number of distractors increased, the net activation in the incorrect response nodes increased, and the target required more time to win the competition. In the congruent condition, no such competition was necessary, because the target’s numerical and physical sizes both activated the same response node, and the distractors’ numerical and physical sizes did not activate the incorrect response nodes.

We thought boosting the display density in Experiment 4 might abolish the SCE for participants attending to the target’s numerical size, but this did not occur. Instead, the effect size of the interaction between numerical and physical size was about the same in Experiment 4 (η p 2 = .70) as it had been in Experiment 1 (η p 2 = .72). In Experiment 2, instructing participants to attend to physical size reduced but did not abolish the SCE. For Experiment 5, we hypothesized that with a higher display density, as in Experiment 4, instructing participants to attend to physical size should reduce the SCE more dramatically than in Experiment 2.

Experiment 5: search for a three-digit physical size singleton

Method

Participants

A total of 14 UCA undergraduate students (12 female, two male) between the ages of 18 and 22 (mean = 20.4 years) volunteered for the experiment in exchange for course credit. None had participated in any of the previous experiments.

Stimuli and procedure

The displays were the same as in Experiment 4, and the instructions were the same as in Experiment 2. All participants were instructed to search for the physically unique (either smaller or larger) item in every display. As in Experiment 2, there was no need to change the instructions halfway through the experiment, and all participants received the same instructions. The instructions did not mention that the target also had a unique numerical size in every display, or that the numerical size of the target would change in the second half of the experiment.

Results

The same trimming routine from Experiment 1 removed a total of 1.7 % of the data points. The mean correct RTs were submitted to a 3 × 2 × 2 × 2 ANOVA with display size, numerical size, and physical size as within-subjects variables, and block order as a between-subjects variable. As in previous experiments, we found a significant interaction between numerical size and block order, F(1, 12) = 8.23, p = .014, η p 2 = .41, but the main effect of block order was not significant, and none of the other interactions with Block Order as a factor were significant, so the RTs depicted in Fig. 7 represent means pooled across both levels of block order.

Fig. 7
figure 7

Response times as a function of display size in Experiment 5. The error bars represent 95 % confidence intervals (Loftus & Masson, 1994)

None of the other main effects or interactions were significant. Crucially for our purposes, instructing participants to attend to physical size abolished the Numerical Size × Physical Size interaction. To confirm the effect of the instructions between Experiments 4 and 5, the mean RTs from both experiments were submitted to a five-way ANOVA with experiment as a between-subjects variable. The significant main effect of experiment, F(1, 24) = 15.4, p = .001, η p 2 = .39, indicates that responses were faster in Experiment 5 than in Experiment 4. Besides the main effect of experiment, we also observed a significant interaction between experiment and every effect that had been significant in Experiment 4 but not in Experiment 5, indicating that the change in instructions abolished all of these effects: Display Size × Experiment, F(2, 48) = 20.9, p < .001, η p 2 = .47 (mean slope in Exp. 4 = 35.2 ms/item; mean slope in Exp. 5 = –2.01 ms/item); Numerical Size × Experiment, F(1, 24) = 10.4, p = .004, η p 2 = .30; Numerical Size × Physical Size × Experiment, F(1, 24) = 22.3, p < .001, η p 2 = .48; and Numerical Size × Physical Size × Display Size × Experiment, F(2, 48) = 6.29, p = .004, η p 2 = .21. No other interactions with Experiment as a factor were significant.

Discussion

In Experiment 5, RTs were essentially flat across increasing display sizes, indicating that physical size was sufficiently salient for the target to pop out from the distractors, regardless of display size. The displays and decision alternatives were the same in Experiment 4 as in Experiment 5, and yet a significant SCE emerged in Experiment 4 but not in Experiment 5. As in Experiments 1 and 2, this shows that some SCEs can be explained by differences originating at the level of top-down attention, independent of the shared-representation and shared-decision accounts.

The faster RTs in Experiment 5 than in Experiment 4 present something of a puzzle. Experimental participants, particularly young college students, can generally be expected to do anything they can to fulfill their experimental obligations as quickly and with as little effort as possible. Although the participants in Experiment 4 were instructed to attend to numerical size, if they had instead attended to physical size, they could have been equally accurate (the physical singleton was also the numerical singleton) while completing the experiment more quickly (faster RTs) and efficiently (shallower slopes). Even though the participants in Experiment 3 were also instructed to attend to numerical size, they seemed to be perfectly willing to allow the salient target color to capture their attention. Apparently, the participants in Experiment 3 but not Experiment 4 noticed that the visually salient item was always the target. Participants’ failure to rely on the salience of the target’s physical size in Experiment 4 lends support to the claim that physical size differences require a combination of bottom-up and top-down processing to capture attention (Kiss & Eimer, 2011). However, the results from Experiment 1, in which search was faster for dense (nine-digit) displays than for less dense (seven-digit) displays, suggests that sufficiently salient physical size differences can eliminate the need for top-down attentional settings. Because all of the displays in Experiment 4 were denser (at least 15 digits) than the displays in Experiment 1, the target’s physical size should have been more likely to capture attention. We do not know why the participants in Experiment 4 failed to realize that they could search more quickly and efficiently by attending to physical size, but we look forward to trying to figure out the reason in the future.

General discussion

The SCE arises when experimental participants who select one of two numbers that differ in numerical and physical size are quicker to select the target when its numerical and physical sizes are congruent than when they are incongruent. To explain the SCE, numerical and physical sizes have been presumed to be initially encoded either into a single representation or into separate representations that interact later, at the decision stage. We took a cue from Risko et al. (2013), who revealed a role for attention in the SCE, implying that the typical size congruity experiment is essentially a visual search task with just two search items. To isolate the roles of attention in the SCE from the shared-representation and shared-decision accounts, in Experiments 1 and 2, and again in Experiments 4 and 5, we held the stimuli and decision alternatives fixed while manipulating the kind of size (numerical or physical) to which participants should attend. In other words, we held bottom-up attentional processing fixed while manipulating top-down attentional processing. In all experiments, the target was the single item that had a unique numerical and physical size.

We hypothesized that instructing the participants in Experiment 1 to attend to numerical size would elicit bottom-up processing for physical size, but instructing the participants in Experiment 2 to attend to physical size would elicit both bottom-up and top-down processing. Furthermore, we expected physical size to be processed more quickly than numerical size. The shallower RT functions and smaller SCE in Experiment 2 than in Experiment 1 confirmed our hypotheses. One unexpected result from Experiment 1 was a downturn of RTs for the densest (nine-item) displays when the target was physically larger than the distractors. This downturn suggested that raising the density of the display items should boost the salience of the target’s physical size.

The shallower RT slopes and smaller SCE in Experiment 2 than in Experiment 1 suggested that displays with very salient targets would yield flat RT functions and abolish the SCE. In Experiment 3, the target was a different color than the distractors, and even though participants were instructed to attend to numerical size, the flat RT functions together with the lack of an SCE suggested that participants allowed the unique color to capture their attention.

In Experiments 4 and 5, we boosted the salience of the target’s physical size by packing more digits into the same number of display items. The SCE was about the same in Experiment 4 as it had been in Experiment 1, but the most surprising outcome was significantly steeper RT functions in the incongruent than in the congruent conditions. We argued that the salience of the distractors’ physical size equipped them to compete better with the target in Experiment 4 than in Experiment 1, such that in the incongruent conditions both the target and distractors accumulated activation, whereas in the congruent conditions only the target accumulated activation. As a result, the target required more time to win the competition as the number of distractors increased in the incongruent conditions than in the congruent conditions.

One benefit of extending the size congruity paradigm to the visual search paradigm is apparent from the richness of the data it affords. Whereas size congruity experiments primarily yield different RTs between conditions, at the very least visual search yields search slopes as well as RTs. Furthermore, as has become evident from our data set, seemingly quirky results such as unexpected points of deflection (Exp. 1) and slope differences between conditions (Exp. 4) may generate valuable insights. Another benefit of approaching the SCE from the perspective of visual search is that we can bring new theoretical tools to bear in our effort to further understand the phenomenon, including the concepts of bottom-up and top-down processing. Although we had intended to isolate the roles of attention in the SCE from the shared-representation and shared-decision accounts, we must acknowledge that our results have some bearing on the debate.

The shared-representation and shared-decision accounts

Our results show that models of the SCE need a component representing top-down attention. Neither the shared-representation nor the shared-decision models in Santens and Verguts (2011) explicitly include such a component. However, the late-selection model (analogous to the shared-decision model in Santens & Verguts, 2011) in Schwarz and Heinze (1998) has “subresponse selection” components that seem to represent top-down attention, insofar as they facilitate or inhibit the outputs from the numerical- and physical-size encoding stages. Thus, our results are compatible with the late-selection model in Scharz and Heinze, because it accommodates top-down attention, and would be compatible with the analogous shared-decision model in Santens and Verguts if top-down attentional components were included in the model. A role for top-down attention could not be carved out of the early-selection model, because top-down attention cannot selectively facilitate or inhibit numerical or physical size once they are fused together into a single representation.

Another reason that our results are compatible with the late-selection/shared-decision account is that our effects originating at the level of top-down attention fed forward to influence behavior in the decision stage, as described by Santens and Verguts (2011) and Faulkenberry et al. (2016). Consistent with this view, we hypothesized that the targets and distractors accumulate activation in numerical- and physical-size nodes, and that the item that accumulates the most activation is selected. Furthermore, we argue that when Santens and Verguts manipulated the decision alternatives (e.g., numerical size vs. parity), they also inadvertently manipulated top-down attention. That is, participants who select the numerically small number need to attend to numerical size, and participants who select the even number need to attend to numerical parity.

Conclusions

The effects originating at the level of top-down attention in our experiments are consistent with a late-interaction model (Schwarz & Heinze, 1998) in which numerical and physical size remain separate until after they are submitted to a processing stage that selectively adjusts each channel’s signal strength. The output from top-down attention feeds forward to a shared decision stage in which both kinds of size interact (Santens & Verguts, 2011). The distinction between bottom-up and top-down attention in models of visual search (e.g., Cave et al., 2005; Wolfe, 2007) is incompatible with a shared-representation model of the SCE.

The advantage we found for the shared-decision over the shared-representation model is not just driven by our experimental results and the theoretical harmony between the shared-decision model and models of visual search, but also speaks to a lively debate currently taking place over widely disparate areas of cognitive science. Perception and cognition have traditionally been considered to be separate mental modules, so a definitive demonstration that cognition can penetrate perception would revolutionize our understanding of perception (Firestone & Scholl, 2014). Until the revolution occurs, another reason to remain skeptical of the shared-representation model is that it violates the classic perception–cognition divide. Overall, the present results solidly support a late-interaction, shared-decision model of the SCE, while simultaneously implicating a role for top-down attention. This situates the SCE, previously limited to numerical cognition, within a wider debate about the interplay between perception and cognition.