People search for visual objects countless times a day. Where are the keys? Where did I put my cell phone? Where is the apple in the fridge? Luckily, search rarely starts from scratch and search targets are often embedded in environments where the relationship among the objects is repeated. The ability to extract such regularities is a key property of our cognitive system, and extensive research has shown that people learn to utilize repetitions to more efficiently process visual scenes (Chun & Turk-Browne, 2008).

One type of visual learning is learning to associate spatial configurations with target positions. This effect, termed contextual cueing (CC; Chun & Jiang, 1998) has been demonstrated dozens of times, and it has been shown that even with little attention people can rapidly learn numerous target–context associations and that this memory is largely immune to interference and decay (Jiang, Song & Rigas, 2005; Mednick, Makovski, Cai, & Jiang, 2009; Rausei, Makovski & Jiang, 2007). CC remains robust across the life span (Merrill, Conners, Roskos, Klinger, & Klinger, 2013), and has been observed in young normally developed children (Dixon, Zelazo, & De Rosa, 2010), in children with autism spectrum conditions (Barnes et al., 2008; Brown, Aczel, Jimenez, Kaufman, & Grant, 2010), and in young adults with intellectual disabilities (Merrill, Conners, Yang, & Weathington, 2014), as well as in non-human primates (Goujon & Fagot, 2013). CC is therefore considered a ubiquitous phenomenon, constantly affecting our behavior, and hence it is an excellent tool for studying visual learning (Goujon, Didierjean, & Thorpe, 2015).

Most of the CC literature has been focused on how people learn spatial configuration regularities. Apart from a few studies that showed that observers can sometimes learn and utilize repetitions of identity information (Chun & Jiang, 1999; Endo & Takeda, 2004; Goujon, Didierjean, & Marmèche, 2009), the role of identity information in CC has overall been discounted. This is partly due to the implicit assumption that identity information should not dramatically affect spatial processing. This assumption is supported by the finding that CC is not bound to identity information (Nabeta, Ono, & Kawahara, 2003), and only under certain conditions does learning of spatial configurations not generalize to the new items' identities (Jiang & Song, 2005). Furthermore, it was recently reported that repeated configurations provide a stronger cue than repeated color information (Kunar, Johnston, & Sweetman, 2014), substantiating the notion that spatial information is a central factor in CC. Accordingly, prominent models of CC have often emphasized the spatial domain in context learning (e.g., Brady & Chun, 2007; Jiang & Wagner, 2004; Olson & Chun, 2002).

In spite of the dominance of the spatial domain in the CC literature, there are also good reasons to suggest that identity variability might modulate context learning. For instance, previous studies have revealed that identity information can facilitate perceptual and mnemonic processes (e.g., Konkle, Brady, Alvarez, & Oliva, 2010). Likewise, providing objects with distinct visual identities enhances performance in multiple-object-tracking tasks (Horowitz et al. 2007, Makovski & Jiang, 2009), suggesting that "what" information plays a role even in a strict "where" task, in which objects’ identities are task-irrelevant.

The purpose of this study was to investigate whether spatial information and identity information are sufficient and/or necessary for CC. To that end, CC was tested using real-world objects that, in contrast to the typical simple meaningless stimuli, introduced identity variability. If context learning is basically spatial, then the inclusion of identity variability should not impair (and perhaps even facilitate) learning. On the other hand, if both spatial and identity information are necessary to form a context, then this variability might actually confine learning, and hence search would be facilitated only in those displays where both types of information are repeated.

Experiment 1

The method of all the experiments reported below follows standard CC procedures (Chun, & Jiang, 1998). Importantly, and in contrast to the typical CC tasks, search stimuli were colored images of real-world objects. To foster identity processing, search targets were defined categorically (e.g., a backpack) and participants looked for the same target category throughout the experiment. Yet, on each trial, the exact image of the target was randomly sampled from 16 possible exemplars.

The goal of the initial experiment was to assess the extent to which identity information plays a role in CC. To that end, four display conditions were tested. The Location-Repeat displays mimicked typical CC experiments, in which only the locations, but not the identities, of the distractors were repeated, together with the target locations. In the Identity-Repeat condition, only the identities, but not the locations, of the distractors were repeated, together with the target locations. In the All-Repeat condition, both the identities and the locations of the distractors were repeated, together with the target locations. These conditions were compared to the New display condition, in which the target locations were repeated across trials (to rule out the possibility that any facilitation is merely due to repeated target locations), yet both distractor locations and identities were randomly determined on every trial (Fig. 1).

Fig. 1
figure 1

A schematic illustration (items are not drawn to scale and the actual set-size was larger) of the four display types. In the All-Repeat condition, both the identity information (what) and the location information (where) are repeated across trials. In the Identity-Repeat condition, the identity information is preserved, but the locations of the items vary across trials. Conversely, in Location-Repeat trials, the spatial arrangement is preserved but the identity information changes across trials (as in the typical CC procedure). These conditions are compared against New display trials in which the target locations are repeated, but both the identity information and the spatial arrangements vary across trials

Methods

Participants

Participants in the study were students (18–35 years old) from the Open University of Israel who participated in the experiments for a course credit; all reported having normal or corrected-to-normal visual acuity. A total of 30 subjects (9 males, age: M = 24.9 years) completed the first experiment.

Equipment and stimuli

Participants were tested individually in a dimly lit room. They sat unrestrained about 65 cm from a 17” (c. 43 cm) CRT monitor (resolution 1024 × 768, 85 HZ). The experiments were programmed using Psychtoolbox (Brainard, 1997; Pelli, 1997), implemented in MATLAB (www.mathworks.com). For each participant, a random set of 350 objects was selected from a total of 2400 colored images (1.89° × 1.89°) of real-world objects taken from the MIT dataset (http://cvcl.mit.edu/MM/).

Procedure and design

Subjects were randomly assigned to look for one out of seven target categories (guitars, backpacks, sofas, butterflies, gift wrappers, shoes or horses); each consisting of 16 exemplar images. In each trial, a randomly selected search target was presented together with 14 distractor objects. The objects were presented against a white background on an invisible 8 × 6 grid (21.6° × 16.2°, with a jitter of up to 0.54° within each cell to reduce colinearities). Subjects were instructed to press the space bar as fast as they could immediately they found the target. To ensure accuracy, upon response, all the items disappeared and the digits 1–6 appeared randomly at the positions of the target and five additional distractors. Subjects were asked to press the key of the digit occupying the target's position. A feedback was given immediately after the subject responded. For correct trials, a green plus sign was presented for 500 ms, whereas after incorrect responses a red minus sign was displayed for 2000 ms. Then, after an interval of 500 ms, the next trial began.

Participants performed 20 consecutive blocks of 32 randomly intermixed trials. Each block consisted of 8 displays of the four conditions (Fig. 1). Finally, at the end of the experiment, a surprise familiarity test was administered and subjects were asked to rate the familiarity of the 32 repeat and new displays on the scale of 1 (low-familiarity) to 5 (high- familiarity).

Results and discussion

Accuracy was very high in all the conditions (above 98 %) and none of the three repeated conditions significantly differed from New (all p's > .06). Error trials as well as trials deviating 2.5 SD above and below each participant's' mean of each cell (2.76 % of the correct trials) were removed from further analyses.

Figure 2 depicts RT as a function of epoch (i.e., a bin of four consecutive blocks) and display condition. Planned repeated-measures ANOVAs were conducted in order to assess learning in each of the three repeated conditions. All these analyses revealed robust effects of epoch, in that RT became faster as the experiment proceeded (all p's < .001, η p 2 > .40). Of a greater interest, neither the Location-Repeat nor the Identity-Repeat displays were different than New displays (F's < 1), and there was no significant epoch by condition interactions (p's > .16) In contrast, the All-Repeat displays yielded faster responses than New displays, F(1,29) = 9.91, p < .01,η p 2 = .26. This effect did not interact with epoch, F(4,116) = 1.06, p = .38, probably because learning was rapid and a significant effect had already emerged after 5 repetitions.

Fig. 2
figure 2

Experiment 1's results: mean RT as a function of epoch and display condition

Experiment 1's findings imply that the learning of arbitrary spatial arrangements does not tolerate large variability in identify information. To rule out the possibility that the lack of "pure" spatial learning in searching through real-world objects was due to overshadowing (stronger learning from the salient All-Repeat condition), and to increase the statistical power of the design, a follow-up experiment tested 16 displays of only the Location-Repeat and New conditions. Still no learning was observed under these conditions, as Location-Repeat displays were not different than New displays, F(1,24) = 2.48, p = .13, and the display condition did not interact with epoch, F < 1.

Taken together, these results clearly demonstrate that the mere repetition of arbitrary spatial configurations or of identity information was not sufficient for learning. Instead, performance was enhanced only when both spatial and identity information were preserved across repetitions.

Familiarity test

The results of the familiarity tests of all experiments are presented in Table 1. Notably, subjects reported that both the Identity-Repeat and the All-Repeat displays were more familiar than the New displays, yet only the All-Repeat displays were in fact responded to faster than the New displays.

Table 1 Mean familiarity scores (range 1–5) as a function of display condition and experiment

Experiment 2

One might argue that the lack of learning in the Location-Repeat and Identity-Repeat conditions was due to the unique features of the design. That is, there are several methodological differences that might explain the inconsistency between Experiment 1 and past studies that have shown CC, in spite of some variability in identity information (L's rotated in different directions; Chun & Jiang, 1998), or spatial variability (Endo & Takeda, 2004). For instance, it is possible that the use of categorically-defined targets, instead of single-exampler targets, might have biased the results. To test this possibility, subjects in Experiment 2 were looking for a single target throughout the experiment. Furthermore, to examine whether the lack of learning was due to other methodological differences, such as cluttered displays and insufficient repetitions, set-size was reduced to 12 items and each display was repeated 28 times.

Method

Fifty participants (8 males, age: M = 25.7 years) completed Experiment 2, which was identical to Experiment 1 except that only 11 non-target items were used, and the number of repetitions was increased to 28. Additionally, each subject was looking for a randomly selected single target throughout the experiment. A total of 24 participants completed Experiment 2a, which tested the Location-Repeat and New conditions (12 displays each); Experiment 2b tested the Identity-Repeat and New conditions.

Results and discussion

Accuracy was above 98.8 % in Experiment 2a with no difference between the Location-Repeat and New conditions (p = .3). In Experiment 2b, accuracy was slightly higher in the New condition (99.02 %) than in the Identity-Repeat condition (98.6 %, p = .05). Error trials as well as trials deviating 2.5 SD above and below each participant's mean of each cell (2.88 % of the correct trials) were removed.

Repeated-measures ANOVAs with epoch and condition were conducted separately for each experiment (Fig. 3). Both experiments revealed a strong effect of epoch (p < .001, η p 2 > .29), and no interaction between epoch and condition (p > .36). Importantly, similar to the previous experiment, Location-Repeat displays were not different from New displays, F < 1. Thus, in spite of the less-crowded displays, more repetitions, and the use of a single target-exemplar, subjects still showed no benefit from spatial configuration repetitions. In Experiment 2b, there was some evidence that Identity-Repeat trials were responded to faster than New trials, F(1,25) = 6.03, p = .02,η p 2 = .2. However, this small difference (13 ms) does not seem to reflect a learning effect because it was mainly driven by the first two epochs, whereas none of the final epochs showed a reliable difference between the Identity-Repeat and New conditions.

Fig. 3
figure 3

The results of Experiment 2a (left) and 2b (right): mean RT as a function of epoch and display condition

Experiment 3

The results thus far argue against the notion that the mere repetition of spatial configuration or identity information is sufficient for CC. Yet, Experiment 1 also revealed that learning occurred when both types of information were repeated. Experiment 3 therefore aimed at replicating and extending this finding. Specifically, it took advantage of the fact that testing CC with real-world objects paves the way for examining the robustness of learning using hybrid-search tasks (e.g., Wolfe, 2012). These tasks combine perceptual search with memory search, in that subjects are asked to hold multiple target templates in mind while searching for a target (is there a phone, a key or a wallet in the display?). Since this task highlights the role of identity information, it should further encourage subjects to process it during search and therefore might facilitate learning in the Identity-Repeat condition. Furthermore, hybrid-search tasks have the advantage of testing the generality of learning and its robustness to memory load manipulations (Travis, Mattingley, & Dux, 2013; Vickery, Sussman, & Jiang, 2010).

Methods

Forty-nine participants (17 males, age: M = 25.5 years) completed Experiment 3, which was identical to Experiment 1, except that hybrid-search was tested and Location-Repeat displays were excluded. Consequently, each block consisted of 30 trials (10 displays per condition). A total of 24 subjects were tested in the Load-4 condition, in which they were assigned a random set of four target categories (e.g., look for a guitar, backpack, sofa, or butterfly), while 25 subjects were tested in the Load-8 condition; two additional target categories (beer mugs, keys) were created to enable Load-8.

Results and discussion

Accuracy was above 98 % and was not affected by load, condition, or their interaction, all p's > .23. Error trials as well as trials deviating 2.5 SD above and below each participant's mean of each cell (2.92 % of the correct trials in Load-4 and 2.8 % in Load-8) were removed.

A mixed analysis, with load (4, 8) as a between-subjects factor, and epoch (1–5) and display condition (Identity-Repeat, All-Repeat, and New) as within-subject factors was conducted (Fig. 4). As expected, there was a main effect of load, F(1,47) = 16.48, p < .001,η p 2 = .26, a main effect of epoch, F(4,188) = 191.24, p < .001, η p 2 = .80, and an interaction between the two, F(4,188) = 3.85, p < .01,η p 2 = .08. Importantly, there was a main effect of display condition, F(2,94) = 11.26, p < .001,η p 2 = .19, and this effect was not modulated by load, F(2,94) = 1.04, p = .36.

Fig. 4
figure 4

Experiments 3 results: mean RT as a function of load, epoch and display condition

More specifically, the Identity-Repeat displays were not different from the New displays, and this condition did not interact with epoch in either memory loads (all p's > .06). In sharp contrast, the All-Repeat displays were significantly faster than New displays in both Load-4, F(1,23) = 9.27, p < .001,η p 2 = .29 , and Load-8, F(1,24) = 7.1, p = .01,η p 2 = .23, and this effect did not interact with epoch (F's < 1). Thus, these data fully replicate and extend the previous findings: Even though identity processing is presumably emphasized under hybrid search conditions, identity repetition was not sufficient to facilitate search. Yet, the repetition of both identity and spatial information produced fast and robust learning regardless of memory load.

Familiarity test

Subjects were able to explicitly distinguish between the New and the All-Repeat displays in Load-4, but not in Load-8 (Table 1). Similar to before, there was also an indication of a familiarity effect in the Identity-Repeat condition, in the absence of search facilitation. Coupled with the previous results, these findings show that the effects of CC and familiarity are not parallel to one another, suggesting that the mechanisms underlying familiarity might be separable from those required for learning target–context associations.

General discussion

What does constitute a context? In the CC literature, it has been generally assumed that either the spatial configuration of the items or their identities can be regarded as the context that facilitates search (Chun & Jiang, 1998, 1999). The findings of the present study argue against this notion. No indication was found that observers can in fact extract and utilize arbitrary spatial configuration regularities in searching through real-world objects. This finding was replicated in multiple groups of subjects and thus undermines the view that repetition of spatial information is sufficient for context learning. Nonetheless, the results also showed that spatial information might be necessary for context learning, since no learning was observed when only identity information was repeated. Instead, CC was found only when both spatial and identity information were repeated. Still, despite this specificity, learning was robust and was not affected by memory load.

The finding that subjects were not able to learn spatial information regularities is inconsistent with the notion that the spatial dimension has a special role in CC (Brady & Chun, 2007; Jiang & Wagner, 2004; Olson & Chun, 2002). The present results also seem inconsistent with previous findings showing learning despite some spatial and identity variability (Chun & Jiang, 1999; Endo & Takeda, 2004). The results of Experiment 2 argue against the possibility that the lack of learning here was due to specific features of the design (e.g., overshadowing, crowded displays, lack of power) or to the use of categorically-defined targets. A plausible explanation for why subjects did not exhibit learning in any of the current experiments, when only the identity or the spatial information was repeated, is the use of complex colored objects that increased the heterogeneity of the displays and diminished learning (Feldmann-Wustefeld & Schubo, 2014). Another possibility is that the semantic meaning of the items, rather than their visual features, has restricted learning. Notably, the question of what confines CC is closely related to the question of what exactly people learn when both what and where information are repeated, and future research should delineate the boundary conditions of learning while searching through real-world objects.

The lack of CC also appears in contrast to Hout and Goldinger (2010, 2012), who found that both object consistency and spatial consistency can be used to facilitate search. Importantly, however, these studies did not test CC, and the search facilitation was attributed to increased familiarity with specific search displays that lowered search response thresholds. In contrast, it is currently debated whether CC affects response thresholds (Kunar, Flusberg, Horowitz, & Wolfe, 2007) or attentional guidance (Chun & Jiang, 1998). Yet, the present findings entail that learning to associate target locations with contexts does require both identity and spatial information, and this learning is largely independent of familiarity with the display.

A novel aspect of this study is the testing of CC under hybrid-search conditions. Previously, the effect of memory load on learning was addressed using dual-task designs, and it was found, for example, that performing a demanding spatial working memory task attenuated learning (Travis et al., 2013). Here, memory load was integrated into the task, and although this manipulation considerably impacted search latencies, it had no effect on learning (see also Vickery et al., 2010). This finding indicates that, first, learning is robust and can be found across variable task difficulties, and second, in contrast to spatial working memory, the type of memory that supports hybrid-search (be that activated long-term memory or visual working-memory; Cunningham & Wolfe, 2014) does not draw on the same resources required for CC.

In conclusion, CC demonstrates the powerful ability of humans to extract regularities from seemingly chaotic and noisy environments and to utilize this knowledge to influence fundamental cognitive processes. The present findings provide strong evidence that there is no learning of "where" without "what", and, at least for CC, both are required. Nonetheless, in spite of being highly specific, this learning is robust and is not modulated by memory load. Together, these findings posit important constraints on CC and challenge the current thinking regarding what is the context that is being learned.