Introduction
In our daily environment, we process information about objects, their shapes, colors, locations, and so on. Thereby, we also register co-occurrences between such features. For instance, imagine your trips to the supermarket: If your favorite pasta comes in a blue package and is always in the same aisle at the supermarket, you will pick it up based on the location and color, without registering more details—you can act routinely in such environments. This kind of learning can occur without any intention to learn and usually, we are also not consciously aware about such learning processes or its contents. Therefore, it is termed implicit learning. It is an important feature of our cognitive system since it helps us to predict future events and thereby to act without effort (Clark,
2013). Another important characteristic of our system is that we learn to discriminate relevant from irrelevant information according to our action goals (Dreisbach & Haider,
2008,
2009; Haider & Frensch,
1996). If we want to buy our supermarket item, we will look for only blue packages, de-selecting other colors. This is a core ability of our attentional system and potentially shapes what we learn from our environment in such everyday actions. The goal of the current study is to ask for the role of selective attention of cues, here manipulated through their task-relevance, in implicit learning processes.
In the field of implicit learning, there has been a long-standing debate about the conditions that are required for such learning processes. When do we notice that certain features of stimuli are co-occurring in a systematic fashion? Do they need to be part of the current action goal or, more broadly, the task-set? Given a confined task context, do features need to be task- or response-relevant to be associatively learned? Or do we encode all the information about all the stimuli of the task at hand in a rather unselective manner and learning occurs automatically whenever the prediction error minimizes due to contingencies inherent in the environment?
Implicit learning
In the lab, we can study implicit learning processes in several different paradigms, like the serial reaction time task (Nissen & Bullemer,
1987), statistical learning paradigms (Fiser & Aslin,
2001; Reber,
1967), or in contextual cueing paradigms (Chun & Jiang,
1998), to only name a few. The research questions studied with these paradigms are rather similar, yet, research within the different paradigms is usually only loosely connected. Here, we focus mostly on the contextual cueing literature, but integrate also some findings from the other paradigms.
In the original contextual cueing paradigm, participants are instructed to do a visual search task and are asked to find a target letter “T” among a display of distractor letters “L”. For each block throughout an experiment, half of the displays are repeated distractor configurations that consistently predict a target location while the other half of displays are novel configurations. In each trial, participants are asked to report the orientation of the target letter. The contextual cueing effect (CC effect) is defined as a stronger decrease (steeper slope) in response time (RT) for the repeated configurations than for the novel configurations over the course of trials. Note that the configurations are not associated with the orientation of the target, and thus only the contingency between the distractor configuration and the target location can be learned, while the response remains unpredictable. The effect can be traced back to an enhanced efficiency in search, attentional guidance and selection, and, to a lesser extent, to response-related processes (Kobayashi & Ogawa,
2020; Kunar et al.,
2007; Schankin & Schubö,
2009,
2010; Sisk et al.,
2019). It results in long-term implicit learning effects (Chun & Jiang,
2003). When asked to explicitly discriminate repeated spatial configurations from novel ones, participants are typically not able to do so, and they do not report having learned anything. Therefore, this learning process is assumed to be implicit (Colagiuri & Livesey,
2016; but see Vadillo et al.,
2019).
The classical buildup of the contextual cueing task does not seem ideal to study our research question. Because originally, it emphasizes the spatial dimension above all else. When studying the question of the role of task-relevance in implicit learning, we want to compare different cues when they are task-relevant or irrelevant. In the classical contextual cueing, the comparison between different cues would be inherently disbalanced: The predictive feature is the spatial configuration of the distractors, the visual search task is a spatial task, and the requested response is based on the spatial orientation judgement of the target.
Meanwhile, the contextual cueing paradigm has been used in different ways that suggest the possibility of reducing the dominance of the spatial dimension in the task. The empirical evidence supports that, within the paradigm, cues or contexts besides the spatial configuration of distractors are learned, and can guide attention. In the visual domain, multiple studies have shown a CC effect when repeating natural scenes or complex geometric patterns that predict target location, though, involving explicit learning (Brockmole & Henderson,
2006; Brockmole et al.,
2006; Ehinger & Brockmole,
2008; Goujon et al.,
2012). With more simplistic stimulus material, it has been shown that background color and distractor identity can be implicitly learned to predict the target position. However, when color or shape cues in such form are predictive for target location on top of spatial cues (distractor configuration) being predictive, only spatial cue contingencies are learned, color and shape contingencies are overshadowed (Endo & Takeda,
2004; Kunar et al.,
2006,
2013). It has further been shown that spatiotemporal sequences can guide attention (Olson & Chun,
2001), illustrating the wide scope of environmental cues that the cognitive system uses for predictions. So, it seems that a number of features can be used as cues, and probably entirely task-irrelevant features like background color can be learned to predict the target position. Yet, the role of selective attention that might discriminate task-relevant from task-irrelevant stimuli or features, remains unclear in the field of implicit learning.
Attentional prerequisites for implicit learning
As a cautionary disclaimer: Attention is a widely used and too often under-defined term (Anderson,
2011). Here, we refer to attention as selective attention, not attention as a resource (as in, e.g., Frensch et al.,
1998; Nissen & Bullemer,
1987). In the studies we will review here, attention is also mostly operationalized as task-relevance. So, when a stimulus feature is task-relevant, it is considered to be attended, and is consequently integrated into the learning process. This is to be seen separately from the question if the feature is processed consciously or not. In many ways, consciousness and attention are closely related notions (Jiang & Chun,
2003; Mack & Rock,
1998; Tsuchiya & Koch,
2009). It is crucial in the definition of attention to avoid regressive reasoning in the form of invoking a homunculus that fulfils all assumed functions of attention, and is a causal, but unexplained factor in the cognitive system. Therefore, attention in our context is to be understood as the resulting effect when manipulating task-relevance, not as a causal factor on its own. A test for conscious knowledge of the learned contents must be an additional step and is not assumed to perfectly correlate with attending to the to-be-learned features (Tsuchiya & Koch,
2009).
There are two lines of argument with opposing predictions when it comes to attentional prerequisites of implicit learning. The first suggests that task-irrelevant features are not processed in a way that allows for integration into the learning process, either arguing that the features are not processed sufficiently, or that their representational strength is too weak to translate into behavior (Turk-Browne et al.,
2005). The second argument suggests that task-irrelevant features are indeed processed to a degree that they can become part of contingencies which then form predictions (Kunar et al.,
2013; Miller,
1987).
As to the first line of argument, there are studies that could demonstrate a learning effect only for relevant features. In visual search and also in statistical learning paradigms, participants were instructed to only pay attention to stimuli of one color, and to ignore stimuli of another color (Jiang & Chun,
2001; Jiang & Leung,
2005; Turk-Browne et al.,
2005). Because learning of contingencies occurred for the attended color stimuli only, it was concluded that selective attention is a prerequisite for (implicit) learning. Similarly, participants were able to learn a spatial sequence of stimuli, but only additionally learned the contingencies with the identity of these stimuli when they were instructed to count them (Jiménez & Méndez,
1999; Jiménez et al.,
1999). Thus, only when the identity of the stimuli were made task- or response-relevant, they were learned (see also Dreisbach & Haider,
2008,
2009). Yet, Jiang and Leung (
2005) observed that contingencies in stimuli of an unattended color could be learned in some way, because even though learning did not manifest in behavior at first, it facilitated learning in a subsequent task. In a similar vein, the above mentioned results from Jiang and Chun (
2001) cannot be interpreted unambiguously. In their third, higher-powered experiment, they found potential evidence for learning of contingencies also in a task-irrelevant color.
The second group of findings indicate that irrelevant information is also processed and respective learning contents used in future instances. For example, Miller (
1987) used a variant of the Eriksen flanker task (Eriksen & Eriksen,
1974). In his experiments, the flankers were not, like originally done in this paradigm, of the same identity as the targets or were otherwise associated with a response. He observed that when these task-irrelevant flankers were associated consistently with a specific response, participants responded faster in these trials compared to when the flanker-response relation was changed. Hence, the irrelevant flankers were associated with the particular response. Similarly, Kunar et al., (
2006,
2013) showed that in contextual cueing, task-irrelevant context features such as background color or texture were learned when they were predictive for target location.
An additional finding, however, is that context features like color, texture, or distractor identity are not learned when a spatial configuration is given as an additional cue (Endo & Takeda,
2004; Kunar et al.,
2013). This suggests that the spatial configuration could overshadow the learning of other predictive features. This may not be surprising, because, as mentioned above, the task in contextual cueing paradigms inherently emphasizes the spatial dimension. In addition, in the literature on implicit sequence learning, for example, Koch and Hoffmann (
2000) suggested that spatial relations of stimuli contributed significantly more to learning effects than other stimulus features. But also generally, the spatial dimension might be distinctly represented in our cognitive system (Mayr,
1996; Paillard,
1991; Schintu et al.,
2014). In fact, the spatial dimension might not even be a perceptual feature as such, as it is so tightly bound to the motor system (Gaschler et al.,
2012; Goschke & Bolte,
2012; Koch & Hoffmann,
2000; Paillard,
1991).
With respect to findings on attentional mechanisms and learning specifically in the contextual cueing paradigm, these results suggest that their generalizability is strongly limited. The paradigm has, with very few exceptions (Endo & Takeda,
2004; Kunar et al.,
2013), not been extended to test other, non-spatial stimulus features. This is particularly a problem when trying to draw conclusions about the learning of task-relevant and task-irrelevant features. Because either spatial features are overshadowing all other visual features (Kunar et al.,
2013) because they are weighted more strongly according to the task requirements, or the spatial dimension is represented entirely differently, and thus shows different learning mechanisms than other visual features. Therefore, to conduct a more generalizable test on attentional mechanisms in implicit learning, we designed a novel variant of the task that de-emphasizes the spatial dimension. With this variant, we can contrast the learning of different visual features (color, shape) that are not problematic in terms of the task requirements, or, potentially, their general representation in the cognitive system.
A second point noteworthy in the studies reviewed so far, is that the participants were not able to recognize predictive distractor configurations (Jiang & Chun,
2001; Kunar et al.,
2006,
2013) or recall the identity of flankers (Miller,
1987). This was respectively taken as evidence for incidental or implicit learning. However, Vadillo et al (
2019) recently questioned the implicit nature of the CC effect, given non-sensitive awareness measures and issues with limited statistical power of many studies in the literature. We will address this with a carefully designed test for conscious awareness, and discuss the issue in light of our results further in the General Discussion.
Overview of the study
The main goal of the current study was to examine whether selective attending, in terms of task-relevance, is needed to learn the contingencies between distractor features and target location within a contextual cueing paradigm. Importantly, whereas in the original contextual cueing paradigm, the spatial configuration of distractors is the cue for the target location, we implemented nonspatial features of the distractors as cues. We manipulated task-relevance of the predictive cue as following: The shape dimension is task-relevant because the task is to assess the target’s shape (i.e., identity), and thus, the distractor shapes, which are the predictive cues, would be relevant and needed to be processed for the processing of the task. The color dimension, on the other hand, does not appear in any of the task’s processes, neither the search process discriminating distractor from target shapes, nor for the response, that is referred to the distractor shape. The color dimension is thus considered task-irrelevant. A second question concerned cue competition. If more than one feature predicts the target location, will that lead to overshadowing of the task-irrelevant feature, as Kunar et al. (
2013) have shown for spatial features? Or are such cue competition effects as overshadowing or blocking the result of explicit or deliberate processes, and thus do not occur in incidental learning paradigms such as our variant of contextual cueing (De Houwer et al.,
2005; Schmidt & De Houwer,
2019)? A third question concerned the implicit nature of the acquired contingencies.
In all three experiments, the participants saw spatial configurations of distractors and had to find the target to answer whether a certain characteristic of the target was present. The spatial configurations of the distractors were novel in every trial and did not predict target location. Instead, either the shape (Experiment 1), the color (Experiment 2), or the color and shape (Experiment 3) of the distractors were predictive for the target location.
In Experiment 1, three of six distractor shapes each cued one of four potential target locations whereas for the other three shapes, targets were randomly assigned to the four potential target locations. Note that in this context, shape is a task-relevant cue in so far that it needs to be processed to discriminate the target from the distractor. In Experiment 2, we used the distractor color as a feature to cue the target location. Again, three colors each cued one particular target location, and the other three colors were randomly paired with the target locations. The question here was if the predictive color would be learned as a cue for target location, even though it is neither task- nor response-relevant. To be more precise, color is neither relevant to the search task, as it does not distinguish target and distractors, nor relevant to the response as each color is equally likely to appear with each target identity and there is no color judgement required. In Experiment 3, we tested whether two distinct features of the distractors, shape (task-relevant) and color (task-irrelevant) would be learned to be associated with a target location as a compound, whether both features would be learned independently from one another, or if only one feature would be learned (overshadowing).
General discussion
The main goal of the current study was to investigate the role of selective attention in the implicit acquisition of contingencies between features. We implemented these contingencies in a novel variant of the contextual cueing paradigm using identity cueing instead of the classical spatial configuration cueing. For the purpose of testing the role of selective attention, we manipulated the task-relevance of distractor features that predicted the target location. In Experiment 1, the predictive feature was the task-relevant shape of the distractors. In Experiment 2, it was the task-irrelevant feature color. In Experiment 3, we aimed to test cue competition effects and therefore presented compound cues of color and shape.
The results of the first two experiments showed that participants learned to predict the target location from the shape (Experiment 1) and from the color as well (Experiment 2). The RT differences between predictive and unpredictive search contexts emerged over the course of the training blocks in both experiments.
A generation task which also contained a confidence measure indicated that these learned associations were not explicitly represented. Participants were not able to report the correct target location according to a predictive feature above chance (only in Experiment 2, but Bayesian analysis provided no substantial evidence), and were not more confident in their respective response when they had responded with the correct target location. This indicated that participants did not have metacognitive access to the acquired information, enabling them to distinguish between their correct and incorrect responses (Michel,
2023).
What do these findings offer in terms of understanding the role of selective attention in implicit learning? Attentional or selective mechanisms are essential to our cognitive system, in the visual system alone, we are bombarded with information of about 10
8 bits per second (Itti & Koch,
2000; Marois & Ivanoff,
2005). This requires mechanisms of selection, chunking, and binding (Fiser & Aslin,
2005; Wheeler & Treisman,
2002). As reviewed above, a number of studies suggested that task-relevance of a predictive feature, manipulated by instruction or by the nature of the task, is necessary for it to be learned implicitly (Jiang & Chun,
2001; Jiang & Leung,
2005; Jiménez & Méndez,
1999; Turk-Browne et al.,
2005). What is implicitly assumed when arguing for a central role of selective attention in implicit learning is that implicit learning is subject to capacity limits. However, this seems to contradict the widely confirmed finding that people can learn more than one contingency in parallel (Conway & Christiansen,
2006; Mayr,
1996; Wilts & Haider,
2023). In addition, our current finding suggests that also contingencies involving task-irrelevant cues can be learned. Thus, there might not be such a compelling argument for a functionally imperative role of selective attention in implicit learning.
To solve this contradiction, it might be useful to refer to research in action control, because here, research has been going in a similar direction. In the framework of the Theory of Event Coding (TEC; Hommel et al.,
2001), an event file is thought to be formed when we integrate stimulus features and responses into an episode that can then be activated by the respective stimulus or response features it entails (Hommel,
1998). In multiple series of experiments, it has been tested what the attentional prerequisites for a stimulus or context feature are to be integrated into an event file. Conclusions from such experiments were that features are integrated into an event file when they are task-relevant (Chao et al.,
2022; Hommel,
2005; Huffman et al.,
2018), specifically, also if they can be used to discriminate targets from distractors (Hommel & Colzato,
2004). More recently, the modeling of the mechanism has been refined, as it has been proposed that the selectivity of integrating features does not lie in the encoding and building of an event file, which is now thought to be automatic, but rather at the retrieval stage of the event file (Hommel et al.,
2014; Schmalbrock et al.,
2022). Thus, it is not the question whether a feature is integrated a priori, but whether the weighting of a feature (Hommel et al.,
2014; Memelink & Hommel,
2013) enables the retrieval of the episode (event file) in a future occurrence. The paradigms that are used in the context of action control, often rely on trial-by-trial observations, examining the effect of a trial
n feature and response on a trial
n + 1. We believe that, with our longer-term learning context, we can extend the scope of studying the processing of features beyond this trial-by-trial frame (Moeller & Pfister,
2022). In our view, one can integrate our findings into the TEC framework, in a way that features are learned to predict events or actions when they activate respective event files that contain such information, irrespective of the features’ task or response relevance. Applied to our current findings, a possible assumption concerning the underlying mechanism is that all features of a trial are integrated into an event file. Given that one feature is contingently paired with the target location (e.g. color), the retrieval of that episode containing the correct target location is strengthened over time (Hommel,
1998; Rescorla & Wagner,
1972). Consequently, it would not be task-relevance (or selective attention) that modulates implicit learning but rather the retrieval of episodes (event file), and the question of implicit learning is whether a particular feature is capable of triggering the retrieval of a certain episode. If so, it leads to performance benefits, or, as we coined it here, implicit learning. This mechanism seems to be effective with task-relevant cues (distractor shape) and with task-irrelevant cues (distractor color).
We acknowledge that our manipulation of task-relevance differs from the studies presented in the introduction. We provided a context in which all distractors needed to be evaluated with respect to their shape matching the target shape or not. This way, color was not task- or response-relevant. However, it may have been processed stronger than in the case of irrelevant stimuli in previous studies, in which, for example, stimuli of a certain color did not have to be searched at all (Jiang & Chun,
2001; Jiang & Leung,
2005; Turk-Browne et al.,
2005). Yet, what is unique to our design, is the distinction between features on the higher level, marking shape as task-relevant, and color as task-irrelevant, instead of marking one specific shape or one specific color as task-relevant or not. We argue that this is the more relevant question when it comes to specifying the building blocks of implicit learning. In that question, we test theoretical accounts that postulate processing in feature-specific modules that may not be able to integrate information from different features that are not attended (Baars,
2005; Eberhardt et al.,
2017; Keele et al.,
2003). In our experiments, we find such learning effects across features, not just within one feature. Although not compatible with feature-wise processing in independent modules, this finding is in line with the underlying learning mechanism we proposed above. Because when information in a trial is encoded into an event file, contingencies within or across features can, in principle, learned to be associated.
A notable limitation of our experiments is that task-relevance in our contextual cueing variant is confounded with the respective feature of the cue. That is, shape is task-relevant, and color is task-irrelevant. We cannot balance these two factors, because when target color were to be the relevant feature, we would have a pop-out effect that would hardly be affected by predictability of the distractors’ shapes or colors. We had no reason to believe that the two visual features (ceteris paribus) would differ in their potential to be associated with target position. Other researchers have found CC effects for (background) color (Kunar et al.,
2006,
2013), but also learning effects for irrelevant but predictive shapes (Levin & Tzelgov,
2016), and even letters (Miller,
1987). From visual recognition and visual scene processing literature, we would even hypothesize that there is a primacy of shape information over color information (e.g., Biederman & Ju,
1988; Del Viva et al.,
2016). Extrapolating this to our experiments, the likelihood of color contingency learning would be further reduced. But note that this is an effect found with more complex stimuli, and might be traced back to complexity reduction, therefore not being transferrable to our simple stimulus set-up. Thus, although a limitation of our design is that these two conditions, task-relevance and feature dimension, cannot be disentangled, there is no compelling argument as to why the feature dimension should be the main contributor of the effect. More so, there would be an argument to hypothesize the opposite effect of task-relevance and feature dimension. Thus, from our experiments, we would deduce that, in principle, task-relevant and task-irrelevant features can be integrated and used for predictions (Experiments 1 and 2).
In Experiment 3, we used colors and shapes as compound cues and, after training, tested learning of both features in isolation. In the interpretation of the results from the single cue blocks, the picture is more nuanced. First, overall, we observe no costs in the RTs in predictive contexts from compound cue to single cues, that is, from training to the single cue blocks. Additionally, the descriptive differences between predictive and unpredictive contexts in the compound blocks (CC effects) are almost double the size than the differences in the shape and color single cue blocks. The same relationship is found when comparing the fixed effects estimates for context in the compound training of Experiment 3, which are almost double the estimates of the trainings in Experiment 1 (shape) and Experiment 2 (color). Thus, the learning effects in single cue experiments (Experiments 1 and 2) and the single cue block effects seem additive with respect to the learning effect in the compound cue blocks of Experiment 3. Such summation effects have been shown in operant conditioning, when comparing compound cue and single cue learning in animals (Mackintosh,
1976; Miles & Jenkins,
1973; Thein et al.,
2008).
Yet, our results from the single cue blocks remain somewhat ambiguous. It remains unclear if there are reliable context effects in the single cue blocks whatsoever. That the mixed model with random slopes for cue feature per participant fit the data best, is a first indicator for individual variance in the learning effects of the two cues. However, in an exploratory individual participant analysis, we do not see convincing evidence for overshadowing effects of any of the two features within participants. This is in itself interesting though, because an overshadowing effect would have been probable not only due to feature saliency (Mackintosh,
1976) or individual preferences (Reynolds,
1961), but also because the shape feature was task-relevant and was thus more probably going to overshadow the task-irrelevant color cue. We manipulated task-relevance to alter attentional processes, and overshadowing effects are also believed to build on attentional mechanisms (Mackintosh,
1971), in the sense that although more than one contingency can be learned, not all learned contingencies are translated into behavior (Kaufman & Bolles,
1981; Matzel et al.,
1985). Thus, attentional processes would have been influenced by task-relevance and cue competition effects, and it is conceivable that cue competition effects would be influenced by the task-relevance of such cues. However, we observe no advantages for the task-relevant cue. This might point to a reciprocal overshadowing that has been observed in animals (Mackintosh,
1976; Miles & Jenkins,
1973), meaning that both features overshadow each other, resulting in a result pattern of a summation effect, as described above.
In a recent article, Schmidt and De Houwer (
2019) noted that there is surprisingly little research on the issue of cue competition, especially in implicit learning (but see Beesley & Shanks,
2012; Endo & Takeda,
2004 for evidence from contextual cueing; Cleeremans,
1997; Jiménez & Méndez,
2001, for evidence from implicit sequence learning). In multiple large studies, Schmidt and De Houwer (
2019) found no evidence for blocking or overshadowing in an implicit learning paradigm. They also labeled the predictive features (shapes and words) in their experiments as task-irrelevant, because the response itself was only based on color. In that respect, their findings are consistent with ours: Task-irrelevant features that are predictive (although, in their case, for response), are still learned. In their case, they are even learned equally strongly, without overshadowing or blocking each other. That fits our take on interpretations of Experiments 1 and 2—independently from task-relevance, cue contingencies can be learned. Our addition from Experiment 3 is that cue competition in our variant of an incidental learning paradigm does not result in overshadowing effects, even though one cue is task-relevant and the other is task-irrelevant. Rather, our results are compatible with the notion of independent learning of cues, resulting in additive learning effects in compound presentation.
One last issue concerning our findings might be doubts about the implicit nature of the knowledge in the contextual cueing paradigm. We claim that while participants' performance in the training phase reflected learning of the contingency between the respective feature and the target location, they were unable to express this knowledge explicitly. There is a long-lasting debate whether the common variant of contextual cueing is in fact based on non-conscious learning. It has been argued that studies have failed to correctly test for conscious knowledge (Luque et al.,
2017; Vadillo et al.,
2016,
2019), especially because they are underpowered, and measurement error leads to wrong conclusions regarding the implicit nature of the CC effect. In an attempt to empirically add to the debate, Colagiuri and Livesey (
2016) tested samples of over 600 participants, and found no positive relationship between explicit knowledge and the cueing effect. Nevertheless, we take the criticism on the conventional testing for explicit knowledge seriously. Contextual cueing studies originally implement a recognition task: They show participants old and novel spatial configurations, and ask them to categorize them into old and novel (Chun & Jiang,
1998). This means that they use a one-trial test for each configuration with often small sample sizes, and it does not seem surprising that there is a reliability and power issue here (Smyth & Shanks,
2008). This is why we did not implement a recognition task, as in the common variant, but a task that mirrors exactly the task that was provided in the training to enhance sensitivity of our test (Shanks & St. John,
1994). Participants thus had every chance to express any knowledge or intuition from the training in the generation task. Additionally, we were not restricted to a one-trial test, as one is in the recognition tasks. Rather, we presented participants with the same cue (color or shape, with random spatial configurations) multiple times, making the measure more reliable (Smyth & Shanks,
2008). Note that we can also expect that, given conscious awareness, the task to reproduce the contingencies between color or shape and target location is considerably easier than to recognize spatial configurations of distractors, and recall target location from that. So, we would expect less false-negatives (i.e., participants have explicit knowledge but cannot demonstrate that in the task) a priori. With our method of testing both an objective performance measure and a subjective confidence measure, and in addition testing for their interdependence (Michel,
2023), we propose that what we observed here is indeed implicit knowledge.