Introduction
Visual working memory (VWM) is a temporary buffer capable of actively storing, processing and manipulating the information coming from the visual world (Logie,
1986; Luck & Vogel,
2013). However, VWM is severely restricted in terms of capacity, with estimates converging on a limit of 3–4 items stored simultaneously (e.g., Alvarez & Cavanagh,
2004; Awh et al.,
2007; Fukuda et al.,
2010; Luck & Vogel,
1997). These capacity constraints lead to two different questions: First, can these storage limits be exceeded? And second, if so, what factors may contribute to increasing the capacity of VWM?
Behavioral, neurophysiological and even daily-life evidence has shown that VWM capacity limits can be surpassed. Moreover, several cognitive (e.g., long-term memory aids, familiarity, attentional selection) and perceptual factors seem to affect VWM capacity (Brady et al.,
2016; Heuer & Schubö,
2016; Jackson & Raymond,
2008; Qian et al.,
2017). Among these perceptual factors, organizational cues and particularly the Gestalt grouping principles (e.g., similarity, proximity, closure, common fate, continuity) have been a prolific area of interest (Palmer,
2003; Pomerantz & Kubovy,
1981; Wagemans,
2016). These organizational cues are known to improve perceptual performance (see Wagemans et al.,
2012 for an extensive review; Wertheimer,
1923) by parsing the visual scene into component objects according to certain rules (Duncan,
1984; Duncan & Humphreys,
1989; Kahneman & Henik,
1977; Moore & Egeth,
1997). Particularly, perceptual grouping seems to parse the visual scene at a preattentive stage of visual processing and organizes the visual information before other cognitive processes have access to it (Lamy et al.,
2006; Mack et al.,
1992). Nonetheless, even if grouping occurs pre or even inattentively, attention is needed for the items to access and be encoded into working memory (Moore & Egeth,
1997). Hence, it would be not surprising if this early organization of the visual information could improve information processing (Woodman et al.,
2003) leading to an increase in VWM performance, but the critical question is whether this early organization of the visual information also unintentionally biases the allocation of attentional resources over the items to be memorized (Duncan,
1984; Qian et al.,
2020).
There is consistent evidence of improved VWM performance when the items to be remembered can be linked by the presence of different grouping cues (e.g., Allon et al.,
2019; Gao et al.,
2016; Peterson et al.,
2015; Peterson & Berryhill,
2013; Woodman et al.,
2003; Xu & Chun,
2007; Zhang et al.,
2016; but see Li et al.,
2018 for a review of grouping effects on VWM). For example, Woodman et al. (
2003), using a pre-cued change detection task (CDT), found that grouping the items through spatial proximity improved VWM performance. Their results showed that grouped items tend to be stored together even if they are not directly cued. The grouping principle of similarity has also been widely studied in the context of VWM benefits. In fact, two different studies, Peterson and Berryhill (
2013) and Peterson et al. (
2015) found that color similarity increased VWM performance in a classical CDT. Other grouping principles like closure, connectedness and collinearity have also been proven to be effective in improving the performance of VWM, but the effects sizes obtained are significantly lower than those obtained in studies that employ the grouping principles of proximity and similarity (Gao et al.,
2016).
Although there is compelling evidence on the effects of perceptual grouping over VWM performance, the mechanisms behind this effect are less clear. One possibility is that perceptual grouping only serves as a mean to organize visual information, allowing multiple individual items that share common features (like some grouping principles) to be treated as a single unit in memory or a “chunk”, a form of lossless compression that leads to more efficient processing of the information (Brady et al.,
2009; Corbett,
2017; Nassar et al.,
2018; Zhang & Luck,
2011). This would allow more items to be stored by freeing up resources that can be reallocated to other items (e.g., Bays & Husain,
2008), or using fewer slots for the same amount of information. In agreement with this account of grouping effects, Kałamała et al. (
2017) found that the total number of objects maintained in VWM (
k value) was greater when some of the items to be remembered were grouped by whole-part similarity. Moreover, Peterson et al. (
2015) found that benefits to VWM derived from different grouping principles (similarity, proximity and connectedness) were echoed by a reduction of contralateral delay activity (CDA) amplitude, a result that can be interpreted in terms of a reduction of the resources devoted to VWM. On the other hand, the improvement in VWM performance could be derived from an encoding bias towards the grouped items (Li et al.,
2018; Peterson & Berryhill,
2013). According to this account, the VWM improvement would result from the preattentional processing of grouping cues and the object-based (or feature-based) attentional capture provoked by the grouped elements (Treisman,
1982; Vecera,
1994; Vecera & Farah,
1994). This will result in a better encoding of the grouped items into the VWM at the expense of a poor encoding of the non-grouped items, especially when the number of items exceeds the VWM capacity (Awh et al.,
2006). In support of this explanation, Peterson and Berryhill (
2013) found that the VWM was restricted to probed items that were grouped during stimulus presentation, suggesting a bias toward encoding the grouped items. Moreover, Qian et al. (
2020) using a pre-cued CDT paradigm, showed that certain features (such as color) guide the attention in a mandatory and automatic manner, leading to a memory benefit for the items that shared this feature. Finally, in a recent meta-analytic study, Li et al. (
2018) provided a tentative explanation for the mechanisms behind the effect of grouping on VWM. These authors argued that the grouping effect seems to depend on the grouping relevancy of the tested feature. Specifically, the authors suggest that when the tested feature is grouping relevant (as is the case of our study), the feature is processed first to form perceptual grouping, and its storage will not involve competing for attention or memory resources as grouping seems to occur at a preattentive stage (Duncan & Humphreys,
1989; Mack et al.,
1992). However, if an irrelevant feature is tested, storage for that feature should occur after the perceptual group is perceived and the items will compete with the others to obtain better storage. In this case, irrelevant features will obtain attentional priority in the encoding stage due to their belonging to the grouped items (Fine & Minnery,
2009; Melcher & Piazza,
2011).
Given the mixed evidence found in the previous literature regarding the nature of the VWM improvements, the present study aims to explore the mechanisms that underlie the VWM improvement associated with perceptual grouping and, especially, the role that attentional processes play in it when grouping-relevant features are tested. We focused on two possible accounts for this beneficial effect. The first is a “chunking” process that compresses grouped items into a single “chunk” of information. This hypothesis states that grouping allows more efficient processing of the items to be remembered without compromising attentional resources, so more resources would be available to store the relevant information (Corbett,
2017). Conversely, the “encoding bias” hypothesis states that the presence of grouping cues is processed pre-attentively at the early stages of the perceptual stream, and automatically directs feature-based attention to the grouped elements (Duncan,
1984; Qian et al.,
2020; Vecera,
1994) during the memory task. This last approach leads to an attentional encoding bias that would impair the encoding of the non-grouped items into the VWM. To this end, we combined a classical CDT and the use of a variable-delay retrocue paradigm to direct the attention to specific item locations within the internal representations of the items stored in the VWM.
To this end, we conducted two experiments in which an array of six colored items were presented briefly at six different locations. In Experiment 1, the participants were instructed to memorize all the items until the appearance of a probe, which could either have the same or different color (see Rensink,
2002 for a thorough review on change detection theoretical basis and methodologies). Two critical manipulations were added to the main task: (1) in some trials, two of the items of the memory array shared the same color (grouped by color similarity), and (2) three different endogenous retrocue conditions with variable delays were included. As previous literature indicates that the benefit of perceptual grouping occurs at the encoding stage of the VWM (Li et al.,
2018), we expect different effects depending on the delay of the retrocue. In the short-retrocue condition, the retrocue appears while the iconic-memory trace of the array persists (Becker et al.,
2000; Gegenfurtner & Sperling,
1993; Sperling,
1960). Thus, we expect similar facilitation in the change detection task for any cued item, regardless of whether it was or not previously grouped, as all the information of the memory array will still be available independently of the attentional resources allocated to each item. In the long-retrocue condition, the retrocue appears after the iconic memory trace has vanished and the array to be remembered is already encoded in the VWM. If grouping not only organizes the visual scene but also induces an attentional encoding bias towards grouped items, then we expect the retrocue to have different effects depending on the items cued. In this case, grouped items will be better encoded and less prone to decay over time as more attentional resources are allocated to them. This would result in the cue being more effective for the grouped items.
In Experiment 2, participants were instructed to ignore the items that shared the same color (grouped items), as they would not be probed in the CDT. This allows us to test whether an encoding bias towards grouped items can be counteracted by top-down voluntary processes that filter out irrelevant information according to the goals of the task or whether, on the contrary, grouped items can capture attention irrespective of the task demands. The rationale for this experiment was twofold. First, according to our encoding bias hypothesis, it would be interesting to address how the task demands could affect the capacity of the grouped items to bias the encoding of non-grouped items and, therefore, memory performance. Second, there is conflicting evidence in the previous literature regarding whether perceptual grouping enhances or hinders the inhibition of irrelevant information. On one side, Kimchi et al., (
2007,
2016) found that perceptual grouping can capture attention irrespective of task demands, leading to full processing of irrelevant items and hindering the inhibition of non-relevant information. A similar result was obtained by Zupan and Watson (
2020) who found that perceptual grouping reduced the number of distractors that can be inhibited. On the other hand, Allon et al. (
2019) showed that grouping by spatial proximity and the presence of illusory objects (another form of perceptual organization) improved filtering performance when the grouped items acted as distractors. However, given that different grouping principles are thought to have different processing demands (Driver et al.,
2001; Kimchi & Razpurker-Apfeld,
2004) and participants are forced to attend and process the grouped stimuli to filter them out (as grouping is the key feature that distinguishes relevant from irrelevant items), it is not clear whether the similarity cues employed in the present study will lead to a better or worse filtering performance.
To sum up, in Experiment 1, we expect to find: (1) a general improvement in change detection accuracy in trials in which part of the items of the memory array are grouped through color similarity. (2) According to the encoding bias hypothesis, we expect that this improvement will only occur when the items probed are previously grouped in the memory array, and no benefit or even a worsening effect will appear when the probes are non-grouped items of a grouped memory array. 3) We also hypothesize that the non-grouped items will be more affected by the delay of the retrocue, due to the worse encoding and the increased susceptibility to decay over time. In Experiment 2 we hypothesize that, according to previous results in space-based presentations (Allon et al.,
2019), participants will be able to counteract the attentional capture generated by grouped items and filter them out as they are irrelevant for the task. This will lead to an opposite pattern of results compared to that of Experiment 1, with better performance in change detection in those arrays that contain grouped elements (4 relevant elements) compared with arrays that do not contain grouped elements (6 relevant elements).
General discussion
The purpose of the current study was to examine the effects of perceptual grouping (color similarity) over the VWM performance, and the mechanisms behind this performance improvement (e.g., Gao et al.,
2016; Li et al.,
2018; Peterson & Berryhill,
2013; Peterson et al.,
2015; Woodman et al.,
2003; Zhang et al.,
2016).
The results from Experiment 1 support the encoding bias hypothesis, at least based on three convergent outcomes. First of all, although there is a general benefit in change detection when some items can be grouped in the memory array, a closer look at this effect shows that this benefit only appeared for GP and turned into a detrimental effect when NGP were tested (see Fig.
2 left). If the presence of grouping items only affects the memory task by allowing more efficient processing of the information to be stored without any effect on the attentional resources allocated to each item, then we should expect a similar benefit (and similar performance) for both grouped and non-grouped probes (Corbett,
2017; Nassar et al.,
2018). Second, the effect of the retrocue conditions differed between GP and NGP. Particularly, these differences progressively increased as the cue presentation interval became longer (see Fig.
2 right). This effect could be explained by the weaker encoding of non-grouped items and the increased decay rate of the memory trace over time (Barrouillet & Camos,
2012) or, alternatively, by a lower probability of non-grouped items to be encoded due to the lower attentional resources allocated to them (Anderson et al.,
2013; Zhang & Luck,
2008). If the grouping effect only affects the organization of the information without biasing attentional, we should expect a similar effect of the different retrocue latencies for all the items in the memory array. Third, the presence of grouped items in the memory array (GA trials) did not improve change detection performance for NGP compared to NGA trials (see Fig.
2 left) as would be expected if perceptual grouping only organized the visual information without biasing the encoding of the items to be stored. Particularly, based on a pure “chunking” hypothesis, grouped items should be treated as a single object (reducing the number of effective items to be memorized to five) and, therefore, we would expect a better overall performance in trials containing grouped items (GA) compared to trials without grouped items (NGA). Instead, we found a mixed pattern of results depending on the retrocue condition. In long-retrocue trials, the presence of grouped elements in the memory array worsened change detection performance for NGP compared to arrays without grouped elements, a decrease in performance that did not appear in short-retrocue trials (see Fig.
2 left and Fig.
3). This pattern of results could be explained by a worse encoding of non-grouped items when other grouped items are present. In short-retrocue trials, even if the non-grouped items are poorly encoded due to the presence of grouped items, the aid of the retrocue presented when the iconic memory trace is still available helps to maintain the performance level. However, in long-retrocue trials, the poorer encoding and higher sensitivity to decay over time of the non-grouped elements caused by the presence of grouped items lead to a performance drop compared to trials in which no grouped elements appear. Surprisingly, this drop in performance did not appear in the no-retrocue condition, a result that we expected to find according to the encoding bias hypothesis (even to a greater extent than in long-retrocue trials). One possible account for this absence of a significant effect in the no-retrocue condition could be a floor effect in change detection performance. This explanation is supported by the fact that a similar minimum performance was found in all the analyses conducted in both experiments.
Taken together, the benefit in VWM performance seen in Experiment 1 seems to be derived from a two-step process in which grouping cues are processed pre-attentionally in the early stages of the visual stream, and impose a particular organization to the scene. This organization, automatically directs the attention (and attentional resources) to the grouped items due to an automatic feature-based attentional capture, leaving fewer resources for processing the rest of the information (Edward Awh & Jonides,
2001; Heuer & Schubö,
2016; Oberauer,
2002; Treisman,
1982; Vecera,
1994). This leads to an improvement for the grouped stimuli, with no performance boost or even a detrimental effect for non-grouped items.
In Experiment 2, the main objective was to explore whether the encoding bias found in Experiment 1 could be voluntarily counteracted by the current task goals. To this end, we performed the same CDT, but this time, the participants were instructed to ignore grouped items as they were irrelevant to the task. Accordingly, if the participants were able to filter out the grouped items we should find an opposite pattern of results compared to Experiment 1, with a better performance in those trials in which two grouped items were part of the memory array (Allon et al.,
2019; Sawaki & Luck,
2010,
2011). This opposite pattern of results is what we found in Experiment 2. Particularly, the change detection performance increased in GA trials. However, a closer look at the results revealed that this benefit was only significant when no retrocue was presented. A likely explanation for this is that the presence of a cue in both short- and long-retrocue conditions helped NGA trials to maintain a similar performance in the CDT by signaling the target before it vanishes from VWM (Heuer & Schubö,
2016). On the other hand, when no retrocue was available GA trials outperformed NGA trials as participants were able to filter out the grouped items as irrelevant (see Fig.
3). Interestingly, when grouped items became irrelevant due to the task goals, the different retrocue conditions (short and long) were equally effective regardless of the presence of grouped elements in the array. This contrasts with Experiment 1, where long retrocues were less effective for NGP in GA (see Fig.
2 right). Taken together, the evidence from Experiment 2 supports the ability of the participants to voluntarily override the attentional capture caused by the presence of grouped items according to the task goals. This result is congruent with those found with other grouping principles in similar tasks (Allon et al.,
2019), and contrasts with the effects found in time-based presentations (Zupan & Watson,
2020).
Sawaki and Luck (
2010) proposed a model of how salient singletons (i.e., stimuli that contain unique feature values) capture attention in a stimulus-driven manner and the degree to which the top-down mechanism can attenuate or modulated this capture that is fully congruent with the results of the present study. In their model, the authors posit that salient items generate an attentional capture signal that, in the absence of top-down control, automatically attracts attention (the bottom-up saliency hypothesis). This automatic deployment can be avoided by a top-down active suppression process that is contingent on the imposed task goals (the contingent involuntary orienting hypothesis). This account can explain the results found in Experiment 1 (through an automatic attentional capture provoked by the salience of previously grouped elements) and the opposite pattern found in Experiment 2 (due to the active suppression imposed by the task instructions).
However, even though our results support the encoding bias hypothesis derived from the attentional capture generated by grouped items, we cannot rule out that the two mechanisms were working together at different stages of processing. It is feasible to consider the early organization of the visual scene as a “chunking” process in which: 1) items that share a common feature are treated as a single object, but 2) also lead to a feature-based attentional capture that biases the encoding of information into VWM.
Finally, although our study was not explicitly designed to investigate the limits of information storing and maintenance in VWM, the results of Experiment 1 are consistent with a flexible resources account of the working memory capacity and the role of attention in the VWM performance. In a recent study, Emrich et al. (
2017) proposed a model of VWM limitations in which performance was determined by the proportion of attentional resources allocated to the items during encoding. This account offers a good explanation of both, the greater performance for GP and the differences in retrocue effectiveness between GP and NGP as retrocue latencies become larger. Specifically, the differences in change detection performance between grouped and non-grouped items increased with retrocue latency, indicating that both memory representations behave differently over time, a result that can be accounted from a flexible resources perspective, in which the quality and resolution of memory representations depend on the degree of attentional resources allocation (see also Bays & Husain,
2008; Brady et al.,
2011,
2016). The results of Experiment 2 were also compatible with the attentional resources account, but given that grouped items were never probed, we are not able to discern whether these items received a lower amount of attentional resources or whether they were simply ignored and not encoded at all (which would lead to chance level performance). Conversely, from a discrete resources point of view, the storage of information is an all-or-none process, that either creates a representation at a fixed level of detail or no representation at all (Zhang & Luck,
2008). According to this model, once an item enters VWM, the level of precision and the strength of the memory trace should be the same for all the information stored, so we would expect to find similar differences between grouped and non-grouped probes in long- and no-retrocue trials.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.