Visual working memory (VWM) allows us to temporarily store and process relevant information from the visual world across temporary interruptions such as saccades. As such, it supports most cognitive tasks, but it is limited in capacity. Behavioral estimates of VWM capacity, defined here as the number of item representations stored simultaneously, converge on an limit of ~4 items (Alvarez & Cavanagh, 2004; Awh, Barton, & Vogel, 2007; Cowan, 2001; Luck & Vogel, 1997). Behavioral estimates of this limit have converged with neural estimates, as measured by an event-related potential component termed the contralateral delay-related activity (CDA; Vogel & Machizawa, 2004) and by functional magnetic resonance imaging (fMRI) data (Todd & Marois, 2004; Xu & Chun, 2006). In both cases, the neural signature amplitude increases with set size and asymptotes at an individual’s VWM capacity limit (for fMRI, see Todd & Marois, 2005; for the CDA, see Vogel & Machizawa, 2004).

These apparently biological constraints on VWM capacity prompt the following question: Can the storage of visual information within VWM be optimized by grouping cues that enhance perception? One relevant observation is that Gestalt principles of grouping facilitate visual perception (Wertheimer, 1924/1950), and some evidence has shown that they may also benefit VWM. Gestalt principles make grouped objects appear to “belong together” (Rock, 1986). Among the various types of Gestalt groupings, three are particularly relevant here: proximity, uniform connectedness, and similarity. Proximity refers to grouping of objects in physical space (Wertheimer, 1924/1950), uniform connectedness groups physically linked features into a single object (Palmer & Rock, 1994), and similarity refers to grouping based on repetition of features such as color (Wertheimer, 1924/1950). A large literature has documented the effects of Gestalt grouping on visual perception. Several key findings are worth reviewing before returning to VWM. First, the processing of Gestalt grouping cues is thought to occur preattentively (Duncan, 1984; Duncan & Humphreys, 1989; Kahneman & Treisman, 1984; Moore & Egeth, 1997; Neisser, 1967; but see also Ben-Av, Sagi, & Braun, 1992; Mack & Rock, 1998; Mack, Tang, Tuma, Kahn, & Rock, 1992). During this preattentive stage, the visual field is divided into discrete objects on the basis of Gestalt principles (Duncan, 1984, Neisser, 1967). Support for this perspective stems from perceptual judgments of grouped elements being made in the absence of attention (Driver, Davis, Russell, Turatto, & Freeman, 2001; Lamy, Segal, & Ruderman, 2006; Moore & Egeth, 1997; Russell & Driver, 2005). For example, perceptual discriminations remain accurate when stimulus arrays can be grouped by similarity, even during conditions of inattention (Moore & Egeth, 1997). As such, grouping appears to automatically facilitate visual perception.

Secondly, although visual arrays incorporating Gestalt grouping principles facilitate perceptual task performance, the degree of improvement varies. One robust finding is that discrimination is facilitated by proximity more than by similarity (Ben-Av & Sagi, 1995; Han, Humphreys, & Chen, 1999; Quinlan & Wilton, 1998); see Table 1. In contrast, similarity benefits emerge when uniform connectedness is included (Han et al., 1999). Other findings have shown that combining similarity and proximity benefits performance additively (Kubovy & van den Berg, 2008). Thus, it is clear that individual Gestalt principles are not equivalent, and a systematic hierarchy including an evaluation of Cue × Cue interactions is lacking.

Table 1 Studies investigating the impact of several Gestalt principles of grouping on perception

Returning to VWM, it is obvious that visual perception precedes VWM. Thus, it would seem logical that VWM should benefit from Gestalt principles. Several studies have reported improved VWM performance in grouped versus ungrouped conditions (e.g., Woodman, Vecera, & Luck, 2003; Xu, 2002, 2006; Xu & Chun, 2007). Other VWM studies have revealed the importance of perceptual organization by manipulating spatial configuration (e.g., Delvenne & Bruyer, 2006; Gmeindl, Nelson, Wiggin, & Reuter-Lorenz, 2011; Hollingworth, 2007; Jiang, Chun, & Olson, 2004; Jiang, Olson, & Chun, 2000; Rossi-Arnaud, Pieroni, & Baddeley, 2006; Treisman & Zhang, 2006), without explicitly investigating the importance of Gestalt principles.

Among the VWM studies testing Gestalt principles, a change detection VWM task showed that connecting two stimuli (set size = 6) improved accuracy by 6 %, and grouping by proximity improved performance by 12 % (Woodman et al., 2003). Similarly, VWM performance (set size = 3) was higher for stimuli grouped by common regions than for ungrouped stimuli (Xu & Chun, 2007). Furthermore, parametrically varying both connectedness and proximity between two features (e.g., color and orientation) modulates the VWM grouping benefit, because monitoring two features rather than one impairs accuracy increasingly with greater distance between the two features (Xu, 2002, 2006). In short, several Gestalt principles facilitate VWM. These findings suggest that other Gestalt principles may also benefit VWM to varying degrees.

Here, we tested whether the Gestalt grouping principle of similarity facilitates VWM. We selected similarity because it can be an incidental component of the visual arrays used in experiments examining VWM, because they are often composed of repeated stimuli. As such, discovering whether the presence of similarity within arrays serves to optimize processing within VWM is important and relevant to current theoretical debates regarding the structure of VWM (see Alvarez & Cavanagh, 2004; Awh et al., 2007; Bays, Catalao, & Husain, 2009; Bays & Husain, 2008; Brady, Konkle, & Alvarez, 2011; Zhang & Luck, 2008).

Experiment 1

Here, we manipulated grouping and set size to determine whether similarity benefited VWM performance. The stimulus arrays (three, four, or six items) were grouped by similarity of color or were ungrouped; see Fig. 1a. We predicted that if similarity follows precedent, it would facilitate performance in a VWM change detection task. The parametric manipulation of set size allowed us to investigate whether any grouping benefit remained constant or interacted with load.

Fig. 1
figure 1

(a) Experiment 1 stimuli and task sequences. (b) Experiment 2 stimuli and sequences. In both experiments, participants viewed a fixation cross, followed by a memory array including one of the experimental conditions. Following the stimulus presentation, a maintenance period occurred. Finally, a single probe stimulus appeared in one of the previously presented locations and remained until participants decided whether it matched the stimulus presented at that location in the memory array. In Experiment 1, participants were also prompted to make a confidence judgment on a Likert-type scale (1–6) regarding their decision. The spatial configurations of the stimuli are for illustration purposes only, and do not represent the actual spatial distances between items in the array.

Method

Participants

Ten undergraduate students participated (eight female, two male; mean age = 24.8 years). The University of Nevada Institutional Review Board approved all experimental protocols. Participants gave informed consent prior to the experiment.

Materials and stimuli

The stimuli were colored circles subtending 1.7º created in Adobe Photoshop CS5. Eight color categories were used: yellow, red, blue, green, purple, magenta, orange, and cyan. The fixation cross subtended 0.9º. The stimuli at each set size were arranged in a circular configuration with each item presented at a distance of 6º from fixation. The stimulus locations in the set size 3 (SS3) and set size 4 (SS4) conditions were counterbalanced between the six possible locations in the circular configuration, with the requirement that the stimuli must be adjacent. The locations of the grouped stimulus pairs were counterbalanced. The experiment was programmed using E-Prime (Psychology Software Tools, Pittsburgh, PA) and displayed on a 24-in. widescreen monitor with a refresh rate of 60 Hz running on a Dell Inspiron PC. The viewing distance was 57 cm from the monitor.

Procedure

A within-subjects 2 × 3 factorial design included the factors Grouping (grouped, ungrouped) and Set Size (SS3, SS4, SS6). A change detection VWM paradigm was used. Trials began with the presentation of a fixation cross (300 ms). A stimulus array (200 ms) including one of the six configurations (SS3 grouped, SS3 ungrouped, SS4 grouped, SS4 ungrouped, SS6 grouped, SS6 ungrouped) was followed by the maintenance period (1,000 ms). Next, a single probe stimulus appeared in one of the previously shown locations until a buttonpress response was made. The unprobed locations were indicated by unfilled white annuli (1.7º). Participants decided whether the probe matched the stimulus array (50 % chance). Additionally, participants made a confidence judgment using a Likert-type scale ranging from 1 (low) to 6 (high). Forty trials were presented per condition (240 total). Participants performed an articulatory suppression task throughout the experiment. Statistical analyses were conducted in SPSS, and all pairwise comparisons were Bonferroni corrected.

Results and discussion

In the following analyses we applied a 2 × 3 repeated measures analysis of variance (ANOVA) including the factors Grouping (grouped, ungrouped) and Set Size (SS3, SS4, SS6) for measures of accuracy, reaction time, confidence, and capacity [K = set size * (hit rate – false alarm rate); Cowan, 2001, adapted from Pashler, 1988]. Experiment 1 revealed a significant grouping benefit across measures of accuracy [F(1, 9) = 50.08, MSE = 0.005, p < .001, η p 2 = .85, β = .99], reaction time [F(1, 9) = 5.90, MSE = 217,774.90, p = .04, η p 2 = .40, β = .58], confidence [F(1, 9) = 53.55, MSE = 0.09, p < .001, η p 2 = .86, β = .99], and capacity (K) [F(1, 9) = 38.52, MSE = 0.45, p < .001, η p 2 = .81, β = .99]; see Fig. 2 and Table 2. Not surprisingly, a main effect of set size emerged, such that increased load hurt performance in terms of accuracy [F(2, 18) = 31.65, MSE = 0.007, p < .001, η p 2 = .78, β = .99], reaction time [F(2, 18) = 15.26, MSE = 62,181.68, p < .001, η p 2 = .63, β = .99], and confidence [F(2, 18) = 30.74, MSE = 0.24, p < .001, η p 2 = .77, β = .99]. Pairwise comparisons revealed significant decreases in performance between SS3 and SS4 (accuracy, p = .01; reaction time, p = .016; confidence, p = .006), SS4 and SS6 (accuracy, p = .02; confidence, p = .004), and SS3 and SS6 (accuracy, p = .001; reaction time, p = .003; confidence, p = .001). We found no effect of set size on capacity (p = .86). Finally, no measure revealed significant interactions between grouping and set size (accuracy, p = .53; reaction time, p = .77; capacity, p = .13; confidence, p = .32).

Fig. 2
figure 2

Experiment 1 visual working memory change detection accuracy. The x-axis shows levels of the experimental factors Set Size and Grouping. The y-axis indicates accuracy in terms of proportions of correct trials. Error bars represent the standard errors of the means for each condition.

Table 2 Experiment 1 mean (and standard deviation) values by condition

The nature of the grouping benefit was that it emerged when the probed item was one of the grouped items rather than when it was one of the ungrouped items. This was confirmed by a 2 × 3 repeated measures ANOVA evaluating probe type (previously grouped, previously ungrouped) and set size (SS3, SS4, SS6). Of primary interest here was the significant main effect of probe [accuracy: grouped = .94, ungrouped = .80, F(1, 9) = 34.87, MSE = 0.009, p < .001, η p 2 = .80, β = .99; reaction time: grouped = 1,910.39 ms, ungrouped = 2,214.60 ms, F(1, 9) = 11.21, MSE = 123,833.57, p = .009, η p 2 = .56, β = .85]. Not surprisingly, the main effect of set size also reached significance, showing decreased accuracy and increased reaction times as in the first analysis [accuracy, F(2, 18) = 21.73, MSE = 0.009, p < .001, η p 2 = .71, β = .99; reaction time, F(2, 18) = 7.58, MSE = 147,172.59, p = .004, η p 2 = .46, β = .90]. Importantly, for accuracy, we observed a significant interaction between probe type and set size [F(2, 18) = 11.99, MSE = 0.007, p < .001, η p 2 = .57, β = .98]. This interaction was driven by a greater benefit as load increased: SS3 (grouped = .97, ungrouped = .95, p = .56), SS4 (grouped = .96, ungrouped = .82, p = .009), SS6 (grouped = .90, ungrouped = .63, p = .001). In concordance with these findings, participants reported significantly higher confidence when the probed item was previously grouped (M = 5.47) than when it was previously ungrouped (M = 4.59) [t(9) = 5.82, p < .001].

Finally, we tested whether the VWM probe in grouped conditions reflected different estimates of capacity. This analysis revealed a significant main effect of probe type [F(1, 9) = 55.96, MSE = 0.47, p < .001, η p 2 = .86, β = .99], indicating that capacity estimates were significantly higher for trials in which the probed item had been grouped. No main effect of set size was apparent (p = .18). However, a significant interaction did emerge between probe and set size [F(2, 18) = 24.94, MSE = 0.442, p < .001, η p 2 = .74, β = .99]. Pairwise comparisons revealed that the interaction was driven by higher capacity estimates as load increased when the probed item was previously grouped rather than ungrouped: SS3 (grouped = 2.76, ungrouped = 2.74; p = .92), SS4 (grouped = 3.64, ungrouped = 2.62; p = .01), and SS6 arrays (grouped = 4.70, ungrouped = 1.76; p = .001).

In addition to capacity estimates based on the number of objects per condition, we examined capacity estimates based on the number of groups per condition (e.g., SS3 grouped = two groups; SS3 ungrouped = three groups); see Table 2. These analyses revealed no significant main effects of grouping (p = .09) or set size (p = .76). However, we did find a significant interaction between set size and grouping [F(2, 18) = 5.93, MSE = 0.297, p = .01, η p 2 = .40, β = .82]. The interaction was driven by higher group capacity for grouped SS6 than for ungrouped SS6 (p = .03). Pairwise comparisons also showed that group capacity was greater for grouped SS4 than for grouped SS3 arrays (p = .04). Group capacity was also higher for grouped SS6 arrays than for grouped SS3 arrays (p = .04). We observed no difference in capacity between grouped SS4 and SS6 arrays (p = 1.00). In the ungrouped conditions, no significant differences emerged between set sizes in terms of group capacity (all ps > .38).

Experiment 1 confirmed that grouping by similarity enhances VWM performance. When similarity was available, participants were more confident in their responses, suggesting that they were aware of the performance boost. The magnitude of the similarity benefit remained constant across set sizes. It is important to note that the benefit appeared to be driven by the trials in which a member of the group was probed at retrieval, implicating an encoding bias as a putative mechanism. Alternatively, because there were significant benefits with respect to capacity via grouping at larger set sizes, similarity may optimize VWM processes.

One limitation was that the spacing between grouped items remained constant, making it possible that these benefits were constrained by spatial proximity. Experiment 2 investigated the importance of spatial proximity in eliciting similarity benefits.

Experiment 2

To determine the contribution of proximity to the similarity benefit, we parametrically manipulated the proximity of grouped items. In the similarity condition, matching stimuli were next to each other. In the repetition conditions, identical stimuli were separated by one or two intervening stimuli. Additionally, we included a control condition in which no items repeated. We predicted that similarity would benefit VWM, but only when grouped items were proximal, as previous findings had indicated that proximity benefits VWM performance (e.g., Woodman et al., 2003; Xu, 2002, 2006).

Method

Participants

A group of 13 new undergraduates participated in Experiment 2 (ten females, three males; mean age = 22.5).

Materials and stimuli

The experimental protocols followed Experiment 1’s with stimulus modifications; see Fig. 1b. Six colored circles were presented in a circular array at a distance of 6º from fixation. In the similarity (S) condition two neighboring circles matched (6º apart). In the first repetition (R1) and second repetition (R2) conditions, one or two intervening stimuli separated the matched pair (9º, 12º apart). Finally, in the control (C) condition, no stimuli repeated. There were 48 trials per condition (192 total). Participants simultaneously performed an articulatory suppression task.

Results and discussion

All measures were subjected to a repeated measures ANOVA investigating the factor Condition (C, S, R1, R2); see Table 3. We found main effects of condition on accuracy [F(3, 36) = 4.21, MSE = 0.004, p = .01, η p 2 = .26, β = .82] and capacity [F(3, 36) = 3.95, MSE = 0.58, p = .02, η p 2 = .25, β = .79]; see Fig. 3 and Table 3. Pairwise comparisons revealed that accuracy and capacity in the S condition were significantly higher than in the C condition (accuracy, capacity: both ps = .01). No other pairwise comparisons reached significance (all ps > .30). We found no main effects of condition on reaction times [F(3, 36 = 0.33, p = .81] or the capacity for the number of groups [F(3, 36) = 1.38, p = .26].

Table 3 Experiment 2 mean (and standard deviation) values by condition
Fig. 3
figure 3

Experiment 2 visual working memory change detection accuracy. The x-axis shows the four conditions of the experiment. The y-axis indicates the proportions of correct trials. Error bars represent the standard errors of the means.

To clarify whether similarity benefits were due to the probed item, we conducted a 2 × 3 ANOVA to compare probe type (previously grouped, previously ungrouped) and grouping condition (S, R1, R2). We observed a main effect of probe type, with better performance when a previously grouped item was probed [accuracy, F(1, 12) = 13.04, MSE = 0.013, p = .004, η p 2 = .52, β = .91; capacity, F(1, 12) = 13.01, MSE = 1.93, p = .004, η p 2 = .52, β = .91; reaction time, F(1, 12) = 5.25, MSE = 155,995.97, p = .04, η p 2 = .30, β = .56]. However, we found no main effect of condition (accuracy, p = .27; capacity, p = .28; reaction time, p = .58), and no significant interactions (accuracy, p = .56; capacity, p = .57; reaction time, p = .06).

Experiment 2 replicated the similarity benefit observed in Experiment 1 and demonstrated the requirement for close spatial proximity. Proximal similarity benefited VWM by 9 % relative to no grouping, but intervening items eliminated the similarity benefit. This pattern extended to estimated capacity based on the number of objects; however, no benefits were observed for capacity based on the number of groups or for reaction times. As before, the similarity benefit appeared to be driven by the trials in which a previously grouped item was later probed.

General discussion

We asked whether the Gestalt principle of similarity facilitated VWM. These data established the existence of a constant performance benefit across set sizes (Exp. 1) and showed that similarity requires proximity (Exp. 2). Finally, for grouped trials, performance was higher when the probed items were members of grouped pairs during stimulus presentation, suggesting that we observed a bias toward encoding the grouped items. This observation has implications regarding VWM and the interpretation of these results.

Similarity can be included in the list of Gestalt principles that facilitate VWM—namely, proximity (Woodman et al., 2003; Xu, 2002, 2006), connectedness (Woodman et al., 2003; Xu, 2002, 2006), and common region (Xu & Chun, 2007). Similarity benefits (Exp. 1, 12 % average benefit across all set sizes; Exp. 2, 9 %) are consistent with those produced by uniform connectedness (6 %) and proximity (12 %; Woodman et al., 2003). One important distinction between the present results and those of Woodman et al. is that those previous researchers had included a cuing paradigm in their VWM change detection task. They found an encoding bias for items that could be grouped with the cued item. Even though we did not use a cue, we found that performance was superior when a previously grouped item was probed, as compared to when an ungrouped item was probed. This suggests the possibility that the similar items created an encoding bias by forming a prepackaged “chunk.” Thus, it is quite possible that the present results reflect an encoding bias for grouped items. Below we note several other interpretations.

One suggestion is that Gestalt principles of grouping reduce the neural requirements for VWM. For example, using fMRI, Xu and Chun (2007) found that grouped items were associated with lower-amplitude activations in the inferior intraparietal sulcus (IPS) during maintenance, relative to the same number of ungrouped items. Similarly, in support of a discrete-resource perspective, Anderson, Vogel, and Awh (in press) reported decreased CDA amplitudes when their stimuli formed collinear groups, as compared to random orientations. However, the overall number of grouped elements that could be stored was limited by a common discrete resource (Anderson et al., in press). Our finding that item capacity was greater in grouped conditions suggests that grouped arrays provide a storage advantage that remains subject to grouped capacity limits. The present data could therefore be interpreted through the lens of a discrete-resource perspective (e.g., Awh et al., 2007; Zhang & Luck, 2008), which would predict that items grouped via similarity require the same amount of mnemonic resources as does one item (Anderson et al., in press). In contrast, a flexible-resource perspective might interpret the present benefits as a reallocation of the available mnemonic resources to ungrouped items (Bays & Husain, 2008). Furthermore, such a reallocation of resources may be made possible via the compression of the grouped item representations in order to increase VWM efficiency (Brady, Konkle, & Alvarez, 2009). Our observation that VWM performance was superior in grouped conditions is consistent with this interpretation. However, it is a challenge to reconcile this view with the finding that performance benefits did not extend to ungrouped items in the grouped arrays.

An alternative proposal to consider is the labeled Boolean map perspective (Huang & Pashler, 2007). According to this perspective, conscious access to visual features from a given dimension occurs serially (Huang, Treisman, & Pashler, 2007). However, similarity permits items sharing a feature (e.g., two blue circles) to be mapped onto multiple locations but to be accessed simultaneously. This perspective also postulates that the structure of VWM may be composed of labeled Boolean maps. The number of Boolean maps that can be simultaneously maintained, rather than the number of discrete items, per se, might contribute to limited VWM capacity (Huang, 2010). According to this perspective, items grouped via similarity would require a single Boolean map and reduce the number of maps that were requisite to maintain the stimulus.

In Experiment 1, we found that similarity benefited capacity based on the number of groups. This is consistent with a Boolean map perspective. Specifically, we found an interaction between grouping and set size suggesting that similarity increasingly benefited performance when the number of groups exceeded the item capacity limits (i.e., in the SS6 arrays). Other findings were less consistent. In Experiment 2, the similarity benefit was contingent on proximity. In contrast, Boolean map theory would predict no difference in the similarity benefits for close and far items, because in both cases a single Boolean map should represent the items, regardless of spatial location (Huang & Pashler, 2007; Huang et al., 2007).

These findings present the potential benefits of Gestalt grouping cues for VWM. However, a “flip side” to the automaticity of grouping cues may limit their use as a strategic aid. When a VWM task varies the spatial configuration of stimuli between encoding and retrieval, performance suffers (Jiang et al., 2000). For example, in change detection tasks using complex visual arrays, the presentation of a single probe item, or of a spatially reconfigured probe array, at test leads to worse performance than does the reappearance of the original array (Jiang et al., 2000). Additionally, VWM accuracy drops even when task-irrelevant features (e.g., the orientation of elongated axes) change (Jiang et al., 2004). These data support the view that VWM uses spatial configurations as an organizing principle. The present data suggest that certain cues, namely similarity, rely on spatial proximity. Consequently, at least for similarity, the larger contextual organization may be built on a series of local contextual bindings. One way to test this would be to follow the lead of Jiang et al. (2000) by varying the relationship between the encoding and retrieval arrays, but also to include subsets of the original array that were either close or far apart.

Conclusion

At least four Gestalt principles facilitate VWM performance: Common region, uniform connectedness, proximity, and similarity all benefit VWM performance (Woodman et al., 2003; Xu, 2002, 2006; Xu & Chun, 2007). It remains unclear whether other Gestalt principles (e.g., continuity, closure, good continuation, or common fate) would benefit VWM. Future experiments examining other Gestalt principles and their interactions will elucidate the extent to which perceptual grouping can benefit VWM. Costs associated with these principles may limit their effectiveness.