As we make eye movements to explore our world, visual working memory maintains a stable representation of several objects across these saccades (Carlson-Radvansky, 1999; Carlson-Radvansky, & Irwin, 1995; Irwin, 1991, 1992, 1996), allowing us to behave adaptively in our surroundings (Hollingworth & Luck, 2009). Previous research has shown that humans have the ability to temporarily store an average of three to four object representations in visual working memory across brief retention intervals (Irwin & Andrews, 1996; Luck & Vogel, 1997; Vogel, Woodman, & Luck, 2001). Nevertheless, how we maintain these representations is still unclear. Models of visual working memory have proposed that spatial mechanisms play an important role in maintaining object representations. Specifically, Logie and Baddeley (Baddeley & Logie, 1999; Logie, 1995) proposed that object and spatial working memory are distinct components that work in a cooperative manner, with the spatial component enabling rehearsal of both spatial and object representations while information is maintained in a passive visual buffer. Guided by these previous theoretical proposals, the goal of the present study was to test the hypothesis that overt and covert mechanisms of visual–spatial attention are used to help maintain object representations in visual working memory.

The basic idea that visual working memory consists of separate mechanisms that specialize in handling spatial and object information is consistent with evidence from dual-task paradigms (Baddeley & Lieberman, 1980; Baddeley & Logie, 1999; Logie & Marchetti, 1991). Moreover, a large body of neurophysiological evidence has supported this idea. Specifically, a leading hypothesis is that spatial and object working memory functions are segregated in dorsal and ventral pathways for working memory, much as occurs for perception (Goldman-Rakic, 1996; Jonides & Smith, 1997; Ungerleider, Courtney, & Haxby, 1998). This hypothesis has received support from several sources. First, dorsal- and ventral-stream sensory processing areas project differentially to separate memory-related areas of the prefrontal cortex (Ungerleider & Mishkin, 1982). Second, single-unit recording studies in monkeys have found that neurons in the principal sulcus of dorsolateral prefrontal cortex generally maintain information about spatial locations (e.g., Funahashi, Bruce, & Goldman-Rakic, 1989), whereas neurons in the inferior convexity of dorsolateral prefrontal cortex tend to maintain information about object identity (e.g., Wilson, O’Scalaidhe, & Goldman-Rakic, 1993). Third, neuroimaging studies in humans have found that object and spatial working memory tasks activate different networks of brain areas (Courtney, Ungerleider, Keil, & Haxby, 1996; Jonides et al., 1993). Thus, this body of evidence is consistent with the proposals that distinct mechanisms maintain object representations and handle spatial information in working memory. However, do the neurophysiological data rule out the idea that these mechanisms might interact to maintain object representations?

Findings from several studies appear to provide neurophysiological support for the hypothesis that spatial and object mechanisms in visual working memory interact to store and maintain object representations across time. Rainer, Asaad, and Miller (1998) found that prefrontal neurons that code for spatial location were interspersed with neurons that coded for object identity, indicating a lack of anatomical segregation of spatial and object information. In addition, they found that a majority of the neurons that they sampled in prefrontal cortex provided information about both an object’s identity and its spatial location during memory retention intervals (see also Rao, Rainer, & Miller, 1997). Similarly, some neuroimaging studies have found frontal areas that appear to be activated by both spatial and object information. For example, Postle and D’Esposito (1999) found that spatial and object versions of a visual working memory task activated the same areas in prefrontal cortex, although different posterior areas were active during the two different tasks. More recently, Vogel and colleagues (Vogel & Machizawa, 2004; Vogel, McCollough, & Machizawa, 2005) have shown that when people are remembering objects during the short retention intervals of a change-detection task, a contralateral event-related potential is observed while these objects are maintained. The lateralization of this signal indicates an inherent spatial specificity of the process of maintaining these object representations. Thus, although the behavioral and physiological data tend to support the notion of separate object and spatial working memory systems, a number of results appear to support the proposal that these distinct mechanisms interact to maintain representations of objects. The next question that we consider is whether mechanisms of visual–spatial attention are the source of these interactions.

Abundant evidence has revealed that covert and overt mechanisms of visual attention are used to help maintain representations of spatial locations. Eye and head movements are considered measures of overt attentional selection (e.g., shifting the high-resolution fovea to an important part of the visual scene), whereas covert attentional selection is the enhanced processing of certain items to the detriment of others in the absence of movements of the eyes or body (Posner, 1980). To examine the role of attention mechanisms in maintaining representation of spatial locations in visual working memory, Awh and colleagues (Awh & Jonides, 2001; Awh, Jonides, & Reuter-Lorenz, 1998) presented probe stimuli during the retention interval of a spatial working memory task. This allowed them to test the hypothesis that observers direct their spatial attention to the remembered locations to aid memory maintenance. They found that probe detection reaction times (RTs) were faster when the probes appeared at the remembered location rather than elsewhere in the display. Moreover, when the probe stimulus was presented at the original memorized location, spatial memory performance was significantly better than when the memorized and probe locations did not overlap. These experiments show that visual attention is deployed to help maintain spatial information in working memory.

Another line of work has shown that irrelevant eye movements (i.e., overt shifts of attention) interfere with the ability to remember spatial locations (Pearson & Sahraie, 2003; Lawrence, Myerson, & Abrams, 2004; Smyth, 1996). Baddeley (1986) and colleagues examined whether irrelevant eye movements impaired the maintenance of spatial information in working memory. They found that irrelevant eye movements interfered with memory for a specific spatial location relative to when observers were not induced to make irrelevant eye movements (see also Postle, Idzikowski, Della Sala, Logie, & Baddeley, 2006). One explanation for these findings is that eye movements create spatial representations that interfere with the memory representations of locations that the participants are trying to maintain. Alternatively, it is possible that moving gaze away from a remembered location prevents the maintenance of the location information because attention cannot stay locked there. Regardless of which account best explains these effects on working memory for spatial location, it is clear that overt and covert mechanisms of visual–spatial attention play an important role in the storage of spatial locations in working memory, either by aiding their maintenance or by preventing interference from new information.

In sum, much is known about the relationship between mechanisms of visual–spatial attention and the visual working memory storage of spatial locations, but the theoretical proposals that visual–spatial selection contributes to the storage of objects in visual working memory have not been as thoroughly tested. The goal of the present study was to test some specific predictions that grow out of the theoretical proposals that we have described. For example, do goal-driven eye movements actually aid in the maintenance of object representations? If this is the case, we predict that observers should spontaneously make eye movements to the locations of previously presented objects while these object representations are held in visual working memory. In addition, preventing eye movements during memory retention intervals should decrease change-detection accuracy. A similar logic has been applied to eye-movement studies of long-term memory (e.g., Spivey & Geng, 2001).

In Experiments 1 and 2, we examined overt measures of visual–spatial attention by tracking saccadic eye movements during the memory retention intervals of an object change-detection task. Experiment 3 tested the hypothesis that drawing spatial attention away from the locations of objects being maintained would disrupt object maintenance. Thus, for this study we used a multipronged approach to test the proposal that visual–spatial attention mechanisms are involved in the maintenance of object representations in visual working memory.

Experiment 1

To test the hypothesis that visual–spatial attention mechanisms participate in the maintenance of object representations in visual working memory, we had participants perform a change-detection task while we measured where their gaze was directed. As is shown in Fig. 1a, the participants in Experiment 1A were required to remember simple colored squares while we compared their change-detection accuracies across two conditions. In the fixation condition, participants were required to fixate a central point throughout each trial and to remember one, three, or six colored stimuli shown in a memory array presented for 500 ms. After a 5,000-ms retention interval a test array was presented, and participants reported with a buttonpress whether a change had occurred in one of the items. In the eye-movement condition, participants performed exactly the same task, but instead they were instructed that they did not need to fixate the cross in the middle of the screen and should move their eyes as they naturally would. Experiment 1B was identical to Experiment 1A, except that a different group of participants performed the change-detection task only under the eye-movement condition. This allowed us to sample more trials from each individual than was possible in Experiment 1A. With this larger sample, we focused on the behavior of the observers in the critical eye-movement condition and asked specific questions about the consequences of the fixations during the retention intervals on change-detection accuracy.

Fig. 1
figure 1

Example stimuli and manual-response results from Experiment 1. (a) In the eye-movement condition, participants were free to move their eyes normally. In the fixation condition, they were instructed to keep their gaze on the center cross. (b) Mean accuracies (percentages correct) for Experiment 1A, as a function of condition and set size. Error bars show 95 % within-subjects confidence intervals (Cousineau, 2005) in this and subsequent figures

The design of Experiments 1A and 1B allowed us to address three questions about the relationship between the deployment of overt visual attention (i.e., patterns of fixation) and visual working memory maintenance. These questions progressed from general empirical observations of eye-movement behavior to more detailed examinations of the impact of that behavior.

First, by analyzing the pattern of fixations during the eye-movement conditions of Experiments 1A and 1B, we were able to test the hypothesis that participants overtly selected the spatial locations that were occupied by the objects in the memory array during the maintenance period (i.e., when the objects were no longer visible). If movements of the eyes are sensitive to the deployment of visual attention mechanisms to perform rehearsal of the object representations, then we should find that during the retention intervals, observers would make eye movements to fixate the locations that the memory items had previously occupied. If we were to observe such behavior, we could then ask more specific questions about its impact.

Second, we tested the hypothesis that fixating the locations of the objects during the change-detection task would improve the accuracy of the memory representations. Specifically, if overt mechanisms of visual attention participate in the active rehearsal of objects, we should observe that participants in Experiment 1A would be more accurate at detecting changes between the memory and test arrays when they were free to move their eyes (i.e., the eye-movement condition) as compared to when the task conditions prevented this (i.e., the fixation condition). Even if we found that people moved their eyes during the retention interval, there might still be no link between these measures of overt visual–spatial attention and the maintenance of object representations in visual working memory. If so, we expected that performance would not be better when observers were allowed to freely saccade during the retention interval, as compared to when fixating centrally. Indeed, a plausible competing prediction is that change-detection performance would be better in the fixation condition than when eye movements were allowed, because the visual transients caused by saccades might interfere with visual working memory maintenance. For example, movements of the eyes shift the retinotopic reference frame away from the allocentric reference frame, which might interfere with the maintenance of the object representations. Given this hypothesis, participants might spontaneously avoid making eye movements during the retention interval, but when they did make eye movements, performance would be worse than when the instructions required fixation. This hypothesis is a plausible contender because when people are remembering spatial locations, eye movements interfere with the maintenance of that visual–spatial feature (Lawrence et al., 2004; Pearson & Sahraie, 2003; Smyth, 1996).

Third, in Experiment 1B we tested the more specific prediction that if overt selection aids maintenance, we should find that change detection was superior when a given object was fixated during the retention interval. That is, would participants be more accurate at detecting a change of an object between the memory and test arrays if they had fixated its location during the blank retention interval? If deployments of overt visual attention to previously occupied object locations improved the retention of the information in visual working memory, we should observe an item-specific benefit at the individual-object level of analysis.

Method

Participants

A group of 26 volunteers (18–32 years of age) from Vanderbilt University and the surrounding community participated in both the eye-movement and fixation conditions of Experiment 1A. A different group of 16 individuals participated in Experiment 1B. All of the participants reported having normal or corrected-to-normal visual acuity and normal color vision. They provided informed consent and were compensated either monetarily or with course credit.

Stimuli

The stimuli consisted of solid-colored squares (each 1.2° × 1.2°) presented on a gray background (48.5 cd/m2) and centered approximately 7.5° from fixation (a black plus sign, 0.3° × 0.3°, < 0.01 cd/m2) with a minimum interitem spacing of 7.5°. On trials of set sizes 1 and 3, stimuli were randomly placed (sampled from a square distribution of the six possible stimulus locations with 7.5° interitem spacing) along the edge of a virtual annulus surrounding the center fixation. On trials with a set size of 6, the stimuli were distributed across the six possible locations. The color of each square was randomly selected with replacement from a set of seven colors: white (95.0 cd/m2), black ( < 0.01 cd/m2), red (chromaticity coordinates of the CIE 1931 color space: x = .633, y = .334), blue (x = .144, y = .065), green (x = .278, y = .614), yellow (x = .420, y = .503), and magenta (x = .291, y = .146). The three different set sizes were randomly interleaved, and the one, three, or six squares were presented in both the memory and test arrays. The articulatory-suppression stimuli were two white numbers (95.0 cd/m2, randomly selected from the digits 1 to 9 without replacement) centered 3.4° above the black fixation point (0.3° × 0.3°, < 0.1 cd/m2), with one number being centered 1.7° to the right and one the same distance to the left of the horizontal meridian.

Apparatus

Eye movements were measured using an EyeLink II infrared eyetracker (SR Research Ltd., Ontario, Canada) with eye position sampled at a rate of 250 Hz. We used a velocity criterion algorithm to automatically detect saccades (35°/s) that had been created by SR Research to be used with the EyeLink II tracker. Participants made all responses using two buttons on a hand-held gamepad.

Procedure

In Experiment 1A, each participant was fitted with the head-mounted eyetracker cameras and given the instructions for the condition that each would perform in that session. During the fixation condition, we instructed participants to keep their gaze on the fixation point and to move their eyes as little as possible while performing the task. In the eye-movement condition, participants were told to move their eyes naturally while performing the task. All observers performed the fixation and eye-movement conditions during different sessions, with the order of the conditions counterbalanced across participants. Each condition consisted of 60 experimental trials and one 12-trial practice block. The researcher sat adjacent to the participant, although out of view, to ensure that participants were engaging in the articulatory-suppression task on each trial and that the eyetracker was continuously calibrated. Experiment 1B was similar to Experiment 1A, except that observers only participated in the eye-movement condition and we increased the number of trials to 120. The concurrent articulatory-suppression task was required to prevent participants from verbally recoding the object identities and storing them in verbal working memory.

Once the eyes were calibrated and drift correction was performed, each trial began with the articulatory-suppression task (repeating approximately three to four numbers per second) as soon as the numbers appeared. The digits were presented for 500 ms with a 1,500-ms stimulus onset asynchrony (SOA) between the articulatory-suppression stimuli and the memory array. The memory array was then presented for 500 ms, followed by a 5,000-ms blank retention interval, and then a 2,000-ms test array presentation. The set size of the memory array varied randomly across trials in the session. The test array remained visible for 2,000 ms or until the observer made the buttonpress response on the gamepad indicating whether the test array was the same as or different from the sample array. When the color of an item changed, it always changed to a color not present in the initial memory array (i.e., colors were sampled without replacement). The probability of a color change of one of the objects was 50 %, and participants were instructed to remember only the color of the objects because their spatial locations would never change. These instructions stressed the accuracy of the manual change-detection response, not its speed.

Data analysis

Mean change-detection accuracy was analyzed using an analysis of variance (ANOVA) with the within-subjects factors Condition (fixation or eye movement) and Set Size (one, three, or six items). Eye-movement data were analyzed using custom MATLAB scripts. An eye movement counted as being directed to the object location if it fell within a 2.0° imaginary window centered on the location of an object in the memory-sample array. This allowed us to measure the number of saccades made to the objects during each trial and to determine which object locations were fixated. For analyses focusing on the maintenance period, we only counted saccades made during the 5,000-ms retention interval, when no physical stimuli were being presented. Data from the participants were discarded from the analysis if the number of saccades made during the fixation condition was greater than two standard deviations above the mean number of saccades made throughout the experiment. This criterion led to the replacement of one participant from Experiment 1A. For the Experiment 1B analyses, we focused primarily on the maintenance period. One observer who did not saccade during the retention interval was removed from the analysis, as well as a second observer who withdrew from the experiment before completing all of the trials because of boredom and fatigue. However, removal of these outliers was not necessary to obtain the pattern of results that we observed. When we entered all available data into the statistical analyses, the same results were obtained.

Results

The memory accuracies from the fixation and eye-movement conditions of Experiment 1A are summarized in Fig. 1b. As expected, change-detection accuracy decreased as the memory set size increased. Of primary importance, accuracy was consistently higher in the eye-movement condition (94 % correct, collapsed across set sizes) than in the fixation condition (92.3 % correct). These findings resulted in significant main effects of condition, F(1, 25) = 9.85, p < .05, and set size, F(2, 50) = 42.95, p < .01. However, the interaction of these factors was not significant, F < 1.0, p = .46. These results were as would be predicted if being free to devote the spatial selection mechanism of the fovea to object locations enhances the accurate maintenance of the objects.Footnote 1

Our first observation while examining the eye-movement data was that when participants were free to make eye movements during the retention interval of the change-detection task in Experiments 1A and 1B, they spontaneously fixated the spatial locations that had been occupied by the objects in the memory array. This is illustrated with an example trial in Fig. 2a. To quantify this behavior, we measured the numbers of saccades made during the retention interval to object locations and to other locations on the screen. We found that 58.8 % of the saccades were made to the object locations during the memory retention intervals using our conservative, 2° measurement window centered on the objects (which spanned 1.2° × 1.2°). As is shown in Table 1, approximately 1–2 objects were fixated during the 5,000-ms retention interval of each trial. This eye-movement behavior was characterized by saccades to object locations interspersed with saccades back to the fixation point and to locations in the direction of the object locations, but outside our measurement windows. Note at that after the saccades to other onscreen locations, the eyes returned to the same couple of object locations during the retention interval. This observation demonstrates that overt eye movements do visit the previous locations of objects during a working memory task, similar to the natural eye-movement behavior reported by Spivey and Geng (2001) in a long-term memory task.

Fig. 2
figure 2

Eye movements from an example trial and the frequency histograms of the saccades made during the trials. (a) An example of eye-movement traces during the retention interval from a single participant. In this example trial, the participant did not fixate the item location that changed (the magenta to white square). (b) Latency histograms indicating the total numbers of saccades made to object as compared to nonobject locations, averaged across participants, during the retention interval for each set size in Experiment 1B

Table 1 Eyetracking metrics measured during the 5,000-ms retention interval of Experiments 1 and 2

Figure 2b shows the latency histogram of saccades to the object locations and to nonobject locations during the memory retention intervals in Experiment 1B. We wanted to be sure that the saccades that we interpreted as being due to maintenance were not simply due to participants fixating the locations of the objects immediately before the test array, as this might indicate that such eye movements were in preparation for the comparison of items in the test array to those in memory. Alternatively, the saccades could have occurred almost exclusively in the short interval after the offset of the memory array, as would be expected if the saccades that we observed during the retention interval were residual effects of encoding into working memory. Although there are slight increases in the number of saccades made to object locations at the beginning and end of the maintenance period, we found that fixations of the object locations occurred throughout the 5-s retention interval, consistent with the idea that these acts of overt attentional selection were performed to help maintain the object representations.

Figure 3 shows change-detection accuracy in Experiment 1B as a function of whether the item that changed was fixated during the retention interval. Saccades on the half of the trials with changes were classified as either fixating or not fixating the item that would change. The trials were then sorted accordingly. Change-detection accuracies were similar during both trial types: 79.8 % on the change-fixated trials and 80.7 % on the change-not-fixated trials. We found neither a significant main effect of trial type, F(1, 15) = 0.14, p > .7, nor an interaction of trial type and set size, F(2, 30) = 1.63, p > .2, but a significant effect of set size did emerge, F(2, 30) = 33.6, p < .01. Planned comparisons revealed that change-detection accuracy was significantly higher at set size 3 when the item that would ultimately change was fixated during the retention interval, F(1, 15) = 13.82, p < .01. These findings do not clearly support the prediction that fixations of specific objects result in an individual-item benefit. If this were the case, we should have observed a significant main effect of change-detection accuracy based on whether or not the changed item was fixated. In addition, it is unclear why such an item-specific benefit would be evident at set size 3 but not at set size 6, when visual working memory was more heavily taxed. Unlike the general benefit that we observed on change-detection accuracy when participants made eye movements, we did not see clear evidence for an item-specific benefit, an issue to which we returned in Experiment 2.

Fig. 3
figure 3

Results of the analysis examining the benefit of fixating during the retention interval the item that would change, across set sizes in Experiment 1B

Discussion

The findings of Experiment 1 supported the hypothesis that overt visual–spatial attention is used to aid the maintenance of object representations in visual working memory. Support for this hypothesis came in two forms. First, the participants were better at detecting changes in the colors of memoranda when they were allowed to make eye movements during the retention interval. Second, when participants were instructed that they were free to move their eyes naturally, they fixated the spatial locations previously occupied by objects in the memory array during the blank retention intervals. Our individual-item analysis suggested that fixating a particular item during the memory retention interval could result in better memory for that specific colored square in Experiment 1, at least at set size 3. However, because this potential effect was not systematic or strong, we reserved drawing conclusions but returned to this possibility in Experiment 2.

In Experiment 1, we required participants to remember simple colored squares across the retention intervals. We wondered whether the fixation behavior found in Experiment 1 would pale in comparison to when participants had to maintain more complex stimuli based on a conjunction of features. Previous work had suggested that attention (covert or overt) plays a special role in the maintenance of multifeature objects in visual working memory (Wheeler & Treisman, 2002). Thus, we wanted to determine the generality of the findings in Experiment 1 and to test the hypothesis that the reliance upon overt visual–spatial attention would increase when participants had to remember objects composed of a conjunction of features.

Experiment 2

In Experiment 2, we tested the hypothesis that visual–spatial attention is primarily used during visual working memory tasks to maintain conjunctions of object features (Wheeler & Treisman, 2002). If visual–spatial attention serves a role in the maintenance of feature conjunctions, above and beyond the basic role in maintaining simple feature representations that we demonstrated in Experiment 1, then we should see that eye movements to the items would be even more critical for correctly remembering the multifeature objects in Experiment 2. The design of Experiment 2 was essentially identical to that of Experiment 1, except that participants were required to remember objects that were composed of a conjunction of features. Figure 4a shows that each object was a colored Landolt square with a gap on one side. Participants had to remember both of these features to accurately perform the change-detection task, because either the color or the shape of the object could change between the memory and test arrays. We again tracked participants’ eyes during both fixation and eye-movement conditions. This allowed us to further test the hypothesis that the overt deployment of visual–spatial attention during the retention interval aids memory performance, using the same metrics that we had used in Experiment 1. In addition, a comparison of the utility of eye movements between Experiments 1 and 2 would allow us to test the hypothesis that overt selection is particularly important for maintaining conjunctions of object features. Although some have proposed that attention is used to maintain feature conjunctions in visual working memory (Wheeler & Treisman, 2002), other recent work has challenged this proposal (Johnson, Hollingworth, & Luck, 2008; Zhang, Johnson, Woodman, & Luck, 2012). Thus, the most recent empirical work suggested that we should find that fixating the object locations during the retention interval in Experiment 2 was essentially identical to that found in Experiment 1, because maintaining feature conjunctions is not particularly reliant upon attention.

Fig. 4
figure 4

Example stimuli and manual-response accuracy from Experiment 2. (a) Either a color or an orientation change could occur. (b) Mean accuracies (percentages correct) for Experiment 2A, as a function of condition and set size

Method

The design of Experiment 2 was identical to that of Experiment 1, except as noted below.

Participants

A new group of 26 volunteers (18–32 years of age) from the same pool participated in both the free-eye-movement and fixation condition of Experiment 2A after informed consent had been obtained. Four participants from Experiment 2A were replaced, using the same criterion used in Experiment 1, for failure to properly fixate. A separate sample of 16 participants volunteered for Experiment 2B.

Stimuli

In Experiment 2, the stimuli consisted of colored Landolt squares (each 1.2° × 1.2°, 0.1° line thickness) with a gap (0.45°) on the left, right, top, or bottom side. Stimuli were presented on a gray background (48.5 cd/m2) and centered approximately 7.5° from fixation (a black plus sign, 0.3° × 0.3°, < 0.01 cd/m2). The color of each object was randomly selected with replacement from the same set used in Experiment 1.

The probability of a change between the memory and test arrays was 50 %, with color and orientation changes being equally probable (i.e., 25 % each). On the trials that color changed, the color of the item changed to a color that had not appeared in the memory array on that trial. When the orientation changed, it was replaced by one of the other three possible orientations with equal probabilities. Participants were instructed to remember both the shape and color of the objects because one feature of one object would change on half of the trials.

In Experiment 2B, participants only performed the eye-movement condition and we increased the number of experimental trials from 60 to 120 to allow us to determine whether the individual items benefited from being fixated.

Results

The change-detection results of the fixation and eye-movement conditions of Experiment 2A are summarized in Fig. 4b. As expected, accuracy significantly decreased in both conditions as set size increased. Change-detection accuracy was higher in the eye-movement condition (82.7 % correct across set sizes) than in the fixation condition (79.9 % correct). These findings resulted in significant main effects of condition, F(1, 25) = 5.39, p < .05, and set size, F(2, 50) = 185.07, p < .01, although the interaction was not significant, F(2, 50) = 0.56, p > .5.

In Experiment 2, we again found that the participants spontaneously fixated the locations previously occupied by objects in the memory array during the blank retention interval when they were free to do so (see the eye movements from an example trial in Fig. 5a). We found that 65.6 % of the saccadic endpoints landed within the 2° windows centered on the object locations during the retention interval. Figure 5b shows the distribution of the saccadic latencies during the memory retention intervals. As is illustrated in Table 1, the saccades to object locations were focused on approximately two objects, with saccades returning to these locations after the eyes were directed to the fixation point and other near-object locations outside of our measurement windows. As in Experiment 1, we found that these fixations occurred throughout the retention intervals and were not simply an index of residual encoding processes or the anticipation of the onset of the test array.

Fig. 5
figure 5

Example of the eye movements during a trial and frequency histograms of the saccade endpoints during the trials. (a) Actual eye-movement data from an example participant, measured during the retention interval. In this example trial, the participant did fixate the changing item location (the red Landolt square rotated 180°). Note the break in the scan path due to a blink during the 5-s retention interval. (b) Latency istograms indicating the total numbers of saccades made to object as compared to nonobject locations, averaged across participants, during the retention interval in Experiment 2B

The individual-item results of Experiment 2B are shown in Fig. 6. In this analysis, we examined change-detection accuracy on the basis of whether or not participants fixated the location of the item that would ultimately change. With a high working memory load of six objects, participants were more accurate at detecting a change when gaze was directed to the item of the upcoming change during the retention interval. The ANOVA of change-detection accuracy as a function of whether the change was fixated in Experiment 2B yielded a significant main effect of set size, F(2, 30) = 105.52, p < .01, but as in Experiment 1, we did not find a significant main effect of fixating the change, F(1, 15) = 0.90, p > .3. However, we did find an interaction of set size and whether the changed item was fixated during the retention interval, F(2, 30) = 5.42, p < .01. Planned comparisons confirmed that this interaction was due to change-detection performance being not statistically different for trials on which the changing item was fixated as compared to when it was not at set sizes 1 and 3 [F(1, 15) = 0.53, p > .4, and F(1, 15) = 2.98, p > .1, respectively], but being significantly better at set size 6 when the changed item was fixated as compared to when it was not, F(1, 15) = 4.78, p < .05. In a follow-up analysis, we wanted to determine whether this individual-item benefit of fixating the object that would change was different depending on the nature of the upcoming change. We thought that it was possible that these fixations of individual items would differentially benefit the accuracy of detecting shape changes, given that this was an inherently spatial feature (i.e., the gap location). However, we found that fixating an item did not differentially effect the accuracy of detection of color or shape (i.e., gap location) changes, F(1, 13) = 0.85, p > .35. In summary, we did not find that fixating the items that would change at the end of the trial resulted in generally better change-detection accuracy, as would have been the case if we found a significant main effect of item fixated. Below we will discuss this finding in relation to the results of Experiment 1.

Fig. 6
figure 6

Results of the analysis examining the benefit of fixating during the retention interval the item that would change, across set sizes in Experiment 2B

Next, we entered the accuracy of the manual responses into a mixed-model ANOVA to obtain a between-subjects comparison of Experiments 1A and 2A in terms of the within-subjects factor Condition (eye movement and fixation). We found a significant main effect of experiment, F(1, 50) = 83.18, p < .01, due to generally higher accuracy when only color needed to be remembered in Experiment 1A, but the Experiment factor did not interact with condition, F(1, 50) = 0.03, p > .8, supporting our observation that the effects of being allowed to make saccades on change-detection accuracy were essentially identical across Experiments 1A and 2A. As expected, we also found a significant main effect of condition (eye movement vs. fixation), F(1, 50) = 10.13, p < .01. We ran another mixed-model ANOVA examining the individual-item benefit of fixations between Experiments 1B and 2B, entering the eye-movement data measuring the effect of fixating an item on change-detection accuracy. This analysis did not yield any significant main effects or interactions. This supports the conclusion that these item-specific analyses did not yield reliable effects in either experiment.

Discussion

In Experiment 2, we replicated the basic findings from Experiment 1 using more complex stimuli that required participants to remember multifeature objects. We again found that during the blank retention intervals, participants fixated the locations previous occupied by items in the memory arrays. Our analysis of change-detection accuracy in the fixation and eye-movement conditions supported the hypothesis that this overt attentional selection of locations previously occupied by the objects improved performance on the visual working memory task. Thus, these findings support the hypothesis that overt mechanisms of spatial selection are used to help maintain representations of objects in visual working memory.

It is notable that the analyses in which we examined whether fixating an item specifically benefited that item did not show a clear effect in either Experiment 1 or 2. It could be that such an effect would emerge with more power or with a memory testing procedure that was more sensitive than change detection. For example, it is possible that a cued recall procedure (Zhang & Luck, 2008) might reveal such a benefit. However, at this point we find that saccades generally increase the accuracy of change detection relative to trials in which saccades are not allowed, but we did not see clear evidence that fixating the location of a specific item consistently benefited that memory representation.

The comparisons of eye-movement effects and of the accuracies of change detection across Experiments 1 and 2 have additional implications for theories of how object representations are maintained in visual working memory. One theoretical proposal about visual working memory storage is that mechanisms of attentional selection are essential for maintaining conjunctions of features across time (e.g., Wheeler & Treisman, 2002). However, we found that patterns of eye movements were not distinguishably different, whether people were remembering single-feature objects (just color, in Exp. 1) or conjunctions of features (color and shape, in Exp. 2). The findings across Experiments 1 and 2 indicated that our saccadic index of the deployment of overt visual–spatial attention to object locations during memory retention was not significantly increased by requiring people to remember a conjunction of two features versus a single feature in visual working memory.

It is important to note that the interpretation that we have just discussed discounts the possibility that location is inherently a feature of the object representations stored in visual working memory. It is possible that location is also stored with other object features, like color, and in this way that even the to-be-remembered objects in Experiment 1 formed conjunctions of features. If this were the case, then in both Experiments 1 and 2 the participants needed to store conjunctions in visual working memory, accounting for the similar effects of eye movements across the experiments. Some evidence has suggested that location is not obligatorily encoded with other object features in visual working memory (Logie & Marchetti, 1991; Tresch, Sinnamon, & Seamon, 1993; Woodman, Vogel, & Luck, 2012), but definitively addressing this question required additional evidence. Another model could also easily accommodate our findings—that is, a model in which features are neither conjoined in object representations nor actively maintained via spatial location (unlike the proposal of Wheeler & Treisman, 2002) during a retention interval. Such a model would also predict that the demands in Experiments 1 and 2 would not be fundamentally different, except that in Experiment 2 an additional feature store would be actively maintaining information. The present findings narrow the space of models that could account for the data, but more work will be required to distinguish between the viable candidates.

Our findings thus far showed how an index of overt visual–spatial attention can be used to study how selective processing aids the maintenance of object representations in visual working memory. Next, we sought to provide converging evidence for our conclusions that mechanisms of spatial selection aid the maintenance of object representations, by using an interference paradigm.

Experiment 3

The goal of Experiment 3 was to see whether we could disrupt the normal use of visual–spatial attention in the maintenance of object representations. We used a probe-detection paradigm to test the hypothesis that being able to covertly attend to remembered objects’ locations aids memory maintenance. For example, it is possible that the benefit of being allowed to move one’s eyes in Experiments 1 and 2 was minimized by the fact that participants could covertly shift visual attention to the locations of the remembered objects in the fixation condition. In Experiment 3, we sought to interfere with both covert and overt shifts of attention in the service of maintaining objects in visual working memory by presenting a brief probe at the fixation point that was difficult to detect.

Participants performed the same change-detection task in three conditions, shown in Fig. 7. In the baseline condition, we instructed participants to remember an array consisting of a variable number of colored squares. In the visual-probe and auditory-probe conditions, participants performed a probe-detection task in addition to the change-detection task. During the visual-probe condition, the fixation cross would change from black to gray for 100 ms on 50 % of the trials. The onset of the probe was jittered such that it appeared at fixation at different times across trials. Participants were instructed to respond to this probe by pressing the spacebar. We expected that after seeing the memory array presentation in the visual-probe condition, participants would focus attention on the fixation cross in anticipation of the possible probe stimulus. However, on half of the trials, no probe was actually presented, meaning that on these trials the visual working memory and response demands were identical to those in the baseline condition, except that attention was focused centrally on the fixation point during the retention interval in anticipation of the possible probe stimulus. Because the probe was a very brief change in the luminance of the fixation point, detection of this signal required observers to focus attention in anticipation of the probe. To test whether any effects we would find were due to general dual-task interference, we also included an auditory-probe condition. Instead of a visual probe occurring on half of the trials, participants were instructed to listen for and respond to an auditory probe. In the auditory-probe condition, a 100-Hz tone was presented for 100 ms during the retention interval. Again, the probe was jittered across the retention interval, and participants had to respond by pressing the spacebar.

Fig. 7
figure 7

Example stimuli from Experiment 3. (a) The stimulus sequence during the baseline condition. (b) The stimulus sequence during the visual-probe condition. (c) The stimulus sequence during the auditory-probe condition. In the probe conditions, 50 % of the trials contained a probe presented for 100 ms. The probe occurred at a 1,000-, 1,500-, or 2,000-ms stimulus onset asynchrony, with equal probabilities, during the retention interval

If participants typically deploy attention to the locations of objects to maintain their representations in visual working memory during blank retention intervals, the expectation of a visual-probe stimulus appearing at fixation should draw attention away from these memory-object locations. We predicted that by introducing this visual-probe task and requiring participants to focus attention at fixation, we would disrupt the use of covert and overt visual–spatial attention to aid in the maintenance of the objects in visual working memory. We expected that change-detection performance on the no-probe trials of the visual-probe condition would be significantly worse than that in the baseline and auditory-probe conditions, due to attention being focused on the fixation point during the retention interval. Moreover, to rule out effects of dual-task interference, we expected that performance would be comparable in the auditory-probe and baseline conditions, and better than performance in the visual-probe condition. Thus, any decrement in performance during the visual-probe condition relative to the auditory-probe condition would be attributed to participants’ inability to use their spatial attention during maintenance.

Method

Participants

A group of 38 volunteers (18–40 years of age range) from the same pool participated for monetary compensation or class credit. After informed consent was obtained, volunteers performed all three conditions of the experiment. All of the observers reported having normal or corrected-to-normal vision and normal color vision.

Stimuli

The stimuli were presented on a gray background (53.4 cd/m2) at two eccentricities. The outer-annulus stimuli were centered at a radius of approximately 6.0° from the fixation point (a black plus sign, 0.2° × 0.2°, < 0.01 cd/m2) in the center of the monitor, and the inner-annulus stimuli at 1.8° from fixation. The memory stimuli were sets of three, six, or nine colored squares (each 1.25° × 1.25°). The memory and test arrays of the colored squares were randomly generated on each trial, such that three, six, or nine locations were selected, without replacement, from a set of 24 possible locations (minimum space of 2.4°). The color of each object was randomly selected from a set of seven colors with at most one replacement of each color: white (92.46 cd/m2), red (x = .642, y = .327; 22.62 cd/m2), blue (x = .152, y = .067; 9.66 cd/m2), green (x = .318, y = .569; 64.99 cd/m2), black ( < 0.01 cd/m2), yellow (x = .478, y = .452; 65.23 cd/m2), and magenta (x = .304, y = .149; 7.04 cd/m2). The articulatory-suppression stimuli were strings of four white letters or numbers (i.e., “a, b, c, d,” “1, 2, 3, 4,” “w, x, y, z,” or “6, 7, 8, 9”; each letter spanned approximately 1° × 1.4°, 92.46 cd/m2). The visual-probe stimulus was identical to the fixation point, except that it was light gray (0.2° × 0.2°, 88.4 cd/m2) instead of black ( < 0.01 cd/m2). The auditory-probe stimulus was a 100-Hz tone.

Procedure

At the beginning of each block of trials, the participants began the articulatory-suppression task after the numbers or letters were presented (repeating approximately three to four alphanumerics per second). Each condition was composed of four blocks, each with 36 trials. The order of the conditions was counterbalanced across participants. All of the observers participated in a 12-trial practice block before for each condition. In all conditions, the fixation point was visible continuously during each trial, and the participants were instructed to maintain fixation throughout each trial.

In the baseline condition, a memory array was presented for 100 ms, followed by a 4,000-ms blank retention interval and then the test array. The test array remained on the screen for 5,000-ms or until the observer pressed the “z” key on the keyboard with the left middle finger, to indicate that the test array was the same as the memory array, or the “x” key with the left index finder, to indicate that the test array was different. The probability of a color change of one item in the array was 50 %, with same versus different trials randomly interleaved. The three set sizes were also randomly interleaved across trials. The instructions for all conditions stressed the accuracy of responses in the memory task.

The probe conditions were identical to the baseline condition, except that we instructed participants to detect a probe during the retention interval, which they were told would occur randomly on 50 % of the trials. In the visual-probe condition, the probe was a brief change at the fixation cross. In the auditory-probe condition, the probe was a brief tone. All probes were presented for approximately 100 ms and began 1,000, 1,500, or 2,000 ms after the retention interval began. The onset latency of the probe was randomized across trials. When the probe occurred, we required participants to press the spacebar on the keyboard as quickly as possible and within 1,000 ms of the onset of the probe to count as correct.

Data analysis

We analyzed the means using an ANOVA with the within-subjects factors Condition (baseline, visual-probe, or auditory-probe) and Set Size (3 or 6). The data from set size 9 were excluded from the analyses because floor effects obscured potential differences due to the different types of probe conditions relative to the baseline.

Results

The participants responded to the presentation of the randomly interleaved visual probes within 1,000 ms on 100 % of the trials on which they appeared (mean RT = 549 ms). We found similar results with the auditory probes (100 % responses, mean RT = 529 ms). The mean change-detection accuracies for all three conditions are shown in Fig. 8a. Only trials in which the probe was not presented were included in the analyses, because we were interested in isolating the effect of focusing attention away from the objects during maintenance and not in the additional forms of interference caused by selecting and initiating the response to the probe. We found a significant main effect of condition, F(2, 74) = 57.44, p < .01, a significant main effect of set size, F(1, 37) = 131.93, p < .01, and a significant interaction, F(2, 74) = 60.98, p < .01. To determine the source of this interaction, we first confirmed that participants were more accurate at detecting changes in the baseline condition than in the visual-probe condition, F(1, 37) = 34.33, p < .01, and in the auditory-probe condition, F(1, 37) = 6.54, p < .05. Performance was significantly better in the auditory-probe condition than in the visual-probe condition, F(1, 37) = 8.15, p < .01. In addition, Fig. 8B shows that the largest difference in change-detection performance in the visual-probe condition relative to the baseline and auditory-probe conditions was at set size 6, with minimal differences at set size 3 (i.e., within the average participant’s visual working memory capacity). Comparisons at set size 6 showed that participants’ performance was significantly impaired during the no-probe trials of the visual-probe condition as compared to baseline, and in the auditory-probe condition as compared to baseline [F(1, 37) = 33.7, p < .01, and F(1, 37) = 8.85, p < .01, respectively]. However, these same comparisons at set size 3 did not yield significant differences (ps > .20).

Fig. 8
figure 8

Results of Experiment 3. (a) Mean accuracies (percentages correct) as a function of condition. (b) Mean accuracies (percentages correct) as a function of condition and set size

Note that we excluded set size 9 due to floor effects at this set size across the conditions. However, the same general pattern of effects was observed when this set size was included in the analyses. Specifically, the omnibus ANOVA yielded significant main effects of condition, F(2, 74) = 4.27, p < .05, and set size, F(2, 74) = 263.43, p < .001, as well as a significant Condition × Set Size interaction, F(4, 148) = 10.02, p < .001. The critical planned comparisons between the auditory- and visual-probe conditions exhibited the same pattern when set size 9 was included; however, the interaction with set size was now in part driven by a reduction in the sizes of the effect across the different probes at set size 9.

Discussion

In Experiment 3, we found that our observers’ ability to maintain information in visual working memory was interfered with by attending to the fixation point. This interference was apparently due to simply attending to the fixation point, because we focused on the trials in which no probe was actually presented. Although we would like to rule out the possibility of attributing these results to dual-task effects, it appears that engaging in a similar but nonvisual spatial task also led to some amount of performance interference; however, these effects were not as pronounced as in the visual-probe condition. These findings, together with the eye-movement findings from Experiments 1 and 2, provide converging evidence for the hypothesis that we use overt and covert visual–spatial attention mechanisms to aid in maintaining object representations in visual working memory.

General discussion

In Experiments 1 and 2, we found that eye movements made during retention intervals to the locations previously occupied by to-be-remembered objects improved participants’ ability to detect changes of those objects in the subsequent test arrays. This was evidenced by change-detection accuracy being higher in the conditions in which eye movements were allowed, as compared to when maintaining fixation was required. In addition, we found that when eye movements were allowed, observers spontaneously fixated the locations of the memoranda during the retention interval. In Experiment 3, we provided converging evidence for the hypothesis that spatially selective mechanisms facilitate visual working memory maintenance of object representations, using an interference paradigm. That is, anticipation of a spatially predictable probe during the retention interval of the change-detection task interfered with memory for the objects. When attention was unable to be focused on the remembered objects, because it was focused on the fixation point in anticipation of the probe, memory for the objects was worse. In summary, we found multiple pieces of converging evidence suggesting that overt and covert selection of locations previously occupied by objects aids the memory of that information.

In our experiments, we have referred to visual–spatial attention as a mechanism that aids memory maintenance to connect our findings with ideas in the classic memory literature (Atkinson & Shiffrin, 1968). However, others have interpreted eye movements to the locations previously occupied by particular objects as being due to attention being directed internally to object representations (Astle, Nobre, & Scerif, 2009; Kuo, Stokes, & Nobre, 2011; Matsukura, Luck, & Vecera, 2007). For example, it has been proposed that spatial attention serves to protect the contents of visual working memory (Matsukura et al., 2007). Because we found that participants spontaneously made eye movements to item locations when no other distracting information was present, and that this resulted in higher accuracy relative to when the eyes were fixed, we propose that spatial attention may do more than filter out potentially distracting information, but instead actively participate in the maintenance of object properties, as was previously proposed in some models of visual working memory (Baddeley & Logie, 1999; Logie, 1995). This interpretation is consistent with previous research suggesting that making eye movements to the locations previously occupied by objects facilitates their retrieval from memory (Ferreira, Apel, & Henderson, 2008). Finally, our basic observation that change detection was more accurate when participants made a series of eye movements to empty locations on the monitor during the retention interval is striking, given the body of work demonstrating that the planning and initiation of saccadic eye movements is a demanding process involving multiple brain areas and stages of processing (e.g., Schall, 2002; Schall & Woodman, 2012). We could have easily found that making eye movements during a retention interval resulted in a reduction of memory accuracy, due to the cognitive demands of making saccades. Instead, it appears that overtly selecting the locations of the objects improves memory, even with the demands of making these eye movements.

The pattern of results that we observed, in which eye movements increased the accuracy of change detection when objects were remembered, differs from previous experiments that have examined the relationship between memory for spatial location and eye movements. Most relevant is a recent study by Godijn and Theeuwes (2012). They had participants remember the locations of a series of numbers that were shown for 10 s. Following a 7.5-s retention interval, the participants clicked a mouse to report the locations of the numbers on the screen in ascending order. Godijn and Theeuwes found that being able to make eye movements during the retention interval did not consistently improve performance in the spatial memory task (analogous to the eye-movement conditions in Exps. 1 and 2). In addition, they showed that performance was not significantly worse when participants were required to fixate one location during the retention interval, relative to a condition in which saccades to the remembered locations were allowed. The most obvious difference between the present study and that of Godijn and Theeuwes is the type of information that needed to be maintained. In the previous work people were maintaining spatial locations, and not the object features used in the present study. It appears that visual working memory handles object features (like color, shape, size, etc.) very differently from location information (Goldman-Rakic, 1996; Jonides & Smith, 1997; Ungerleider et al., 1998). The effect of eye movements on memory for object features relative to spatial locations appears to be another signature of the distinction between these two types of information.

It might initially seem counterintuitive that a spatial selection mechanism could help maintain features that are inherently nonspatial (i.e., color, form, etc.). However, the neurons throughout the visual system have spatial receptive fields. It is likely that attending to the location that was just recently occupied by an object of a given color helps enhance the activity of the neurons that code for both that location and color (e.g., Luck, Chelazzi, Hillyard, & Desimone, 1997), with attention being able to enhance the remembered feature because the lingering activation in the neurons coding for that color allows the feature to inhibit neurons with the same spatial receptive field but other color selectivities. For example, if a red item was presented in the lower left visual field, then shifting attention to that location in space could boost the firing rate of neurons representing the lower left visual field, with the higher firing rate of the red neurons due to the sustained working memory trace damping down the activity levels of the neurons with the same receptive field but coding for blue, green, yellow, and so forth. It is possible that such covert shifts of attention are sufficient to boost the maintenance-related activity of the object representations in visual working memory, with saccades simply following these shifts of attention to the locations of remembered objects (Hoffman & Subramaniam, 1995; Kowler, Anderson, Dosher, & Blaser, 1995) when saccades are not being actively suppressed, as in our fixation conditions of Experiments 1 and 2. Next, we will discuss how our findings are entirely consistent with several types of models of visual working memory.

In their multiple-component model of working memory, Logie and Baddeley (Baddeley & Logie, 1999; Logie, 1995) proposed that separate spatial and object working memory mechanisms operate in architecturally and functionally distinct subsystems. However, this model of working memory proposes that the spatial and object-based components of visual working memory interact in the process of maintaining representations of objects across time. That is, spatial mechanisms serve to actively rehearse the object representations held in the passive visual store. Although there is evidence for the separation of the buffers (Baddeley, 2003; Smith et al., 1995), the findings presented here support the prediction that they interact during the maintenance of objects in visual working memory. One common feature that these memory systems appear to share is their use of attention. Here we have shown that spatial attention is used to rehearse object information being held during maintenance. Many models of visual attention describe how visual working memory guides such attention (Desimone & Duncan, 1995), but our findings demonstrate an influence in the opposite direction. That is, attention helps determine what is maintained in visual working memory.

Another view of working memory is that it is the activated portion of long-term memory (Cowan, 1997, 1999; Lovett, Reder, & Lebiere, 1999). According to this view, attention is used to maintain the elevated activity of a limited number of representations in long-term memory. In this way, the capacity limits of visual working memory are a natural consequence of the limited capacity of attentional mechanisms. Our findings could also be accommodated by such an architecture. Under a model like the embedded-processes model of Cowan (1999), spatial location serves as an index through which visual attention can reactivate the task-relevant representations being stored in memory. Given such a view, the fact that people look at the previous location of an object while trying to recall it from long-term memory (Ferreira et al., 2008; Spivey & Geng, 2001; Zelinsky, Loschky, & Dickinson, 2011) is an example of using these spatial indexes to reactivate a long-term memory representation and bring it into working memory. This would suggest that the spatial traces left by objects in memory may be more directly accessible than the objects’ other features, both during fairly short retention intervals, like those we used, and across longer periods of time. The idea of a directly accessible spatial index for the nonspatial properties of objects is consistent with the empirical observation that neurons in the visual system inherently have spatial receptive fields (Desimone & Duncan, 1995), and it is central to Treisman’s (1988) proposal that a master map of locations is used to direct attention and organize the object representations in working memory (Wheeler & Treisman, 2002).

In summary, the present findings support models proposing that a spatial rehearsal mechanism is used to maintain veridical object representations in visual working memory (e.g., Logie, 1995), as well as models proposing that the deployment of attention to spatially indexed internal representations is what distinguishes visual working memory from long-term memory. The use of such a mechanism might naturally lead to shifts in gaze toward locations in our visual field in which we have perceptually processed items, as we have seen in the present study. In addition, when we try to maintain multiple objects in visual working memory, the reliance on such rehearsal processes may be increased. This appears to explain why previous studies using dual-task paradigms have failed to find significant interference when a probe has needed to be detected and a single object was maintained in visual working memory (Awh & Jonides, 2001; Postle et al., 2006).