Imagine yourself standing on top of a mountain, enjoying the scenery of snow-capped giants and little clouds drifting by. But when you close your eyes, your rich percept immediately starts to fade from mind, and a few moments later you can only remember a few elements. This example points out a fundamental aspect of human memory; it is initially brief and richly detailed, then moments later sustained but impoverished. Usually, the initial high-capacity form of memory is referred to as iconic memory (IM; Neisser, 1967; Sperling, 1960) and the sparse, yet sustained, form of memory as visual working memory (WM; Luck & Vogel, 1997; Vogel, Woodman, & Luck, 2001).

Importantly, recent findings have challenged this traditional, two-stage viewpoint with the discovery of an intermediate form of visual short-term memory, putatively termed fragile visual short-term memory (FM). FM seems to last much longer than IM (IM < 0.5 s, FM > 4 s), yet at the same time its capacity surpasses the limits of WM (WM capacity = 2–4 items, FM capacity = 5–15 items; Sligte, Scholte, & Lamme, 2008). Furthermore, FM seems to have more distinct features, as magnetic stimulation of the prefrontal cortex (Sligte, Wokke, Tesselaar, Scholte, & Lamme, 2011) and manipulations of attention during memory encoding (Vandenbroucke, Sligte, & Lamme, 2011) greatly reduce WM capacity while having little to no effect on FM capacity.

A controversy has emerged as to whether FM really is a new and distinct kind of memory, or whether it is some form of visual working memory. Matsukura, Luck, and Vecera (2007), Makovski and Jiang (2007), and Makovski, Sussman, and Jiang (2008) have suggested that working memory is not robust before onset of a new visual scene (such as a test display), and that only those memory items that are attended are prioritized/protected in a way that ensures survival of a new visual scene. According to this account, a partial-report cue during retention may serve to prioritize/protect items in VSTM that would otherwise be erased by a new visual scene. We sympathize with this kind of explanation, but it does not prove that FM does not exist. Rather, it suggests that a rewording of the phenomenon would settle the debate (instead of calling the stores FM and WM, they would be called rich WM before visual interference and stable WM after interference). Second, Matsukura and Hollingworth (2011) found that participants could only make use of FM after extensive training. Again, we think that this does not disprove the existence of FM. It is well known from the IM literature that subjects need time (even up to 200 trials) to learn how to use a partial-report cue. Perhaps FM access (or focusing the mental spotlight on VSTM) does not happen often in daily life, and is therefore difficult without any training. Summarizing, although there is no agreement about the exact nature of FM (is it really that distinct from WM?), it seems that everybody agrees that visual memory is rich yet volatile before the onset of a new visual scene, but impoverished yet robust after onset of a new visual scene.

The goal of the present study is not to confirm or falsify the proposition that WM and FM are essentially the same. Instead the purpose is to reveal under what conditions FM is erased, leaving only WM. Thus, our question is, what type of new visual scene erases FM—that is, any visual scene, or only very specific visual stimuli? This question is pivotal for two reasons. First, it could be informative about where to put FM on the IM–WM continuum. Is FM as fragile as IM, or as robust as WM, or something in between? Second, from an applied perspective, if we could answer the former question, we might be able to circumvent the brief duration of IM and the strict capacity of WM, and eventually to use FM in everyday life to overcome practical challenges involving visual short-term memory.

To date, very little is known as to why FM is erased in standard experimental settings: When a new visual display is shown that is very similar to the memorized scene (same objects, same locations), catastrophic inference occurs and subjects’ performance is reduced to WM levels. This interference by similarity might also be the reason why the test displays in standard WM change detection tasks are so potent in erasing FM. However, the question remains under what circumstances a relative sparing of FM in the face of new visual stimulation can be achieved. Importantly, investigating this issue may shed light on the characteristics of all forms of VSTM. To investigate this, we tested whether a new visual scene needs to spatially overlap with the to-be-recalled scene, and/or needs to consist of the same objects for erasure to happen. We operationalized this by presenting a new scene (which we will refer to, from here on, as the interfering display, or ID) that was either similar (composed of rectangles) or dissimilar (composed of circles) to the items stored in FM, and we presented these objects either at the same location as or at another location than the probed FM representation (see Fig. 1a).

Fig. 1
figure 1

Stimuli and trial design. On each trial, subjects were shown a memory and a test display (a; end displays), and they had to detect changes between these successive displays. A cue (a; next-to-last display) appeared during the retention interval between the memory and test displays, to measure fragile visual short-term memory, or FM (b), or during the test display, to measure working memory, or WM (c). On FM trials, interfering displays, or IDs (a; boxed displays) could appear 300 ms before, 100 ms before, simultaneous with, 100 ms after, or 300 ms after the appearance of the cue. These IDs consisted of four objects that were all presented on the left or the right side of fixation. In effect, the objects in the ID could be on the same side as or the opposite side from the probed FM representation. Moreover, the IDs contained items either similar (rectangles) or dissimilar (circles) to the ones stored in FM

To preview our results: We found that FM is only erased when the new display consists of similar objects presented at the same location. When we uncoupled location and identity, FM was largely spared—for instance, when dissimilar objects were presented at the same location. These results imply that FM is a location-specific and object-specific storage. This reveals a qualitative difference between FM and WM (WM is not/is less affected by visual stimulation, which we confirmed in our second experiment), and between FM and IM (which is erased by any visual stimulation; Sligte et al., 2008). To return to our applied perspective, this study provides a first clue as to how visual settings should be designed so that FM can be useful in practical settings.

Experiment 1: FM is only erased by visual stimulation at the same location, consisting of the same objects

Method

Subjects

In this experiment, 34 students from the University of Amsterdam participated (15 female, 19 male; age range 18–34 years, mean age = 22.5, SD = 3.2), all having normal or corrected-to-normal vision. All of the subjects gave their written informed consent to participate in the study, which was approved by the local ethics committee of the University of Amsterdam. Subjects received financial compensation or course credits for their participation.

Equipment and stimuli

Stimuli were displayed on a 19-in. LG CRT display (type FB915BP) at a refresh rate of 60 Hz. The subjects were seated 75 cm from the monitor, and thus the total viewing angle of the display spanned 27.2° × 20.5°. All stimulus displays had a black background and contained a red fixation dot (13.52 cd/m2). The memory and test displays (Fig. 1a, two end displays) consisted of eight white (87.66 cd/m2, 2.08° × .52° in size) rectangles having horizontal, vertical, or oblique (45° or 135°) orientations. Individual rectangles were placed on an imaginary circle (radius 4.68°) around fixation. The interfering displays (ID) contained either four rectangles that were chosen randomly from the four possible orientations or circles that were 2.15° in diameter (Fig. 1a, box of four displays). The rectangles and circles in the IDs were presented either in the same or in the other hemifield relative to where the cue appeared. The objects in the IDs were presented at the same positions where the rectangles appeared in the memory and test displays. The cue (Fig. 1a, next-to-last display) consisted of a three-pixel-thick line that at one of its ends was close (±0.93°) to fixation, and at its other end was close to the center of one rectangle (±1.6°). Note that the cues were thin lines, and thus were distinct from the presented objects (which were either circles or rectangles).

Trial design

In the FM (Fig. 1b) and WM (Fig. 1c) conditions, two displays, containing eight oriented rectangles, were presented sequentially. Display 1 was the memory display, which was presented for 250 ms. Display 2 (the test display) remained visible until response. On 50 % of the trials, the orientations of the rectangles were exactly the same in both displays; on the other 50 %, one of the rectangles in the test display had changed orientation. The subjects were to indicate whether both displays were identical. Furthermore, a cue appeared 1 s after the disappearance of the first display. In both conditions, the cue pointed to the location where the changed rectangle would appear, if there would be a changed rectangle. Importantly, in the WM condition, the second display appeared 100 ms before the appearance of the cue (so that the cue could not be used to access FM), while in the FM condition, the second display appeared 1 s after the appearance of the cue.

In addition, we presented 100-ms interfering displays (ID) in the FM condition either 300 ms before the cue, 100 ms before the cue, simultaneous with the cue, 100 ms after the cue, or 300 ms after the cue (see Fig. 1a). These IDs could appear on the same side as the cue or on the other side, and they could consist of rectangles or circles. Thus, in total 40 ID conditions were presented (no change/change [2] × ID timing [5] × ID location [2] × ID object [2]), in addition to 2 FM (no change/change) and 2 WM (no change/change) conditions without IDs. In total, subjects performed 1,144 trials on two separate days. Thus, each cell in the experiment contained 26 observations.

Procedure

First, subjects were trained on the cued change detection task without IDs for about 30 min. Subsequently, they performed 11 blocks of 52 trials apiece (572 trials total) in which all 40 ID conditions and the four FM/WM conditions without IDs were randomly intermixed. On the second day, subjects again performed 11 blocks of 52 trials. The training served two purposes: first, to exclude unmotivated or unable participants, and second, to train participants in the use of the cues in general.

Data analysis

All statistical analyses were performed with repeated measures analyses of variance (ANOVAs) and paired t tests. Effect sizes for the t tests were computed with Cohen’s d or η 2.

Results

Subjects needed to score at least 75 % correct in the training session, averaged over all conditions. This criterion was not strict, since 75 % correct corresponded to four objects in memory (and so six in IM, four in FM, and two in WM would do). Six out of the 34 subjects were excluded because they did not reach this performance threshold. In the rest of article, the results will be based on the remaining 28 subjects. As can be observed in Fig. 2a, a large spread is apparent in the accuracy of FM (black diamonds) and WM (white squares). Two subjects scored poorly on the task, but we did not want to remove these participants as they had passed the training. On average, FM accuracy was 82.56 (2.56 SEM) and WM accuracy was 64.5 (1.75 SEM), a highly significant difference [t(27) = 12.39, p < .001, d = 1.83].

Fig. 2
figure 2

a A considerable spread in FM and WM accuracy existed across participants when no interfering displays (IDs) were shown. b Memory accuracy was highest for the FM condition without IDs (in black), intermediate for the FM conditions when IDs were shown (intermediate bars), and lowest for the WM condition (in white). The depicted accuracies are means over all ID–cue SOAs together. c Memory accuracy for each ID type and each ID–cue SOA apart (colored to match bars). FM accuracy (black line) and WM accuracy (dotted black line) are shown as references. It is evident that complete overwriting of FM only takes place when the ID consists of similar objects presented at the same location. In addition, it seems that when the ID is presented simultaneously with the cue, readout of FM is diminished

FM is an object-based and location-specific storage

For clarity, we averaged all ID timings together, as is shown in Fig. 2b. The averaged results tell a clear story: When the ID consisted of either circles or rectangles, but was presented on the opposite side from the cue, FM was hardly affected. However, when the ID consisted of rectangles and was presented on the same side as the cue, FM was nearly completely erased [as can be seen by the lack of a significant difference with WM accuracy; t(27) = 1.05, p = .30, d = 0.12]. These results strongly support the claim that FM is both location- and object-specific. Only when a new scene occupies the same space and consists of the same objects is FM erased; otherwise it endures, nearly unimpaired.

This point is further evidenced by Fig. 2c, which depicts accuracy as a function of the time between cue and ID. When an ID erases FM, accuracy will only be affected when the cue appears after the ID, not when the cue appears before the ID. This pattern is only observed when the ID consists of the same objects at the same location [the lowest line; F(1, 27) = 14.86, p = .001]. However, in the other conditions (the other lines), no such pattern is observed. In these cases, no significant difference in accuracy emerges, regardless of whether the IDs are presented before or after appearance of the cue [highest F value, in same-object–other-quadrant condition: F(1, 27) = 2.16, p = .153]. This indicates that in these conditions, no overwriting takes place. Note that in the same-object–same-location condition, even when the cue appears before the ID, accuracy is not at FM levels. This is in line with previous studies that have found that within a time window of 300 ms, interfering visual input impairs use of the cue (Becker, Pashler, & Anstis, 2000; Landman, Spekreijse, & Lamme, 2003).

Discussion

The results of Experiment 1 point to a simple conclusion: Not all visual stimulation is equal. Some visual stimulation hardly affects FM, whereas other visual stimulation seems to erase it to the point at which it becomes indistinguishable from WM. That is, the results indicate that FM is WM plus something extra. With the appropriate visual stimulation, this something extra is completely erased, while much visual stimulation leaves this extra largely unaffected. Moreover, these results confirm that FM is not just a long-lasting form of IM. After all, basically any visual stimulation will erase IM (Sligte et al., 2008), while only very specific visual stimulation erases FM. Thus, it seems that we are looking at a three-layered storage: meaningless visual stimulation (or a sufficient passage of time) will peel off the IM layer, and leave only the FM and WM layers. Presenting a visual scene consisting of the same objects at the same location will peel of the FM layer, and leave only WM.

But before we draw strong conclusions, let us first consider alternative explanations. Perhaps FM is not erased with the appropriate visual stimulation, but certain visual stimulation is distracting, causing a reduction in performance. Specifically, it could be that, since the to-be-remembered items are rectangles, participants adopted an attentional set for rectangles, and therefore all interfering displays consisting of rectangles would distract and thus hamper performance.

We would posit two arguments against this notion. First, both Vandenbroucke et al. (2011) and Makovski (2012) found that attentional interference in general hardly impacts FM. Moreover, if subjects had adopted an attentional set for rectangles, all displays consisting of rectangles should have been distracting. However, when the interfering display consisted of rectangles appearing at the opposite side of the cue, performance was virtually unaffected.

Another consideration is that the results of Experiment 1 are not specific to FM, but might hold equally well for WM. Perhaps, for specific memory tasks, certain types of displays are just more interfering than others. In this case, our claim that this tells us something about how FM is erased would not be incorrect, but the findings would just point to a larger rule: Similar items in a similar location affect all types of visual memory in the same way. To test whether the interference found in Experiment 1 is specific to FM, we conducted Experiment 2.

Experiment 2: IDs affect FM but not WM

Method, subjects, and procedure

Everything was the same as in Experiment 1, except for the following changes.

Nine subjects participated, seven of them naïve as to the purpose of the experiment, and two of the subjects were authors of this article (I.G.S. and Y.P.); the reported findings are essentially the same when these two participants are excluded from the analyses. Four of the subjects were male and five were female. All were right-handed and had normal or corrected-to-normal vision (age range 21–37 years, average 30.67 years, SD 4.99).

The procedure was as follows (see Fig. 3): Only two memory conditions (FM and WM) and five ID conditions (the same as in Exp. 1: no ID, same objects–same location, same–different, different–same, and different–different) were presented. First, the memory display was presented for 250 ms. Then, if an ID appeared, it would do so 700 ms after offset of the memory display; the ID would appear for 100 ms. Then, in the FM condition, 1 s after offset of the memory display, a retro-cue would be presented for 100 ms. Finally, 2 s after offset of the memory display, the test display would appear and would be visible until response. In the WM condition (in which no retro-cue was presented), a postcue would appear for 100 ms, and the onset of the postcue would be simultaneous with the appearance of the test display. Note that we chose to have the same time between memory display, ID, and test display in both conditions. We deemed this to be essential, since otherwise any differential effects of IDs on FM and WM could be due to differences in the timings. This did have the consequence that the time between cue and memory display was longer in the WM condition than in the FM condition. This, then, may have influenced overall performance in both conditions. However, this was not crucial, since Experiment 2 was set up to investigate the role of IDs in both memory types, not the difference in overall performance.

Fig. 3
figure 3

The experimental design of Experiment 2 was similar to that of Experiment 1. A memory display was presented for 250 ms. In the FM condition, a retro-cue was presented 1,000 ms after the offset of the memory display (for 100 ms), and the test display was presented 2,000 ms after the offset of the memory display (and remained visible until response). In the WM condition, no retro-cue was presented, but a postcue appeared together with the onset of the test display (the cue would also be presented for 100 ms). Furthermore, in both conditions an interfering display could appear 700 ms after the offset of the memory display (for 100 ms)

The experiment consisted of 12 blocks of 50 trials and took approximately 55 min. The first two blocks were practice blocks, and were not included in the analysis. Furthermore, before the experiment, participants underwent a 15-min training. The training task was the same as the actual experiment, except that no IDs were presented. Participants had to score 70 % or higher during this training to participate in the actual experiment. This procedure led to the exclusion of one participant (who scored at chance level).

Results and discussion

See Fig. 4 for an overview. We performed a two-way ANOVA, with ID Type and Memory Type as factors. This revealed a main effect for memory type [F(1, 8) = 15.5, MSE = .018, p < .005, η 2 = .66], as performance was higher in the FM condition than in the WM condition. Furthermore, a main effect of ID type was apparent [F(4, 32) = 11.1, MSE = .004, p < .001, η 2 = .58], as interfering displays hampered performance. Finally, we found a significant interaction [F(4, 32) = 6.2, MSE = .003, p = .001, η 2 = .44], as interfering displays hampered performance more in the FM than in the WM condition.

Fig. 4
figure 4

a Overall performance of each subject in the FM and WM conditions. b Average accuracy in the different conditions for both FM and WM. These results reveal that FM is severely hampered when similar objects appear at the same location, but is hardly affected in all other conditions. In the WM condition, none of the IDs significantly affected performance

Follow-up tests revealed no significant effect of IDs in the WM condition (F < 1, p > .55), but we did find a significant effect of IDs in the FM condition [F(4, 32) = 23.5, MSE = .003, p < .001, η 2 = .75]. Follow-up t tests revealed that this ID disturbance in the FM condition was entirely due to severely hampered performance in the same–same condition [when rectangles appeared on the same side as the cue; for the no-ID relative to the same–same condition, t(8) = 7.12, p < .001, d = 1.38; for all other comparisons between no-ID and other ID types, ts < 1.3, ps > .25].

Finally, to ensure that our analysis was not biased against IDs playing a role in the WM condition, we directly compared the no-ID to the same–same conditions for both FM and WM (ignoring all other conditions). This revealed a significant interaction between memory type and ID [F(1, 8) = 27.7, MSE = .002, p = .001, η 2 = .78], indicating that the same–same condition hampered performance more for FM than for WM. As we mentioned above, for FM, performance was significantly lower in the same–same than in the no-ID condition, but this was not the case for WM [t(8) = 1.25, p = .25, d = 0.47]. Our last t test revealed that for FM, in the same–same condition performance dropped to WM levels [t(8) = 1.33, p = .22, d = 0.45]. This essentially replicated the results of Experiment 1 (in which the same–same performance for FM also dropped to WM levels) and suggests that the differential cue timings did not have a big impact on WM performance (since the same–same condition essentially seemed to reduce FM to WM; if FM had an added cue-timing advantage in the setup of Experiment 2, FM same–same performance should have been higher than WM performance).

First, note that in Experiment 2 we essentially replicated the results of Experiment 1. As in Experiment 1, interfering displays hampered FM most when they consisted of similar objects at the same locations. In fact, when the conditions of location and object similarity were not met, performance was hardly affected. Note, furthermore, that as in Experiment 1, when the interfering display consisted of the same objects at the same location, performance dropped to WM levels.

Second, and the main point of Experiment 2, note that interfering displays had a significantly smaller effect on WM than on FM. Of course, this does not mean that interference can never have any effect on WM performance. Our sample size was fairly small, so it seems reasonable that with a larger pool of subjects, WM disturbances could be found. However, the main point was not whether WM could be affected by visual (or other) interference. The crux of the Experiment 2 results was that FM and WM were differentially affected by visual interference. That is, FM is more disturbed by specific types of visual interference than is WM.

Consequently, we can conclude that the results of Experiment 1 were not due to a general rule regarding (visual) memory, but that these results reveal a specific aspect of FM.

General discussion

The goals of the present study were to unveil the functional underpinnings of FM and to compare the properties of fragile memory to those of more traditional working memory. To investigate this, we presented interfering displays with objects that were similar (rectangles) or dissimilar (circle) to the items stored in FM, and these objects were presented either in the same hemifield that was probed in the FM representation or in the opposing hemifield. We found that when the interfering display consisted of objects other than the to-be-recalled objects, FM was hardly affected. Moreover, the impairment that was observed seemed to be due to interference unrelated to overwriting (since performance was not lower when the cue appeared after, rather than before, the ID). When a new visual scene did consist of similar objects, it seemed that FM was only erased when these objects appeared at the same location as the to-be-recalled objects. When objects appeared at another location, only a slight impairment was observed, and again, the impairment seemed not to be due to FM erasure.

The first point to be made is that the present study suggests that the erasure of FM seems to relate to different mechanisms than either IM or WM. IM can be erased by very little time (<0.5 s; Sperling, 1960) or meaningless visual stimulation (Sligte et al., 2008). However, both of these erasure options do not apply to FM. Moreover, WM seems to sit at the other end of the continuum, with visual stimulation hardly impacting WM at all (as we showed in Exp. 2).

Another important finding is that FM, despite its name, is only somewhat fragile; only very specific types of visual stimulation cause interference. This may explain why Makovski (2012) did not find complete erasure of FM after presenting visual interference. His visual-interference stimuli were perhaps too dissimilar to the items in the memory display to completely erase FM. At any rate, this provokes an interesting follow-up question for future research: How similar do the objects (and the locations) have to be to cause erasure of FM?

In a similar vein, the present findings could yield a practical application, given that FM appears to be a richer store than WM. Apparently, in most everyday situations, we are not able to tap into this particular memory store. On the basis of the present results, memory stimuli could potentially be designed to avoid erasure, under the appropriate circumstances. For example, situating charts at different locations, or making them dissimilar enough in appearance might make it easier to remember information. Such a strategy could be helpful in situations in which large amounts of information have to be evaluated and integrated, for instance in financial or medical settings.

Also noteworthy is that the present results suggest that objects are bound to locations in FM. That is, FM does not store the objects as free-floating features, since if it did, it should not matter (with regard to erasure) where in the visual display that objects are presented. Since FM capacity exceeds the capacity normally associated with selective attention (FM capacity = ±15 items: Sligte et al., 2008; attention capacity < 4 items: Conway, Cowan, & Bunting, 2001; Cowan, Elliott, Saults, Morey, Mattox, Hismjatullina and Conway 2005), it seems reasonable that FM is not dependent on selective attention (cf. Vandenbroucke et al., 2011). This, then, suggests that object–location binding does not depend on selective attention (since it seems to occur in FM), contrary to some widely held notions (Treisman, 1998; Treisman & Gelade, 1980; Wolfe, 1994), and in accordance with Landman et al. (2003).

Finally, the present findings suggest that FM is located in the intermediate visual areas of the brain. The evidence for this assertion is that in early visual areas, stimulus representations are not object-specific (see Konen & Kastner, 2008, who found that object specificity starts to occur from V4 onward), whereas later visual areas are not location-specific (Desimone & Gross, 1979; Desimone, Schein, Moran, & Ungerleider, 1985; Gross, Rocha-Miranda, & Bender, 1972). This suggestion is in accordance with neural studies into FM (Sligte, Scholte, & Lamme, 2009).

Summarizing, the present results indicate that FM is both object- and location-specific. FM is not erased by all visual stimulation, but only when similar objects appear at the same location as the to-be-remembered objects. This means that, when it comes to robustness, FM lies between IM and WM: more robust than IM, but less robust than WM. This finding could have practical consequences, as previously described, such that people can tap into this rich form of visual memory, rather than having to rely on WM.