Visual short-term memory capacity predicts the “bandwidth” of visual long-term memory encoding

Fukuda, Keisuke; Vogel, Edward K.

doi:10.3758/s13421-019-00954-0

Visual short-term memory capacity predicts the “bandwidth” of visual long-term memory encoding

Open access
Published: 24 June 2019

Volume 47, pages 1481–1497, (2019)
Cite this article

Download PDF

You have full access to this open access article

Memory & Cognition Aims and scope Submit manuscript

Visual short-term memory capacity predicts the “bandwidth” of visual long-term memory encoding

Download PDF

Keisuke Fukuda¹ &
Edward K. Vogel²

8326 Accesses
37 Citations
Explore all metrics

Abstract

We are capable of storing a virtually infinite amount of visual information in visual long-term memory (VLTM) storage. At the same time, the amount of visual information we can encode and maintain in visual short-term memory (VSTM) at a given time is severely limited. How do these two memory systems interact to accumulate vast amount of VLTM? In this series of experiments, we exploited interindividual and intraindividual differences VSTM capacity to examine the direct involvement of VSTM in determining the encoding rate (or “bandwidth”) of VLTM. Here, we found that the amount of visual information encoded into VSTM at a given moment (i.e., VSTM capacity), but neither the maintenance duration nor the test process, predicts the effective encoding “bandwidth” of VLTM.

Behavioral asymmetries in visual short-term memory occur in retinotopic coordinates

Article 30 November 2022

An integrative view of storage of low- and high-level visual dimensions in visual short-term memory

Article 22 February 2016

Electrophysiological and behavioral evidence for attentional up-regulation, but not down-regulation, when encoding pictures into long-term memory

Article Open access 19 October 2018

Although visual long-term memory (VLTM) has large enough capacity to store a virtually infinite amount of visual information (Brady, Konkle, Alvarez, & Oliva, 2008; Standing, 1973), not every information that we wish to remember is encoded. What limits our access to the unlimited memory storage? According to Atkinson and Shiffrin’s influential modal model of memory (Atkinson & Shiffrin, 1968, 1971; Rundus & Atkinson, 1970; Shiffrin & Atkinson, 1969), information is first encoded into short-term memory (STM) in which information is actively maintained. And this active maintenance is what grants access to long-term memory (LTM) storage. Despite its elegant simplicity, this model later received criticisms particularly on the proposed role of the STM maintenance in LTM encoding (Craik & Watkins, 1973; Naveh-Benjamin & Jonides, 1984). This criticism led to a discovery of the importance of the nature of encoding processes that information undergoes (Craik, 1983; Craik & Lockhart, 1972; Craik & Tulving, 1975; Craik & Watkins, 1973; Fisher & Craik, 1977; Moscovitch & Craik, 1976) and the characterization of the interaction between the encoding and retrieval processes became a central theme of LTM research. As a result, one aspect of LTM encoding proposed in the modal model—namely, its capacity limitation—received little attention to this date. That is, is there a capacity limitation in the amount of information encoded into LTM at a given time? If so, is this initial encoding bottleneck analogous to STM capacity? Recent studies that examined the limit of visual memory encoding had participants remember one object at a time, and therefore, their results do not directly inform us about the existence of such encoding bottleneck (e.g., Brady et al., 2008; Endress & Potter, 2014). In order to fully characterize the mechanism of VLTM encoding, it is important to examine whether there exists a capacity-limited encoding bottleneck by directly manipulating the amount of information that needs to be encoded into VLTM at the same time. Here, by manipulating the number and the quality of visual information that needs to be encoded into VLTM simultaneously, we found that VSTM capacity predicts the “bandwidth” of VLTM encoding due to a shared encoding bottleneck.

Individual difference approach to examine the influence of VSTM capacity on encoding of VLTM

VSTM allows us to actively represent a limited amount of visual information in mind at a given time (Cowan, 2001; K. Fukuda, Awh, & Vogel, 2010a; Luck & Vogel, 2013). Although it is currently under debate as to how we should best characterize this capacity limitation (K. C. S. Adam, Vogel, & Awh, 2017; Bays & Husain, 2008; Fougnie, Asplund, & Marois, 2010; Luck & Vogel, 1997; Ma, Husain, & Bays, 2014; Rouder et al., 2008; van den Berg & Ma, 2018; Wilken & Ma, 2004; Zhang & Luck, 2008), researchers agree that, at a given moment, individuals on average can represent three to four simple objects worth of information in VSTM in a precise enough format to inform their decisions on what they remember. In other words, when individuals are presented with a visual display that contains more than three to four simple objects worth of information to remember, their VSTM representation of some parts of the display becomes incomplete or too imprecise to make accurate judgments about what they remember.

Furthermore, individuals reliably differ in their VSTM capacity (K. C. Adam, Mance, Fukuda, & Vogel, 2015; Awh, Barton, & Vogel, 2007; Cowan et al., 2005; K. Fukuda, Vogel, Mayr, & Awh, 2010b; K. Fukuda, Woodman, & Vogel, 2015; Shipstead, Harrison, & Engle, 2015); some individuals can represent four or more objects worth of information, while others can represent as little as two or fewer objects worth of information in a precise enough format to inform their decisions about what they remember. Here, we took these reliable individual differences in VSTM capacity to our advantage to test whether VSTM capacity determines the “bandwidth” of VLTM encoding. If VSTM capacity determines the amount of information successfully encoded into VLTM, individuals with high capacity should encode more items at a given time than those with lower capacity. Critically, this relationship should only emerge when the amount of visual information to encode saturated their VSTM (e.g., above Set Size 3 or 4).^{Footnote 1}

Experiments 1a and 1b: VSTM capacity predicts object VLTM encoding when VSTM is saturated

In Experiments 1a and 1b, we focused on the encoding of a relatively simple form of VLTM—namely, the object VLTM. After measuring individuals’ VSTM capacity, we had participants encode a varying number of pictures of real objects at a time. Subsequently, participants’ VLTM for the encoded pictures were assessed. If VSTM capacity determines the bandwidth of VLTM encoding, we should expect that individual differences in VSTM capacity predict the VLTM performance only when the encoding set size saturated individuals’ VSTM capacity. To examine the effect of encoding intention on VLTM encoding, we ran the same experiment in both an incidental learning condition (i.e., individuals were unaware of the object recognition task; Experiment 1a) and an intentional learning condition (i.e., individuals were informed about the object recognition task prior to the object encoding task; Experiment 1b).

Method

Participants

After signing the consent form approved by the Institutional Review Board, 55 students at the University of Oregon (28 for Experiment 1a and 27 for Experiment 1b) with normal (or corrected-to-normal) vision participated for the introductory psychology course credits.

Power calculation

In order to test our key prediction about the effect of VSTM capacity on VLTM encoding, we conducted a repeated-measures ANOVA, with one within-subjects factor of set size and one between-subjects factor of intention for learning. Anticipating that we will obtain a moderate effect size (i.e., f = 0.25; J. Cohen, 1988) of set size, the a priori-power calculation with alpha level of 0.05, the statistical power of 0.8, and 0.6 correlation coefficients among the repeated measures, indicated that we would need 24 subjects (Faul, Erdfelder, Lang, & Buchner, 2007). This assures that our sample size was sufficient to detect a moderate size effect with 0.8 statistical power.

As for the correlational analyses, we predicted that there will be a strong correlation (r = .6) between individuals’ VSTM capacity and VLTM performance for the objects presented in the supracapacity set size (i.e., Set Size 6). This is because of the causal role that we hypothesized VSTM capacity plays in VLTM encoding. Based on this assumption, we would have needed 19 participants to reliably observe the result with the statistical power of 0.8. This assures that our sample size was sufficient to observe the targeted effect.

For the comparison of the correlational strengths for individuals’ VSTM capacity and VLTM performance across set sizes, we were not able to estimate the sufficient sample size to reliably observe the results for our within-subjects design. However, the power for detecting the difference in correlation strengths increases when two correlations share one variable in common and the correlation between the other variables is available (Steiger, 1980). That is exactly how our experiments were designed.

Bayes factor analysis

In addition, to appreciate the statistical significance and nonsignificance of our results, we used JASP software (JASP Team, 2019) and calculated Bayes factor using a default parameter setting (Cauchy prior centered on zero with a scale = 0.707). BF₁₀ denotes the odds ratio favoring the alternative hypothesis over the null hypothesis, and BF₀₁ denotes the odds ratio favoring the null hypothesis over the alternative hypothesis.

Stimuli and procedure

Color change detection task

A standard color change detection task was administered first to measure individuals’ VSTM capacity (see Fig. 1). In this task, either four or eight colored squares (1.15° × 1.15°) were presented for 150 ms on the screen with a gray background (memory array), and individuals were instructed to remember as many of them as possible over a 900-ms retention interval during which the screen remained blank. Then, one colored square was presented at one of the original locations in the memory array (test array), and participants judged if it was the same colored square as the original square presented at that location with a button press (“Z” if they thought it was the same, and “/” if different). The test array remained on the screen until their response. The change frequency was 50% to make sure that any response bias would neither benefit nor penalize their performance. The colors of the memory array were randomly selected from a highly discriminable set of nine colors (red, green, blue, yellow, magenta, cyan, orange, black, and white) without replacement. Participants performed 60 trials each for Set Sizes 4 and 8 conditions in a pseudorandom order.

Object encoding task

Then, participants performed the object encoding task (see Fig. 2). This task was identical to the color change detection task except for two modifications. First, the stimuli presented were pictures of real objects (mean radius = 4.9°) borrowed from Brady and colleagues study (2008), and second, the tested set sizes were two, four, and six. Pictures were selected from a set of 2,400 different pictures without replacement so that none of the pictures appeared on the memory arrays were presented more than once during the encoding task. Participants performed 40 trials each for Set Sizes 2, 4, and 6 in a pseudorandom order.

Object recognition task

Following the encoding phase, participants performed the object recognition task (see Fig. 2). In this task, participants were presented with one picture of real objects (mean radius = 4.9°), and they were asked to judge, with a button press, if it was a picture that was presented anytime, anywhere during the encoding phase (“O” for “Old” or studied, and “N” for “New” or never seen). The picture stayed on the screen until their response. Forty previously presented (old) pictures for each set size and 120 new pictures were tested in a pseudorandom order. Of note, a picture that was tested during the object encoding task was never tested in this task.

Results

Color change detection task

First of all, individuals’ performance on the color change detection task was converted to VSTM capacity estimate for each set size (K4 for Set Size 4 and K8 for Set Size 8) using a standard formula (Cowan, 2001). K4 and K8 were averaged to compute a single metric for individuals’ VSTM capacity estimates (Kcolor). The mean Kcolor score was 2.6 (SD = 0.87) and 2.7 (SD = 0.71) for Experiments 1a and 1b, respectively. For a demonstrative purpose, individuals were divided by a median split, into high K (mean K = 3.4, SD = 0.52 for Experiment 1a, and mean K = 3.3, SD = 0.44 for Experiment 1b) and low K (mean K = 1.9, SE = 0.47 for Experiment 1a, and mean K = 2.1, SE = 0.40 for Experiment 1b) groups (see Fig. 3).

Object encoding task

In the object encoding task, the change detection accuracy for each set size was converted to the capacity estimate. The capacity estimate for each set size was K2 = 1.7 (SD = 0.26), K4 = 2.0 (SD = 0.94), and K6 = 2.2 (SD = 0.92) for Experiment 1a, and K2 = 1.6 (SD = 0.21), K4 = 2.1 (SD = 0.50), and K6 = 1.7 (SD = 1.01), for Experiment 1b (see Fig. 3). The results were analyzed by a repeated-measures ANOVA with two factors (learning intention, set size). As expected, there was a significant set size effect, F(2, 106) = 5.85, p < .01, η_p² = 0.1, BF₁₀= 3.53. In other words, the K estimates increased from Set Size 2 to Set Size 4 and stopped increasing thereafter (as supported by marginally significant linear, F(1, 53) = 3.4, p = .07, η_p² = 0.06, and significant quadratic, F(1, 51) = 8.9, p < .01, η_p² = 0.14, effects. There was no main effect of learning intention, F(1, 53) = 1.1, ns, BF₀₁= 2.71. Furthermore, we examined the correlation between individuals’ VSTM capacity estimated from the color change detection task and from the object encoding task. Here, we found that there was a significant positive correlation between the two estimates for both Experiment 1a (r = .65, p < .01) and 1b (r = .49, p < 0.01). This suggests that VSTM capacity estimated by the canonical color change detection task predicted the amount of object representations that participants encoded into their VSTM.

Object recognition task

Here, we measured individuals’ corrected recognition performance (Pr = hit rate − false alarm) for each set size. The hit rate was calculated as the proportion of correct responses for old trials, and the false alarm was calculated as the proportion of incorrect responses for new trials. The results were first analyzed by a repeated measures ANOVA with two factors (learning intention, set size; see Fig. 3). First of all, there was a strong set size effect, F(2, 106) = 22.0, p < .001, η_p² = 0.29, BF₁₀= 8.60×10⁵). In other words, Pr scores decreased from Set Size 2 to Set Size 4 and stopped decreasing thereafter (as supported by both significant linear, F(1, 53) = 37.5, p < .001, η_p² = 0.41, and quadratic, F(1, 53) = 5.6, p < .05, η_p² = 0.1, effects. There was no main effect of learning intention, F(1, 53) = 2.61, ns, BF₀₁= 1.26. Importantly, this set-size-dependent reduction in the recognition accuracy does not necessarily mean that participants encoded a smaller number of visual objects from the display as the set size increased. That is, even if one can remember two objects, regardless of the set size, the likelihood that the encoded objects get tested in the recognition test decreases as the set size increase beyond two. Rather, our results demonstrate that there is a capacity limit in how much information we can encode into VLTM at a given time.

Next, we examined the correlations between individuals’ VSTM capacity and corrected recognition performance. Here, we found that although the correlations between the capacity estimate and the recognition performance was not significant for subcapacity set size (r = .19, ns, and r = .20, ns, for Set Size 2 in Experiments 1a and 1b, respectively), they became stronger as set size surpassed their VSTM capacity (r = .47, p < .01, and r = .25, ns, for Set Size 4 in Experiments 1a and 1b, respectively; r = .67, p < .01, and r = .43, p < .05, for Set Size 6 in Experiments 1a and 1b, respectively). Critically, Steiger’s (1980) Z test revealed that the difference in the strength of the correlation with VSTM capacity for subcapacity set size (i.e., Set Size 2) and supracapacity set size (i.e., Set Size 6) was statistically reliable for both experiments. (Steiger’s Z test: Z = 2.74, p < .01 for Experiment 1a with correlation between VLTM for Set Size 2 and Set Size 6 = 0.42; Steiger’s Z test: Z = 2.17, p < .05 for Experiment 1b with correlation between VLTM for Set Size 2 and Set Size 6 = 0.69). Thus, these results confirmed our hypothesis that VSTM capacity predicts the “bandwidth” of VLTM encoding.

Discussion

In Experiments 1a and 1b, we confirmed the first and the most basic corollary of our “bandwidth” account of VLTM encoding irrespective of the intention for learning. That is, VSTM capacity predicts the amount of VLTM encoded from a display only when the displayed information exceeds individuals’ VSTM capacity. This specificity is critical because it negates the possibility that high capacity individuals were better at any memory tasks because they are those who tend to be more motivated.

Experiment 2a and 2b: VSTM capacity predicts relational VLTM encoding when VSTM is saturated

To extend our finding to arguably different types of VLTM, we investigated the role of VSTM capacity in creating relational VLTM. Relational memory refers to the memory of interrelations among multiple memory representations, and researchers have argued that it has a specific reliance on the hippocampus and related medial temporal lobe regions (N. J. Cohen, Poldrack, & Eichenbaum, 1997; Davachi, 2006; Davachi & Wagner, 2002; Hannula & Ranganath, 2008; Kumaran & Maguire, 2005; Prince, Daselaar, & Cabeza, 2005; Squire, 1992). The experimental design was very similar to Experiment 1. After the measurement of VSTM capacity, participants proceeded to the relational encoding task followed by a VLTM recognition test. Here, we chose the arrays of colored squares as the relational stimuli because, unlike pictures of real objects, each array is nearly identical in terms of their components (i.e., a selection of squares from nine possible colors), but the difference is determined by the relative positions of the squares (i.e., where is the red square in relation to the blue square?). Therefore, to perform well on the later recognition test, it is critical to have encoded the relational information of the squares. If VSTM capacity also determines the “bandwidth” of VLTM encoding for relational information, individuals’ VSTM capacity should be positively correlated with the VLTM recognition performance only when their VSTM capacity is saturated during encoding (i.e., Set Size 8). Similarly to the previous experiments, we ran two versions of the same studies to test both incidental (Experiment 2a) and intentional (Experiment 2b) learning.