Retrieval mode distinguishes the testing effect from the generation effect
Introduction
A venerable way to study the nature of retrieval processes is to examine the effect of one retrieval on another. Experiments in which subjects repeatedly retrieve and reconstruct the past have a long history in memory research (e.g., Ballard, 1913, Bartlett, 1932, Brown, 1923, Gates, 1917, Tulving, 1967; see too Payne, 1987). Recently the effects of repeatedly testing memory have captured the attention of contemporary researchers who are interested in educational applications of retrieval practice (for overviews see McDaniel et al., 2007, Metcalfe and Kornell, 2007, Pashler et al., 2007, Roediger and Karpicke, 2006a). This renewed interest has also led to new examinations of the nature of the mnemonic effects of retrieval, which is the focus of this paper.
On the surface the testing effect seems to share many similarities with the generation effect (Jacoby, 1978, Slamecka and Graf, 1978). Consider the typical design of experiments that examine either the effects of retrieval practice (testing) or generation on subsequent retention. In most retrieval practice experiments, subjects study materials like word lists or text passages and take an initial test. The initial test often involves recall but recognition and multiple-choice tests have also been used. The effect of initial retrieval is assessed on a later criterial test which may be the same or different format as the first test and may occur relatively immediately or at a longer retention interval. The key finding is that practicing retrieval on the initial test enhances performance on the criterial test relative to a control condition where subjects repeatedly study to equate nominal exposure time to the materials in the two conditions (for review see Roediger & Karpicke, 2006a).
Now consider the design of a prototypical generation effect experiment. Here subjects are induced to generate items during the initial learning phase. This may be accomplished in a variety of ways—subjects might be asked to complete a fragment of a target word (fr_ _nd) or produce the target when given an antonym as a cue (generate friend when given enemy as a cue) or told to unscramble an anagram to form the target word (generate friend when given fndrie). The effect of generation is typically assessed on a criterial test of free recall or recognition. The key finding is that generation often enhances performance on the criterial test relative to a control condition where subjects read words intact (Jacoby, 1978, Slamecka and Graf, 1978; for review see Bertsch, Pesta, Wiscott, & McDaniel, 2007). There are important boundary conditions to the generation effect and we will discuss them shortly.
The question we asked in this research was this: Are there any meaningful differences between the testing effect and the generation effect? Many authors continue to lump together the testing effect and the generation effect with good reason—there is currently no well-developed empirical or theoretical basis to distinguish the effects (cf. Carrier & Pashler, 1992). In addition a number of researchers have espoused the benefits of retrieval practice for student learning (Karpicke and Roediger, 2008, McDaniel et al., 2007, Metcalfe and Kornell, 2007, Pashler et al., 2007) but this has not yet garnered wide support in education. In contrast the effectiveness of generative learning activities has been largely embraced in educational circles (see Mayer, 2008; see too Chi, 2000, King, 1994, Wittrock, 1974, Wittrock, 1989). We point this out because any distinction between generation and retrieval practice would have not only theoretical implications but also practical implications for learning in educational contexts.
Two observations motivated this research. The first observation is that the instructions given to subjects in generation effect experiments differ from the instructions in retrieval practice experiments. In most generation effect experiments the subjects are instructed to generate target items and can rely on any strategy that might accomplish this task. Many of the tasks used to induce generation (like completing a word fragment, or producing a word that is conceptually related to a cue, or unscrambling an anagram) are similar to implicit memory tests because these generation tasks do not involve intentional retrieval. Subjects are instructed to produce target items but are not required to think back to a prior episode or experience (Graf and Schacter, 1985, Roediger and McDermott, 1993, Schacter, 1987). In contrast, in experiments that examine retrieval practice the subjects are instructed to retrieve items that occurred in a study episode. Most retrieval practice tasks involve intentional retrieval—subjects are instructed to reconstruct knowledge about a study event that occurred in a particular place at a particular time (retrieve the words from a particular list or recall ideas from a particular text). Recall tests likely involve generation but generation does not necessarily involve recovering the spatiotemporal context in which an event occurred.
The difference between generation and retrieval instructions is essentially parallel to the distinction between incidental and intentional retrieval and constitutes a difference in what Tulving (1983) called retrieval mode. Subjects are thought to be in an episodic retrieval mode when told to think back to the past as they are on explicit memory tests. Moreover, subjects are thought to process retrieval cues differently in this cognitive state than they do under incidental retrieval conditions where subjects do not consciously think back to the past (as on implicit memory tests; see Graf and Schacter, 1985, Roediger and Blaxton, 1987). It is possible to hold all conditions and test cues constant and manipulate only retrieval mode by varying the instructions given to subjects (cf. to the retrieval intentionality criterion; see Schacter, Bowers, & Booker, 1989; see too Roediger, Weldon, Stadler, & Riegler, 1992). In the experiments reported here we examined whether the retrieval mode engaged in by subjects—intentional vs. incidental retrieval—would differentiate the testing effect from the generation effect.
The second observation that motivated this research is that generation effects are sensitive to aspects of experimental design that do not impact the testing effect. Specifically when the final criterial test involves free recall, generation effects are often found in mixed-list (within-subject) designs but not in pure-list (often between-subject) designs (see Begg and Snider, 1987, Hirshman and Bjork, 1988, Schmidt and Cherry, 1989, Slamecka and Katsaiti, 1987). On the contrary, testing effects are found in both within- and between-subject experiments when the final test involves free recall. For example, Carpenter, Pashler, and Vul (2006) and Roediger and Karpicke (2006b) obtained testing effects with both within- and between-subjects designs and with free recall as the criterial measure.
With respect to the generation effect, one explanation of the moderating influence of list composition is the item-order account first proposed by Nairne and colleagues (Nairne, Riegler, & Serra, 1991; see McDaniel & Bugg, 2008). This account is conceptually similar to other tradeoff or multifactor accounts of the generation effect, though differences between the accounts have been discussed elsewhere (see Hirshman and Bjork, 1988, McDaniel et al., 1990, McDaniel et al., 1988, Mulligan and Lozito, 2004). The idea behind the item-order account is that subjects encode attributes or features pertaining to the individual items in a list and to the order in which the items occurred. On a free recall test subjects use order information as a structure to guide retrieval of target candidates and use item information to discriminate which items actually occurred in a prior study episode (for elaboration of these ideas see Crowder, 1979, Hunt and Einstein, 1981, Hunt and McDaniel, 1993, Mandler, 1969, Nairne, 2006, Postman, 1972, Underwood, 1969). When subjects are required to generate items during learning this enhances the processing of item-specific features but disrupts the encoding of order information (Nairne et al., 1991). Therefore in mixed lists that contain both generated and read (intact) items, the generated items benefit from enhanced item processing and both types of item suffer from disrupted order processing. The result is a generation effect—better free recall of generated items than read items in mixed-list designs (Serra & Nairne, 1993; see too Gardiner and Arthurs, 1982, Slamecka and Graf, 1978, Slamecka and Katsaiti, 1987).
But the story is different in pure-list designs. A pure list of generated items benefits from enhanced item processing but suffers from disrupted order processing. In contrast, a pure list of read items does not benefit from enhanced item processing but also does not suffer from disrupted order processing. Thus there is often no difference in free recall of read vs. generated lists, and in fact sometimes there is an advantage of reading over generating (e.g. Nairne et al., 1991, Schmidt and Cherry, 1989). The enhanced item processing that occurs in a pure list of generated items is not sufficient to counteract disrupted order processing and produce an advantage in free recall relative to a pure list of read items.
Generation effects are clearly sensitive to experimental design but retrieval practice effects do not appear to depend on this factor. Several prior studies have shown that practicing retrieval produces greater retention than repeated study in pure-list, between-subject designs that employ final free recall (see Carpenter, 2009, Carpenter and DeLosh, 2006, Hogan and Kintsch, 1971, Karpicke and Roediger, 2007, Roediger and Karpicke, 2006b, Thompson et al., 1978, Wheeler et al., 2003). The effects of list composition have not been examined as rigorously in the testing effect literature as they have been in the generation effect literature but a few studies have addressed the issue directly. Namely, Carpenter et al. (2006) examined both pure- and mixed-lists and showed positive testing effects on final free recall with both types of design (see too Carpenter, 2009).
In sum, we have two reasons to suspect there may be important differences between engaging in generation and practicing retrieval. First, generation conditions often involve incidental retrieval while retrieval practice conditions involve intentional retrieval. When subjects practice retrieval they must think back to and attempt to reconstruct what happened in a prior study episode. In contrast, subjects do not need to be in an episodic retrieval mode to complete a generation task. Second, generation effects depend on aspects of the experimental design in ways that retrieval practice effects do not. This is especially true when the criterial measure involves free recall. Adopting the perspective of the item-order framework might make it possible to identify the locus of any differences between generating and retrieving.
In the four experiments reported here we sought to determine whether manipulating retrieval mode—by giving subjects either incidental or intentional retrieval instructions—would distinguish the testing effect from the generation effect. Our aim was to hold all aspects of the procedure constant and manipulate only whether subjects incidentally generated or intentionally recalled during the critical generate/recall phase. This presented a handful of methodological challenges. First, subjects typically do not study items prior to generating them in most generation effect experiments but subjects do study items prior to recalling them on an initial test in testing effect experiments. Of course, including a study episode for a recall condition but not for a generate (or read) condition would confound the experiment. Thus the subjects in all conditions experienced the target words under incidental learning conditions in an initial exposure phase prior to the read/generate/recall manipulation.
Second, it was critical to create conditions where manipulating incidental vs. intentional retrieval would not affect performance on the initial test. Any difference in performance on the initial test would cloud interpretation of differences observed on a subsequent criterial test (for elaboration see Underwood’s (1964) classic paper). Fortunately, several prior studies have demonstrated that it is possible to hold all test cues constant and manipulate only incidental vs. intentional retrieval instructions and observe virtually identical levels of performance in the two instructional conditions (e.g. see Geraci and Rajaram, 2002, Hamilton and Rajaram, 2001, Roediger et al., 1992). The materials and tasks used in the present experiments were designed to produce equivalent levels of performance in the initial “Generate” and “Recall” conditions (that is, under incidental or intentional retrieval instructions).
Finally, we suspected that using materials that afforded easy generation of target words (e.g. pairs of antonyms) would encourage subjects to use an incidental retrieval strategy rather than intentional retrieval even when subjects were instructed to do the latter. Thus the materials used in the present experiments were somewhat more difficult than materials commonly used in generation effect experiments. While many generation effect experiments see initial generation performance above 90%, initial performance was closer to 75% in the present experiments. The intent was to insure that subjects could successfully generate targets under incidental retrieval instructions but that subjects would in fact think back to the prior study episode when given intentional retrieval instructions.
The general procedure was similar in each experiment. In an initial exposure phase (Phase 1) subjects viewed a list of words (e.g., love, diet) under incidental learning conditions. Then in Phase 2 one of three things happened. In a Read condition the subjects read the intact target words paired with related cue words (e.g., heart – love, eat – diet). In Generate and Recall conditions the subjects were given fragments of the target words paired with cue words (e.g., heart – l_v_, eat – di_ _) and instructed to complete the fragment. The only difference between the Generate and Recall conditions was the instructions. Subjects in the Generate condition were told to complete the fragment with the first word that came to mind that successfully completed it. Subjects in the Recall condition, in contrast, were told to use the fragment as a cue to help them recall a word that occurred in the first part of the experiment. Thus subjects in the Recall condition were placed in an episodic retrieval mode while subjects in the Generate condition were not. Finally, in Phase 3 the subjects were given a criterial test of free recall (Experiments 1 and 2) or recognition (Experiments 3 and 4).
Section snippets
Experiment 1
The purpose of Experiment 1 was to see if practicing retrieval would produce effects on future retention that differed from the effects of generation. Subjects either read, or generated, or recalled word pairs. A pure-list between-subjects design was used and free recall was the criterial measure. As described above, there are often no generation effects in such designs but testing effects are observed in similar designs. In Experiment 1 the cues on the initial test were held constant. The only
Initial generation/recall
Table 1 shows the proportion of targets correctly produced in the initial generate/recall phase. The proportion was nearly identical in the Generate and Recall conditions (.71 vs. .72, F < 1). Response times for correct responses (correctly generated or recalled targets) averaged 4.12 s and 4.04 s in the Generate and Recall conditions respectively (F < 1). Neither mean was significantly different from the 4 s presentation rate in the Read condition (Fs < 1). Finally the proportion of alternate targets
Discussion
The key finding in Experiment 1 was that there was no generation effect but there was a testing effect. That is, in a pure-list between-subject design, practicing retrieval produced a significant advantage in final free recall relative to reading but generating produced no effect. These patterns of results were suggested by prior research and they are captured here within a single experiment. The independent variable that distinguished retrieval practice from generation was retrieval
Experiment 2
Experiment 1 showed that there is a clear difference between retrieval practice and generation and that the difference depends on intentional vs. incidental retrieval instructions. Indeed Experiment 1 demonstrated a scenario where a “generative” learning task produced no benefit over reading but retrieval practice produced a significant benefit. At this point we do not know exactly what is responsible for the different mnemonic effects of generating vs. retrieving. In the context of the
Initial generation/recall
Table 3 shows the proportion of targets correctly produced in the initial generate/recall phase. As was true in Experiment 2 the proportion was nearly identical in the Generate and Recall conditions (.76 vs. .75, F < 1). Response times for correct responses averaged 4.68 s and 4.69 s in the Generate and Recall conditions respectively (F < 1). The mean response time in Experiment 2 (4.68 s) was significantly greater than the 4 s presentation rate in the Read condition (F(1, 39) = 7.34, ). The
Discussion
Experiment 2 provides a conceptual replication of Experiment 1 with a mixed-list design. There was a generation effect, replicating prior work with mixed lists, and there was also a retrieval practice effect. But most importantly retrieval practice produced greater final recall than generating. Based on the item-order tradeoff theory, the generation effect occurs in a mixed list because generation enhances item-specific processing but disrupts retention of order information for both generated
Experiment 3
Experiments 1 and 2 established that retrieval mode distinguishes the testing effect from the generation effect when the final criterial test involves free recall. The purpose of Experiment 3 was twofold. The first purpose was to see if the superiority of retrieval practice to generation would also be observed in a final item recognition memory test, a test which is presumably more sensitive to item information than to order information (Hunt and Einstein, 1981, Nairne et al., 1991). The second
Initial generation/recall
Table 5 shows the proportion of targets correctly produced in the initial generate/recall phase. The proportion was nearly identical in the Generate and Recall conditions (.73 vs. .75, F < 1). Response times for correct responses averaged 4.52 s and 4.69 s in the Generate and Recall conditions respectively (F < 1). The mean response time in Experiment 2 (4.61 s) was not significantly greater than the 4.5 s presentation rate in the Read condition (F < 1). The proportion of alternate targets produced was
Discussion
Experiment 3 extended the findings from Experiments 1 and 2 to final recognition. Both generation and retrieval practice produced positive effects on subsequent recognition. There was also an advantage of retrieval practice relative to generation, conceptually replicating the results of Experiment 1 and 2, though the difference did not reach significance. Importantly, performance on the order reconstruction test was better in the Read condition than in the Generate or Recall conditions which
Experiment 4
The purpose of the final experiment was to examine the effect of reading, generating, or recalling items on final recognition using a mixed-list design. In Phase 2 half the items were presented intact in Read trials and half were presented as fragments in Generate or Recall trials. The instruction to generate or recall items was manipulated between-subjects (just as was done in Experiment 2). The criterial test in Phase 3 was a yes/no recognition test. The prediction was that there would be a
Initial generation/recall
Table 7 shows the proportion of targets correctly produced in the initial generate/recall phase. There was no difference between the Generate and Recall conditions (.77 vs. .77, F < 1). Response times for correct responses averaged 4.47 s and 4.90 s in the Generate and Recall conditions respectively (F < 1). The mean response time in Experiment 2 (4.65 s) was not significantly greater than the 4.5 s presentation rate for Read items (F < 1). Finally, the proportion of alternate targets produced was .13
Discussion
Experiment 4 used a mixed-list design and showed positive effects of generation and retrieval practice on a final recognition test. The key finding from the experiment was that retrieval practice also produced significantly better recognition performance than generation. The results lend further support to the idea that intentional retrieval in the retrieval practice condition produced greater item-specific processing than incidental retrieval in the generate condition—and consequently
General discussion
These four experiments have clearly established that there is an important difference between generating during learning and retrieving during learning and that the difference originates from retrieval mode. The Generate and Recall conditions in these experiments held all test cues constant and differed only in the instructions given to subjects. Intentional retrieval in the Recall condition consistently produced greater retention than incidental retrieval in the Generate conditions.
One
Acknowledgments
We thank Siara Saliu, Ben Borgmann, Anna Crow, and Kayla Balensiefer for helping collect the data. We also thank James Nairne and Dan Burns for helpful comments.
References (70)
Similarity and order in memory
- et al.
The orthographic distinctiveness effect on direct and indirect tests of memory: Delineating the awareness and processing requirements
Journal of Memory and Language
(2002) - et al.
Effects of generation on memory for order
Journal of Memory and Language
(1998) - et al.
The concreteness effect in implicit and explicit memory tests
Journal of Memory and Language
(2001) - et al.
Differential effects of study and test trials on long-term recognition and recall
Journal of Verbal Learning and Verbal Behavior
(1971) - et al.
Relational and item-specific information in memory
Journal of Verbal Learning and Verbal Behavior
(1981) - et al.
The enigma of organization and distinctiveness
Journal of Memory and Language
(1993) On interpreting the effects of repetition: Solving a problem versus remembering a solution
Journal of Verbal Learning and Verbal Behavior
(1978)Dissociating automatic and consciously controlled effects of study/test compatibility
Journal of Memory and Language
(1996)- et al.
Repeated retrieval during learning is the key to long-term retention
Journal of Memory and Language
(2007)
A contextual account of the generation effect: A three-factor theory
Journal of Memory and Language
Self-generation and memory
The generation effect as an artifact of selective displaced rehearsal
Journal of Memory and Langauge
The effects of presentation and recall of material in free-recall learning
Journal of Verbal Learning and Verbal Behavior
Oblivescence and reminiscence
British Journal of Psychology
The English lexicon project
Behavior Research Methods
Remembering: A study in experimental and social psychology
The generation effect: Evidence for generalized inhibition
Journal of Experimental Psychology: Learning, Memory, and Cognition
The generation effect: A meta-analytic review
Memory & Cognition
To what extent is memory measured by a single recall?
Journal of Experimental Psychology
The generation effect: A test between single and multifactor theories
Journal of Experimental Psychology: Learning, Memory, and Cognition
The efficient assessment of need for cognition
Journal of Personality Assessment
Cue strength as a moderator of the testing effect: The benefits of elaborative retrieval
Journal of Experimental Psychology: Learning, Memory, and Cognition
Impoverished cue support enhances subsequent retention: Support for the elaborative retrieval explanation of the testing effect
Memory & Cognition
What types of learning are enhanced by a cued recall test?
Psychonomic Bulletin & Review
The influence of retrieval on retention
Memory & Cognition
Self-explaining expository texts: The dual processes of generating inferences and repairing mental models
Processing strategies and the generation effect: Implications for making a better reader
Memory & Cognition
Generation effects and the lack thereof: The role of transfer-appropriate processing
Memory
Encoding context and the generation effect in multitrial free-recall learning
Canadian Journal of Psychology
Recitation as a factor in memorizing
Archives of Psychology
Implicit and explicit memory for new associations in normal and amnesic subjects
Journal of Experimental Psychology: Learning, Memory, and Cognition
The generation effect: Support for a two-factor theory
Journal of Experimental Psychology: Learning, Memory, and Cognition
Metacognitive control and strategy selection: Deciding to practice retrieval during learning
Journal of Experimental Psychology: General
Cited by (125)
Sources and goals in memory and language: Fragility and robustness in event representation
2024, Journal of Memory and LanguageHow does divided attention hinder different stages of episodic memory retrieval?
2023, Current Research in Behavioral SciencesRetrieval practice enhances learning and memory retention of French words in Chinese-English bilinguals
2022, LinguaCitation Excerpt :For example, several studies have suggested that retrieving target words from memory based on cue words enhanced long-term retention more than repeated studying of word pairs for foreign vocabulary learning (Carpenter, 2009; Carpenter and DeLosh, 2006; Carrier and Pashler, 1992; Karpicke and Roediger, 2008; Karpicke and Zaromb, 2010). The phenomenon that retrieval practice yields larger gains in long-term retention than repeated studying is called the retrieval practice effect, which usually occurs during testing, and thus is also called the testing effect (Karpicke and Zaromb, 2010; Rowland, 2014; van den Broek et al., 2013; van den Broek et al., 2016). Typically, a study on the retrieval practice effect consists of three phases (Rickard and Pan, 2018; van den Broek et al., 2016).
Multiple Choice vs. Fill-In Problems: The Trade-off Between Scalability and Learning
2024, ACM International Conference Proceeding SeriesEffects of encoding type and retention interval on emotional memory
2024, Current Psychology