Retrieval mode distinguishes the testing effect from the generation effect

https://doi.org/10.1016/j.jml.2009.11.010Get rights and content

Abstract

A series of four experiments examined the effects of generation vs. retrieval practice on subsequent retention. Subjects were first exposed to a list of target words. Then the subjects were shown the targets again intact for Read trials or they were shown fragments of the targets. Subjects in Generate conditions were told to complete the fragments with the first word that came to mind while subjects in Recall conditions were told to use the fragments as retrieval cues to recall words that occurred in the first part of the experiment. The instruction manipulated retrieval mode—the Recall condition involved intentional retrieval while the Generate condition involved incidental retrieval. On a subsequent test of free recall or recognition, initial recall produced better retention than initial generation. Both generation and retrieval practice disrupted retention of order information, but retrieval enhanced retention of item-specific information to a greater extent than generation. There is a distinction between the testing effect and the generation effect and the distinction originates from retrieval mode. Intentional retrieval produces greater subsequent retention than generating targets under incidental retrieval instructions.

Introduction

A venerable way to study the nature of retrieval processes is to examine the effect of one retrieval on another. Experiments in which subjects repeatedly retrieve and reconstruct the past have a long history in memory research (e.g., Ballard, 1913, Bartlett, 1932, Brown, 1923, Gates, 1917, Tulving, 1967; see too Payne, 1987). Recently the effects of repeatedly testing memory have captured the attention of contemporary researchers who are interested in educational applications of retrieval practice (for overviews see McDaniel et al., 2007, Metcalfe and Kornell, 2007, Pashler et al., 2007, Roediger and Karpicke, 2006a). This renewed interest has also led to new examinations of the nature of the mnemonic effects of retrieval, which is the focus of this paper.

On the surface the testing effect seems to share many similarities with the generation effect (Jacoby, 1978, Slamecka and Graf, 1978). Consider the typical design of experiments that examine either the effects of retrieval practice (testing) or generation on subsequent retention. In most retrieval practice experiments, subjects study materials like word lists or text passages and take an initial test. The initial test often involves recall but recognition and multiple-choice tests have also been used. The effect of initial retrieval is assessed on a later criterial test which may be the same or different format as the first test and may occur relatively immediately or at a longer retention interval. The key finding is that practicing retrieval on the initial test enhances performance on the criterial test relative to a control condition where subjects repeatedly study to equate nominal exposure time to the materials in the two conditions (for review see Roediger & Karpicke, 2006a).

Now consider the design of a prototypical generation effect experiment. Here subjects are induced to generate items during the initial learning phase. This may be accomplished in a variety of ways—subjects might be asked to complete a fragment of a target word (fr_ _nd) or produce the target when given an antonym as a cue (generate friend when given enemy as a cue) or told to unscramble an anagram to form the target word (generate friend when given fndrie). The effect of generation is typically assessed on a criterial test of free recall or recognition. The key finding is that generation often enhances performance on the criterial test relative to a control condition where subjects read words intact (Jacoby, 1978, Slamecka and Graf, 1978; for review see Bertsch, Pesta, Wiscott, & McDaniel, 2007). There are important boundary conditions to the generation effect and we will discuss them shortly.

The question we asked in this research was this: Are there any meaningful differences between the testing effect and the generation effect? Many authors continue to lump together the testing effect and the generation effect with good reason—there is currently no well-developed empirical or theoretical basis to distinguish the effects (cf. Carrier & Pashler, 1992). In addition a number of researchers have espoused the benefits of retrieval practice for student learning (Karpicke and Roediger, 2008, McDaniel et al., 2007, Metcalfe and Kornell, 2007, Pashler et al., 2007) but this has not yet garnered wide support in education. In contrast the effectiveness of generative learning activities has been largely embraced in educational circles (see Mayer, 2008; see too Chi, 2000, King, 1994, Wittrock, 1974, Wittrock, 1989). We point this out because any distinction between generation and retrieval practice would have not only theoretical implications but also practical implications for learning in educational contexts.

Two observations motivated this research. The first observation is that the instructions given to subjects in generation effect experiments differ from the instructions in retrieval practice experiments. In most generation effect experiments the subjects are instructed to generate target items and can rely on any strategy that might accomplish this task. Many of the tasks used to induce generation (like completing a word fragment, or producing a word that is conceptually related to a cue, or unscrambling an anagram) are similar to implicit memory tests because these generation tasks do not involve intentional retrieval. Subjects are instructed to produce target items but are not required to think back to a prior episode or experience (Graf and Schacter, 1985, Roediger and McDermott, 1993, Schacter, 1987). In contrast, in experiments that examine retrieval practice the subjects are instructed to retrieve items that occurred in a study episode. Most retrieval practice tasks involve intentional retrieval—subjects are instructed to reconstruct knowledge about a study event that occurred in a particular place at a particular time (retrieve the words from a particular list or recall ideas from a particular text). Recall tests likely involve generation but generation does not necessarily involve recovering the spatiotemporal context in which an event occurred.

The difference between generation and retrieval instructions is essentially parallel to the distinction between incidental and intentional retrieval and constitutes a difference in what Tulving (1983) called retrieval mode. Subjects are thought to be in an episodic retrieval mode when told to think back to the past as they are on explicit memory tests. Moreover, subjects are thought to process retrieval cues differently in this cognitive state than they do under incidental retrieval conditions where subjects do not consciously think back to the past (as on implicit memory tests; see Graf and Schacter, 1985, Roediger and Blaxton, 1987). It is possible to hold all conditions and test cues constant and manipulate only retrieval mode by varying the instructions given to subjects (cf. to the retrieval intentionality criterion; see Schacter, Bowers, & Booker, 1989; see too Roediger, Weldon, Stadler, & Riegler, 1992). In the experiments reported here we examined whether the retrieval mode engaged in by subjects—intentional vs. incidental retrieval—would differentiate the testing effect from the generation effect.

The second observation that motivated this research is that generation effects are sensitive to aspects of experimental design that do not impact the testing effect. Specifically when the final criterial test involves free recall, generation effects are often found in mixed-list (within-subject) designs but not in pure-list (often between-subject) designs (see Begg and Snider, 1987, Hirshman and Bjork, 1988, Schmidt and Cherry, 1989, Slamecka and Katsaiti, 1987). On the contrary, testing effects are found in both within- and between-subject experiments when the final test involves free recall. For example, Carpenter, Pashler, and Vul (2006) and Roediger and Karpicke (2006b) obtained testing effects with both within- and between-subjects designs and with free recall as the criterial measure.

With respect to the generation effect, one explanation of the moderating influence of list composition is the item-order account first proposed by Nairne and colleagues (Nairne, Riegler, & Serra, 1991; see McDaniel & Bugg, 2008). This account is conceptually similar to other tradeoff or multifactor accounts of the generation effect, though differences between the accounts have been discussed elsewhere (see Hirshman and Bjork, 1988, McDaniel et al., 1990, McDaniel et al., 1988, Mulligan and Lozito, 2004). The idea behind the item-order account is that subjects encode attributes or features pertaining to the individual items in a list and to the order in which the items occurred. On a free recall test subjects use order information as a structure to guide retrieval of target candidates and use item information to discriminate which items actually occurred in a prior study episode (for elaboration of these ideas see Crowder, 1979, Hunt and Einstein, 1981, Hunt and McDaniel, 1993, Mandler, 1969, Nairne, 2006, Postman, 1972, Underwood, 1969). When subjects are required to generate items during learning this enhances the processing of item-specific features but disrupts the encoding of order information (Nairne et al., 1991). Therefore in mixed lists that contain both generated and read (intact) items, the generated items benefit from enhanced item processing and both types of item suffer from disrupted order processing. The result is a generation effect—better free recall of generated items than read items in mixed-list designs (Serra & Nairne, 1993; see too Gardiner and Arthurs, 1982, Slamecka and Graf, 1978, Slamecka and Katsaiti, 1987).

But the story is different in pure-list designs. A pure list of generated items benefits from enhanced item processing but suffers from disrupted order processing. In contrast, a pure list of read items does not benefit from enhanced item processing but also does not suffer from disrupted order processing. Thus there is often no difference in free recall of read vs. generated lists, and in fact sometimes there is an advantage of reading over generating (e.g. Nairne et al., 1991, Schmidt and Cherry, 1989). The enhanced item processing that occurs in a pure list of generated items is not sufficient to counteract disrupted order processing and produce an advantage in free recall relative to a pure list of read items.

Generation effects are clearly sensitive to experimental design but retrieval practice effects do not appear to depend on this factor. Several prior studies have shown that practicing retrieval produces greater retention than repeated study in pure-list, between-subject designs that employ final free recall (see Carpenter, 2009, Carpenter and DeLosh, 2006, Hogan and Kintsch, 1971, Karpicke and Roediger, 2007, Roediger and Karpicke, 2006b, Thompson et al., 1978, Wheeler et al., 2003). The effects of list composition have not been examined as rigorously in the testing effect literature as they have been in the generation effect literature but a few studies have addressed the issue directly. Namely, Carpenter et al. (2006) examined both pure- and mixed-lists and showed positive testing effects on final free recall with both types of design (see too Carpenter, 2009).

In sum, we have two reasons to suspect there may be important differences between engaging in generation and practicing retrieval. First, generation conditions often involve incidental retrieval while retrieval practice conditions involve intentional retrieval. When subjects practice retrieval they must think back to and attempt to reconstruct what happened in a prior study episode. In contrast, subjects do not need to be in an episodic retrieval mode to complete a generation task. Second, generation effects depend on aspects of the experimental design in ways that retrieval practice effects do not. This is especially true when the criterial measure involves free recall. Adopting the perspective of the item-order framework might make it possible to identify the locus of any differences between generating and retrieving.

In the four experiments reported here we sought to determine whether manipulating retrieval mode—by giving subjects either incidental or intentional retrieval instructions—would distinguish the testing effect from the generation effect. Our aim was to hold all aspects of the procedure constant and manipulate only whether subjects incidentally generated or intentionally recalled during the critical generate/recall phase. This presented a handful of methodological challenges. First, subjects typically do not study items prior to generating them in most generation effect experiments but subjects do study items prior to recalling them on an initial test in testing effect experiments. Of course, including a study episode for a recall condition but not for a generate (or read) condition would confound the experiment. Thus the subjects in all conditions experienced the target words under incidental learning conditions in an initial exposure phase prior to the read/generate/recall manipulation.

Second, it was critical to create conditions where manipulating incidental vs. intentional retrieval would not affect performance on the initial test. Any difference in performance on the initial test would cloud interpretation of differences observed on a subsequent criterial test (for elaboration see Underwood’s (1964) classic paper). Fortunately, several prior studies have demonstrated that it is possible to hold all test cues constant and manipulate only incidental vs. intentional retrieval instructions and observe virtually identical levels of performance in the two instructional conditions (e.g. see Geraci and Rajaram, 2002, Hamilton and Rajaram, 2001, Roediger et al., 1992). The materials and tasks used in the present experiments were designed to produce equivalent levels of performance in the initial “Generate” and “Recall” conditions (that is, under incidental or intentional retrieval instructions).

Finally, we suspected that using materials that afforded easy generation of target words (e.g. pairs of antonyms) would encourage subjects to use an incidental retrieval strategy rather than intentional retrieval even when subjects were instructed to do the latter. Thus the materials used in the present experiments were somewhat more difficult than materials commonly used in generation effect experiments. While many generation effect experiments see initial generation performance above 90%, initial performance was closer to 75% in the present experiments. The intent was to insure that subjects could successfully generate targets under incidental retrieval instructions but that subjects would in fact think back to the prior study episode when given intentional retrieval instructions.

The general procedure was similar in each experiment. In an initial exposure phase (Phase 1) subjects viewed a list of words (e.g., love, diet) under incidental learning conditions. Then in Phase 2 one of three things happened. In a Read condition the subjects read the intact target words paired with related cue words (e.g., heartlove, eatdiet). In Generate and Recall conditions the subjects were given fragments of the target words paired with cue words (e.g., heart – l_v_, eat – di_ _) and instructed to complete the fragment. The only difference between the Generate and Recall conditions was the instructions. Subjects in the Generate condition were told to complete the fragment with the first word that came to mind that successfully completed it. Subjects in the Recall condition, in contrast, were told to use the fragment as a cue to help them recall a word that occurred in the first part of the experiment. Thus subjects in the Recall condition were placed in an episodic retrieval mode while subjects in the Generate condition were not. Finally, in Phase 3 the subjects were given a criterial test of free recall (Experiments 1 and 2) or recognition (Experiments 3 and 4).

Section snippets

Experiment 1

The purpose of Experiment 1 was to see if practicing retrieval would produce effects on future retention that differed from the effects of generation. Subjects either read, or generated, or recalled word pairs. A pure-list between-subjects design was used and free recall was the criterial measure. As described above, there are often no generation effects in such designs but testing effects are observed in similar designs. In Experiment 1 the cues on the initial test were held constant. The only

Initial generation/recall

Table 1 shows the proportion of targets correctly produced in the initial generate/recall phase. The proportion was nearly identical in the Generate and Recall conditions (.71 vs. .72, F < 1). Response times for correct responses (correctly generated or recalled targets) averaged 4.12 s and 4.04 s in the Generate and Recall conditions respectively (F < 1). Neither mean was significantly different from the 4 s presentation rate in the Read condition (Fs < 1). Finally the proportion of alternate targets

Discussion

The key finding in Experiment 1 was that there was no generation effect but there was a testing effect. That is, in a pure-list between-subject design, practicing retrieval produced a significant advantage in final free recall relative to reading but generating produced no effect. These patterns of results were suggested by prior research and they are captured here within a single experiment. The independent variable that distinguished retrieval practice from generation was retrieval

Experiment 2

Experiment 1 showed that there is a clear difference between retrieval practice and generation and that the difference depends on intentional vs. incidental retrieval instructions. Indeed Experiment 1 demonstrated a scenario where a “generative” learning task produced no benefit over reading but retrieval practice produced a significant benefit. At this point we do not know exactly what is responsible for the different mnemonic effects of generating vs. retrieving. In the context of the

Initial generation/recall

Table 3 shows the proportion of targets correctly produced in the initial generate/recall phase. As was true in Experiment 2 the proportion was nearly identical in the Generate and Recall conditions (.76 vs. .75, F < 1). Response times for correct responses averaged 4.68 s and 4.69 s in the Generate and Recall conditions respectively (F < 1). The mean response time in Experiment 2 (4.68 s) was significantly greater than the 4 s presentation rate in the Read condition (F(1, 39) = 7.34, ηp2=.16). The

Discussion

Experiment 2 provides a conceptual replication of Experiment 1 with a mixed-list design. There was a generation effect, replicating prior work with mixed lists, and there was also a retrieval practice effect. But most importantly retrieval practice produced greater final recall than generating. Based on the item-order tradeoff theory, the generation effect occurs in a mixed list because generation enhances item-specific processing but disrupts retention of order information for both generated

Experiment 3

Experiments 1 and 2 established that retrieval mode distinguishes the testing effect from the generation effect when the final criterial test involves free recall. The purpose of Experiment 3 was twofold. The first purpose was to see if the superiority of retrieval practice to generation would also be observed in a final item recognition memory test, a test which is presumably more sensitive to item information than to order information (Hunt and Einstein, 1981, Nairne et al., 1991). The second

Initial generation/recall

Table 5 shows the proportion of targets correctly produced in the initial generate/recall phase. The proportion was nearly identical in the Generate and Recall conditions (.73 vs. .75, F < 1). Response times for correct responses averaged 4.52 s and 4.69 s in the Generate and Recall conditions respectively (F < 1). The mean response time in Experiment 2 (4.61 s) was not significantly greater than the 4.5 s presentation rate in the Read condition (F < 1). The proportion of alternate targets produced was

Discussion

Experiment 3 extended the findings from Experiments 1 and 2 to final recognition. Both generation and retrieval practice produced positive effects on subsequent recognition. There was also an advantage of retrieval practice relative to generation, conceptually replicating the results of Experiment 1 and 2, though the difference did not reach significance. Importantly, performance on the order reconstruction test was better in the Read condition than in the Generate or Recall conditions which

Experiment 4

The purpose of the final experiment was to examine the effect of reading, generating, or recalling items on final recognition using a mixed-list design. In Phase 2 half the items were presented intact in Read trials and half were presented as fragments in Generate or Recall trials. The instruction to generate or recall items was manipulated between-subjects (just as was done in Experiment 2). The criterial test in Phase 3 was a yes/no recognition test. The prediction was that there would be a

Initial generation/recall

Table 7 shows the proportion of targets correctly produced in the initial generate/recall phase. There was no difference between the Generate and Recall conditions (.77 vs. .77, F < 1). Response times for correct responses averaged 4.47 s and 4.90 s in the Generate and Recall conditions respectively (F < 1). The mean response time in Experiment 2 (4.65 s) was not significantly greater than the 4.5 s presentation rate for Read items (F < 1). Finally, the proportion of alternate targets produced was .13

Discussion

Experiment 4 used a mixed-list design and showed positive effects of generation and retrieval practice on a final recognition test. The key finding from the experiment was that retrieval practice also produced significantly better recognition performance than generation. The results lend further support to the idea that intentional retrieval in the retrieval practice condition produced greater item-specific processing than incidental retrieval in the generate condition—and consequently

General discussion

These four experiments have clearly established that there is an important difference between generating during learning and retrieving during learning and that the difference originates from retrieval mode. The Generate and Recall conditions in these experiments held all test cues constant and differed only in the instructions given to subjects. Intentional retrieval in the Recall condition consistently produced greater retention than incidental retrieval in the Generate conditions.

One

Acknowledgments

We thank Siara Saliu, Ben Borgmann, Anna Crow, and Kayla Balensiefer for helping collect the data. We also thank James Nairne and Dan Burns for helpful comments.

References (70)

  • M.A. McDaniel et al.

    A contextual account of the generation effect: A three-factor theory

    Journal of Memory and Language

    (1988)
  • N.W. Mulligan et al.

    Self-generation and memory

  • N.J. Slamecka et al.

    The generation effect as an artifact of selective displaced rehearsal

    Journal of Memory and Langauge

    (1987)
  • E. Tulving

    The effects of presentation and recall of material in free-recall learning

    Journal of Verbal Learning and Verbal Behavior

    (1967)
  • P.B. Ballard

    Oblivescence and reminiscence

    British Journal of Psychology

    (1913)
  • D.A. Balota et al.

    The English lexicon project

    Behavior Research Methods

    (2007)
  • F.C. Bartlett

    Remembering: A study in experimental and social psychology

    (1932)
  • I. Begg et al.

    The generation effect: Evidence for generalized inhibition

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1987)
  • S. Bertsch et al.

    The generation effect: A meta-analytic review

    Memory & Cognition

    (2007)
  • W. Brown

    To what extent is memory measured by a single recall?

    Journal of Experimental Psychology

    (1923)
  • D.J. Burns

    The generation effect: A test between single and multifactor theories

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1990)
  • J.T. Cacioppo et al.

    The efficient assessment of need for cognition

    Journal of Personality Assessment

    (1984)
  • S.K. Carpenter

    Cue strength as a moderator of the testing effect: The benefits of elaborative retrieval

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (2009)
  • S.K. Carpenter et al.

    Impoverished cue support enhances subsequent retention: Support for the elaborative retrieval explanation of the testing effect

    Memory & Cognition

    (2006)
  • S.K. Carpenter et al.

    What types of learning are enhanced by a cued recall test?

    Psychonomic Bulletin & Review

    (2006)
  • M. Carrier et al.

    The influence of retrieval on retention

    Memory & Cognition

    (1992)
  • M.T.H. Chi

    Self-explaining expository texts: The dual processes of generating inferences and repairing mental models

  • P.A. deWinstanley et al.

    Processing strategies and the generation effect: Implications for making a better reader

    Memory & Cognition

    (2004)
  • P.A. deWinstanley et al.

    Generation effects and the lack thereof: The role of transfer-appropriate processing

    Memory

    (1996)
  • J.M. Gardiner et al.

    Encoding context and the generation effect in multitrial free-recall learning

    Canadian Journal of Psychology

    (1982)
  • A.I. Gates

    Recitation as a factor in memorizing

    Archives of Psychology

    (1917)
  • P. Graf et al.

    Implicit and explicit memory for new associations in normal and amnesic subjects

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1985)
  • E. Hirshman et al.

    The generation effect: Support for a two-factor theory

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1988)
  • J.D. Karpicke

    Metacognitive control and strategy selection: Deciding to practice retrieval during learning

    Journal of Experimental Psychology: General

    (2009)
  • Karpicke, J. D., & Smith, M. A. (2009). Separate mnemonic effects of retrieval practice and elaborative encoding....
  • Cited by (125)

    • Retrieval practice enhances learning and memory retention of French words in Chinese-English bilinguals

      2022, Lingua
      Citation Excerpt :

      For example, several studies have suggested that retrieving target words from memory based on cue words enhanced long-term retention more than repeated studying of word pairs for foreign vocabulary learning (Carpenter, 2009; Carpenter and DeLosh, 2006; Carrier and Pashler, 1992; Karpicke and Roediger, 2008; Karpicke and Zaromb, 2010). The phenomenon that retrieval practice yields larger gains in long-term retention than repeated studying is called the retrieval practice effect, which usually occurs during testing, and thus is also called the testing effect (Karpicke and Zaromb, 2010; Rowland, 2014; van den Broek et al., 2013; van den Broek et al., 2016). Typically, a study on the retrieval practice effect consists of three phases (Rickard and Pan, 2018; van den Broek et al., 2016).

    View all citing articles on Scopus
    View full text