Testing the retrieval effort hypothesis: Does greater difficulty correctly recalling information lead to higher levels of memory?

https://doi.org/10.1016/j.jml.2009.01.004Get rights and content

Abstract

Although substantial research has demonstrated the benefits of retrieval practice for promoting memory, very few studies have tested theoretical accounts of this effect. Across two experiments, we tested a hypothesis that follows from the desirable difficulty framework [Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe, A. Shimamura, (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press], the retrieval effort hypothesis, which states that difficult but successful retrievals are better for memory than easier successful retrievals. To test the hypothesis, we set up conditions under which retrieval during practice was successful but differentially difficult. Interstimulus interval (ISI) and criterion level (number of times items were required to be correctly retrieved) were manipulated to vary the difficulty of retrieval. In support of the retrieval effort hypothesis, results indicated that as the difficulty of retrieval during practice increased, final test performance increased. Longer versus shorter ISIs led to more difficulty retrieving items, but higher levels of final test performance. Additionally, as criterion level increased, retrieval was less difficult, and diminishing returns for final test performance were observed.

Introduction

Interest in the benefits of retrieval practice for subsequent memory has increased dramatically in recent years due to important implications for student learning and scholarship. A wealth of research has indicated that retrieval practice can be used not only as a means to assess memory but as an effective means to improve memory (for a recent review see Roediger & Karpicke, 2006).

Although demonstrations that retrieval practice is beneficial for promoting memory are increasingly numerous, the extant literature is largely empirical rather than theoretical at this point. This is not to say that theoretical frameworks relevant to explaining effects of retrieval practice do not exist, and some recent work has reviewed existing findings in light of these accounts (e.g., Bjork, 1994, Carpenter et al., 2006, Roediger and Karpicke, 2006). However, very few studies have been designed to directly test a priori predictions of proposed theories (Carpenter and DeLosh, 2006, Glover, 1989, McDaniel and Masson, 1985). Put differently, retrieval practice effects are well documented, but the factors that underlie the effects are less well established. Accordingly, the goal of the current research was to provide a theoretical advance to the extant literature. To foreshadow, we first describe the general theoretical framework motivating the current work. We then introduce a specific hypothesis that follows from this framework and two experiments designed to directly test a priori predictions from the hypothesis.

The present work was motivated by the desirable difficulty framework (e.g., Bjork, 1994, Bjork, 1999). The general principles of the framework specify that within any learning task or domain, difficult but successful processing will be better for memory than difficult but unsuccessful processing, a relatively intuitive claim. The more provocative claim is that successful but difficult processing will be better for memory than successful but easier processing.

Of course, applying this general framework to a particular task domain requires a specific instantiation of its claims that are appropriate to the learning task of interest. Although frameworks cannot be tested directly, specific hypotheses instantiated to apply the basic principles of the framework to a particular learning task can be tested. Accordingly, to apply the desirable difficulty framework to retrieval practice, the specific instantiation of these general principles tested here is the retrieval effort hypothesis. The basic claim of the retrieval effort hypothesis is that not all successful retrievals are created equal: given that retrieval is successful, more difficult retrievals are better for memory than less difficult retrievals.

Thus, two conditions must be satisfied to directly test the retrieval effort hypothesis. First, retrieval during practice must be successful (hereafter we will simply use the term retrieval to refer to retrieval during practice). To satisfy this condition in the current work, items were practiced until they were correctly retrieved a predetermined number of times. Second, difficulty of retrieval must vary. To satisfy this condition, we manipulated two variables. The first variable was interstimulus interval (ISI, defined here as the number of items between each next practice trial with any given item), and the second variable was criterion level (the number of times items were required to be correctly recalled before dropping from practice). Each of these manipulations is based on an assumption about the relationship between the manipulated factor and retrieval difficulty. Below, we discuss each assumption in turn, followed by the prediction from the retrieval effort hypothesis that rests on that assumption.

The first assumption is that correct retrieval of items is more difficult after a longer ISI than after a shorter ISI (hereafter referred to as the ISI assumption). Evidence supporting this assumption comes from recent work by Karpicke and Roediger (2007a), in which response latencies for correct retrievals during practice were shorter for items correctly retrieved after an ISI of zero items (i.e., massed trials) compared to items retrieved after a longer ISI (either one or five intervening items).

Based on the ISI assumption, the retrieval effort hypothesis predicts that final test performance will be greater for items correctly retrieved after a longer ISI than items correctly retrieved after a shorter ISI (hereafter referred to as the ISI prediction). Regarding previous findings bearing on the ISI prediction, many studies have manipulated ISI. However, previous research has manipulated ISI between a fixed number of practice trials, which would obviously then involve a mixture of trials in which items were correctly retrieved and trials in which items were not correctly retrieved. In contrast, no previous study has examined the effects of ISI between correct retrievals, which provides a stronger test of the retrieval effort hypothesis.

The second assumption is that as the number of times an item is correctly retrieved (i.e., criterion) increases, the difficulty of each next correct retrieval will decrease (hereafter referred to as the criterion assumption). Evidence supporting this assumption also comes from recent work by Karpicke and Roediger (2007a) reporting response latencies for practice test trials in which the item was correctly retrieved. Results indicated that as the number of correct retrievals increased (ranging from one to three), response latencies decreased.

Based on the criterion assumption, the retrieval effort hypothesis predicts that as the number of times items are correctly retrieved increases, the incremental benefit to final test performance will decrease; that is, a curvilinear relationship between number of correct retrievals and final test performance is predicted (hereafter referred to as the criterion prediction). Note that this is not to imply that more correct retrievals will not enhance final test performance; a reasonable expectation is that more correct retrievals will lead to higher levels of final test performance than fewer. Rather, the retrieval effort hypothesis predicts a greater increase in final test performance from earlier versus later correct retrievals, because difficulty of retrieval is greater earlier in learning compared to later in learning.

Regarding previous findings bearing on the criterion prediction, research on overlearning has typically manipulated the number of trials or amount of practice time rather than the number of correct retrievals during practice (e.g., Kratochwill et al., 1977, Rohrer et al., 2005). Only one previous study has manipulated the number of correct retrievals during practice (Nelson, Leonesio, Shimamura, Landwehr, & Narens, 1982). Using paired associates (e.g., 48-dollar), Nelson et al. (1982) required participants to retrieve items one, two, or four times. On the final test four weeks later, performance increased as the number of correct retrievals increased. However, the fairly limited range of criterion levels (1, 2, or 4 correct retrievals) makes it difficult to determine whether the relationship between final test performance and the number of times items were correctly retrieved is linear or curvilinear. Karpicke and Roediger (2007b) have examined final test performance as a function of the number of times items were correctly retrieved with a larger range of criterion levels. Their participants learned word lists using conditions in which items could be correctly recalled up to 15 times. Of interest here, results indicated a curvilinear relationship between final test performance and the number of times items were correctly retrieved. However, because the primary interest in their study was not in evaluating the effect of increasing the number of correct retrievals, these analyses were conducted post hoc (see also Pyc, Rawson, & A., submitted for publication). Therefore, these results are difficult to interpret because items were not assigned to criterion level. Items correctly retrieved more times were likely the easier items, and the results may have been due in part to item difficulty effects. Accordingly, to extend beyond previous research, we manipulated the number of times items were required to be correctly retrieved before dropping from test-restudy practice, with a wider range of criterion levels. Thus, we were able to evaluate the relationship between the number of times an item is correctly retrieved and subsequent memory performance without concerns for item difficulty effects.

In sum, Experiments 1 and 2 were designed to directly test predictions from the retrieval effort hypothesis. We created conditions in which retrieval would be successful (items were required to be learned to criterion) but differentially difficult (manipulations of ISI and criterion level). Experiment 2 extended beyond Experiment 1 by including a latency measure to evaluate the ISI and criterion assumptions, which are the bases for the predictions of the retrieval effort hypothesis.

Section snippets

Participants and design

One hundred twenty-nine participants enrolled in Introductory Psychology at Kent State University participated in return for course credit. ISI (short versus long) was a between-participant manipulation. Criterion level (1, 3, 5, 6, 7, 8, or 10 correct retrievals per item during practice) was a within-participant manipulation. To establish the generality of the retrieval effort hypothesis, we implemented two retention intervals (RI) between practice and final test, 25 min (short RI) or one week

Experiment 2

Results from Experiment 1 supported both predictions of the retrieval effort hypothesis: higher levels of final test performance were observed after longer versus shorter ISIs between correct retrievals. Additionally, the relationship between criterion level and final test performance was curvilinear rather than linear. These results indicate that conditions under which retrieval is successful but more difficult produce greater benefits to memory than conditions under which retrieval is

General discussion

Overall, the pattern of results from Experiments 1 and 2 confirmed predictions from the retrieval effort hypothesis, which states that successful but difficult retrievals will be better for memory than successful but easy retrievals. Specifically, both ISI and criterion level manipulations influenced the difficulty of successful retrieval during practice, which in turn led to differences in final test performance. Experiment 2 extended beyond the results of Experiment 1 by providing a measure

Acknowledgments

The research reported here was supported by the Institute of Education Sciences, US Department of Education, through Grant #R305H050038 to Kent State University. The opinions expressed are those of the authors and do not represent views of the Institute or the US Department of Education.

Thanks to Tina Burke, Sean Burton, Jill Peterson, and Ericka Schmitt for assistance with data collection. Thanks also to Heather Bailey and Nic Wilkins for assistance with data analyses. A special thanks to John

References (19)

  • J.D. Karpicke et al.

    Repeated retrieval during learning is the key to long-term retention

    Journal of Memory and Language

    (2007)
  • J.R. Anderson et al.

    The atomic components of thought

    (1998)
  • R.A. Bjork

    Memory and metamemory considerations in the training of human beings

  • R.A. Bjork

    Assessing our own competence: Heuristics and illusions

  • S.K. Carpenter et al.

    Impoverished cue support enhances subsequent retention: Support for the elaborative retrieval explanation of the testing effect

    Memory & Cognition

    (2006)
  • S.K. Carpenter et al.

    What types of learning are enhanced by a cued recall test?

    Psychonomic Bulletin & Review

    (2006)
  • J.E. Driskell et al.

    Effect of overlearning on retention

    Journal of Applied Psychology

    (1992)
  • J.A. Glover

    The testing phenomenon: Not gone but nearly forgotten

    Journal of Educational Psychology

    (1989)
  • J.D. Karpicke et al.

    Expanding retrieval practice promotes short-term retention, but equally spaced retrieval enhances long-term retention

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (2007)
There are more references available in the full text version of this article.

Cited by (0)

View full text