The ability to accurately assess and monitor memory performance is essentialfor effective learning. A commonly used measure for examining memory monitoring is the judgment of learning (or JOL), whereby participants assess the likelihood of recalling a specific item on a later test. While some research has shown that people often make accurate memory predictions (Arbuckle & Cuddy, 1969; Nelson & Dunlosky, 1991; Underwood, 1966), other studies have shown that important inaccuracies also exist (e.g., Benjamin, Bjork, & Schwartz, 1998; Koriat & Bjork, 2006; Rhodes & Castel, 2008). Such inaccuracies and potential misconceptions about memory can reveal much about what influences the assessment and monitoring of our learning.

Previous work examining how people predict future memory performance suggests that JOLs are often based on ease of processing. For example, trigrams that appear closer to actual words are judged easier to remember than trigrams that are not pronounceable (Underwood, 1966); related word pairs are judged easier to remember than unrelated word pairs (Dunlosky & Matvey, 2001); and common concrete words are judged easier to remember than rare abstract words (Begg, Duft, Lalonde, Melnick, & Sanvito, 1989). However, more recent research has shown important demonstrations in which ease of processing is an unreliable predictor of memory performance. For example, Koriat and Bjork (2006) showed that JOLs did not discriminate between word pairs with strong forward-associative strength (e.g., kittenscats) and those with strong backward-associative strength (e.g., catskittens), despite cued recall being significantly worse for the backward associates (see also Castel, McCabe, & Roediger, 2007). Thus, fluency, defined as the perceived ease of a mental task (Oppenheimer, 2008), is often used when metacognitive judgments are made, and this has been shown in memory tasks that involve encoding and perceptual fluency (Hertzog, Dunlosky, Robinson, & Kiddler, 2003; Kelley & Jacoby, 1996; Rhodes & Castel, 2008), as well as retrieval fluency (Benjamin et al., 1998).

Despite the intuition that items that are easier to process are easier to remember (Miele, Finn, & Molden, 2011), research has shown that if the to-be-learned material is processed in a way that challenges the learner to a certain degree, learning is enhanced for this material, a concept known as desirable difficulties (Bjork, 1994; McDaniel & Butler, 2010). Examples of the benefits of desirable difficulties include enhanced memory when one is actively solving fill-in-the-blank word pairs, rather than passively reading them (e.g., rapidf___ vs. rapidfast; de Winstanley, Bjork, & Bjork, 1996; Slamecka & Graf, 1978), and enhanced memory for words successfully identified when they are presented rapidly versus when they are presented long enough for reading (Hirshman & Mulligan, 1991; Nairne, 1988). In addition, Maki, Foley, Kajer, Thompson, and Willert (1990) found that participants who read a set of mixed paragraphs having either deleted letters or intact letters performed better on comprehension tests for the paragraphs that had deleted letters, and this also lead to enhanced metacomprehension (see also Rawson & Dunlosky, 2002). Desirable difficulties have also been shown to enhance learning in classroom settings (Diemand-Yauman, Oppenheimer, & Vaughan, 2011). However, very little research has directly examined metacognitive accuracy to determine whether participants are aware of the potential benefits of desirable difficulties when actually studying information.

The present study was designed to determine whether people are aware of the potential benefits of disfluent encoding and, thus, desirable difficulties. To assess this, participants studied a mixed list of words presented in either inverted fashion (i.e., rotated 180°) or upright (standard reading) and were asked to provide a JOL after each word. We predicted that participants would provide higher JOLs to upright words relative to inverted words, due to perceived fluency or ease of processing. However, we predicted that participants would actually recall more of the inverted words. Better recall for inverted words would provide evidence for desirable difficulties (Bjork, 1994): Reading words that are inverted is an obstacle that the learner must overcome, and if successful, the learner will benefit by having enhanced recall for such items. If JOLs do not differentiate between inverted and upright words, despite superior recall for inverted words, it would suggest that participants are not aware of the benefits of desirable difficulties.

Experiment 1

Participants in Experiment 1 studied a mixed list of words in which half of the words were presented upright and the other half inverted, in a randomized order. Participants made predictions of how likely it was that they would later recall each word. In addition, participants were asked to say each word aloud to ensure that they were processing the word correctly. According to the notion of desirable difficulties, recall for inverted words should be enhanced relative to upright words, but participants might predict better recall for upright words if predictions were based on processing fluency.

Method

Participants

Twenty undergraduate students from the University of California, Los Angeles, participated in the experiment for course credit and were tested individually.

Materials and apparatus

The experiment used Microsoft Powerpoint on computers with 17-in. monitors to display the words and prompts for JOLs. Each word was placed at the center of a white screen in black 20-point Times New Roman font. Half of the words were rotated 180° (inverted), while the remaining half were not rotated (upright). A list of 40 words was compiled from the English Lexicon database (Balota et al., 2007), selecting concrete nouns that were four to seven letters long). The list was block randomized such that within each block of eight words, there were two words of every word length equally divided for word orientation.

Procedure

The participants sat in front of the computer and were presented instructions on the screen, which were read aloud by the experimenter. Participants were instructed that they would see words flipped upside down and words presented upright, that they had to say each word, that they should study the word for a later test, and that they would be prompted to provide a JOL after studying the word. The JOLs were made on a scale from 0 to 100, with 0 indicating that they would definitely not recall the word and 100 being that they would definitely recall the word later, and participants were encouraged to use intermediate values as appropriate. To ensure that participants read all words, participants said each word aloud, the word remained on the screen for 4 s, and then they were prompted to give a JOL for the studied word. There was a 1-s blank slide before the next word was presented. After the last word and JOL prompt, participants engaged in a distractor task for 1 min. They were then given a free recall test and had 2 min to say aloud any words they remembered from the list, and the experimenter recorded the words recalled. Afterward, participants were asked whether they thought that they had recalled more inverted or upright words or whether there was no difference, before being debriefed.

Results and discussion

Figure 1 presents predicted and actual recall performance and shows that word orientation influenced recall but that predictions were not influenced by word orientation. A 2 (word orientation: upright or inverted) × 2 (measure: JOL or recall) repeated measures analysis of variance (ANOVA) was conducted and revealed a main effect of word orientation, F(1, 19)  =  6.47, MSE  =  101.74, p  =  .02, η 2p   =  .25, a main effect of measure F(1, 19)  =  16.90, MSE  =  363.45, p  =  .001, η 2p   =  .47, and a significant interaction between word orientation and measure, F(1, 19)  =  6.94, MSE  =  59.84, p  =  .02, η 2p   =  .27. Recall was significantly greater for inverted words (M  =  35.29, SE  =  2.56) than for upright words (M  =  25.00, SE  =  2.49), t(19)  =  3.07, p < .01, d  =  0.69, but there were no differences in JOLs between inverted words (M  =  48.26, SE  =  3.44) and upright words (M  =  47.08, SE  =  3.18), t(19)  =  0.53, p  >  .05.

Fig. 1
figure 1

Mean predicted (judgments of learning [JOLs]) and actual recall performances for upright and inverted words in Experiment 1. Error bars represent standard errors of the means in all figures

People do not often encounter inverted text, which may make it (feel) unusual or distinct, and this could then lead to recall advantages (see Hunt, 2006, for a more comprehensive discussion of distinctiveness). In the present experiment, inverted words might have been perceived as less distinctive if they occurred later in the list, leading to less of a recall advantage. To examine this, we compared recall and JOLs in the first half and second half of the list, as displayed in Fig. 2. For recall, a 2 (word orientation) × 2 (list portion) ANOVA was conducted and revealed that there was a main effect of list portion; more words were recalled from the last half of the list (M  =  34.28, SE  =  2.26) than from the first half (M  =  27.25, SE  =  2.45), F(1, 19)  =  6.95, MSE  =  142.18, p  <  .05, η 2p  =  .27. There was also a main effect of word orientation such that recall was better for inverted words (M  =  36.53, SE  =  2.74) than for upright words (M  =  25.00, SE  =  2.49), F(1, 19)  =  10.85, MSE  =  244.93, p  <  .01, η 2p   =  .36. However, there was no significant interaction of word orientation on list portion, F(1, 19)  =  1.46, p  =  .24, η 2p   =  .07. In terms of predictions, JOLs were higher for the first half of the list (M  =  50.65, SE  =  3.30) than for the last half (M  =  44.73, SE  =  3.07), F(1, 19)  =  23.53, MSE  =  29.75, p  <  .001, η 2p   =  .55. JOLs did not differ for inverted words (M  =  48.29, SE  =  3.44), as compared with upright words (M  =  47.09, SE  =  3.19), F  <  1. There was no significant interaction of word orientation and list portion for JOLs, F  <  1. Thus, it appeared that distinctiveness, as examined and defined in this manner, played very little role in influencing participants’ recall performance.

Fig. 2
figure 2

Mean predicted (judgments of learning [JOLs]) and actual recall performance for upright and inverted words in the first half and second half of list in Experiment 1

Overall, participants’ JOLs were not sensitive to word orientation, but participants recalled more inverted words relative to upright words (Fig. 1). While JOLs and recall were positively correlated, these correlations did not differ for upright or inverted words. Interestingly, 65% of participants were able to accurately report what orientation they recalled better after the recall test. If some participants were able to accurately assess how they performed after one list, would participants perhaps learn to assign higher JOLs for inverted words, relative to upright words, after the first list in multiple study–test cycles? Experiment 2 examined this possibility in greater detail.

Experiment 2

In Experiment 2, participants were given three study–test cycles with different lists to determine whether JOLs may become sensitive to word orientation with task experience. Thus, the paradigm was very similar to Experiment 1, except that participants studied three different lists with words presented in inverted or upright fashion and were given recall tests after each list (but were not asked about their recall performance after each list). Other studies have shown that multiple study–test cycles can lead to improvements regarding JOL–recall calibration accuracy (e.g., Castel, 2008; Koriat, 1997; Koriat & Bjork, 2006; Rhodes & Castel, 2008; Tauber & Rhodes, 2010). Because some participants in Experiment 1 were able to accurately report what word orientation had greater impact on recall, one might expect that on subsequent lists, participants would provide higher JOLs for inverted words and lower JOLs for upright words, as reflected by their own recall performance.

Method

Participants

Twenty-four undergraduate students from the University of California, Los Angeles, participated in the study for course credit and were tested individually.

Materials and apparatus

Experiment 2 was almost identical to Experiment 1, with the exception that three lists of 28 common words were constructed from the same source.

Procedure

The procedure was almost identical to that in Experiment 1, with the exception that after the free recall on list 1, participants engaged in two more study–test cycles with new lists. Unlike in Experiment 1, post retrieval questions were not asked after each list.

Results and discussion

As is shown in Fig. 3, and similar to Experiment 1, recall performance was better for inverted words across all lists, but predictions did not differ between the two orientations. Separate analyses for recall and JOLs, as a function of list, are presented below:

Fig. 3
figure 3

Mean predicted (judgments of learning [JOLs]) and actual recall performance for both types of word orientation as a function of list (lists 1–3) in Experiment 2

Recall

Mean recall performance for each word orientation is presented in Fig. 3. The data were analyzed in a 2 (word orientation) × 3 (list) repeated measures ANOVA. Recall was significantly better for inverted words (M  =  44.05, SE  =  2.33) than for upright words (M  =  34.52, SE  =  1.82), F(1, 23)  =  49.07, MSE  =  66.55, p  <  .001, η 2 p  =  .68. There was no main effect of list; recall did not appear to reliably differ from list 1 (M  =  40.92, SE  =  2.32) to list 2 (M  =  39.58, SE  =  2.19) to list 3 (M  =  37.35, SE  =  2.45), F(1, 23)  =  1.47, p  >  .05, η 2p  =  .06. There was no significant interaction of word orientation and list, F  <  1.

JOLs

Mean JOLs for each word orientation are presented in Fig. 3. The data were analyzed in a 2(JOLs) × 3(list) repeated measures ANOVA and revealed no main effect of word orientation; JOLs did not differ for inverted words (M  =  45.51, SE  =  2.40) and upright words (M  =  45.04, SE  =  2.49), F(1, 23)  =  0.10, p  >  .05, η 2 p  =  .004. There was a significant main effect of list, since overall JOLs declined from list 1 to list 3, F(1, 23)  =  12.28, MSE  =  119.34, p  <  .001, η 2 p  =  .35. There was no significant interaction of word orientation and list, F  <  1.

The results from Experiment 2 demonstrated that participants were unaware that recall was better for inverted words. JOLs were reduced in later lists, and this resulted in better overall calibration. However, participants did not learn to differentiate JOLs for the upright and inverted words in later lists, in contrast with other research that has shown that task experience and knowledge updating can lead to improvements in JOL accuracy (e.g., Castel, 2008; Dunlosky & Hertzog, 2000; Tauber & Rhodes, 2010). In terms of JOLs in the present set of experiments, it appears that the relative benefits of desirable difficulties created by reading inverted text are not incorporated into JOLs, even after experience with several study–test sessions.

General discussion

The results from the two experiments showed that participants recalled more inverted than upright words but that JOLs did not differentiate between the two types of presentations, a finding that persisted with task experience. Thus, learners may not be aware of factors that can enhance learning, and ease of processing does not always predict learning (Bjork, 1994). However, some participants may be aware of the benefits of processing inverted words, as assessed by the post experiment questionnaire in Experiment 1. It may be that JOLs are captured by current item-level processing that is less likely to tap memory knowledge or beliefs, whereas global judgments and questionnaires do tap that knowledge (e.g., Dunlosky & Hertzog, 2000; Kornell, Rhodes, Castel, & Tauber, in press). That is, while some participants may believe that effortful processing can enhance learning (Miele et al., 2011), this belief is not appropriately incorporated when item-specific JOLs are made. If desirable difficulties can, indeed, enhance learning (e.g., Diemand-Yauman et al., 2011; Maki et al., 1990), one must carefully consider how people can become aware of how to effectively enhance learning under challenging learning conditions. The present work shows that some degree of task experience does not necessarily allow for sufficient awareness of the potential benefits of effortful processing (relative to reading) and desirable difficulties.

It may be that distinctiveness is driving the superior recall for inverted words, since inverted words are not commonly encountered when reading. The precise manner in which distinctiveness (and more specifically, the perception of distinctiveness) could influence metamemory and memory in the present task is somewhat complex (see Hunt, 2006), especially given that half of the words were inverted and half were upright, making both text types “relatively” typical in the present task. If inverted words are considered more novel or unusual in the real world, relative to upright text, one might expect that inverted words presented at the earlier part of the list would be recalled better than later inverted words or JOLs would be sensitive to text orientation. But this was not found to be the case. Instead, reading inverted words may act like a generation effect (Slamecka & Graf, 1978), leading to more elaborative and effortful processing, which could then confer a memory advantage (see also Lindsay & Kelley, 1996, for similar dynamics that can alter recognition memory performance). However, participants’ JOLs were not sensitive to the benefits conferred by this type of processing, possibly because the task was more perceptual in nature, relative to generation tasks that involve more effortful retrieval processing, semantic processing, and/or production from word fragments (see Begg et al., 1989; de Winstanley et al., 1996).

We note that JOLs appeared to be better calibrated for inverted words relative to upright words (especially on later study–test cycles in Experiment 2), and this could be for a number of reasons. Calibration may be accurate as a result of JOLs being anchored at around 50%, which happens to be the same level of recall (e.g., if there were a longer delay, the JOLs would likely overestimate recall). JOLs may be better calibrated for inverted words because the analytical processing associated with inverted text leads to more analytical process for the JOLs. It may be that people overestimate the effects of simply reading, relative to engaging in deeper processing. In the present study, participants read all words aloud, so JOLs could be based on the fluency associated with end product of saying each word (and not simply the act of reading each word), which could result in no differences in JOLs for the two types of orientations. Thus, participants may encounter some ease of processing when pronouncing both inverted and upright words, leading to similar JOLs but overconfidence for upright text, due to fluency and a general guiding heuristic that suggests that upright text is easier to process, and the sometimes inaccurate assumption that more easily learned information will be remembered better. Future research could examine whether individual differences in the beliefs regarding ease of processing and theories of intelligence influence JOLs (e.g., Miele et al., 2011) and whether participants may be more or less likely to restudy inverted words on the basis of beliefs regarding ease of processing and learning.

The present study provides important insight regarding how effortful processing can enhance memory and to what degree people are aware of the benefits of desirable difficulties. If people believe that information that is easily learned is easier to recall later, any form of disfluency encountered during learning will be perceived as detrimental. The present research suggests that people may not appreciate the benefits of effortful processing, possibly due to inaccurate mental models regarding the relationship between ease of processing and the efficiency of learning. The present results suggest that the seemingly “effortful processing” required to achieve mnemonic benefits does not have to be perceived as laborious by the learner; it may occur with the learner being unaware of it.