Is the Levels of Processing effect language-limited?
Introduction
Craik and Lockhart (1972) proposed that memory is a by-product of processing, the deeper the processing the better the retention. Their paper is one of the most highly cited in the history of cognitive psychology (Roediger & Gallo, 2001), and “one of the most influential systematic conceptual frameworks within which problems of memory can be raised and investigated” (Tulving, 2001, p. 24). While the assumption of a series of levels leading from perceptual to semantic was subsequently abandoned (Craik & Tulving, 1975), Levels of Processing (LOP) has continued to serve as a broad theoretical framework, accounting for a wide range of data within the field of human memory and potentially providing a fruitful basis for further investigation (Conway, 2002). Furthermore, the principle underlying the levels approach is of considerable practical relevance, providing an important and valuable means of improving learning, in contrast to the common tendency for learners to rely on rote rehearsal.
On the other hand, despite many replications and the magnitude of the effects shown (a series of studies by Hyde and Jenkins (1969) and Walsh and Jenkins (1973) yielded an average effect size based on Cohen’s d of 2.27), the use of the framework to broaden our knowledge of human memory has been somewhat limited. One exception to this comparative lack of development comes from the demonstration by Tulving and Thomson (1973) of the importance of the match between encoding and retrieval in determining memory performance. This point was further developed with the introduction of the concept of Transfer Appropriate Processing (TAP), as proposed by Morris, Bransford, and Franks (1977). They showed that shallow phonological coding led to better performance than deeper semantic coding when rhyming words were used as retrieval cues for the items to be recalled, again demonstrating that memory performance depends crucially on conditions at retrieval, as well as at encoding. The concept of TAP is an important reminder that retrieval needs to be considered, but leaves open the question of how to determine transfer appropriateness.
In an attempt to develop the concept of TAP, Roediger (Roediger and Blaxton, 1987, Roediger et al., 1989) proposed to link it to the distinction between explicit episodic memory and more automatic implicit memory. Most explicit memory tasks involve processing in terms of meaning, hence benefiting from deeper encoding while implicit tasks tend to be perceptually based, depending more on the exact replication of shallower encoding cues. However, although there were many examples in the literature that fitted this pattern, it is not always possible to make a clear distinction between perceptual data-driven levels of analysis and analysis at a more conceptual or semantic level. Roediger, Srinivas, and Weldon (1989) proposed that any given situation could have components involving both levels of analysis which might or might not trade off against each other. While plausible, this compounds the problem of measuring transfer appropriateness. Furthermore, data began to appear suggesting that dissociations occurred within the proposed perceptual and conceptual paradigms (Hunt & Toth, 1990) presenting further difficulties in using TAP as a way of developing the original LOP approach, and leading Roediger (2002, p. 321) to conclude “we suggest that the field in general has not yet been able to develop an adequate characterization of procedures that account for memory phenomena despite efforts in this direction”.
One important question to be asked of any theoretical framework concerns its breadth of application. As Roediger and Gallo (2001, p. 42) observe, LOP can be regarded as “a special case of transfer-appropriate processing that applies to memory for words in meaning-based tests”. However, although language is clearly important, it is only part of our capacity to experience and remember the world, suggesting a need for LOP studies of non-verbal memory. We describe a series of experiments that began with the question of whether reliable LOP effects could be demonstrated using visual material. As relatively little is known we adopted an exploratory approach of comparing LOP effects for a range of visual and verbal materials. Our results show that different types of visual materials all yield modest LOP effects whereas verbal materials give a wider range such that the dramatic advantage to deep encoding typically found depends crucially on the nature of the material. These findings led us to propose a modified explanation of LOP effects that takes into account the “affordances” of a stimulus (Gibson, 1977) and applies to both verbal and non-verbal material.
An early critique of the LOP concept (Baddeley, 1978) noted the lack of evidence for LOP effects using visual stimuli. Although subsequent research on LOP has also been dominated by use of verbal stimuli, a number of studies have been performed across a range of other modalities, though largely using implicit memory measures for which LOP effects were, unsurprisingly found not to apply (Graf and Mandler, 1984, Jacoby and Dallas, 1981). There appears to be very little investigation of the LOP effect in studies of explicit episodic memory using nonverbal stimuli. Some exceptions to this generalization do however occur.
In the case of music, Halpern and Bartlett (2010) comment on a paucity of LOP studies in the literature, reporting only one positive result. Peretz, Gaudreau, and Bonnel (1998), found that judgments of the familiarity of a tune led to better subsequent recognition than judging the instrument playing the tune, commenting however that “the current authors failed to find LOP effects for unfamiliar music on numerous occasions (some published, some languishing in bottom drawers)” (Halpern & Bartlett, 2010, p. 234).
Attempts have also been made to study LOP effects in olfactory memory. Lyman and McDaniel (1986) varied encoding instructions in a study involving recognition of 30 odors after a 1 week delay. No difference in hit rate was found, but an advantage on a d′ measure suggested that attempting to name and define each odor or linking it to a life episode led to better performance than forming a visual image or simply trying to memorize each stimulus. A subsequent replication by Zucco (2003) again found a significant effect for d′ but not hit rate, with only the life episode condition showing a significant advantage. These results suggest a modest overall effect of deeper processing, operating mainly through reducing false alarm rate, far from the robust effects typical of verbal material.
There have been rather more attempts to detect LOP effects in visual memory, reflected largely in studies of memory for faces. Warrington and Ackroyd (1975) report better face recognition following pleasantness judgments than from estimation of the person’s height, a somewhat challenging task from a portrait photograph. A much easier “shallow” task was used by Bower and Karlin (1974), judging the sex of the person portrayed. This proved less effective in facilitating subsequent recognition than did judgments of likeableness or honesty. This could however simply reflect the need to scan the face more intently in order to make these “deeper” judgments, as proposed by Winograd (1981) who found that an instruction to identify the most distinctive facial feature of a given face was more effective than the apparently deeper task of making a personality judgment. On the other hand, a study by Patterson and Baddeley (1977) which compared categorization on physical dimensions such as nose size and thickness of lips found these to be slightly less effective than judgments of pleasantness or intelligence. An attempt to increase depth of processing by providing a semantic context for each face by adding a description of the unfamiliar person’s occupation, background and habits however, proved ineffectual (Baddeley, 1982, Baddeley and Woodhead, 1982). An attempt to maximize TAP by presenting the contextual information at both encoding and recognition did increase rate of detection, but this proved to be entirely attributable to inducing a positive response bias (Baddeley & Woodhead, 1982), with participants also more likely to erroneously say yes to a novel face, if accompanied by a previously presented description. Once again therefore, although it would be unwise to rule out the possibility of an LOP effect for faces, any such effects are clearly far weaker than those routinely found for verbal materials.
It could be argued of course, that despite the obvious importance of faces, they are a rather special form of visual stimulus, with their own specific anatomical processing area (Kanwisher, McDermott, & Chun, 1997), possibly also associated with a relatively automatic link to emotional coding (Öhman, 2009). For that reason, it is important to extend the study of LOP effects in visual memory to other stimuli. Unfortunately, the small number of studies that have attempted this previously have used different methods and given contradictory results with D’Agostino, O’Neill, and Paivio (1977) finding a positive effect using readily nameable line drawings while Intraub and Nicklos (1985), found a negative effect for some of their cued recall conditions, suggesting the need for a more systematic approach.
The starting point for our investigation was the observation that any study of the role of LOP must deal with three variables, the nature of the initial encoding, deep versus shallow, the nature of the retrieval test, bearing in the mind the importance of TAP, and the characteristics of the material to be remembered. Neither the method of encoding nor the range of materials involves a simple binary choice, hence the range of possible experiments becomes very large indeed. For that reason we fixed our deep encoding method, basing it on judgments of pleasantness, and always used a four-alternative forced-choice recognition retrieval measure. Holding constant the method of ensuring deep encoding and the testing procedure then allowed us to manipulate the variable central to our enquiry, the nature of the material, allowing comparison between visual and verbal memory, and importantly, of variations in material within each modality. This approach raises a number of further issues which will be discussed next.
The first concerns our selection of judgments of pleasantness as our deep encoding procedure. We did this because we needed a semantic judgment that is readily applicable to a wide range of materials. In his attempt to develop a measure of meaning that extended beyond verbal material Osgood developed a complex rating scale, the semantic differential which factor analysis suggested yielded three factors of which the strongest was consistently the hedonically evaluative good–bad dimension (Osgood, May, & Miron, 1975). In the case of words, encoding on this dimension has been shown to produce a particularly powerful LOP effect (Hyde & Jenkins, 1969) and indeed Packman and Battig (1978) found that judgments of pleasantness were substantially more effective than other “deep” judgments such as concreteness or meaningfulness. Furthermore, the widespread use of pleasantness judgments in clinical assessments such as the Warrington (1984) recognition test involving words and faces, reflects the fact that it is a task that participants find natural and relatively easy to use for both verbal and visual stimuli.
Choosing shallow encoding tasks is less straightforward given that they need to be applicable to both visual and verbal material and to ensure that participants process the stimuli at the required level. Finally, to avoid the risk of basing our conclusions on a single atypical task, we use a range of different “shallow” processing instructions. Our earlier research concerned with developing a clinical test of visual memory opted to use door scenes as they are familiar, allowing a range of degrees of similarity and resulting difficulty. Two lists of 12 doors tested using four-alternative forced choice proved both sensitive to memory deficit and patient friendly (Baddeley, Emslie, & Nimmo-Smith, 1994).
Photographing doors subsequently proved addictive to A.B., resulting in a data base of over 2000 visual stimuli. In order to increase their experimental usability we classified each item along a range of dimensions, thus making it relatively easy to select sets of differing levels of inter-item similarity (Baddeley, Hitch, Quinlan, Bowes, & Stone, in press). In addition to our having a very large readily available set, doors have the advantage that, unlike faces they almost certainly do not have a specific brain area devoted to their processing and are unlikely to have atypically strong links to emotional and social processing (Öhman, 2009).
Having established that a LOP effect can be obtained using door scenes in pilot work, we continued to include door stimuli as a baseline against which other types of visual and verbal stimulus material could be compared. This led to the question of what other type of material. In this essentially exploratory study, rather than setting up and testing precise hypotheses, we used pragmatic constraints to select our material. We opted for lists of 24–30 items per condition, choosing four-choice recognition rather than two-alternative or yes/no recognition reduced baseline guessing to obviate the need for longer lists. We wanted to maintain certain characteristics of our doors test, namely that the items should come from a single broad semantic category, and that there should be sufficient similarity between items to allow a level of recognition approximately equal across materials. It is worth noting at this point that simply selecting visual recognition items from a wide range of categories, with distractors chosen at random tends to lead to levels of performance of 90% or more, even with very long lists (Brady et al., 2008, Konkle et al., 2010a, Nickerson, 1965, Standing et al., 1970). The experiments that follow reflect these constraints.
Experiment 1 therefore compares recognition memory for door scenes or concrete words processed either “deeply” in terms of pleasantness, or more shallowly in terms of stimulus color. Experiment 2 attempts to replicate this with different sets of stimuli and a different shallow processing task, while Experiments 3a, b and c explore the generality of our initial findings by extending them to a broader range of visual and verbal materials, using the method of converging operations to determine which aspects of the material are crucial
Section snippets
Design and procedure
A 2 × 2 within participants design combined two types of material, doors and words and two types of encoding instruction involving judgments of pleasantness and color. All participants were tested on each of the four conditions in counterbalanced order. A total of 20 student volunteers were tested.1 They and all participants in the remaining studies were
Design
A total of 24 participants, were each tested on three types of stimuli, doors, names and occupations, in each case processed at two levels, shallow and deep. Each encoded list was followed by an immediate four alternative forced-choice test. All participants completed all six conditions, half beginning with the shallow condition and half with deep. The order of stimulus presentation within each encoding condition was counterbalanced using a 3 × 3 Latin square. Half began with the three deep
Experiment 3a
The materials selected were as follows:
- (1)
The 240 doors used in Experiment 2.
- (2)
A total of 240 clocks. These were selected from the internet using Google Search under five subcategories: circle clocks, square clocks, pendulum clocks, alarm clocks, and street clocks. All words on the pictures were removed using Adobe Photoshop CS 2.256.
- (3)
A total of 240 verbal items came from food menus, again selected from the internet using search terms: Chinese food menu, English food menu, Japanese food menu, Dessert
Experiment 3b
The overall design is identical to 3a with the exception that different materials were used, and different shallow processing judgments required. In this case the shallow judgment was to report the dominant color of each stimulus.
Experiment 3c
This used the same overall design as 3a and 3b, using a pleasantness judgment for deep processing, this time compared with a shallow judgment of whether the stimulus had one dominant color or was multi-colored. Three types of material were used, one comprising the same 30 doors, a second stimulus set comprising scenes as used by Konkle, Brady, Alvarez, and Oliva (2010b) at http://cvcl.mit.edu/MM/sceneCategories.html (Appendix). We used 240 scene images from 10 categories (streams, libraries,
General discussion
We will begin by summarizing our results before going on to suggest an interpretation. This will then be applied to the studies of the effect of LOP on verbal and nonverbal material more generally, as summarized in the introduction before concluding with a discussion of the potential significance of our results for other recent studies of visual LTM.
We set out with a broad question; is the positive effect of deep processing limited to language-based materials? We compared the effect of
Author note
We are grateful to Philip Quinlan for his help and advice, and to Stephanie Chung, Po Fu, Stephanie Motley, Susan Oei, Stephen Rhodes, Natalie Whitehead and Suet Wong for their contribution to the development of material and to testing and to Fergus Craik and James Nairne for their constructive comments on an earlier draft.
References (54)
- et al.
Levels of processing. A framework for memory research
Journal of Verbal Learning and Verbal Behavior
(1972) Subjective organization and list organization as determinants of free-recall and serial-recall memorization
Journal of Verbal Learning and Verbal Behavior
(1967)- et al.
Activation makes words more accessible, but not necessarily more retrievable
Journal of Verbal Learning and Verbal Behavior
(1984) - et al.
Levels of processing versus transfer appropriate processing
Journal of Verbal Learning and Verbal Behavior
(1977) - et al.
Effects of orienting tasks on free recall in incidental learning: “Difficulty,” “effort,” and “process” explanations
Journal of Verbal Learning and Verbal Behavior
(1973) The trouble with levels: A re-examination of Craik and Lockhart framework for memory research
Psychological Review
(1978)Domains of recollection
Psychological Review
(1982)- et al.
Doors and people: A test of visual and verbal recall and recognition
(1994) - Baddeley, A. D., Hitch, G. J., Quinlan, P., Bowes, L., & Stone, R. (2016). Doors for memory: A searchable database....
- et al.
Depth of processing, context, and face recognition
Canadian Journal of Psychology
(1982)