Elsevier

Cognition

Volume 131, Issue 2, May 2014, Pages 263-283
Cognition

Scrutinizing visual images: The role of gaze in mental imagery and memory

https://doi.org/10.1016/j.cognition.2014.01.003Get rights and content

Highlights

  • When visually imaging a previously seen shape observers move their eyes.

  • Gaze concentrates on empty regions where salient features originally appeared.

  • Gaze during imagery enacts gaze shown during perception of the same shape.

  • The fidelity of the enactment predicts memory accuracy for an object’s visuospatial properties.

  • Interfering with gaze during recall lowers the quality of the memory.

Abstract

Gaze was monitored by use of an infrared remote eye-tracker during perception and imagery of geometric forms and figures of animals. Based on the idea that gaze prioritizes locations where features with high information content are visible, we hypothesized that eye fixations should focus on regions that contain one or more local features that are relevant for object recognition. Most importantly, we predicted that when observers looked at an empty screen and at the same time generated a detailed visual image of what they had previously seen, their gaze would probabilistically dwell within regions corresponding to the original positions of salient features or parts. Correlation analyses showed positive relations between gaze’s dwell time within locations visited during perception and those in which gaze dwelled during the imagery generation task. Moreover, the more faithful an observer’s gaze enactment, the more accurate was the observer’s memory, in a separate test, of the dimension or size in which the forms had been perceived. In another experiment, observers saw a series of pictures of animals and were requested to memorize them. They were then asked later, in a recall phase, to answer a question about a property of one of the encoded forms; it was found that, when retrieving from long-term memory a previously seen picture, gaze returned to the location of the part probed by the question. In another experimental condition, the observers were asked to maintain fixation away from the original location of the shape while thinking about the answer, so as to interfere with the gaze enactment process; such a manipulation resulted in measurable costs in the quality of memory. We conclude that the generation of mental images relies upon a process of enactment of gaze that can be beneficial to visual memory.

Introduction

In his book Inquiries into Human Faculty and its Development (1883), Sir Francis Galton discussed mental imagery as a special ability of human visual memory. Specifically, he wondered whether mental images could be “so clear and sharp as […] to be scrutinized with nearly as much ease and prolonged attention as if they were real objects.” Galton prompted his informants to “think of some definite object—suppose it is your breakfast-table as you sat down to it this morning—and consider carefully the picture that rises before your mind’s eye […] Is the image dim or fairly clear? […] Are all the objects pretty well defined at the same time, or is the place of sharpest definition at any one moment more contracted than it is in a real scene?” Reports about the “definition” of the imagined breakfast items varied very much across individuals; however, a common report was that one or two objects would appear much more distinct than the others but these could come out clearly if attention be paid to them. Thus, different objects were not clear all at once but only successively, by focusing attention on them at different time points.

About a century later, although accounts of imagery did not rely any longer exclusively on introspective reports, the modern cognitive psychologists also concluded that whenever we generate a visual image of an object, the different parts of the object are not clear all at once but only successively (e.g., Hebb, 1968, Neisser, 1976). Kosslyn, 1980, Kosslyn, 1994 has also put forward an influential computational model for visual imagery, according to which each part of an image is added in successive steps (Kosslyn et al., 1988, Kosslyn et al., 1983). Visual images take time both to generate and to inspect and, in many respects, they strongly resemble the normal perception of objects at close range, where a high-resolution perceptual representation of the object cannot be achieved in a single glance but a series of eye movements must bring into ‘foveal’ focus the different parts of the object.

One remarkable finding of several studies of imagery is that while imagining something there appears to be a lot of motor activity, which resembles the exploratory movements typically made during perceptual scrutiny of an object or scene. Jacobson (1932; see also Totten, 1935) had originally observed with a galvanometer that engaging in imagery (e.g., recollection) resulted in the measurement of action potentials in muscle groups that were specific to the body part which was imaginatively moved (e.g., during visual imagination, movements of the eye-balls was registered, while when thinking, one could register brief contractions in muscles of tongue). Moreover, several researchers have noticed a remarkable similarity in the duration of imagined actions compared to the time it takes to perform them (e.g., Decety, 1996, Decety et al., 1989, Jeannerod, 1994, Parsons, 1987). These findings clearly implicate the presence of motor processing during imagery, although the motor processes would often seem to constitute only a subset of those activated during overt movement (Ellis, 1995).

According to recent studies, gaze patterns (i.e., fixations and/or direction of saccades) that are measured in real time during recollection of a previous event look remarkably similar to the scanpaths during a perceptual recognition test of the same scene, despite the fact that when thinking about the episode there is nothing at all to look at on a blank computer screen. This phenomenon has been repeatedly observed in a variety of studies (e.g., Moore, 1903: Altmann, 2004, Brandt and Stark, 1997, Brandt et al., 1989, de’Sperati, 2003, Gbadamosi and Zangemeister, 2001, Hollingworth, 2005, Humphrey and Underwood, 2008, Jeannerod and Mouret, 1962, Johansson et al., 2006, Laeng and Teodorescu, 2002, Laeng et al., 2007, Martarelli and Mast, 2013, Renkewitz and Jahn, 2012, Spivey and Geng, 2001). It would seem that, when retrieving a visual image or episode, not only there occur spontaneous eye movements but these tend to reflect the content of the original scene. Deckert (1964) had observed that participants instructed to imagine a beating pendulum developed pursuit ocular movements of a frequency comparable to the frequency of a previously seen real pendulum. Intriguingly, studies of rapid eye movements or REM during sleep also would seem to show some relationship between the types of eye movements and the content of dreams (e.g., Aserinsky and Kleitman, 1953, Dement and Kleitman, 1957, Doricchi et al., 2007, Hong et al., 1997, Hong et al., 2009, Roffwarg et al., 1962) as well as time-locked activity within primary visual cortex (Miyauchi, Misaki, Kan, Fukunaga, & Koike, 2009).

At a first glance, the above phenomena are puzzling because it seems a meaningless expenditure of bodily energy and cognitive effort to move about the eyes when there is nothing to be seen. Purposeful saccades that cannot garner any visual input appear completely paradoxical in relation to normal visual processing, since the pattern of saccadic movements during perception seems to be purposefully guided towards visual information or ‘objects’ that are relevant for the cognitive system at that particular time (e.g., Einhäuser et al., 2008, Findlay and Gilchrist, 2003, Hayhoe and Ballard, 2005, Noton and Stark, 1971a, Noton and Stark, 1971b, Rothkopf et al., 2007, Rucci et al., 2007, Schütz et al., 2012, Stark and Ellis, 1981, Trommershäuser et al., 2009, Yarbus, 1967). Importantly, eye movements indicate the occurrence of shifts in spatial attention (Craighero et al., 2004, Deubel and Schneider, 1996, Henderson, 1992, Moore and Fallah, 2001, Rolfs et al., 2011, Shepherd et al., 1986) and covert visual attention may consist in the motor preparation of an eye movement (Rizzolatti et al., 1983, Rizzolatti et al., 1987). Hence, oculomotor activity could overload the cognitive system and/or interfere with other processes (cf. Loftus, 1972). Since the early days of research on mental imagery, both Francis Galton and Alfred Binet (Hadamard, 1945, pp. 72–73) had suggested that there may be an antagonism between the vividness or detail of a visual image and the presence of other activities.

A solution to the above puzzle is to assume that, contrary to the idea that such “empty” looks during recollection and imagination are either deleterious or irrelevant to cognition, they may actually serve some useful function. There is growing evidence for shared mechanisms of perception and imagery (e.g., Kan et al., 2003, Kosslyn and Thompson, 2000). In addition, the idea that perception is “active” or “embodied” has been gaining strength over the years within the cognitive sciences and neurosciences (Barsalou, 1999, Ellis, 1995, Findlay and Gilchrist, 2003, Gibbs, 2006, Gibson, 1979, Pezzulo et al., 2001, Pulvermüller and Fadiga, 2010). This perspective stresses the idea that the visual system does not merely register its environment but explores it and poses questions by “grasping” objects with the eyes and/or hands (Ballard et al., 1997, Castelhano et al., 2009, Karn and Hayhoe, 2000, Land et al., 1999). If perception and imagery share processing mechanisms, then also imagery may be “active” in the sense that adjustments of the body organs, even in a vacuum, could play a significant role in the retrieval of internally stored information. A straightforward hypothesis, already well-formulated by Hebb (1968), is that such an empty gaze serves the function of assisting the mental re-construction of a representation. According to Hebb (1968, p. 470), “if the image is a reinstatement of the perceptual process it should include the eye movements […] and if we can assume that the motor activity, implicit or overt, plays an active part we have an explanation of the way in which the part-images are integrated sequentially”. Neisser (1976) also speculated that the act of constructing an image would require eye movements like those originally made in perceiving, because imagery is a process of visual synthesis and construction, much like perception.

The fundamental “Hebbian” idea behind the present study is that eye fixations can provide a sort of “scaffolding structure” for generating a visual image part-by-part. As put by Mast and Kosslyn (2002), eye movements could play an important role in allowing one to visualize a montage, a composite created on the basis of memories of multiple fixations. In other words, a single object’s image may be constructed in a manner that is not that different from imagining a scene; since an object has a categorical spatial structure between its parts (Laeng, Shah, & Kosslyn, 1999), these can be treated as separate units or “objects”. Thus, gaze could trigger sequences of memories and could also help to position correctly each image of a part relative to other parts. For example, we may have vivid imagery of, say, a cat, when we go through (some of) the motions of looking at something and determining that it is a cat, even though there is actually no cat (Thomas, 2011). Thus, contrary to the idea that motoric activity during imagery may be an epiphenomenon, a meaningless spill-over of mental activity while thinking to be back in a previously encountered situation, which in itself could bear no meaningful effect on cognitive processing (e.g., Marks, 1973, Teichner et al., 1978), we believe that the present phenomena actually reflect something very important about the nature of mental representations.

Most current models of episodic memory do posit that one of the key functions of imagery is to allow reconstructing the past and, in particular, to generate specific predictions based on past experience (Addis et al., 2007, Hassabis et al., 2007, Moulton and Kosslyn, 2009, Schacter et al., 2007, Schacter et al., 2008). That is, imagery allows making explicit and accessible aspects of a specific situation. If someone’s gaze is engaged during recollection, despite being actually “looking at nothing”, this might actually tell us a great deal about mechanisms involved in memory recall (Ferreira et al., 2008, Ryan et al., 2000). Specifically, memory representations are based on integrating input from various sources with spatial information, which would seem to be registered by default in working memory as part of a dynamic motor system (Altmann, 2004, Altmann and John, 1999, Ballard et al., 1997, Hodgson et al., 2002, Logie, 1995, Richardson et al., 2009, Richardson and Kirkham, 2004). Thus, the visual system automatically registers a spatial index or pointer to a position in the visual field as a core element of an episodic trace, also in circumstances in which actions are not required, the location information is not relevant for solving the task, and there is no intention or demand to learn the spatial information (Laeng et al., 2007, Richardson and Spivey, 2000). Kent and Lamberts (2008) have proposed that memory retrieval is generally elicited by “mental simulation” (Barsalou, 1999); supposedly, when the integrated memory episode is reactivated at a later time, the spatial index relating to an object or part will also be automatically retrieved (Bourlon et al., 2011, Hoover and Richardson, 2008), which in turn triggers the eyes to move to the indexed location in which the part originally appeared. As Ballard et al. (1997, p. 724) point out: “Because humans can fixate on an environmental point, their visual system can directly sample portions of three dimensional space […] and as a consequence, the brain’s internal representations are implicitly referred to an external point.” Thus, gaze direction may indicate a retrieval attempt for a specific item of information (Renkewitz & Jahn, 2012). In fact, outside of on-screen laboratory experiments, locations in the environment are rarely completely empty. Therefore, gaze might garner useful contextual visual cues (like noticing an empty chair) when attempting to recall visual information. Finally, returning the eyes to the former location of an object could also improve memory for information associated with that object (e.g., Hollingworth, 2006, Johansson and Johansson, 2013), especially if spatial information contributes to maintaining the continuity and integrity of the “object file” or event (Hommel, 2004, Hommel et al., 2001, Kahneman et al., 1992).

In support of the above idea that the motoric activity during imagery plays a functional role, there exists some evidence that the accuracy of memory retrieval can be disrupted when someone who is holding an image in mind is restrained from making an eye movement or deliberately moves in an image-irrelevant way (e.g., Andrade et al., 1997, Antrobus et al., 1964, Barrowcliff et al., 2004, Gunter and Bodner, 2008, Postle et al., 2006, Ruggieri, 1999, Singer and Antrobus, 1965). Several studies have shown the same phenomenon with other movement types; e.g., the recall of an imagined path can be disrupted by a concurrent movement of the arm (Quinn, 1994). A counterclockwise manual rotation hinders the concomitant clockwise mental rotation of a visual object and vice versa; however, a counterclockwise mental rotation of a visual object does facilitate a clockwise mental rotation (Wexler, Kosslyn, & Berthoz, 1998). Demarais and Cohen (1998) observed that, while solving transitive inference problems with the terms left/right or above/below, participants spontaneously made more horizontal than vertical saccades during the former task but they showed the reverse pattern for the latter. Glenberg and Kaschak (2002) found that, when judging whether a sentence was sensible (e.g., “close the drawer”), participants had difficulty making such a judgment if required to make a response in the opposite direction. Dijkstra, Kaschak, and Zwaan (2008) have found that participants could retrieve more efficiently autobiographical information when their body positions while being queried were similar to the body position they had during the original event. In eye-tracking studies, when participants perform a problem solving tasks and simultaneously their eye movements are “guided” either according to a scanpath related to the problem’s solution or in an irrelevant way, the former gaze patterns lead to successful problem-solving than the latter ones (e.g., Grant and Spivey, 2003, Thomas and Lleras, 2007). Laeng and Teodorescu (2002) specifically found that memory suffered when spontaneous fixations during recall were prevented by enforcing fixation on a central cross at the time the participant attempted to answer a question regarding a previously seen object, which strongly suggests that the eye movements occurring during image generation are not epiphenomenal or a consequence of the experiment’s task demands (Jolicoeur & Kosslyn, 1985). Instead, they strongly suggest that, by disrupting a spontaneous action pattern, the memory system may be hindered in the retrieval of the details of a mental representation and that they play a functional role in the process of recollecting and re-constructing a previous perception. Consistently with the findings of Laeng and Teodorescu’s (2002; Experiment 2), successive studies have found evidence, by forcing fixation during retrieval, that eye movements played a functional role for memory, since this procedure reduced episodic memory performance (Johansson et al., 2012, Johansson and Johansson, 2013, Mäntylä and Holm, 2006).

In the specific case of visual imagery, eye-tracking studies support the idea that the original locations get automatically stored as a part of the scene’s representation and, to some extent, a trace of the whole oculomotor sequence may also be kept, as originally suggested by Noton and Stark, 1971a, Noton and Stark, 1971b, although the evidence for a “scanpath” memory remains weak (cf. Johansson, Holsanova, & Holmqvist, 2006). Moreover, a key aspect that is still unclear is whether these eye movements during imagery simply return to a generic position occupied by an object (e.g., its center of mass) or they actually “mirror” to some degree the details or parts of a single object, as classic accounts of imagery would appear to imply. The few extant studies in which observers were asked to visualize single pattern stimuli remains ambiguous in this respect (i.e., Brandt and Stark, 1997, Laeng and Teodorescu, 2002, Martarelli and Mast, 2013, Noton and Stark, 1971a, Noton and Stark, 1971b). Therefore one aim of the present study is to directly address the question of whether fixations during imagery do not simply occur over a generic, center-of-mass, position of the object but also on specific locations corresponding to an object’s features. If imagery of a single object can be based on a global encoding of the shape as a single unit (cf. Kozhevnikov, Kosslyn, & Shepard, 2005), then it might be possible than no more than a generic gravitation of gaze over the region previously occupied by the object would be observed. However, a large literature on looking at patterns has revealed that the eye’s “dwell time” is a function of the information value of specific parts or features of an object (e.g., Buswell, 1935, Deco and Schürmann, 2000, Kaufman and Richards, 1969, Leek et al., 2012, Mackworth and Morandi, 1967, Renninger et al., 2007, Yarbus, 1967, Zusne and Michels, 1964). Thus, we hypothesize that when reconstructing the image of a single object, like the figure of an animal or a geometrical shape, fixations during imagery will concentrate in the locations of those ‘parts-rich’ regions of the shape where gaze mainly dwells during perception.

In the present study, we monitored gaze by use of an infrared remote eye-tracker during perception and imagery of geometric forms (Experiment 1) and figures of animals (Experiment 2 and 3). In the first experiment, we provide evidence that when observers look at an empty screen and at the same time they generate a detailed visual image of a simple geometrical form that was previously seen, their gaze probabilistically dwells within a region that corresponds to the original positions and shape of the imagined object. In Experiments 2 and 3, we show that the more faithful an observer’s gaze enactment between perception and imagery over specific parts of figures of animals, the more accurate is the observer’s memory. Moreover, when participants were queried in a recall phase about properties of each one of the animal pictures, gaze not only returned to the location of the part probed by the question, but interfering with this process (by asking them to maintain fixation away from the original location of the shape while thinking about the answer), resulted in measurable costs in the quality of memory.

Section snippets

Experiment 1

In the first experiment, we showed pictures of equilateral triangles on a computer monitor, while the participants’ eye fixations were monitored by an infrared eye-tracker. The triangles were always shown centered over the same gray background (see Fig. 1) but their orientation was in half of the trials upright (i.e. with one corner pointing up) and in the other half upside-down (i.e. one side was on the top and one corner pointed down). To introduce some variety in the stimuli, the internal

Experiment 2

The previous experiment used simple geometric figures that the observers visualized. In these shapes, the key features are evenly or symmetrically distributed. In contrast, living organisms and many man-made objects (e.g., automobiles) tend to have a hierarchical distribution of features, also rather asymmetric, so that several of the relevant features may be densely crowded within a same section of the object’s body (e.g., the head of the animal or the front of the automobile). Moreover, many

Experiment 3

Although the findings of the previous two experiments are highly consistent with a theoretical account positing that imagery emulates perception not only phenomenally or subjectively but also in term of the oculomotor operations that occur during perception, one could remain skeptical that we really tested imagery in the traditional sense of the concept. That is, imagery often refers to the internal re-creation of a visual stimulus on the basis of top-down knowledge, as in “please imagine an

General discussion

The present experiments showed highly correlated patterns of gaze between perception and imagery of a same visual object. Specifically, at imagery retrieval, gaze fixations were likely to re-occur over the same regions of space as those scrutinized during the encoding or perceptual scrutiny of the shape. In the first experiment, we observed that imagining upright versus upside-down triangles resulted in a pattern of fixation that mirrored the shape and orientation of each form over the same

References (207)

  • H. Deubel et al.

    Saccade target selection and object recognition: Evidence for a common attentional mechanism

    Vision Research

    (1996)
  • F. Ferreira et al.

    Taking a new look at looking atnothing

    Trends in Cognitive Science

    (2008)
  • G. Ganis et al.

    Brain areas underlying visual mental imagery and visual perception

    Cognitive Brain Research

    (2004)
  • M.R. Greene et al.

    Reconsidering Yarbus: A failure to predict observers’ task from eye movement patterns

    Vision Research

    (2012)
  • R.W. Gunter et al.

    How eye movements affect unpleasant memories: Support for a working-memory account

    Behaviour Research and Therapy

    (2008)
  • M. Hayhoe et al.

    Eye movements in natural behavior

    Trends in Cognitive Sciences

    (2005)
  • B. Hommel

    Event files: Feature binding in and across perception and action

    Trends in Cognitive Science

    (2004)
  • M.A. Hoover et al.

    When facts go down the rabbit hole: Contrasting features and objecthood as indexes to memory

    Cognition

    (2008)
  • S.A.H. Jones et al.

    Memory for proprioceptive and multisensory targets is partially coded relative to gaze

    Neuropsychologia

    (2010)
  • J.H. Kaas

    Topographic maps are fundamental to sensory processing

    Brain Research Bulletin

    (1997)
  • D. Kahneman et al.

    The reviewing of object files: Object-specific integration of information

    Cognitive Psychology

    (1992)
  • C. Kent et al.

    The encoding–retrieval relationship: Retrieval as mental simulation

    Trends in Cognitive Sciences

    (2008)
  • S.M. Kosslyn et al.

    Sequential processes in image generation

    Cognitive Psychology

    (1988)
  • G.T.M. Altmann

    Language-mediated eye movements in the absence of a visual world: The ‘blank screen paradigm’

    Cognition

    (2004)
  • J. Andrade et al.

    Eye-movements and visual imagery: A working memory approach to the treatment of post-traumatic stress disorder

    British Journal of Clinical Psychology

    (1997)
  • J.S. Antrobus et al.

    Eye movements accompanying daydreaming, visual imagery, and thought suppression

    Journal of Abnormal and Social Psychology

    (1964)
  • E. Aserinsky et al.

    Two types of ocular motility ocurring in sleep

    Journal of Applied Psychology

    (1953)
  • D.H. Ballard et al.

    Deictic codes for the embodiment of cognition

    Behavioral and Brain Sciences

    (1997)
  • M. Bar et al.

    Top-down facilitation of visual recognition

    Proceedings of the National Academy of Sciences

    (2006)
  • A.L. Barrowcliff et al.

    Eye-movements reduce the vividness, emotional valence and electrodermal arousal associated with negative autobiographical memories

    The Journal of Forensic Psychiatry & Psychology

    (2004)
  • L.W. Barsalou

    Perceptual symbol systems

    Behavioral and Brain Sciences

    (1999)
  • T.G. Bever et al.

    Analysis by synthesis: A (re)-emerging program of research for language and vision

    Biolinguistics

    (2010)
  • I. Biederman

    Recognition-by-components: A theory of human image understanding

    Psychological Review

    (1987)
  • Brandt, S. A., Stark, L. W., Hacisalihzade, S., Allen, J., & Tharp, G. (1989). Experimental evidence for scanpath eye...
  • S.A. Brandt et al.

    Spontaneous eye movements during visual imagery reflect the content of the visual scene

    Journal of Cognitive Neuroscience

    (1997)
  • B. Bridgeman

    Conscious vs. unconscious processes: The case of vision

    Theory and Psychology

    (1992)
  • J.R. Brockmole et al.

    Eye movements and the integration of visual memory and visual perception

    Perception & Psychophysics

    (2005)
  • G.T. Buswell

    How people look at pictures

    (1935)
  • M.S. Castelhano et al.

    Viewing task influences eye movement control during active scene perception

    Journal of Vision

    (2009)
  • O. Chelnokova et al.

    Three-dimensional information in face recognition: An eye-tracking study

    Journal of Vision

    (2011)
  • B.N. Cuthbert et al.

    Imagery: Function and physiology

  • D.D.J. de Grave et al.

    The effect of the Müller–Lyer illusion on saccades is modulated by spatial predictability and saccadic latency

    Experimental Brain Research

    (2010)
  • M. DeAngelus et al.

    Top-down control of eye movements: Yarbus revisited

    Visual Cognition

    (2009)
  • G.H. Deckert

    Pursuit eye movements in the absence of a moving visual stimulus

    Science

    (1964)
  • G. Deco et al.

    A neuro-cognitive visual system for object recognition based on testing of interactive attentional top-down hypotheses

    Perception

    (2000)
  • W. Dement et al.

    The relation of eye movements during sleep to dream activity: An objective method for the study of dreaming

    Journal of Experimental Psychology: General

    (1957)
  • A. Demichelis et al.

    Motor transfer from map ocular exploration to locomotion during spatial navigation from memory

    Experimental Brain Research

    (2012)
  • C. de’Sperati

    Precise oculomotor correlates of visuospatial mental rotation and circular motion imagery

    Journal of Cognitive Neuroscience

    (2003)
  • K. Dijkstra et al.

    Body posture facilitates retrieval of autobiographical memories

    Cognition

    (2008)
  • F. Doricchi et al.

    The “ways” we look at dreams: Evidence from unilateral spatial neglect (with an evolutionary account of dream bizarreness)

    Experimental Brain Research

    (2007)
  • Cited by (106)

    • Reinstating location improves mnemonic access but not fidelity of visual mental representations

      2022, Cortex
      Citation Excerpt :

      The primary goal of our registered report was to test whether spatial information enhances fidelity of color recall. Thus far, there is some existing evidence that reinstating previous locations is beneficial for recall of information associated with that location (e.g., Laeng et al., 2014). However, it is still unclear whether corresponding spatial information can indeed enhance mnemonic fidelity of visual mental representations.

    • PERCEPTION: THE BASICS

      2024, Perception: The Basics
    View all citing articles on Scopus
    View full text