Introduction

It is hard to deny the importance of learning the spatial layout of the environment in our daily lives, as we go to work, do errands, find restaurants, and manage to get back home. In order to navigate successfully, we must acquire some knowledge of the spatial relationships between these locations. Successful navigation may involve scene and place recognition, reliance on salient landmarks, route knowledge, and/or survey knowledge (Wiener, Buchner, & Holscher, 2009). Route knowledge enables one to follow a known path from one location to another, whereas survey knowledge includes some configural information and gives one the ability to take novel shortcuts and detours between locations, traversing paths that have never been taken before. There are thus different types of spatial knowledge that a navigator might acquire during exploration of a new environment, which could depend on the structure of that environment, how it is explored, or the effort devoted to learning it.

Appleyard (1970) was one of the first to note that passengers on a bus seem to acquire only route knowledge of a city, whereas bus drivers have a much greater level of survey knowledge. Taxi drivers may have even greater knowledge than bus drivers, since they navigate novel routes through the city (Maguire, Woollett, & Spiers, 2006). This intuition immediately suggests that the difference between passive exposure and active exploration has important implications for spatial learning. But the anecdote raises more questions than it answers. Is there, in fact, a systematic difference between active and passive learning? If so, what are the differences in the resulting spatial knowledge? What constitutes “active” exploration specifically—the physical activity of self-motion and its sensory–motor consequences, or the cognitive activity of choosing a route or attending to and encoding particular aspects of the environment?

The purpose of this review is to investigate how the mode of exploration in a new environment influences the resulting spatial knowledge. We focus on the distinction between active and passive spatial learning and ask how they contribute to landmark, route, and survey knowledge.

We begin by arguing that the active/passive dichotomy is too coarse a distinction, for “active” learning encompasses a number of potential components. Our goal is to tease out the active and passive contributions to these types of spatial knowledge and identify gaps in the existing literature. We start with the sensory–motor components of physically walking through an environment and then pursue cognitive mechanisms that may play a role in active learning. We discuss how literature on spatial updating contributes to larger issues of spatial navigation. We then turn to attention and working memory, which operate in tandem to selectively encode different aspects of the environment; active manipulation of spatial information in working memory can yield greater learning. Research on these topics has been hampered by inconsistent methods, making both qualitative and quantitative comparisons difficult. Throughout the review, we point out these inconsistencies, while attempting to draw firm conclusions wherever possible.

The results suggest that there is a relation between active exploration and the acquisition of spatial knowledge. Specifically, we argue that the idiothetic information available during walking contributes to metric survey knowledge and appears to interact with attention. We find that some aspects of places and landmarks can be learned without much effort but that full route and survey knowledge require the allocation of attention and encoding in working memory. Different components of working memory may be responsible for encoding certain aspects of the environment, while mental manipulation of spatial information may also play a role in learning.

Active and passive spatial learning

Despite Appleyard’s (1970) observation, studies comparing active and passive spatial learning have yielded surprisingly mixed results. One reason for the heterogeneous findings is that active exploration actually involves several complex activities that are often confounded in experimental designs.

Components of active learning

To test passive learning, experimenters typically present visual information about the path of self-motion through the environment—such as the sequence of views seen by an explorer—to a stationary observer in the form of a video or series of slides. Active learning, however, may not be limited to physical movement alone. In addition to the motor control of action, active learning could include the resulting sensory information about self-motion and several cognitive processes (Gaunet, Vidal, Kemeny, & Berthoz, 2001). Specifically, we can identify five distinct components of active exploration that potentially contribute to spatial knowledge: (1) efferent motor commands that determine the path of locomotion, (2) reafferent proprioceptive and vestibular information for self-motion (1 and 2 are collectively referred to as idiothetic information; Mittelstaedt & Mittelstaedt, 2001), (3) allocation of attention to navigation-related features of the environment, (4) cognitive decisions about the direction of travel or the route, and (5) mental manipulation of spatial information. These components may be grouped into those that involve physical activity (motor control and reafferent information) and those that involve cognitive activity (attention, decision making, and mental manipulation) (Wilson, Foreman, Gillett, & Stanton, 1997). For present purposes, we will refer to navigation that involves any or all of these five components as active. But the aim of this review is to refine the concept by identifying which of these components actually play a role in spatial learning. We attempt to elucidate their relative contributions to particular forms of spatial knowledge and whether they act independently or interact in some way.

On the basis of theoretical considerations, we would expect these components of active learning to differentially affect what the explorer learns about specific aspects of spatial structure. First, we hypothesize that idiothetic information plays an essential role in the acquisition of survey knowledge. Survey, or “map,” knowledge is believed to depend upon information about the metric distances and directions between locations, such as that provided by the motor, proprioceptive, and/or vestibular systems, together with a process of path integration. Although passive vision also provides information about the depth and visual direction of objects, spatial perception is subject to large affine distortions (Koenderink, van Doorn, & Lappin, 2000; Loomis, Da Silva, Fujita, & Fukusima, 1992; Norman, Crabtree, Clayton, & Norman, 2005). The idiothetic systems specifically register distance and turn information along a traversed path, providing a basis for path integration, and thus might be expected to improve the accuracy of survey knowledge.

Second, we hypothesize that active decision making about the path of travel is sufficient for the acquisition of route knowledge, in the absence of idiothetic information. Given that route knowledge is believed to consist of a sequence of turns at recognized locations (place–action associations) along a learned route (Siegel & White, 1975), making decisions about turns on one’s path should be sufficient to acquire useful route knowledge, without metric information.

Third, we hypothesize that the acquisition of route and survey knowledge depends on the allocation of attention to corresponding environmental properties. For example, assuming that place–action associations depend on reinforcement learning mechanisms, explicitly attending to conjunctions of landmarks and turns should facilitate route learning (Chun & Turk-Browne, 2007; Sutton & Barto, 1998). Similarly, attending to information about the relative spatial locations of places should enhance survey learning. On the other hand, to the extent that object encoding and recognition are automatic processes (Duncan, 1984; O’Craven, Downing, & Kanwisher, 1999), landmark learning should not require the allocation of attention. Finally, these components may interact. For instance, actively making decisions about his or her route may lead the observer to attend to different features of the environment than when following a prescribed route.

Note that many experiments make use of desktop virtual reality setups (desktop VR), in which participants use a joystick to steer around a virtual environment presented on a monitor. This process is quite different from walking around an environment: Although desktop VR does involve some physical hand movements, actual walking provides qualitatively different motor, proprioceptive, and vestibular information.

Navigation versus spatial updating

The issue of active and passive learning has also come up in recent research on the topic of spatial updating. Spatial updating occurs when an observer maintains information about the spatial relations among objects as he or she moves around in the environment. Spatial updating is thus closely related to path integration and probably shares many of the same mechanisms, including reliance on visual and idiothetic information. However, there are important methodological differences between spatial updating and navigation paradigms that make it difficult to compare the findings. Experiments on spatial updating typically present a small set of objects in a central location that are all viewed simultaneously, so the participant can perceive the spatial relationships between the objects. In navigation experiments, by contrast, the observer is typically embedded in a larger environmental layout and views objects sequentially, so he or she must path integrate between them to derive their spatial relationships. Despite these differences, some researchers have used results from spatial updating to support claims concerning navigation. We believe that active and passive spatial updating should not be confused with active and passive navigation. We attempt to clarify this rather unwieldy body of literature.

Limitations of the literature

To illustrate some of the challenges in conducting research on active and passive learning, we begin with a few introductory examples. Gaunet et al., (2001) attempted to isolate the motor/proprioceptive component of active learning using desktop VR. They asked three groups to follow routes in novel environments: active, passive, and snapshot. The active group physically handled a joystick to steer but did not make decisions about the travel path; the experimenters verbally instructed participants to “go straight” or “turn left.” The passive group simply watched a video of the same route through the environment, while the snapshot group saw sample views taken from the video, instead of continuous motion. The authors failed to find an active/passive effect: There were no group differences in pointing back to the start location from the end of the path or in a scene recognition task. The only difference occurred in route drawing, and even then there was no difference between the active and passive groups; rather, the snapshot group had larger distance and angle errors. The implication of these results is that some spatial knowledge can be obtained from all three modes of exploration. However, the absence of an active advantage might be due to the reduced motor and proprioceptive information when a joystick is used or the lack of decision making during exploration.

Other evidence points to an active/passive effect. Carassa, Geminiani, Morganti, and Varotto (2002) reported that self-controlled exploration with a joystick in desktop VR led to greater wayfinding abilities than did passively following an avatar through the environment. However, this result is confounded by the fact that the active group was instructed to use the most efficient exploration procedures, which could have promoted different spatial processing; in addition, the visual input was not equated for the two groups. Some research suggests that it may be the motor component that yields an active advantage. Farrell et al., (2003) found that using a keyboard both to actively explore and to follow a prescribed route in desktop VR led to fewer errors when tested in a real environment, as compared with participants without prior experience in the environment; in contrast, passively watching a video of the route did not yield such an improvement. However, visual input was not equated in the active exploration and route-following conditions, and it is not clear whether the difference between route-following and passive viewing conditions is due to motor control of the keyboard or to a difference in attentional deployment.

These studies highlight some key challenges facing research on active and passive navigation. First, the use of desktop VR fails to provide appropriate idiothetic information about self-motion. Motor control of a joystick or keyboard is qualitatively different from that of legged locomotion, and the resulting proprioception specifies the joystick deflection or the number of key presses, rather than properties of the step cycle, while vestibular information specifies that the observer is stationary. These sources of information could be vital for keeping track of the distance traveled and the magnitude of body rotations. The size of the display may also affect performance on spatial tasks (Tan, Gergle, Scupelli, & Pausch, 2006). In addition, the relation between motor efference and visual reafference in desktop VR is different from that in walking and, thus, may affect visual path integration.

Second, it is difficult to isolate and test the active and passive components. The difference between the active and passive groups in Gaunet et al. (2001), for example, consisted of only the motor and proprioceptive information arising from use of a joystick. The null result in this study thus may not be surprising. To adequately test the contribution of physical activity, an ideal experiment would compare one group that walked around in the environment with full locomotor control and information about self-motion, guided by an experimenter (to prevent decision making), with a group that watched a matched video of that exploration.

Third, it is important to equate the size and visibility of the environments. Being able to see the entire layout at once may yield different effects than being immersed in the environment and moving around to view the layout. In the former case, the spatial relations among objects are immediately visible, whereas in the latter case, they must be determined via path integration.

Finally, it is important to match the views seen by participants to the greatest extent possible, including the visual angle of the display. Often, researchers allow active participants to freely explore the environment but guide passive participants through a standard preplanned route. The active groups may thus have exposure to the environment that the passive groups do not, making comparisons uncontrolled. Both Carassa et al. (2002) and Farrell et al. (2003) failed to match the route of exploration of the passive groups with that of the active groups.

If our review of active and passive learning were limited to studies using real-world or ambulatory virtual environments with walking observers, matched views, and appropriate idiothetic information, the discussion would be very short. Even the most complete studies tend to have one or more of these limitations. We thus attempt to draw some preliminary conclusions about active and passive spatial learning from the available literature, bearing in mind that they must be clarified by further research.

Idiothetic information, decision making, and attention in spatial learning

In this section, we address the contributions of idiothetic information and decision making during exploration to landmark, route, and survey learning; we also discuss attention as it relates to these factors. We begin by exploring attempts to cross aspects of physical movement with the ability to make decisions about exploration.

An illustrative example comes from the developmental literature. When young children actively explore a playhouse, they are better at finding novel shortcuts and reversing routes than are children who are led around or carried around by their parents (Hazen, 1982). Thus, making route decisions appears to improve children’s spatial learning over being led on a route; such decision making may also drive attentional allocation. On the other hand, in this instance, idiothetic information did not appear to contribute to spatial learning in children, for there was no advantage to being led over being carried. When navigators can make their own decisions about the direction of travel, they may then test predictions about how their own actions affect their subsequent views of the environment (Gibson, 1962; James, Humphrey, & Goodale, 2001) or the change in direction and magnitude of their own movements (Larish & Andersen, 1995).

Most research that focuses on decision making tends to use desktop VR, making it difficult to assess the role of idiothetic information. Conversely, research on idiothetic information tends to ignore the role of decision making and attention. Finally, we examine the relation between research on spatial updating and the question of active and passive spatial learning. We argue that the scale and visibility of the environment are important factors to consider when these two literatures are interpreted.

Motor control and decision making in desktop VR

A comprehensive examination of the active and passive distinction was carried out in a series of experiments by Patrick Péruch, Paul Wilson, and their colleagues, using desktop VR. Péruch, Vercher, and Gauthier (1995) first examined differences between active and passive learning in a semiopen environment, using a within-subjects design. In the active condition, participants were able to explore freely using a joystick, giving them both motor and cognitive control. In the passive-dynamic condition, they watched a video of exploration, while in the passive-snapshot condition they viewed slides of exploration. During the test phase, they were asked to navigate through the environment to each of four landmarks, taking the shortest route possible. The active condition led to significantly higher performance on this task than did the passive-dynamic condition, which, in turn, was significantly better than the passive-snapshot condition. There were also individual differences, such that some people tended to perform well in all conditions, while others fared poorly throughout.

This finding stands in contrast to that of Gaunet et al. (2001), who reported no difference between the active and the passive-dynamic conditions; hence, the effect might be attributable to decision making during exploration in the present experiment, as compared with a prescribed route in Gaunet et al. However, Péruch et al. (1995) also provided more exposure to the environment: Whereas participants in Gaunet et al. saw a given section of a route only once, the present participants learned a relatively small semiopen environment and typically traveled through a given section two to three times, which could have promoted active learning. In addition, Péruch et al. used different environmental layouts in each condition, and the passive video was not matched to the active condition, so the effect could have been due to variations between conditions. Thus, it is unclear whether it is active motor control, active decision making, the exposure to the environment, or the difference in layout that accounts for the active advantage.

To address some of these problems, Wilson et al. (1997) conducted an experiment with five groups in a yoked design, using desktop VR with a keyboard (where “active” denoted decision making). The active-with-movement group both made decisions about their path and controlled the movement by pressing the keyboard, whereas the passive-without-movement group simply viewed the corresponding display. The active-without-movement group decided where to go but communicated the decision to yoked passive-with-movement participants, who carried out the action with the keyboard. The control group simply performed the test trials without previous exposure to the environment and should, thus, have performed at chance. In the test phase, participants were virtually dropped at one of three landmarks and were asked to point to the other two landmarks; they also drew a map of the environment. All experimental groups had significantly smaller pointing errors than did the control group, indicating that some survey learning had taken place. However, there were no differences between any of the experimental groups, such that neither motor nor cognitive activity proved to have an effect. In a second experiment, the authors used a simpler environment and test tasks similar to those of Péruch et al. (1995). Even so, they found no differences between the experimental groups and only one significant difference between the control group and the passive group.

To reconcile their opposing findings, Wilson and Péruch (2002) joined forces to examine the issue of motor and cognitive control, again using desktop VR. They designed a yoked setup in which active participants explored the environment while passive viewers either sat next to them and watched their movements together with the display or only viewed a video of the display. The results in this case show that passive participants were more accurate at pointing to the targets when sitting next to the active participants than when watching the video; they were also more accurate than active participants in wayfinding. These results contradict both the previous findings of either no difference (Wilson, 1999; Wilson et al., 1997) or better performance by active observers (Péruch et al., 1995; Tan et al., 2006). To resolve these inconsistent findings, the authors tested both the active and passive conditions in a within-group design, with all yoked pairs sitting side-by-side during exploration. In this case, they found no differences between conditions for any of the dependent measures.

A related experiment investigated the contribution of active and passive exploration to scene recognition in desktop VR. Christou and Bülthoff (1999) paired active explorers who used a track ball to explore a virtual house with passive observers who watched a video of the display. They found that participants were best at recognizing scenes of the environment on the basis of views they had previously observed and were also better at recognizing scenes from novel views than from mirror-reversed familiar views. However, there was no difference between the active and passive groups: Both showed higher accuracy and faster reaction times for familiar views than for novel views of the same environment. The results are consistent with the acquisition of view-based scene representations but show that active learning is no better than passive learning, even for recognizing scenes from novel viewpoints. When passive observers viewed only snapshots of the display, performance on novel views dropped dramatically, to the level of mirror-reversed views. This result confirms an advantage of continuous visual motion during exploration of the environment.

Taken together, these results offer little support for a role of decision making in spatial learning. Any effects of active versus passive exploration in desktop VR are small and unreliable and may be susceptible to minor procedural differences. In addition, the reduced motor and proprioceptive information from small movements of a joystick or keyboard does not adequately test the idiothetic contribution. We thus take a more detailed look at the role of idiothetic information in studies of active walking.

Idiothetic information during walking

Much of the research on idiothetic information during locomotion goes beyond desktop VR by using ambulatory VR—environments that are presented in a head-mounted display with a head-tracking system, so the participant can walk through the virtual environment. In an early study, Grant and Magee (1998) reported results consistent with an idiothetic contribution. Participants were guided on a prescribed route in a large-scale real environment and a matched virtual environment with an interface that allowed them to walk in place, so decision making and the visual sequence were controlled; they were subsequently tested on finding locations in the real environment. Participants who walked in the real environment were faster to find locations in the test than were those who walked in place (reducing idiothetic information) or used a joystick in the virtual environment. The walk-in-place group also showed some advantages over the joystick group, such as taking shorter paths in the test. These results suggest a role for idiothetic information; however, the real-environment group also had a larger field of view and free head movements, as compared with the VR groups.

Additional studies have also examined the contributions of idiothetic information to wayfinding or route knowledge. Ruddle and Lessels (2009) had participants search for hidden objects in a room-sized virtual environment. They found better performance for those who walked, as compared with those who physically rotated but translated with a joystick and those who used a joystick for both rotation and translation. In contrast, Riecke et al. (2010) reported an advantage only for physical turns over joystick alone. They found that the addition of physical translation aided learning only by leading to less total distance traveled during search.

Ruddle, Volkova, Mohler, and Bülthoff (2011b) examined body-based contributions to route knowledge. They had some participants walk in a virtual environment to follow a specified route and then asked them to retrace the route and repeat the out-and-back route several times. Other participants made physical rotations but used a joystick for the translation component. Overall, the walking group had fewer errors, primarily when traveling in the reverse direction on the route. Ruddle, Volkova, and Bülthoff (2011a) similarly found an advantage for walking over physical rotations and purely visual exploration of a virtual marketplace. In this experiment, participants searched for four target objects, and then, after returning to the start location, they had to find the objects again and estimate distances and directions to the other objects. In a small-extent environment, the walking group traveled less to find the target objects and had more accurate estimates of the distance between targets. In a large-extent environment, participants walked using an omnidirectional treadmill, walked with a linear treadmill but used a joystick for rotations, physically rotated but used a joystick for translations, or used a joystick for both rotations and translations. In the larger environment, those participants who walked using a treadmill (either omnidirectional or linear) had more accurate estimates of distance and direction between targets. Together, these two studies indicate that motor and proprioceptive information are vital to learning routes, as well as to some survey knowledge, while rotational information contributes minimally to wayfinding.

Most other examinations of idiothetic contributions to spatial learning focus primarily on survey knowledge. Chance, Gaunet, Beall, and Loomis (1998) examined spatial learning from path integration in fairly simple virtual mazes, which contained one to three target objects separated by barriers. The experimenters varied the availability of idiothetic information by having participants walk or steer with a joystick on a prescribed path through the environment, thus eliminating decision making. Participants were instructed to keep track of the object locations along the path; at the end of the path, they reported the location of each object by referring to the hands of a clock to indicate their estimate (Experiment 1) or by turning to face the object’s location (Experiment 2), without feedback. Participants who physically walked had lower absolute pointing errors than did those who used a joystick to traverse the path, but only after considerable exposure to the testing procedures and environments (on the third trial in each maze, Experiment 1) or with a path that had no decision points, possibly allowing for more attention to the location of the objects (Experiment 2). However, participants who used a joystick to traverse linear segments but physically turned in place to change direction were in between and were not significantly different from either group. These findings indicate that idiothetic information about translation and rotation during locomotion (and possibly each separately) is important to keeping track of one’s position and acquiring spatial relations in the environment.

Waller and Greenauer (2007) conducted a similar experiment in which participants traveled a prescribed path through a series of hallways with several noted locations. The walk group had visual and idiothetic information, the wheeled group had only visual and vestibular information, and the still group viewed videos of the display. Participants were asked to point and estimate distances between all possible pairs of locations. In contrast to Chance et al. (1998), there were no overall differences in pointing errors between conditions, but there was a significant advantage for the walk group when the pairs of locations were linked by a large number of turns. Mellet et al. (2010) likewise found no differences in relative distance judgments when comparing those who learned object locations by walking in a simple real hallway and those who learned by using a joystick in VR. Taken together, these ambulatory studies suggest a contribution of motor and proprioceptive information (although perhaps not vestibular information) to spatial learning, but only on sufficiently complex paths and after repeated exposure to the environment.

The environments used in the last several studies were fairly simple, with few, if any, path intersections or choice points. They were also fairly small, the size of a room or a building, although objects were not simultaneously visible. It is possible that idiothetic information is more useful for spatial updating in a small-scale environment than for learning survey knowledge in a large-scale space. Longer paths may lead to increasing drift in path integration—particularly, vestibular information—eventually rendering it unreliable for estimating distance and direction (see Collett, Collett, Chameron, & Wehner, 2003; Etienne, Maurer, Boulens, Levy, & Rowe, 2004; Etienne, Maurer, & Séguinot, 1996; Müller & Wehner, 2010, for the drift and resetting of path integration in animals). To test this hypothesis, Waller, Loomis, and Steck (2003) varied the magnitude and fidelity of vestibular information that participants had access to while exploring a large real-world environment. Some participants were driven in a car on a route through the environment while receiving full visual and vestibular information. Others rode in the car while viewing a video in an HMD that matched the vestibular input, but with a reduced field of view. A third group viewed the same video while the car traveled on a different route, such that visual and vestibular information were incongruent. A final group watched the video while sitting still, receiving no vestibular input. Participants were asked to estimate distances and directions between all 20 possible pairs of locations on the route. Those who had full visual and vestibular information were more accurate than any of the other three groups, which did not differ from each other. These results suggest that vestibular input contributes to survey knowledge of a large environment only when it is paired with a large field of view. The differences between the full-information and congruent groups might be due to the field of view but could also be attributed to active head turns or to visual fidelity in the full-information condition.

An additional limitation of this experiment is the absence of proprioceptive information. To remedy this limitation, Waller, Loomis, and Haun (2004) presented both proprioceptive and vestibular information during exploration. Participants traveled a prescribed route by walking in a virtual environment while wearing an HMD, viewing a matched video in the HMD while sitting, or watching a matched video in the HMD that was smoothed to minimize head jitter and rotation. They kept track of five locations along the route and, at the end, gave pointing estimates between all possible pairs. Participants who walked through the environment were more accurate than those who watched either of the videos, indicating that idiothetic information contributes to survey knowledge of the environment. It remains to be determined whether this effect is due to the motor and proprioceptive information, the vestibular information, or their combination.

Another line of evidence stems from research on alignment effects in spatial cognition. Early work had found that participants are more accurate in making spatial judgments when they are aligned with the initial learning orientation (e.g., to a map) than when facing the opposite direction (Evans & Pezdek, 1980; Presson & Hazelrigg, 1984; Richardson, Montello, & Hegarty, 1999; Thorndyke & Hayes-Roth, 1982). Such alignment effects imply that the learned spatial representation is orientation specific. However, recent evidence suggests that even a small amount of motor and proprioceptive information can reduce alignment effects (Richardson et al., 1999; Rossano, West, Robertson, Wayne, & Chase, 1999; Sun, Chan, & Campos, 2004). Sun, Chan, and Campos found that participants who walked on a prescribed route through a real building during exploration had lower overall pointing errors to landmarks than did those who rode a stationary bike on the same route through a virtual building, presented in an HMD. However, they reported no alignment effects in either group. Alignment effects even disappeared when exploration was controlled with a mouse, despite reduced motor and proprioceptive information. Passively watching a video of the corresponding display, however, resulted in the same kinds of alignment errors as those observed in map learning. These results indicate that very little motor efferent and proprioceptive information, without vestibular information, may be sufficient to yield orientation-free spatial knowledge. The absence of alignment effects should be noted with caution, since it does not necessarily correlate with superior spatial knowledge. Rather, their absence indicates that spatial knowledge is not view dependent, although this conclusion seems at odds with the results of Christou and Bülthoff (1999) for scene recognition. Better spatial knowledge is acquired when actively walking, but this result could be due to a larger field of view in the real environment.

In sum, the evidence offers qualified support for an idiothetic contribution to spatial learning. The addition of motor, proprioceptive, and possibly vestibular information due to walking during exploration appears to improve performance on survey tasks such as pointing, over and above passive vision alone (Chance et al., 1998; Waller et al., 2004). Similar results are also seen in route learning and wayfinding tasks (Riecke et al., 2010; Ruddle et al., 2011a; Ruddle et al., 2011b). This pattern seems to hold especially with complex paths or repeated exposure to the same environment (Chance et al., 1998; Waller & Greenauer, 2007), suggesting that passive vision may be sufficient for simple environments (Mellet et al., 2010) and that idiothetic learning may build up over time. Other positive results could be attributable to a larger field of view or free head movements in the walking condition (Grant & Magee, 1998; Sun et al., 2004; Waller et al., 2003). Thus, the general pattern of results is consistent with a role for idiothetic information in active spatial learning, although the relative contributions of locomotor efference, proprioception, and vestibular information remain to be determined.

However, these studies did not attempt to control for the allocation of attention during exploration, which may also be an important contributor to active learning. It is possible that attention is allocated to different aspects of the environment in active and passive experimental conditions. For example, active exploration requires greater interaction with the environment, which may lead participants to attend more to the spatial layout (Wilson & Péruch, 2002). Thus, we turn to possible effects of attention during exploration.

Attention to spatial and nonspatial properties

Wilson et al. (1997; Wilson, 1999) speculated that the null results in their desktop VR experiments might be explained by similar patterns of attention in both active and passive conditions. They had instructed all participants in both conditions to pay attention to the spatial layout. Thus, they hypothesized that when passive observers attend to spatial properties, they perform as well as active explorers.

Conversely, other results suggest that active/passive differences appear when attention is directed to nonspatial aspects of the environment. Attree et al. (1996; Brooks, Attree, Rose, Clifford, & Leadbetter, 1999) instructed participants to attend to objects while taking a route through a desktop virtual environment—specifically, to “study the objects . . . and try to find an umbrella which may or may not be there.” Active participants explored the environment using a joystick and performed better on subsequent recall tests of spatial layout than did passive participants who viewed a corresponding display. On the other hand, passive participants were only marginally better than active participants on object memory. These results suggest that when passive observers attend to spatial properties, they learn the layout as well as active observers, but when they attend to objects, their layout learning suffers. In contrast, active explorers may attend to the spatial layout in order to successfully navigate through the environment even when instructed to attend to objects, so they acquire better spatial knowledge than do passive observers in that condition.

However, Wilson (1999) found no active advantage for spatial learning when attention was directed to objects. Wilson and Péruch (2002) pursued this issue further by instructing half of their yoked active/passive participants to attend to the spatial layout and the other half to attend to the objects in the environment. The object attention group recognized more objects than did the spatial attention group, and passive participants in the spatial group recalled fewer objects than did the three other groups. However, spatial tests of pointing and judging distance revealed no differences between any of the groups. The only effect on spatial learning was that the spatial attention group was better at drawing a map than was the object attention group; consistent with the authors’ original hypothesis, active participants were only marginally better than passive participants. These results cloud the picture further, leading Wilson and Péruch to conclude that findings of attentional influence on spatial learning are unreliable.

Taken together, there is no consistent evidence that directing attention to spatial layout or objects influences spatial learning, although it does appear to affect object learning. Part of the inconsistency may be due to the use of different measures of spatial knowledge: Tests of layout recall seemed to show an attentional effect (Attree et al., 1996; Brooks et al., 1999), whereas standard tests of survey knowledge, such as pointing and distance estimates, did not (Wilson, 1999; Wilson & Péruch, 2002). However, this failure to find an effect of attention on survey tasks in desktop VR is not particularly surprising. The acquisition of survey knowledge depends on metric information during learning, and the evidence we just reviewed indicates that it is provided by idiothetic information during walking. Desktop VR is thus an inherently inadequate paradigm in which to test the role of attention; we return to the question in the Attention and Incidental Spatial Learning section.

It remains possible that active exploration may provide an advantage because the greater interaction with the environment leads participants to attend to spatial layout, but as yet there is little support for this hypothesis. Thus, the active advantage during walking discussed in the previous section (Idiothetic Information During Walking) appears to be attributable to idiothetic information, rather than to increased spatial attention in the active condition.

Idiothetic information in spatial updating

The active/passive distinction has also become important in the recent literature on spatial updating. For present purposes, we will consider spatial updating to be the problem of keeping track of the spatial relations between the observer and a small array of objects as the observer moves around the environment. Spatial updating is closely related to the problem of path integration, but the experimental paradigms have important differences. In most spatial-updating tasks, the environment usually consists of a small array of objects that can be viewed all at once, and the task emphasizes maintaining the spatial relations among objects as one’s viewpoint changes. In contrast, in path integration tasks, the observer is typically embedded in a larger layout of objects that cannot be viewed simultaneously, and the task emphasizes keeping track of one’s position and orientation within that environment; this is typically assessed by judgments of the location of the observer’s starting point. Both spatial updating and path integration require measuring the distances traveled and angles turned by the observer and probably share common mechanisms of integrating information about self-motion. However, the tasks are sufficiently different that it is not clear whether experimental results transfer from one paradigm to the other.

It is important to point out a key difference between the spatial-learning and spatial-updating literatures. Whereas the active/passive question in spatial learning applies to movement during exploration and learning, in spatial updating it typically applies to movement after an object array has already been learned. There is no evidence that active movement while a small array of objects is examined aids spatial learning (provided that there is sufficient static information to specify the 3-D configuration). Participants allowed to freely walk around while learning a spatial array were no more accurate at later spatial judgments than were those who viewed the display from a single monocular viewpoint (Arthur, Hancock, & Chrysler, 1997), and free movement during learning does not preclude alignment effects (Valiquette, McNamara, & Smith, 2003). Thus, active/passive spatial updating is chiefly concerned with whether a known set of object relations is updated during locomotion. Given that spatial learning of a layout of objects presumably depends on keeping track of their positions as one moves about, evidence from spatial updating and path integration may have implications for spatial learning. In addition, some wayfinding tasks may appear on the surface to require path integration. However, we do not wish to assume that all wayfinding requires accurate survey knowledge derived from path integration or spatial updating. Alternative navigation strategies based on sequences of views, route knowledge, or the ordinal relationships among objects may be sufficient for many wayfinding tasks.

The question of active and passive spatial updating focuses on the relation between visual and idiothetic information. Rieser, Guth, and Hill (1986) initially reported that after a layout of objects is learned, pointing estimates from a novel location are faster and more accurate when participants physically move to that location than when they just imagine moving to it, regardless of whether the participants physically moved to one of the objects or to a random location in the room. While it is not clear whether the times for physical movement and imagined movement were equated, this result suggests that visual imagery and idiothetic information may be intrinsically coupled. When a self-rotation is imagined, pointing errors and latencies increase with the magnitude of the imagined rotation, just as they do with a physical rotation (Farrell & Robertson, 1998; Rieser, 1989). Conversely, when asked to ignore their movements after traveling to a new location, people make errors similar to those made when they imagine the movement (Farrell & Robertson, 1998; Farrell & Thomson, 1998). These results indicate that idiothetic information and the corresponding visual imagery cannot be easily decoupled, implying that visual spatial updating may automatically accompany physical movement.

Moreover, imagining or ignoring movement seems to be an effortful process. When forced to make their responses in these conditions quickly, people are prone to errors, whereas when given time to mentally update their position before responding, their performance is the same as when they physically move (Farrell & Thomson, 1998). In contrast, Waller, Montello, Richardson, and Hegarty (2002) found no difference in errors between participants who were asked to ignore a physical rotation and those who stayed in place. However, they did not measure response latencies, so participants who ignored the rotation may have had sufficient time to mentally realign their orientation. People who are blind from birth do not show this discrepancy between imagined and actual movement; they have poor performance in both cases (Rieser et al., 1986) but may also form spatial representations through other means that can at times be superior to those of sighted individuals (Afonso et al., 2010). In contrast, late-blind people show the same discrepancy as sighted individuals (Rieser et al., 1986). Thus, once the relationship between visual and idiothetic information for self-motion is acquired, the calibration appears to be long-lasting and functionally useful.

The evidence presented so far suggests that spatial updating is automatic with physical movement, but it is unclear exactly which components of idiothetic information are vital to this process or whether visual information is also sufficient. It is possible that some combination of visual, motor, proprioceptive, and/or vestibular information for self-motion is either sufficient or necessary for spatial updating. If vestibular information is sufficient, passively moving a person around an environment should yield accurate updating; if efferent control is necessary, performance will suffer. Féry, Magnac, and Israël (2004) tested this question by sitting participants in a rotating chair, giving them primarily vestibular input. Participants first learned an array of objects and then rotated through a prescribed turn. Those who controlled their rotation via a joystick had smaller pointing errors than did those who were passively turned, although the latter were not completely random. In this situation, it appears that having some measure of control over when the rotations start and stop, without deciding how far to turn, improves the accuracy of spatial updating. This result points to the importance of motor and/or proprioceptive information in spatial updating and also suggests a subsidiary role for vestibular input.

In contrast, Wraga, Creem-Regehr, and Proffitt (2004) found that motor efference added little to spatial updating beyond the contributions of vestibular input. Their participants sat in a swivel chair during learning and testing of objects in a virtual environment and either used their feet to turn or were passively turned by the experimenter. The active condition added little to either speed or accuracy of spatial updating. Wraga et al. also examined the effects of self-motion, as compared with display motion. In the same environments as above, participants either stood in place and rotated to learn the layout or stood still and used a joystick to rotate the virtual environment. In this case, the addition of vestibular and proprioceptive information in the self-motion condition led to shorter response times and fewer errors, as compared with visual and motor information alone in the display motion condition. An interesting difference between these two studies is that the objects in Wraga et al.’s displays surrounded the participant, such that only one object was visible at a time. In contrast, Féry et al. (2004) used a layout where all of the objects were learned from a single perspective.

While vestibular and proprioceptive information provide an advantage over visual information alone, there is some evidence that the latter might be sufficient for spatial updating. Riecke, Cunningham, and Bülthoff (2007) tested participants in a virtual replica of a well-known environment. Participants were rotated in place and then pointed to target locations. The researchers crossed two types of visual rotations—the full scene or just an optic flow pattern—with either a physical rotation or no physical rotation. While optic flow alone was not sufficient for spatial updating, the full scene including landmarks was sufficient for automatic and obligatory spatial updating, even without a physical rotation. In the case of a well-known environment, a rich visual scene may be enough for spatial updating to occur by means of view-based place recognition. It should be noted that the environment was learned while participants were walking or driving around the town, such that views could be related via idiothetic information. Rich visual information may be sufficient for spatial updating once the environment is learned, but these results do not address the question of whether visual information is sufficient for spatial learning.

The contributions of visual and idiothetic information have also been tested in studies of path integration (Harris, Jenkin, & Zikovitz, 2000; Kearns, Warren, Duchon, & Tarr, 2002; Loomis et al., 1993; Tcheang, Bülthoff, & Burgess, 2011). In a standard triangle completion task, participants walk on two prescribed outbound legs of a triangle and then are asked to turn and walk back to their starting point on the homebound leg. Klatzky, Loomis, Beall, Chance, and Golledge (1998) showed that turn errors on the homebound path are low when participants actively walk on the outbound path or when they actively turn for the rotation but only view optic flow during the translation on the outbound legs. However, participants who only received visual input or who imagined the outbound legs exhibited large turn errors, demonstrating the importance of idiothetic information. Allen, Kirasic, Rashotte, and Haun (2004) reported that when young adults were led blindfolded or were pushed in a wheelchair on the outbound legs, their performance was the same on the homebound path. Older adults, in contrast, suffered decreased performance in the wheelchair condition, when only vestibular information was available. These results indicate that vestibular information is sufficient—and motor and proprioceptive information not essential—for path integration in younger adults, whereas the latter are necessary in older adults, due to a loss in vestibular function with age. Kearns (2003) dissociated optic flow from idiothetic information during triangle completion by varying the visual gain in ambulatory VR. She found that idiothetic information accounted for about 85% of the response for both turns and distance on the homebound path, whereas optic flow accounted for about 15% of the response. Tcheang et al. also altered the visual gain during path integration tasks to determine the contribution of visual information. They were able to predict triangle completion errors made while participants were blindfolded after a visual adaptation paradigm using a multimodal integration model. In sum, it appears that motor, proprioceptive, and vestibular information all contribute to path integration, with visual information for self-motion playing a significant but lesser role.

So far, the evidence suggests a degree of automatic updating of spatial relations based on idiothetic information when a person walks around an environment. Such spatial updating would seem to be at odds with findings of viewpoint dependency in scene and object recognition. A number of studies have shown that learning a scene from one viewpoint and then making judgments about the scene from a novel viewpoint, either actual or imagined, impairs performance (e.g., Shelton & McNamara, 1997, 2001; Tarr, 1995). For example, Shelton and McNamara (1997) had participants learn an array of objects from two viewing directions and then asked them to make spatial judgments from several imagined orientations. Angular error and response latency for the learned orientations were significantly lower than those for other imagined orientations. These results support the notion that people have a viewpoint-dependent representation of spatial configurations, such that they have better access to scene information in familiar views. In some cases, viewpoint dependency may be overridden by the presence of an intrinsic reference axis or frame of reference in the environment (Mou, Fan, McNamara, & Owen, 2008; Mou & McNamara, 2002; Shelton & McNamara, 2001).

Spatial updating is relevant here because it could mitigate the limitations of viewer-centered spatial knowledge. As we have seen, observers are less accurate at making spatial judgments from a novel viewpoint. However, it is not clear whether this effect is due to a change in the orientation of the objects or a change in orientation of the viewer (Simons & Wang, 1998). If a person automatically updates his or her position during active self-motion, as suggested by Féry et al. (2004), he or she should have similar performance at new and learned viewpoints, provided that sufficient idiothetic information is available.

Simons and Wang (1998) probed this hypothesis by directing participants to learn an array of five objects from one viewpoint. They then divided the participants into two groups: The different-viewpoint group walked to a new viewpoint for testing, whereas the initial-viewpoint group walked around but returned to their initial position for testing. On half of the trials, the object array was constant, so the initial-viewpoint group received the same view as before, but the different-viewpoint group received a new view of the array. On the other half of the trials, the object array rotated such that the different-viewpoint group actually saw the same view of the array as they originally learned, while the initial-viewpoint group received a new view (participants were informed about the type of trial they were receiving). The participants’ task was to identify which of the objects had been moved to a different relative position. The initial-viewpoint group did very well when they saw the learned view of the array, but performance suffered when they saw the rotated array. In contrast, the different-viewpoint group had similar performance in both conditions, indicating that they could judge object relations from the learned view and were able to update their position to the new view. Without the corresponding idiothetic information, the initial-viewpoint group could not adjust. In another experiment, the participants in both groups were disoriented during the move to the test viewpoint. In this case, neither group performed as well with the new view as with the learned view, consistent with an idiothetic contribution to spatial updating.

One potential problem in the initial Simons and Wang (1998) paradigm is that the different-viewpoint group had information (in this case, idiothetic) about the magnitude of rotation and might have anticipated the new view by mentally rotating the array before test while they walked to the new viewpoint, whereas the same-viewpoint group did not have information about the magnitude of rotation. Thus, it is not clear whether the effect provides evidence for active spatial updating or simply mental rotation. It is necessary to show that the effect is stronger with active idiothetic updating than with other information, such as control of the movement, that would allow anticipation of the new view.

Wang and Simons (1999) subsequently explored the updating process in more detail. In one condition, participants controlled a lever that rotated the display, giving them information about the magnitude of rotation without physically moving to a new location. In the other condition, the experimenter controlled the lever, and participants merely viewed the display. There was no difference in performance between the two conditions, indicating that idiothetic information from physical movement plays the key role in spatial updating, not cognitive control of the rotation or information about its magnitude. Finally, performance was only marginally better when participants received the learned view at the initial viewpoint than when they were passively wheeled to a new view and viewpoint. These conditions were comparable to performance in earlier experiments in which participants actively walked to the new viewpoint. Thus, vestibular information appears to be sufficient for spatial updating, whereas motor and proprioceptive information are not essential. These results are consistent with those of Allen et al. (2004), who also reported the sufficiency of vestibular information for path integration in young adults, but are contrary to the findings of Féry et al. (2004), who found a greater contribution of motor efference for spatial updating during rotation.

Wang and Simons effectively demonstrated that spatial updating can occur when one actively moves around a display, but they used only small angular differences in viewpoint—47o (Simons & Wang, 1998) and 40o (Wang & Simons, 1999). Although they found no difference between the different-viewpoint/same-view and different-viewpoint/different-view conditions, both conditions showed somewhat reduced accuracy, as compared with the same-viewpoint/same-view condition. That is, it is possible that the updating achieved by active movement around the display was not complete. Motes, Finlay, and Kozhevnikov (2006, Experiment 2) used a similar task, requiring participants to actively move around a learned scene. They found that reaction time increased and accuracy suffered as the angular distance from the learned view increased, consistent with view-dependent scene recognition (Shelton & McNamara, 1997, 2001). They did not, however, include a group that remained stationary while the array moved, so it is difficult to determine whether the active group had complete or partial updating.

Other experiments failed to find an active updating effect. Using a more difficult scene recognition task with a greater time delay between encoding and testing, Motes et al. (2006, Experiment 1) found no advantage when the observer moved, as compared with when the array moved, with no nonidiothetic information about the magnitude of the rotation; if anything, participants responded faster when the array moved. Similarly, when using a viewpoint misaligned from the learned viewpoint, Roskos-Ewoldsen, McNamara, Shelton, and Carr (1998) found no difference between participants who were passively wheeled to a new location and knew their orientation in the room and participants who were disoriented while being wheeled to the new location; both groups had higher errors than those tested from the learned viewpoint. On the other hand, Waller et al. (2002) reported evidence of active updating. They found no view-based alignment effects when people actively walked to a misaligned viewpoint, but the effects reappeared when participants were disoriented. They also obtained evidence that rotating in place alters the orientation-specific representation of the layout. Finally, Teramoto and Riecke (2010) found that the dynamic visual information obtained from movement provides as much information as does physical movement during a virtual object recognition task. Seeing the whole environment move was found to produce performance equivalent to that when walking to a new viewpoint, suggesting that visual information might be sufficient for spatial updating during object recognition.

On balance, the literature is generally consistent with the occurrence of spatial updating during active movement. Thanks to an established calibration between idiothetic and visual information for self-motion, active movement produces coordinated updating of viewer-centered object locations and visual imagery and tends to reduce view-dependent alignment effects. A couple of dissenting reports suggest that spatial updating may be compromised by larger rotations or more difficult recognition tasks. Active updating is clearly based on idiothetic information, although there are conflicting results about whether vestibular information is sufficient or whether motor and proprioceptive information are necessary. There are some suggestions that visual information for place recognition or self-motion may be sufficient for spatial updating under certain conditions.

Conclusions: Idiothetic information, decision making, and attention

In this section, we have examined the contributions of idiothetic information, decision making, and attention to spatial learning, primarily using VR techniques. The pattern of evidence reviewed so far indicates that idiothetic information during walking plays an important role in active navigation, a pattern that is generally consistent across the spatial-learning, path integration, and spatial-updating literatures, with some exceptions. In principle, idiothetic information could help an explorer keep track of his or her position and orientation and relate the spatial locations of objects as they walk around the environment. In contrast, there is little evidence that making decisions about one’s path or attending to the spatial layout (as opposed to isolated objects) during exploration makes a contribution to spatial learning. However, these conclusions must be regarded as preliminary because the available evidence is limited and inconsistent.

One important limitation is that studies of decision making and spatial attention discussed so far have been done in desktop VR, which has failed to yield reliable evidence of any active advantage in spatial learning, whereas most studies of idiothetic information have been done using prescribed routes in ambulatory VR. An exception is a recent study by Wan, Wang, and Crowell (2010), who found no evidence that path choice improved path integration in the presence of full idiothetic information. However, the authors did not examine its influence on the resulting spatial knowledge. Thus, there is no research investigating the contribution of these three components to spatial learning in the same experimental paradigm, especially regarding route knowledge. As a consequence, possible additive effects or interactions between them remain unexamined. Further studies in ambulatory environments are needed to investigate whether decision making and spatial attention contribute to spatial learning when normal idiothetic information is also available.

Second, we point out that the spatial-learning literature has focused primarily on metric survey knowledge, as opposed to weaker route, ordinal, or topological knowledge. In most cases, the research involves survey tasks such as standing (or imagining standing) at one location and pointing to other locations or making distance judgments between locations. These tasks probe metric knowledge of the environment, which appears to depend on the metric information provided by the idiothetic systems. This test of metric knowledge might explain the dependence of an active advantage on idiothetic information. Only a few studies have tested other tasks that could be based on weaker spatial knowledge (e.g., Grant & Magee, 1998; Hazen, 1982; Péruch et al., 1995; Ruddle et al., 2011b; Wilson et al., 1997; Wilson & Péruch, 2002). For example, Hazen reported better route finding by children who had freely explored than by those who were led by their parents, suggesting a role for decision making in route knowledge. Similarly, making decisions about exploratory behavior has also been found to enhance other types of spatial memory (Voss, Gonsalves, Federmeier, Tranel, & Cohen, 2011). Thus, whether there is an active advantage in learning weaker forms of spatial knowledge, and the components on which it depends, remain a largely unexplored question.

A third limitation is that the studied environments, both real and virtual, have varied widely in size. There is some evidence that spatial abilities at different scales are partially, although not totally, dissociable (Hegarty, Montello, Richardson, Ishikawa, & Lovelace, 2006). The spatial updating literature relies primarily on arrays of objects on a tabletop, and path integration research typically covers a few meters, whereas spatial learning research has used room-, building-, or campus-sized environments. The main concern is that small object arrays can be seen simultaneously and spatial updating requires information only about self-rotation, whereas larger, more complex environments cannot be viewed all at once and require more sophisticated path integration to relate objects and views. As a consequence, spatial updating focuses on active movement after an object array is learned, while studies in larger environments focus on active movement while a spatial layout is learned.

Despite the varying methods, scales, and extents, some common themes emerge. There is evidence that under certain circumstances, rich visual information is sufficient for spatial updating, but it is also clear that optic flow alone is not sufficient. Most important, all three literatures appear to demonstrate a role for idiothetic information. Presumably, this advantage occurs because spatial updating and path integration depend on similar mechanisms of self-motion perception, and path integration is important for the acquisition of survey knowledge in larger environments.

Attention and incidental spatial learning

We now focus more directly on the cognitive dimensions of active spatial learning. We begin by examining the role of attention. This section will investigate what aspects of the environment can be learned passively, without much attentional deployment, and what aspects do require attention. The research reviewed thus far offers little support for a contribution of attention to active spatial learning. In those experiments, however, attention was manipulated by explicitly instructing participants to study the spatial layout or environmental objects. In this section, we review two other paradigms in an attempt to clarify the role of attention in spatial learning. First, we examine the literature on intentional and incidental learning, in which attention is manipulated by varying the participant’s awareness of an upcoming test or by employing interference tasks during learning. Second, we consider research that uses orienting tasks to direct attention more narrowly to specific aspects of the environment. In both cases, we examine the effects of these manipulations on acquiring different types of spatial knowledge, including landmark, route, and survey knowledge.

Incidental and intentional learning of spatial information

Consider the possible effects of the observer’s intentions on spatial learning. If learning the environmental layout is facilitated by active attention, explorers who are informed that they will be tested on the layout and intentionally learn it may perform better than if they are not informed of the upcoming test. On the other hand, if spatial properties are acquired automatically and learning is incidental, the awareness of the test should not make a difference.

An early experiment by Lindberg and Garling (1983) investigated whether survey knowledge was automatically encoded as observers were guided on a route through a real environment. Estimates of straight-line distances and directions showed no differences in errors or latencies between intentional- and incidental-learning groups. However, the incidental group was taken through the route three times while the experimenters pointed out the reference locations. Given these demand characteristics, it seems likely that they may have inferred the purpose of the study and paid attention to spatial information, leading them to perform like the intentional group. In addition, distance and direction estimates improved in both groups with increased exposure to the environment, suggesting an effortful process. The results thus do not support incidental learning of survey knowledge and may even imply the opposite.

Van Asselen, Fritschy, and Postma (2006) investigated the intentional and incidental encoding of route knowledge. Half of their participants were told to pay attention to the route they took through a building because they would be tested on it later. The other half were told only that they needed to go to another room in the building, giving them no reason to pay particular attention to the route. The intentional-encoding group more accurately filled in the route on a map of the building and made fewer errors when reversing the route on foot than did the incidental-encoding group. Interestingly, the two groups were equally good at identifying landmarks and putting those landmarks in the correct temporal order. In this case, it appears that learning a route is not an automatic process, whereas acquiring some landmark and ordinal knowledge may require less effort. In this paradigm, however, it is possible that participants in the incidental-encoding group attended to such environmental properties even without knowledge of the upcoming test, making null results for landmark learning difficult to interpret.

Other evidence from interference tasks suggests than some attention is required to learn even simple elements of a route, such as the sequence of landmarks and landmark–action associations. Albert, Reinitz, Beusmans, and Gopal (1999) instructed their participants to learn a route from a video display. Those who performed a difficult verbal working memory task while watching the videos were less proficient at putting landmarks in the correct order than were those who were allowed to fully attend to the video. The distractor task also interfered with associating landmarks with the appropriate turns on the route, learning the spatial relationships between landmarks, and even associating landmarks with the correct route. Similarly, Anooshian and Seibert (1996) found that intentional learners who performed a visual imagery task while viewing a route were more likely to make errors in assigning snapshots of scenes to the correct route. These interference tasks appear to affect conscious recollections, not measures of familiarity (Anooshian & Seibert, 1996). Incidental memories may thus provide a sense of being familiar with landmarks, but they are not sufficient to guide the navigator through a route; one may have a sense of having been at a place before but have no idea which direction to turn or where that place fits into a route or spatial layout.

However, two notes of caution must be sounded before concluding that acquiring the elements of route knowledge requires attention. First, these two studies (Albert et al., 1999; Anooshian & Seibert, 1996) relied on videos to present the routes, so participants did not have access to the idiothetic information that appears to be important for spatial learning. Second, both reports used distractor tasks, which not only interfere with attention, but also place a high demand on working memory. More on the topic of working memory loads will be discussed in the Working Memory and Spatial Learning section.

Incidental encoding of small-scale spatial layouts has also been examined, with mixed results. In children, intentional learning of an object array proves to be no better than incidental learning, suggesting that spatial information may be acquired with little effort (Herman, Kolker, & Shaw, 1982). In adults, alignment effects have also been reported with both intentional and incidental learning of the layout of objects in a room. In the incidental condition, these effects indicate that participants learned the layout from one or two viewpoints, which tend to be aligned with their initial orientation or with the walls of the room (Valiquette et al., 2003). Explicit instructions to intentionally learn the layout also lead to alignment with the walls of the room. Strong reference axes may influence both intentional and incidental learning of a layout (Mou & McNamara, 2002).

On the other hand, intentional learning appears to improve performance when the task is to reproduce the layout by placing the objects on a table immediately after viewing, rather than to make spatial judgments from imagined positions (Rodrigues & Marques, 2006). When the reproduction task is delayed for several minutes, the performance of the incidental group suffers, while the intentional group remains fairly accurate. Participants in incidental and intentional conditions also appear to have different memorization strategies: Intentional learners focused on the locations of the objects, whereas incidental learners tried to remember the object names. These results suggest that spatial information can be learned briefly when attention is focused elsewhere but cannot be retained over time.

Intentional learning may be based on associative-reinforcement mechanisms, whereas incidental learning can occur without reinforcement. Reinforcement learning “blocks” or “overshadows” later learning, so learning a new piece of information interferes with future learning. In contrast, incidental learning does not act as a blocker and, thus, does not prevent future learning. Doeller and Burgess (2008; Doeller, King, & Burgess, 2008) observed blocking when people performed tasks that emphasized learning the relationship between objects and a landmark, but not when the tasks emphasized learning the relationship between objects and an environmental boundary. These findings imply that spatial relations among landmarks must be intentionally encoded, whereas spatial relations with boundaries are learned incidentally. Thus, not only do local features, like landmarks, help in learning a layout, but intentional processing of those relations among features leads to greater spatial learning. Global environmental features, such as boundaries, are not explicitly “associated” with object locations but appear to be acquired more automatically.

Before concluding, it should be noted that even without explicit instructions to attend to the spatial environment, participants in these studies may still allocate attention to the spatial layout. They may be inherently inclined to attend to spatial properties, or the demand characteristics of the experiment may lead them to do so. Acknowledging such concerns, these studies suggest that explorers learn limited properties of landmarks and routes incidentally, without explicit attention. However, full route knowledge and survey knowledge appear to require the intention to learn, implying the need for attention to the relevant spatial relations. Specifically, incidental encoding allows the observer to identify landmarks, their relation to boundaries, and, in some cases, their sequential order, although there is conflicting evidence on this point. On the other hand, intentional encoding appears to be necessary for place–action associations, reproducing a route, and spatial relations between landmarks. For small-scale spatial layouts that do not require as much exploration and integration, there appears to be little difference between incidental and intentional learning, although the latter may lead to more long-term retention.

Differential attention to environmental properties

Another paradigm for investigating the role of attention in spatial learning is to manipulate the prior information or the orienting task that is presented to the participant. These manipulations aim to direct attention to particular aspects of the environment and appear to influence whether places, sequences of landmarks, routes, or survey knowledge are acquired during learning. This strategy assumes that some attention is actively allocated, but only to specific aspects of the environment.

The type of information about the environment that is presented prior to learning can push participants toward encoding particular aspects of the layout. For example, Magliano, Cohen, Allen, and Rodrigue (1995) gave their participants information on the landmarks, route, or overall layout before they viewed a slideshow of a route, with instructions to learn that information. All groups were able to recognize landmarks from the route and to put landmarks in their correct temporal order. In survey tasks, the controls who received no additional information performed better than those given landmark information, indicating that there is a cost associated with being given information that is inappropriate for a survey task. Despite being able to put landmarks in sequential order, the control and landmark groups performed poorly when asked to give directions for the route, indicating that they did not associate actions with particular landmarks. These results are consistent with findings discussed earlier (e.g., Magliano et al., 1995; van Asselen et al., 2006) that some landmark and ordinal knowledge is acquired without much effort but that full route and survey knowledge requires attention, additional information, or active manipulation of that information.

Directing attention to different features of the environment by manipulating the orienting task during learning can also influence the type of spatial knowledge that is acquired. For example, when instructed to only learn locations in the environment, participants encode sequences of landmarks without much effort but appear to have difficulty placing landmarks in context, including the appropriate action to take at a landmark and the spatial relations among landmarks, suggesting a potential dissociation between place knowledge, on the one hand, and route and survey knowledge, on the other (e.g., Albert et al., 1999; Magliano et al., 1995; van Asselen et al., 2006). However, attention to landmarks at the expense of turns while a route is learned can also impair the ability to put landmarks in sequential order, particularly in older adults (Lipman, 1991). Tracking a particular landmark over time also adversely affects the acquisition of survey knowledge, as tested by placing locations on a map that contains two given reference points (Rossano & Reardon, 1999). This task led participants to encode locations with respect to the tracked landmark at the expense of accurately encoding them with respect to each other.

To compare learning of places and actions, Anooshian (1996) guided participants along a route that contained simulated landmarks (photographs of, e.g., a fire station), while instructing them either to anticipate place names or to learn turns on the route. For the place group, the landmarks were visible on the first walk through the route but were covered on the three subsequent walks, and participants were tested on their memory for the location each time. For the turn group, the landmarks were visible each time through the route, and participants were tested on what action they needed to take at each landmark. Interestingly, the place group was not only better at later recalling landmarks as they walked the route, but also better at naming the next landmark in the sequence and pointing to landmarks from new positions. While this result might seem surprising, the orienting task required the place group to learn the upcoming landmark at the next location, so they acquired the landmark sequence and, apparently, some configurational knowledge. In contrast, the action group simply had to associate the current landmark with an action, without anticipating the next landmark. These results suggest that attending to the sequence of places on a route (with idiothetic information) can lead to greater survey knowledge than can attending to place–action associations, the basis of route knowledge.

Other evidence also indicates that the orienting task can influence whether route knowledge or survey knowledge is acquired. For instance, day-to-day use of a building typically involves repeatedly traversing a few familiar routes. Moeser (1988) found that nurses who worked in a complex hospital building did not acquire survey knowledge of the building even after 2 years of experience. This finding suggests that the daily task of following known routes does not inexorably lead to survey knowledge, contrary to Siegel and White’s (1975) well-known hypothesis that landmark knowledge is learned first, followed by routes, and that survey knowledge eventually emerges. In contrast, Taylor, Naylor, and Chechile (1999) found that experimentally manipulating the orienting task influences the spatial knowledge that is acquired. Participants given the goal of exploring a complex building to learn the quickest routes through it were better on later tests of route knowledge than were those instructed to learn the building’s layout. However, the opposite effect was not observed in this case: Two groups performed equally on tests of survey knowledge, presumably because the route-learning group had explored the building widely to find efficient routes. Finally, participants who learned the building by studying a map tended to show an advantage on both route and survey tests over those who learned it by walking in the environment (see also Thorndyke & Hayes-Roth, 1982, for comparisons of map and route learning without orienting tasks).

There also appear to be important individual differences in learning survey, as well as route, knowledge (Wolbers & Hegarty, 2010). After 24 participants were driven through two connected routes, Ishikawa and Montello (2006) found that 17% of them had relatively accurate survey knowledge after one exposure, and only another 25% achieved accurate survey knowledge after ten exposures (where “accurate” is rather liberally defined as an absolute pointing error less than 30°). Only half of the participants improved their survey knowledge over time, again contrary to Siegel and White’s (1975) hypothesis.

Older adults appear to have difficulty retracing a route (Wilkniss, Jones, Korol, Gold, & Manning, 1997), putting scenes from the route in the correct order, and selecting the most informative landmarks for navigation (Lipman, 1991). Rather than paying attention to landmarks that are relevant to finding the route, they appear to attend to those that are most perceptually salient. Likewise, children tend to select highly noticeable but spatially uninformative landmarks (Allen, Kirasic, Siegel, & Herman, 1979). They are, however, able to navigate the route well when given helpful landmarks. These results indicate that the ability to navigate a route successfully is related to the ability to attend to relevant landmarks and not be distracted by other salient objects. Verbal information about landmarks at decision points has proven to be most informative when a route is followed (Denis, Pazzaglia, Cornoldi, & Bertolo, 1999), suggesting that attention to and selection of informative landmarks is crucial to successful route navigation.

In sum, while certain environmental features may be learned automatically, the evidence indicates that acquiring route and survey knowledge depends on the intention to learn or the orienting task and, by implication, the deployment of attention. Rather than progressing through a regular sequence of place, route, and survey knowledge, the type of spatial knowledge that is acquired depends on the task demands during learning. Landmarks, landmark–boundary relations, and, to some extent, sequences of landmarks appear to be acquired incidentally, regardless of the task. In contrast, the selection of informative landmarks, place–action associations, and spatial relations among landmarks appears to depend on tasks that direct active attention to the corresponding environmental properties. Given that metric survey knowledge also depends on the presence of idiothetic information during learning, this may explain the failure to find reliable effects of attention in desktop VR (see the Idiothetic Information During Walking section). Thus, the present findings indicate that the control of attention, in combination with idiothetic information, is an important component of active exploration.

There are still many open questions involving attention and spatial learning. Attention may interact with the other components of active exploration by, for example, modulating the contribution of idiothetic information or playing a mediating role for cognitive decision making. The implications of such possible interactions have yet to be studied. In addition, the limits of spatial attention have not been investigated. It may be possible to learn multiple aspects of the environment when directed to attend to both route and survey information. On the other hand, there may be a limit to attentional capacity that leads to depressed performance on both.

Given that attention influences the acquisition of route and survey knowledge, this implies that the relevant spatial information must be encoded in working memory. Thus, we next address the role of working memory in spatial learning.

Working memory and spatial learning

Attention appears to contribute to the encoding of certain aspects of the environment, but it remains to be seen how that encoding takes place. Some environmental information can be encoded without a major investment of attention, such as landmarks and landmark sequences, but other information may be difficult to encode even with full attentional resources, such as metric spatial relations. Thus, in this section, we discuss the role that particular components of working memory play in encoding different types of spatial information. Working memory may be considered a part of active learning, especially when active manipulation or transformation of the spatial information is required. Working memory also affects how and where attention is allocated. As we saw with attention, working memory appears to contribute to spatial learning in a variety of ways, depending on the component of working memory involved, the particular spatial information, and whether the information is actively transformed or is simply maintained.

The interference paradigm

The main experimental framework in the literature on working memory is an interference paradigm, in which distractor tasks designed to interfere with specific working memory processes are used to investigate how different spatial properties are encoded. It is thus important to distinguish two factors: (1) the aspect of the environment that is to be encoded and (2) the type of working memory process involved. The former refers to the information that is to be acquired by the observer, such as landmark information, route knowledge, or survey knowledge. The latter refers to whether that information is encoded via verbal, visual, or spatial working memory, or some combination thereof. Distractor tasks are designed to interfere with one or more of these functional processes during the learning phase. The resulting knowledge of the environment is probed during the test phase, although distractors can also be used to interfere with retrieval of information at test. The disruption of one type of encoding may thus impair the acquisition of a particular environmental property but not others, revealing something about how they are encoded. For example, a spatial interference task may inhibit the encoding of survey knowledge without disrupting the acquisition of route knowledge. In this section, our aim is to identify such relationships between types of working memory and forms of environmental knowledge.

We should be clear at the outset that the understanding of working memory continues to develop, and we are not committed to a particular framework. Our goal is merely to use current theory to see whether it will yield insights into spatial learning. Working memory is typically broken down into multiple functional subunits (Baddeley, 2003; Logie, 1995). These are thought to include verbal and visual-spatial working memory, where the latter includes visual and spatial components. In addition, the spatial component is often divided into sequential and simultaneous processes, which appear to be independent of each other (Pazzaglia & Cornoldi, 1999). Researchers often test visual-spatial working memory using the Corsi block test (Pazzaglia & Cornoldi, 1999; Zimmer, 2008). Beginning with a random layout of blocks, the experimenter points to a sequence of blocks, and the participant must then repeat the sequence. This task contains a high degree of sequential information; the participant must not only tap the appropriate blocks, but also do so in the correct order. Thus, this particular test of visual-spatial abilities involves both spatial and sequential aspects of working memory. Verbal working memory might also play a role in acquiring spatial information if the participant encodes a route using verbal directions, for example, or if spatial information is presented in the form of text. A verbal interference task may probe the degree to which an observer verbally encodes spatial information.

As is discussed in the Incidental and Intentional Learning of Spatial Information section, secondary tasks do not appear to interfere with the encoding of certain types of information, such as places or landmarks. On the other hand, both verbal and visual-spatial interference tasks disrupt the encoding of route information, including assigning landmarks to the correct route and putting them in sequential order, as well as learning spatial relationships (Albert et al., 1999; Anooshian & Seibert, 1996), and may also distract from path integration (Tcheang et al., 2011). Similarly, a verbal shadowing task impairs selecting scenes from a route, making relative distance judgments, and verifying the route on a map (Allen & Willenborg, 1998). These results indicate that people use some sort of verbal strategy to help encode route information when passively watching a video or slides. However, it is less clear whether this is the case when they are actively exploring the environment.

Encoding spatial texts

An important limitation of the literature on working memory in navigation is that the majority of research has used spatial descriptions as stimuli. Although such studies may illuminate learning from directions or other verbal descriptions, they are less informative about ordinary spatial learning from exposure to the environment. Just as with desktop VR, the spatial text paradigm is likely to be supplanted by more immersive studies as they become available. However, given the dearth of working memory research in which participants are immersed in a real or virtual environment, studies using spatial text currently provide some of the only evidence on working memory and spatial learning. One point of contact is that both spatial text and route learning tap into the sequential aspects of spatial working memory. Most spatial descriptions proceed through a route and avoid cumbersome descriptions of distance and orientation relationships. Similarly, much immersive spatial learning is achieved by traversing routes from place to place, so spatial descriptions may bear some similarity to route learning.

Visual-spatial working memory appears to be key for learning spatial texts (De Beni, Pazzaglia, Gyselinck, & Meneghetti, 2005; Gyselinck, De Beni, Pazzaglia, Meneghetti, & Mondoloni, 2007; Gyselinck, Meneghetti, De Beni, & Pazzaglia, 2009), especially during encoding (Pazzaglia, De Beni, & Meneghetti, 2007). While concurrent verbal tasks disrupt spatial texts, concurrent spatial tasks disrupt spatial texts more than verbal tasks do (Pazzaglia et al., 2007). Concurrent spatial tasks interfere during both encoding and retrieval (Pazzaglia et al., 2007), while measures designed to interfere with central executive processing interfere with encoding only (Brunyé & Taylor, 2008). Thus, it appears likely that verbal and executive functions are involved with encoding spatial memories from texts but that visual-spatial working memory plays a larger role in both encoding and retrieving spatial descriptions.

Deeper understanding of the relationship between working memory and spatial learning comes from investigating the subunits of visual-spatial working memory. In order to distinguish the various components, Pazzaglia and Cornoldi (1999) created four different interference tasks, designed to probe verbal, visual, spatial-sequential, or spatial-simultaneous aspects of working memory during encoding of four types of texts. They found that spatial-sequential working memory contributes the most to learning sequential information, while verbal encoding plays a role in learning spatial-simultaneous information, possibly indicating that participants are verbalizing the simultaneous information. Pazzaglia and Cornoldi then investigated how the same three visual-spatial distractors interfered with the encoding of texts that emphasized route, survey, or visual knowledge of the environment. The authors expected that, if such information is encoded via separable subsystems of visual-spatial memory, the interference tasks would interfere with the corresponding spatial information. The results show that the sequential task, indeed, impaired recall of the route description but also interfered with the survey and visual texts. This finding could be due to the inherently sequential nature of verbal descriptions: Spatial information in the texts was presented serially. It thus appears that spatial-sequential interference disrupts the encoding of both route and survey information from text, in contrast to spatial-simultaneous memory, which did not interfere with any spatial information. However, it is possible that this effect is attributable to the sequential nature of texts or that the spatial-simultaneous task may have been too easy to produce comparable interference.

The results for maps complement those for texts. Coluccia, Bosco, and Brandimonte (2007) asked participants to perform two types of interference tasks while studying a map and then to draw a sketchmap of locations and roads. They found that tapping a spatial pattern interfered with learning both route and survey knowledge, while a verbal interference task did not affect either one. However, as we will see, other evidence suggests that verbal tasks do interfere somewhat with acquiring route knowledge in the real world when the route is experienced sequentially (Garden, Cornoldi, & Logie, 2002). Displaying the spatial layout simultaneously allows participants to encode survey and route information via spatial-simultaneous working memory, whereas presenting visual or textual information sequentially invokes verbal and spatial-sequential working memory to encode route and survey knowledge.

In sum, there appear to be multiple components of working memory involved in encoding spatial information from textual descriptions. First, verbal working memory appears to play a role in encoding route information that is presented sequentially in text. Second, spatial working memory is also involved when spatial information is described in text. However, on the basis of this evidence, it is difficult to conclude that spatial-sequential working memory is normally invoked when encoding route and survey knowledge, because textual stimuli are inherently sequential. For the same reason, one cannot infer that spatial-simultaneous working memory is normally uninvolved in the acquisition of route and survey knowledge; it is clearly involved when such information is presented simultaneously in the form of a visual map.

Working memory during immersive spatial learning

Let us turn to the few studies that have investigated working memory during “eye-level wayfinding,” or immersive spatial learning in a real or virtual environment. As was previously observed with slide and video sequences (Albert et al., 1999; Allen & Willenborg, 1998; Anooshian & Seibert, 1996), these studies confirm that verbal encoding plays a role in the acquisition of route knowledge. Participants who learn a route by viewing a display of a virtual environment (Meilinger, Knauff, & Bülthoff, 2008) or by walking through an actual town (Garden et al., 2002) while performing a secondary lexical decision task make errors on subsequent attempts to follow the same route. Ordinal information for a route might be verbally encoded as a series of place names with associated left or right turns. Given the ample research on dual coding of information, it may not be surprising that a route may be encoded verbally as well as visuospatially (Meilinger et al., 2008).

Regarding spatial interference with route learning, at first glance the results for immersive learning seem to contradict those for spatial texts. Pazzaglia and Cornoldi (1999) reported that a sequential spatial task interfered with route learning from text, whereas a simultaneous spatial task did not. In contrast, both Garden et al. (2002) and Meilinger et al. (2008) found that a spatial distractor interfered with route learning in an immersive environment. However, Garden et al.’s interference task called for participants to tap a spatial pattern in a particular order; this can be considered a sequential spatial task and, so, is consistent with Pazzaglia and Cornoldi’s results. But the auditory interference task used by Meilinger et al. required participants to identify the direction from which a sound was coming—a simultaneous spatial task, which nonetheless interfered with route learning. This apparent inconsistency may be reconciled in the following way. The auditory spatial task required participants to respond to tones to their left, right, or front by pressing one of three buttons, and thus the spatial direction of the tone might have been verbally encoded. Given that verbal distractors interfere with route learning, this distractor task may also have done so. This general pattern of results points to a role for both verbal working memory and spatial-sequential working memory in the encoding of route knowledge.

Mental manipulation of spatial information

Returning to the theme of active and passive spatial learning, a distinction has recently been introduced between active and passive working memory tasks (Bosco, Longoni, & Vecchi, 2004). Passive tasks involve memorizing spatial information, while active tasks require manipulation or transformation of that information. An example of a simultaneous active task is one in which participants must solve a jigsaw puzzle by reporting the number of the correct piece without actually moving the pieces, thus requiring mental rotation and comparison. On the other hand, the Corsi block task, in which participants must repeat a sequence of blocks, is a sequential passive task. Bosco et al. found that performance on both of these tasks correlates with the ability to learn landmark, route, and survey knowledge from studying a map, as measured, respectively, by landmark recognition tasks, route recognition and wayfinding tasks, and map completion and distance judgments. This correlation with both active and passive tasks holds especially for men, while active tasks were better predictors of women’s spatial learning abilities. In addition, when an environment was learned from a map, both active and passive simultaneous working memory ability was related to survey knowledge of landmark locations and road connections (Coluccia et al., 2007). This result is not surprising, however, because a map provides simultaneous information about the layout. Thus, active manipulation in working memory does not appear to make a strong contribution to active spatial learning, but it does deserve further investigation in light of the observed gender difference.

The tasks considered so far have been designed to interfere with elements of working memory to test their role in spatial learning. Alternatively, one might approach the same question by investigating whether specific active working memory tasks facilitate aspects of spatial learning. For example, instructions conducive to mental imagery, such as imagining oneself in the environment described in a spatial text, have been shown to improve performance on a sentence verification task more than does just repeating the previous sentence in the text (Gyselinck et al., 2007; Gyselinck et al., 2009). This finding seems to be consistent with enhancement by active, as opposed to passive, working memory tasks.

Few studies have directly tested whether learning is enhanced by active manipulation of spatial knowledge. It is known that giving people advanced information, such as a map or the route they will encounter, improves learning (e.g., Magliano et al., 1995). Münzer, Zimmer, Schwalm, Baus, and Aslan (2006) found that participants who actively used a map to follow a route in a real environment were better at retracing the route and placing landmarks than were those who viewed the route on a map and then received visual or auditory instructions about which way to turn. The first group had to interact with the environment to figure out the turns, requiring them to manipulate spatial information in working memory. This type of mental manipulation might contribute to an active advantage in spatial learning. Active mental manipulation may also interact with the other cognitive components of active navigation that we outlined in the introduction: making navigational decisions and allocation of attention.

In sum, the existing evidence suggests that different elements of working memory may be involved in particular aspects of spatial learning. A consistent result is that verbal working memory seems to play a role in encoding route information, whether it is presented via text, slide sequences, passive VR, or walking with idiothetic information. Similarly, spatial-sequential working memory also appears to contribute to the encoding of route knowledge from both text and VR displays. However, the relationship between the components of working memory and the acquisition of survey knowledge is unknown, due to a dearth of pertinent research. Most existing experiments are based on spatial descriptions that present survey information sequentially and thus, not surprisingly, invoke spatial-sequential working memory; analogously, spatial-simultaneous working memory is invoked when survey knowledge is encoded from a map. Systematic exploration of working memory components in survey learning during walking in immersive environments is needed. Finally, the distinction between active manipulation and passive storage of spatial information in working memory opens up a potential avenue for research. Initial results suggest that mental manipulation of spatial information may contribute to active learning of both route and survey knowledge, but more work with immersive spatial learning is called for.

Conclusions and future directions

We began with Appleyard’s (1970) original observation that bus drivers acquire better survey knowledge of a city than do their passengers, who acquire only route knowledge. This intuition raised a number of questions about active and passive contributions to spatial learning, to which, despite the limitations of the existing literature, we can offer some preliminary answers.

First, consistent with our first hypothesis, idiothetic information contributes to spatial updating in small-scale environments, to path integration in large-scale environments, and to spatial learning of survey knowledge. It may require a sufficiently complex path or repeated exposure for idiothetic information to reveal its effect, and several studies did not control for field of view and free head movements. Nevertheless, a core set of findings demonstrates an influence of idiothetic information on spatial learning. Motor and proprioceptive information, and perhaps vestibular information, appear to play a role, although their relative contributions remain to be determined. This conclusion is consistent with the theoretical claim that survey knowledge is derived from information about metric distances and turn angles along a traversed route—the sort of information provided by idiothetic systems. There is preliminary evidence that motor and proprioceptive information also contribute to route knowledge, perhaps by better specifying the action (turn angle and distance) in each place–action association. The role of idiothetic information in acquiring weaker topological and ordinal knowledge remains to be investigated.

Second, seemingly at variance with our second hypothesis, there is little evidence that making decisions about one’s path during exploration, in the absence of idiothetic information, contributes to spatial learning. Any effects of choosing a route, as opposed to following a prescribed route, are small, unreliable, and vulnerable to minor procedural differences. However, research on this topic has tested the acquisition of survey knowledge only in desktop VR, so it is not surprising that performance is poor. Thus, the hypothesis that decision making is sufficient for route learning remains to be tested. It also remains possible that decision making contributes to survey learning in combination with idiothetic information, but this question must be investigated during walking in real or virtual environments.

Third, consistent with our third hypothesis, the allocation of attention to relevant environmental properties clearly contributes to the acquisition of route and survey knowledge. Whereas landmarks, their relation to boundaries, and possibly landmark sequences appear to be encoded incidentally, landmark–action associations and spatial relations among landmarks require the intention to learn, implicating attention. Directing attention to place–action associations facilitates route learning at the expense of survey learning, whereas directing attention to configural relations facilitates survey knowledge but may not impact route knowledge. It is important to note that research in desktop VR has produced little evidence that attention to layout, as opposed to objects, influences survey learning. The absence of idiothetic information in desktop VR may have masked the contribution of attention, making the presence of idiothetic information necessary to test the contribution of attention to the acquisition of survey knowledge.

Finally, spatial learning is influenced by the information that is encoded in working memory. The interference paradigm has provided evidence that verbal and spatial-sequential working memory are involved in route learning, regardless of whether the mode of presentation of route information is verbal or eye-level visual. In contrast, spatial-simultaneous working memory is implicated in the encoding of survey knowledge from visual maps, but otherwise the working memory components involved in survey learning are unknown. Further research based on ambulatory environments, rather than spatial texts, are needed for progress in this area. In addition, some promising results suggest that active manipulation of spatial information in working memory may enhance spatial learning.

In sum, there appears to be a reliable active advantage in spatial learning, such that it is task dependent. Walking humans acquire place, landmark, route, and/or survey knowledge, depending on their goals, and can modulate their learning by attending to and encoding different environmental properties. A complex navigation task may tap into some subset of this knowledge, depending on the goals of the navigator. Such spatial learning involves many cognitive processes that interact to varying degrees depending on the task, on the deployment of attention and working memory, and active mental manipulation of spatial information.

However, many questions remain about the components of active learning in spatial navigation. The tentative conclusions presented here must be tested in a more rigorous and systematic fashion in ambulatory environments with full idiothetic information. In particular, it is critical to determine whether influences of decision making and attention are dependent on the presence of idiothetic information during learning. Moreover, the underlying neural bases of active and passive spatial learning are relatively unexplored. Although there is a large body of work on the neural correlates of landmark, route, and survey learning, which we review in a companion article (Chrastil, 2011), there is little research that directly addresses the correlates of active and passive learning in humans. One major obstacle is that physical movement is severely restricted by most neuroimaging techniques, whereas we argue that work in full ambulatory environments is needed to understand the contributions of decision making and attention. Imaging studies may still inform our understanding of nonmotor active learning, but their limitations must also be acknowledged.

Despite these gaps, the groundwork has been laid for a better understanding of active and passive spatial learning. Idiothetic information during active walking is important for the acquisition of metric survey knowledge. Active attention selects landmark sequence, route, or survey information to be encoded in the relevant units of working memory. For a more comprehensive picture of spatial learning, the systems in this interconnected network must be considered in relation to one another as they work together in complex navigation tasks.