1 Introduction

Throughout history, humans have relied on technology to help us remember information. From cave paintings, clay tablets, and papyrus to modern paper, audio, and video, we have used technology to encode and recall information. This paper addresses the question of whether virtual environments could be the next step in our quest for better tools to help us memorize and recall information. Virtual reality displays, in contrast to traditional displays, can combine visually immersive spatial representations of data with our vestibular and proprioceptive senses. The technique of memory palaces provides a natural spatial mnemonic to assist in recall. Since classical times, people have used memory palaces (method of loci), by taking advantage of the brain’s ability to spatially organize thoughts and concepts (Julian 1976; Roediger 1979; Knauff 2013). In a memory palace, one mentally navigates an imagined structure to recall information (Yates 1992; Harman 2001). Even the Roman orator Cicero is believed to have used the memory palace technique by visualizing his speeches and poems as spatial locations within the auditorium he was in (Yates 1992; Godwin-Jones 2010). Spatial intelligence has been associated with a heightened sense of situational awareness and of relationships in one’s own surroundings (Mayer et al. 2001; Gardner 2006).

Research in cognitive psychology has shown that recall is superior in the same environment in which the learning took place (Godden and Baddeley 1975). Such findings of context-dependent memory have interesting implications for virtual environments that have not yet been fully explored. Imagine, for instance, a victim of a street aggression being asked to recall the appearance details of their assailant. Virtual environments that mirror the scene of the crime could provide superior assistance in recall by placing the victim back into such an environment.

In this paper, we present the results of a user study that examined if virtual memory palaces could assist in superior recall of faces and their spatial locations aided by the context-dependent immersion afforded by a head-tracked head-mounted display (HMD condition) as compared to using a traditional desktop display with a mouse-based interaction (desktop condition). To explore this question, we designed an experiment where participants were asked to recall specific information in the two environments: the HMD condition and the desktop condition. We created the virtual memory palaces prior to the start of the study. Our hypotheses are as follows:

  • Hypothesis 1: The participant memory recall accuracy will be higher in the HMD condition as compared to the desktop condition due to the increased immersion.

  • Hypothesis 2: Participants will have higher confidence in their answers in the HMD condition as compared to the desktop condition.

The experiment was a within-subject, \(2 \times 2 \times 2\) Latin-square design, ensuring all the different combinations of variables and factors were accounted for. The experimental results of our study support both hypotheses.

2 Related work

Memory palaces have been used since the classical times to aid recall by using spatial mappings and environmental attributes. Figure 1 shows a depiction of a memory palace attributed to Giulio Camillo in 1511. The idea was to map words or phrases onto a mental model of an environment (in this case an amphitheater) and then recall those phrases by mentally visualizing that part of the environment.

Fig. 1
figure 1

Giulio Camillo’s depiction of a memory palace (1511 AD). Memory palaces like this have been used since the classical times as a spatial mnemonic

An important component of the memory palace technique is the subjective experience of being virtually present in the palace, even when one is physically elsewhere. This notion of presence has long been considered central to virtual environments, for evaluation of their effectiveness as well as their quality Skarbez et al. (2017). More precisely, Slater (2009) developed the idea of place illusion (PI), referring to the aspects of presence “constrained by the sensorimotor contingencies afforded by the virtual reality system.” Sensorimotor contingencies are those actions which are used in the process of perceiving the virtual world, such as moving the head and eyes to change gaze direction or seeing around occluding objects to gain an understanding of the space (O’Regan and Noë 2001). Slater (2009) therefore concluded that establishing presence or “being there” for lower-order immersive systems such as desktops is not feasible. In contrast, the sensorimotor contingencies of walking and looking around facilitated by head-mounted displays contribute to their higher-order immersion and establishing presence.

Recent research in cognitive psychology (Repetto et al. 2016) suggests that the mind is inherently embodied. The way we create and recall mental constructs is influenced by the way we perceive and move (Barsalou 2008; Shapiro 2010). The memory system that encodes, stores, recognizes, embodies, and recalls spatial information about the environment is called spatial memory (Madl et al. 2015). Several studies have found that embodied navigation and memory are closely connected  (Leutgeb et al. 2005; Buzsáki and Moser 2013). Madl et al. (2015) state that there are several different types of brain mechanisms involved in processing spatial representations in the brain. Grid cells in the entorhinal cortex, used for path integration, are activated by changes in movement direction and speed (Moser et al. 2008; Burgess 2008). Head-direction cells activate in the medial parietal cortex when the head points in a given direction, providing information on viewing direction (Baumann and Mattingley 2010). Border cells and boundary vector cells in the subiculum and entorhinal cortex activate in close proximity to environment boundaries, depending on head direction (Burgess 2008; Lever et al. 2009). Lastly, place cells in the hippocampus activate in specific spatial locations, independent of orientation, providing an internal representation of the environment (Ekstrom et al. 2003; Hartley et al. 2014). It is believed that place cell fields arise from groups of grid and boundary cells which activate for different spatial scales and environmental geometry to provide a sense of location (Barry et al. 2006; Kim et al. 2011). In addition, these hippocampal cells also provide information about place–object associations, associating place cell representations of specific locations with the representations of specific objects in recognition memory (Brown and Aggleton 2001; Hok et al. 2005). This leads us to the possibility that a spatial virtual memory palace, experienced in an immersive virtual environment, could enhance learning and recall by leveraging the integration of vestibular and proprioceptive inputs (overall sense of body position, movement, and acceleration) (Hartley et al. 2014).

2.1 Memory palaces on a desktop monitor

Legge et al. (2012) compared the use of the traditional method of loci using a mental environment against a 3D graphics desktop environment. In this study, the subjects were divided into three groups. The first group was instructed to use a mental location or scene, the second group was a 3D graphics scene, and the third (control) group was not informed on the use of any mnemonic device. The subjects in the three groups were given 10–11 uncorrelated words and asked to memorize the words with their mnemonic device, if any. The users then recalled the words serially. This study found that the users who used a graphics desktop environment as the basis for their method of loci performed better than those using a mental scene of their choice, and those who were not instructed on a memory strategy did not perform as well as those who were instructed to use the memory strategy. Fassbender and Heiden (2006) compared the ability of users to recall a list of 10 words when using a desktop compared to memorizing the word list. The authors created a navigable 3D castle with 4 sections and 10 objects, where each object has a visual and audible component, with the idea that a user will associate a word with that object. First, each user was given 10 words to memorize and then were asked to recall as many as they could after a 2-min distraction task. Next, each user was explained and shown the 3D castle on a desktop. After being given time to learn the associations between the words, images, and audio, the users were evaluated on their ability to recall the words in the 3D castle on the desktop. The study found that there was no significant difference between the users’ ability to immediately recall the words after a 2-min break, but after one week there was a 25% difference in recall in favor of the 3D graphics desktop memory palace environment condition. The above studies show that compared to a purely mental mnemonic, a graphics desktop setup is better in assisting retention and recall.

Both of these studies have been carried out on desktops and not in immersive HMDs. In our study, we compare the performance of users on a desktop compared with an immersive HMD.

2.2 Memory palaces on multiple displays

The efficacy of varying immersion levels by changing the field of view has also been studied in the context of procedural training (Bowman and McMahan 2007). Sowndararajan et al. (2008) compared subject performance for a simple and complex procedural task (involving a different number of steps and interactions), but with two different fields of view—one with a laptop and the other with a large rear-projected L-shaped display. The study had participants trained on two procedures, and the performance with the two levels of immersion was compared. The study found that higher levels of immersion (in this case, field of view) were more effective in learning complex procedures that reference spatial locations. In addition, there was no statistical difference in performance for the simple task for the different levels of immersion. Ragan et al. (2010) carried out a user study in which participants were asked to memorize and recall the sequence of placement of virtual objects on a grid shown on three rear-projected screens (one front and two side screens). The participants were divided into multiple groups that performed the task with different fields of view and fields of regard. The field of view is the size of the visual field seen in one instant, while the field of regard is the total size of the visual field that can be seen by a user (Bowman and McMahan 2007). Both are measured in degrees of visual angle. Ragan et al. found that higher field of view and field of regard produced a statistically significant performance improvement.

The above studies examined the effectiveness of memory recall of objects, their locations, and the sequence of placement actions, in a limited field of view and field of regard in monoscopic display environments with multiple monitors. The field of regard in these studies did not surround the viewer completely. In our study, we wanted to examine the effectiveness of stereoscopic, spherical field of regard afforded by modern HMDs compared to a desktop for memory recall of objects and their spatial locations.

2.3 Search and recall in head-mounted displays

Pausch et al. (1997) studied whether immersion in a virtual environment using a HMD aids in searching and detection of information. For their study, they created a virtual room with letters distributed on walls, ceiling, and floor. A user was placed in the center of this room and was asked whether a set of letters was present or not. The test was conducted using a HMD and a traditional display with a mouse and keyboard. They found that when the search target was present, the HMD and the traditional display had no statistically significant difference in performance. However, when the target was absent, the users were able to confirm its absence faster in the HMD than on the traditional display. In addition, the users that used the HMD first and then moved to a traditional desktop had better performance than those who used the desktop first and then the HMD. This suggests a positive transfer effect from the HMD to a desktop. Our user study is highly influenced by the study of Pausch et al. (1997), but in our study, users perform recall rather than search.

Ruddle et al. (1999) compared user navigation time and relative straight-line distance accuracy (amount of wasteful navigational movement) between a HMD and a traditional desktop. Users were then asked to learn the layout of two virtual buildings: one using a HMD, and the other using a desktop. After familiarizing themselves with the buildings, each user was placed in the lobby of that building and were told to go to each of five named rooms and then return to the lobby. They found that the users wearing the HMD had faster navigation times and less waste-full movement and were more accurately able to estimate distances, compared to those using a desktop.

Mania et al. (2003) examined accuracy and confidence levels associated with recall and cognitive awareness in a room filled with objects such as pyramids, spheres, and cubes. Participants were exposed to one of the following scenarios: (a) a virtual room using a HMD, (b) a rendered room on a desktop, or (c) a real room experienced through glasses designed to restrict the field of view to \(30^{\circ }\) to match that of the HMD and desktop. All the four walls of the room were distinct. After 3 min of exposure, the participants were given a paper containing a representation of the room which included numbered positions of objects in the various locations. The participants were asked to recall which objects were present and where they were located in the room, and to give a confidence and awareness state with each answer. The study evaluated the participants immediately after the exposure and then again after one week. The study found that immediately after the exposure the participants had the most accurate recall in the real-world scene and were slightly less accurate and confident in the HMD and least accurate and confident on the desktop. After one week, the overall scores and confidence levels dropped consistently across the board, with the viewing condition having no effect on the relative reduction in performance. In this inspirational study, the participants only experienced one display. In our study, the participants were exposed to both the desktop and the HMD. This makes it possible to compare recall for the same user across the two display modalities. Further, to use the context provided by immersion, the participants in our study were asked to recall the information while viewing the same virtual scenes on the same display, rather than recording their answers on a representation of the scene on paper.

Harman et al. (2017) explored immersive virtual environments for memory recall by having participants take on the role of a boarding an airplane in a virtual airport. After the experience, the participants were asked about the tasks they performed. The participants who experienced the virtual airport in a HMD had more accurate recall than those who used the desktop. In this study, each participant used either a HMD or a desktop. Also, the evaluation of the memory recall was done outside of the visual experience, through a questionnaire. In our study, not only do participants experience the virtual environment in both, HMD and desktop, but are also asked to recall in the same environment in which they experienced the information.

2.4 Embodied interaction and recall

Virtual walk-throughs have been one of the earliest applications of virtual worlds (Brooks Jr et al. 1992). Brooks (1999) studied whether active participants had superior recall of the layout of a 3D virtual house on a desktop compared to passive participants. Active participants controlled camera navigation via a joystick, while passive participants observed the navigation. They found that active participants had a superior environment layout recall compared to those who were passive. However, they also found that there was no statistically significant difference between the recall or recognition of objects (such as furniture or entrances and exits of a room) or their positions within the environment between the active and passive participants. This suggests that memory was only enhanced for those aspects of the environment that were interacted with directly—particularly the environment which was navigated.

Richardson et al. (1999) had users learn the layout of a complex building through either 2D maps, physically walking through the real building, or through a 3D virtual representation of that building built using the Doom II engine and shown on a desktop. The study found that when the building was a single floor, the real-world and virtual-environment-trained users had comparable results. However, when the building had two floors, relative view orientation during learning and testing mattered. If the participants were in the same orientation that they had used during learning, they were able to navigate the environment just as well as those who were physically in the environment. However, participants were susceptible to disorientation if their starting-out views were different between their training and testing. The authors concluded that training in the virtual and real-world environments likely used similar cognitive mechanisms.

Wraga et al. (2004) compared the effectiveness of vestibular and proprioceptive rotations in assisting recall by having participants recall on which of the four walls was a object located relative to their orientation before and after rotation. Participants were placed in a virtual room with four distinctly colored alcoves on four walls and given time to learn and recognize the alcoves. Participants would then rotate, either using the HMD accelerometer or a joystick, to find a certain object on one of the alcoves as described by the tester. Once the user was looking at that object on one of the alcoves, their view would be frozen and the tester would ask the participant to state where a particular (different) alcove was relative to their orientation. They found that users in a HMD were better able to keep track of the objects by rotating their heads as compared to using a joystick. In another experiment, the authors also found that users in a HMD who controlled their bearing in a virtual world by actively rotating in a swivel chair were better able to keep track of an object than those that were being rotated by a tester. In our study, we expect vestibular and proprioceptive inputs to improve performance in the HMD. We study how well people can recall information regardless of their orientation. In addition, our objects are distributed in more than four unique locations.

Perrault et al. (2015) leveraged the method of loci technique by allowing participants to link gestural commands, which would control some system, to physical objects within a real room. They compared their interaction technique to a mid-air swipe menu which relies on directional swiping gestures. Their idea was to leverage spatial, object, and semantic memory to help users learn and recall a large number of gestures and commands. In a home environment, participants were shown a command (or stimulus) on a television and then performed a motion that a Microsoft Kinect would track and record as representing that command. For the mid-air swipe, the participant would perform a 2-segment marking menu gesture. For the physical loci, the participant would simply point at an object in the environment that they wanted associated with the command, such as a chair or poster. Once the gestures and physical loci were trained, the participants went into the recall phase. In this phase, a command would be presented on the screen and the user had to quickly and accurately perform the corresponding gesture. The system would then show whether the participant performed the correct gesture or pointed at the correct loci object that they originally assigned for that command. The authors found that users, when using their physical loci technique, had superior command recall and were more robust compared to the more traditional mid-air swipe menu.

3 Method

A memory palace is a spatial mnemonic technique where information is associated with different aspects of the imagined environment, such as people, objects, or rooms, to assist in their recall (Yates 1992; Harman 2001). The goal of our user study was to examine whether a virtual memory palace, experienced immersively in a head-tracked stereoscopic HMD, can assist in recall better than a mouse-based interaction on a traditional, non-immersive, monoscopic desktop display. Previous work has examined the role of spatial organization, immersion, and interaction in assisting recall.

This study is different from the previous work in several ways. First, we are focusing on spatial memory using a 3D model of a virtual memory palace, rather than relying on other forms of memory (such as temporal/episodic). Second, both the training and testing (recall) phases take place within the same virtual memory palace. Third, participants used both the desktop and HMD displays, which allows us to compare each participant’s recall across displays. Lastly, the content used in previous studies was either abstract, verbal, textual, visually simplistic, low in diversity, or time based, whereas our study uses faces, with unique and diverse characteristics.

3.1 Participants

Our user study for this research was carried out under IRB ID 751321-1 approved on August 7, 2015, by the University of Maryland College Park IRB board. In this study, we recruited 40 participants, 30 male and 10 female, from our campus and surrounding community. Each participant had normal or corrected-to-normal vision (self reported). The study session for each participant lasted around 45 min.

3.2 Materials

For this study, we used a traditional desktop with a 30 inch (76.2) cm—diagonal monitor and an Oculus DK2 HMD. The rendering for the desktop was configured to match that of the Oculus with a resolution of \(1920 \times 1080\) pixels (across the two eyes) with a rendering field of view (FOV) of \(100^{\circ }\). In order to give the desktop display the same field of view as the HMD, the participants were positioned with their heads 10 inches (25.4 cm) away from the monitor. The software used to render the 3D environments on both the desktop and HMD was identical and was designed in-house using C++ and OpenGL-accelerated rendering. The rendering was designed to replicate a realistic looking environment as closely as possible, incorporating realistic lighting, shadows, and textures. The models (the medieval town and palace) were purchased through the 3D modeling distribution Web site TurboSquid (3DMarko 2011, 2014).

3.3 Design

The participants were shown two scenes, on two display conditions (head-tracked HMD and a mouse-based interaction desktop), and two sets of faces (within-subject design), all treated as independent variables, with the measured accuracy of recall as the dependent variable. The two scenes (virtual memory palaces) consist of pre-constructed palace and medieval town environments filled with faces. We decided to use faces given the previous work (Harris 1980; McCabe 2015) showing the effectiveness of memory palaces aiding users in recalling face-name pairs. We used faces as the objects to be memorized and carefully partitioned them into two sets of roughly equal familiarity. We quantified the familiarity of the faces using Google trends data over the four months preceding the study. The faces are shown in “Appendix” (at the very end of the paper) in Figs. 11 and 12, and the Google trends statistics are presented in Tables 1 and 2. There was no statistically significant difference between the two sets of Google trends data: \(p = 0.45 > 0.05\).

The faces in the palace and medieval town were hand positioned for each environment, before the start of the study, and remained consistent throughout the study. We distributed the faces at varying distances from the users’ location (see Fig. 2) so that they surrounded and faced the user. Since we used perspective projection, the sizes of the faces varied. However, the distribution of the angular resolution of the faces across the two sets/environments was not statistically different, with \(p = 0.44 > 0.05\) (see Table 3 in “Appendix”).

Users were allowed to freely rotate their view but not translate. This effectively simulated a stereoscopic spherical panoramic image with the participant at its center. Our motivation behind this study design decision was that if even this limited level of immersion could show an improvement in recall, it could lead to a better-informed exploration of how greater levels of immersion relate to varying levels of recall.

3.4 Procedure

First, each participant familiarized themselves with all the 42 faces and their names used in the study. The participants received a randomly permuted collection of printouts, each containing a face-name pair used in the study. Participants were given as much time as needed until they stated when they were comfortable with the faces. In general, participants did not spend more than 5 min on this familiarization.

Next, each participant was told about the training and testing procedure, including how many faces were going to be in each scene (21), how much time they had to view the faces (5 min), how the breaks would work, that the faces would be replaced with numbers in the recall phase, and that they were to give a name and confidence for their recalled faces for each numbered position. In almost every case, we recorded the answer as the name explicitly recalled by the participant. However, in rare, exceptional circumstances, when the participants gave an extremely detailed and unambiguous description of the face (“fat, wore a wig, was King of France, and is not Napoleon” for King Louis), we marked it correct. Next, each participant was placed either in front of a desktop monitor with a mouse or inside a head-tracked stereoscopic HMD. They were given as much time as they desired to get comfortable, looking around the scene without numbers or faces. The users rotated the scene on a desktop monitor with a mouse, and in the HMD setup they rotated their head and body, but no further navigation was possible.

Fig. 2
figure 2

Locations of faces and numbers in the virtual memory palaces used in our user study a an ornate palace and b a medieval town. Note that this is not the view the participants had during the experiment, and these pictures are used to convey the distribution of the face locations. The participants would have been placed in the middle of these scenes surrounded by the faces as shown in Fig. 3

Once each participant was comfortable with the setup and the controls, a set of 21 faces were added to the 3D scene and distributed around the entire space as shown in Fig. 2. We used two such scenes—a palace and a medieval town, shown in Fig. 3. The faces were divided into two consistent sets used for the whole study; if a face appeared in one set (or scene) for a given participant, it would not be shown again in the second set or scene.

To cover all possible treatments of the \(2 \times 2 \times 2\) Latin-square design, each participant was tested in both scenes, both display conditions (HMD and desktop), and both sets of faces, with their relative ordering counterbalanced across participants. The 21 faces within the scene were presented to the participants all at once, and the participants were able to view and memorize the faces in any order of their choosing. The faces were deterministically placed in the same order for all participants. However, since the participants were free to look in any direction, the order of presentation of faces was self-determined. Each participant was given 5 min to memorize the faces and their locations within the scene. After the 5-min period, the display went blank and each participant was given a 2-min break in which they were asked a series of questions. Questions we asked included how each participant learned about the study, what their profession/major was, and what were their general hobbies or interests. In the second half of the study, during the break for the alternative display, we asked how often a participant used a computer, what their previous experience was with VR, and their general impressions of VR. We consistently asked these questions of each participant, but did not record the responses.

The reasons for these study design decisions are rooted in foundational research in psychology on memory. From the seminal work by Miller (1956), we learn that the working memory (Baddeley and Hitch 1974) can only retain \(7 \pm 2\) items. According to Atkinson and Shiffrin (1968), the information in the short-term memory decays and is lost within a period of 15–30 s. We feel confident that having participants recall 21 faces after a 2-min break will engage their long-term memory.

Fig. 3
figure 3

The two virtual memory palace scenes used in our user study a an ornate palace and b a medieval town, as seen from the view of the participants

Fig. 4
figure 4

Virtual memory palace: recall phase

After the 2-min break, the scene would reappear on the display with numbers having replaced the faces, as shown in Fig. 4. Each participant was then asked to recall, in any order, which face had been at each numbered location. During this recall phase, each participant could look around and explore the scene just as they did in the training phase, using the mouse on the desktop or rotating their head-tracked HMD. Each participant had up to 5 min to recall the names of all the faces in the scene. Once the participant was confident in all their answers, or the 5-min period had passed, the testing phase ended. After a break, each participant was placed in the other display that they had not previously tested with. The process was then repeated with a different scene and a different set of 21 faces to avoid information overlap from the previous test.

For each numbered location in the scene, the participants verbally recalled the name of the face at that location, as well as a confidence rating for their answer, ranging from 1 to 10, with 10 being certain. If a participant had no answer for a location, it was given a score of 0. The results were hand-recorded by the study administrator, keeping track of the number, name, user confidence, and any changes in a previously given answer.

To mitigate any learning behavior from the first trial to the second, we employed a within-subject trial structure, using a 2 (HMD-condition to desktop-condition vs desktop-condition to HMD-condition) × 2 (Scene 1 vs Scene 2 ) × 2 (Face Set 1 vs Face Set 2) Latin-square design. By alternating between the displays shown first (2), the scenes (2), and the faces (2), we expect to mitigate any confounding effects. At the end, each participant was tested on the two display conditions, desktop and HMD, on two different scenes, and with two different sets of 21 faces. We note that participants could have used personal mnemonics to help remember the locations and ordering of faces. However, since we evaluated recall for each participant over a desktop and a HMD, their performance should be counterbalanced between the two display conditions.

4 Results

Our hypothesis is that a virtual memory palace experienced in an immersive head-tracked HMD (the HMD condition) will lead to a more accurate recall than on a mouse-controlled desktop display (the desktop condition). In addition, we hypothesized that participants should be more confident in their answers in the headset and make fewer mistakes or errors in recall. Our null hypothesis is that there is no statistical difference between the accuracy and confidence of results between the HMD and desktop conditions and that there is no statistical difference in the ordering of the display conditions.

We confirmed using a four-way mixed ANOVA that there were no statistically significant effects on recall due to the scenes (palace and town) \(F(1,79)=0.27, p > 0.05\), the two sets of 21 faces \(F(1,79)=0.27, p> 0.05\), or the ordering of display conditions (HMD followed by desktop vs desktop followed by HMD) \(F(1,79) = 1.93, p > 0.05\). We found that there was a statistically significant effect for the display condition (HMD vs desktop) with \(F(1,79)=4.6\) and \(p<0.05\). This means participants were able to recall better in the HMD condition as compared to the desktop condition, permitting us to reject the null hypothesis.

4.1 Task performance

The overall average recall performance of participants in the HMD condition was 8.8% higher compared to the desktop condition with the mean recall accuracy percentage for HMD condition at 84.05% and the desktop condition at 75.24%. Using a paired t test with Bonferroni–Holm correction, we calculated \(p = 0.0017 < 0.05\) which shows that our result was statistically significant. In Fig. 5, we present the overall performance of the users in the HMD condition as compared to the desktop condition.

Fig. 5
figure 5

The overall average recall performance of participants in the HMD condition was 8.8% higher compared to the desktop condition. The median recall accuracy percentage for HMD was 90.48% and for desktop display was 78.57%. The figure shows the first and third quartiles for each display modality

4.2 Errors and skips

The recall accuracy measures the number of correct answers. In addition, we kept track of when participants in our user studies made an error in recall (i.e., gave an incorrect answer) or skipped answering (i.e., did not provide an answer). We show the percentile distribution of the average number of erroneous answers per participant for each display modality in Fig. 6. Participants in the HMD condition made on average fewer errors than those in the desktop condition. The total number of errors in the HMD condition for 40 people was 33 out of 840, and in the desktop condition it was 56 out of 840. In addition, the difference in the incorrect answers was statistically significant, shown using a paired t test with Bonferroni–Holm correction resulting in \(p=0.0195 < 0.05\).

Fig. 6
figure 6

The distribution of incorrect answers for each display modality showing the median, first, and third quartiles

In Fig. 7, we showed that the number of faces for which participants skipped an answer in the desktop condition was significantly higher than in the HMD condition. This was shown to be statistically significant using a paired t test with Bonferroni–Holm correction with \(p=0.0062 < 0.05\), which reinforces that participants in the HMD had better recall than those on the desktop.

Fig. 7
figure 7

The distribution of faces skipped during recall for each display modality showing the median, first, and third quartiles

4.3 Confidence

Previous work by (Mania and Chalmers 2001; Mania et al. 2003) examined user confidence with recall accuracy. This allows us to study not only the objective recall accuracy but also the subjective certainty of the user answers. We asked each participant to indicate their confidence on a scale of 1–10, with 10 being certain, as a measure of how certain they were in the correctness of their response, for each answer. The confidence scores aggregated across all the 40 participants and all the 42 faces that each studied are shown in Fig. 8.

Fig. 8
figure 8

The overall confidence scores of participants in the HMD condition and the desktop condition. Each participant gave a confidence score between 1 and 10 for each face they recalled. Those in the HMD condition are slightly more confident about their answers than those in the desktop condition

From Fig. 8, we can see that users were slightly more confident in the HMD condition than on the desktop condition. The average confidence values for the HMD and desktop conditions were 9.4 and 9.1 respectively, ignoring skips. For the highest confidence, a confidence score equal to 10, there was a statistical difference between the number of correct answers given in the HMD and the desktop conditions, with \(p = 0.009 < 0.05\) using a Chi-square test, and with \(p=0.022 < 0.05\) including Yates community correction. However, confidence is not always an indication of correctness. We wanted to see whether the HMD condition was giving a false sense of confidence. Figure 9 shows the number of errors given in each display based on the confidence of participant answers.

Fig. 9
figure 9

The number of errors made for each display condition for various confidence levels

The results in Fig. 9 show that when the users were less error-prone in the HMD condition, their confidence was better-grounded in the recall accuracy than when in the desktop condition. In general, participants were more often correct in the HMD condition than for the desktop condition for a given confidence level.

4.4 Ordering effect

In our study, we alternated the order in which participants were exposed to the displays. Figure 10 shows the accuracy when using the desktop first followed by the HMD versus using the HMD first and then the desktop.

Fig. 10
figure 10

The performance of participants going from a desktop to a HMD and from a HMD to a desktop, showing the median, first and third quartiles

For both the desktop and HMD conditions, users started with roughly the same performance (accuracy) on both the desktop and HMD (desktop-1 and HMD-1 in Fig. 10), but when going to the other display, the performance changed. When users went from a desktop to a HMD, their performance generally improved. However, when the users went from a HMD to a desktop, their performance surprisingly decreased. When comparing each participant’s first trials, the desktop-1 and HMD-1, their distribution of recall scores was not significantly different with \(p = 0.62 > 0.05\), but they were for the second trials, the HMD-2 and desktop-2, with \(p = 0.025 < 0.05\).

5 Discussion

We next report some interesting observations based on a questionnaire the participants filled out after the study. All our participants were expert desktop users, but almost none had experienced a HMD before. We believe that if there were to be any implicit advantage, it would lie with the desktop, given the overall familiarity with it. Although we gave the participants enough time to get comfortable in the HMD before we began the study, we observed that many were not fully accustomed to the HMD, even though they performed better in it. We asked each participant which display they preferred for the given task of recall. We explicitly stated that their decision should not be based on the novelty or “coolness” of the display or the experience. All but two of the 40 participants stated they preferred the HMD for this task. They further stated that they felt more immersed in the scene and so were more focused on the task. In addition, a majority of the users (70%) reported that HMD afforded them a superior sense of the spatial awareness which they claimed was important to their success. Approximately a third mentioned that they actively used the virtual memory palace setup by associating the information relative to their own body. This ability to associate information with the spatial context around the body only adds to the benefit of increased immersion afforded by the HMD.

We note the interesting results we obtained with the display ordering. When starting with the desktop and then using the HMD, we observed a significant improvement as compared to starting with the HMD and then using the desktop. A possible explanation for this could be that those who used the HMD first are able to benefit from the HMD’s superior immersion, which they lose when they transfer to the desktop. However, when the users start on the desktop they invest a greater effort to memorize the information and therefore when they transfer to the HMD, they not only keep their dedication but also gain from the improved immersion.

5.1 Study limitations

In general, it is a difficult design decision to balance the goals of experimental control and ecological validity. In our study, we placed the faces for a particular face set in the same locations for all participants. However, since the participants were free to look in any direction, the order of presentation of faces was self-determined. We could have restricted the participants to look at the faces in a predetermined order. However, we allowed the participants to look around freely, so that the results would achieve greater ecological validity. Randomization of faces could have led to unintended consequences; having the Dalai Lama’s face next to Abraham Lincoln’s in one instantiation could alter its memorability, as could the opportune positioning of the Dalai Lama on a roof-top background. To avoid such inter-object semantic saliency confounds, we decided to preserve the same ordering of faces for all participants that viewed the scene with a given set of faces. We recognize that not randomizing the stimuli in a within-subject design could introduce a bias. To make sure that this did not result in any significant effects, we carried out a four-way mixed ANOVA (reported at the beginning of Sect. 4) and we did not find any statistically significant effects on recall due to the scenes, face sets, or the ordering of the display conditions. Previous research, such as Loomis et al. (1999), points out the trade-offs between experimental control and ecological validity for virtual environments. Parsons (2015) persuasively argues for designing virtual environment studies that strike a balance between naturalistic observation and the need for exacting control over variables.

The modality of interactive exploration of the virtual environment in the two conditions was different (head tracking versus mouse tracking). Thus, differences in the recall performance may be explained by this diverse interaction modality. Our study did not attempt to distinguish the role of proprioceptive and vestibular information from visual stimuli, but examined them in the respective contexts of immersive HMD and desktop display conditions. It will be interesting to examine the relative advantage of the diverse interaction modalities with the same display modality, in future user studies.

5.2 Conclusions

We found that the use of virtual memory palaces in HMD condition improves recall accuracy when compared to using a traditional desktop condition. We had 40 participants memorize and recall faces on two display–interaction modalities for two virtual memory palaces, with two different sets of faces. The HMD condition was found to have 8.8% improvement in recall accuracy compared to the desktop condition, and this was found to be statistically significant. This suggests an exciting opportunity for the role of immersive virtual environments in assisting in recall. Given the results of our user study, we believe that virtual memory palaces offer us a fascinating insight into how we may be able to organize and structure large information spaces and navigate them in ways that assist in superior recall.

One of the strengths of virtual reality is the experience of presence through immersion that it provides (Sanchez-Vives and Slater 2005; Skarbez et al. 2017). If memory recall could be enhanced through immersively experiencing the environment in which the information was learned, it would suggest that virtual environments could serve as a valuable tool for various facets of retrospective cognizance, including retention and recall.

5.3 Future work

Our study provides a tantalizing glimpse into what may lie ahead in virtual-environment-based tools to enhance human memory. The next steps will be to identify and characterize what elements of virtual memory palaces are most effective in eliciting a superior information recall. At present, we have only studied the effect of in-place stereoscopic immersion, in which the participants were allowed to freely rotate their viewpoint but not translate. It will be valuable to study how the addition of translation impacts information recall in a virtual memory palace.

Other directions of future studies could include elements in the architecture of the virtual memory palaces such as their design, the visual saliency of the structure of model (Kim et al. 2010), their type, and various kinds of layouts and distribution of content that could help with recall. Another interesting future work would be to allow people to build their own virtual memory palaces, manipulate and organize the content on their own, and then ask them to recall that information. If their active participation in the organization of the data in virtual memory palaces makes a meaningful difference, then that could be further useful in designing interaction-based virtual environments that could one day assist in far superior information management and recall tools than those currently available to us. Yet another interesting future direction of research could be to compare elements of virtual memory palaces that are highly personal versus those that could be used by larger groups. Much as textbooks and videos are used today for knowledge dissemination, it could be possible for virtual memory palaces to be used one day for effective transfer of mnemonic devices among humans in virtual environments.