Introduction

In the last few years, Virtual Reality (VR) has undergone incredible technological advancements and is becoming increasingly accessible to a growing audience through easier handling. It thus has also found its way into experimental psychology, being a valuable tool to study basic cognitive and emotional processes under realistic conditions (for an overview, see Cipresso et al., 2018). Foremost, the application of VR might improve the ecological validity of psychological science as it enables researchers to study aforementioned processes under multimodal and complex conditions typical to real-life situations while maintaining strict experimental control (Parsons, 2015; Smith, 2019).

Emotional and cognitive processes have developed over millions of years (Darwin & Bynum, 2009) in a complex environment and are specifically adapted to it. Notwithstanding, experimental psychology has over the last decades predominantly reduced the investigation of said processes to a rudimentary laboratory environment; however, for well-founded reasons. The maxim of experimental control makes it possible in the first place to isolate distinct psychological processes and to identify their neural correlates and substrates. Nevertheless, science claims to unravel the processes underlying complex naturalistic events. To grasp prominent aspects of the reality-related psychological functioning, aside from picture stimuli, videos are often used. They capture defining aspects of real-life experiences, including a multimodal stream of information, dynamically evolving and varying over time (Samide et al., 2019).

Although psychological science went to great lengths to recreate authentic experiences within the laboratory, especially the induction of emotion within the laboratory is at odds with how they are induced in real-life. In particular, a further concern originating from the conventional mode of presentation is that it relies on participants recalling or reliving emotional memories in order to induce affect (Harmon-Jones, 2019) rather than the situation itself triggering emotions. Emotions and affect serve as information for cognitive heuristics applied in the making of judgments (Schwarz, 2000). However, the informative value of images presented on a computer screen does not convey current environmental information. Instead, it serves as a reminder to a previously perceived emotion. For example, seeing a picture of a person standing on a cliff might trigger memories and/or feelings associated with fear of heights. Such retrieved emotions have another informative value than those proprietary to the situation: Watching a horror movie as well as actually being in an abandoned house elicits fear. However, in contrast to the real-life situation, movie-induced fear does not inform cognition by promoting corresponding flight behavior (other than turning off the TV) and can even be entertaining. Although both types of fear are overlapping concepts, they exhibit unique features of which some cannot be studied within conventional laboratory setups. The unmitigated nature of an experience (multimodal conscious representation) is what makes the distinction from merely seeing (two-dimensional visual impression) a stimulus (Kisker et al., 2020).

Virtual reality has shown that it promotes realistic behavior (Kisker et al., 2019a) informed by appropriate emotional reactions that are triggered by, and adapted to the environment itself. As VR users are shielded against all sensory input other than the virtual environment with a head-mounted display (HMD), an immersive experience is created (Felnhofer et al., 2015; Slater & Wilbur, 1997). This further facilitates a strong feeling of being physically present in the scene, commonly referred to as the sense of presence (Nilsson et al., 2016; Slater & Wilbur, 1997). Consequently, emotional reactions are adapted to the entire VR environment and not to a stimulus presented on a screen in an otherwise neutral laboratory environment.

Immersive experiences do not only modulate emotional processes (Diemer et al., 2015; Gorini et al., 2010), they also impact cognitive processes, including memory (Schöne et al., 2019; Smith, 2019) and attention (Iriarte et al., 2012; Urech et al., 2015). A recent publication replicated Simons’ and Chabris’ Invisible Gorilla paradigm in VR (Schöne et al., 2020). While under conventional conditions about 70% of the participants missed the Gorilla, the effect was diminished to 30% in the VR condition using the very same video material. Although the effect was not completely attenuated, sustained inattentional blindness plays a much smaller role under realistic conditions than derived from classical laboratory settings. The attentional processes might be modulated by physical vicinity and/or self-relevance.

The latter seems to be one of the most prominent features of VR (Schöne et al., 2020), almost inevitably resulting from immersion and presence, and should generally be further considered and systematically manipulated (Samide et al., 2019). VR experiments easily allow for that kind of manipulation. For example, being the victim in a VR scene of domestic violence as opposed to being a passive bystander enhances the sensation of fear, helplessness, and vulnerability (Gonzalez-Liencres et al., 2020). Most importantly, a first-person perspective, that is, being subject of a scene, leads to taking the scene personally and is associated with elevated behavioral and physiological reactions. Notably, the simulation of immediate physical danger evokes appropriate behavioral and cognitive coping strategies even if the situation is fantasy-based, for example, a zombie attack. Hence, the brain emits danger signals that trigger reactions appropriate to a real situation (Lin, 2017). Under conventional laboratory conditions, self-relevance can often only be achieved indirectly, for example by associating stimuli with an external monetary reward (Deci, 1971) or aversive sound (Riesel et al., 2012).

Creating Virtual Environments

To the best of our knowledge, no coherent VR database (i. e., material of one type and source) is available, which means that researchers have to create VR environments individually. The creation of the material is time-consuming and requires a high degree of expertise and capabilities. Furthermore, the creation of individual VR settings might lead to the use of stimuli that are not compatible in terms of realism, presence, and hence the emotions they elicit. It is therefore of utmost importance for researchers to select appropriate and foremost controlled stimuli for inducing a specific emotional state when investigating emotions and cognitive processes (Marchewka et al., 2014).

In principle, VR environments can be created in two different ways: Computer-generated environments are constructed with game engines like Unity or Unreal. Their major advantage is the absolute creative freedom and versatility, as anything imaginable can be simulated in VR. Using additional hardware, those environments potentially incorporate real-life interaction (e.g. hand tracking). However, creating a realistic and responsive environment requires a high level of technical skills, a hurdle that does not need to be overcome using VR cameras. VR cameras record the whole surrounding environment (360°/panoramic), more advanced models even in 3D (3D/360°). 3D/360°-VR videos can be experienced, just like computer-generated environments, through head-mounted displays (HMDs). Therefore, VR-videos are an easy to use and cost-efficient way to enter immersive virtual environments. Although the interaction within VR videos is mostly limited to tracking and translation of the head-movements, the translated head tilts and shifts resemble real-life exploration of a real environment. In particular, 3D/360° videos create the impression that objects, people or animals can be touched or might touch the observer in return. Those experiences come with a high sense of presence (Breves & Heber, 2020; Chirico et al., 2018; Rupp et al., 2019). The photorealism and naturalistic character of these videos give rise to realistic behavior (see also Higuera-Trujillo et al., 2017).

Library for Universal Virtual Reality Experiments – luVRe

Up to the present day, we recorded 450 videos with 69 themes using an Insta 360° VR camera (Insta360, Shenzhen, China). Each theme comprises several videos, varying greatly in part. For example, eleven videos of various animals are assigned to the theme zoo. The database contains 3D/360° videos with a length of 30s seconds (plus extended versions in some cases) with 4 K resolution and 60fps as well as 3D/360° pictures with 8 K resolution when feasible. To avoid motion sickness, most videos are recorded using a tripod. We have estimated a standard body height of 175 cm and a corresponding lens height of 163 cm. Preliminary tests determined no perceived height incongruencies when people were larger or smaller than our standard lens height or even sitting (see also Rothe et al., 2018).

Paralleling previous databases (Bradley et al., 2001; Dan-Glauser & Scherer, 2011; Li et al., 2017), luVRe comprises everyday life scenes as well as extraordinary encounters with varying arousal and valence. Among them, there are calming natures scenes (jetty by a lake, beach, forest), neutral scenes (hotel rooms, farm), tourist attractions/cities (Amsterdam, New York, London, Hamburg, Vienna) and interesting places (restaurants, museums, decommissioned Soviet submarine). Most importantly, the database contains stimuli aiming at eliciting strong emotional and motivational reactions to enable researchers to study the dynamic unfolding of complex affective reactions under realistic conditions. We filmed rather aversive scenes, like visiting a dentist, impressions from an emergency room, during an alarm in an atomic shelter, at a funeral parlor, during surgery, and a police training for a hostage situation. To cover a broad spectrum of emotional reactions, we also included highly appetitive scenes like male strippers, show cooking, playing puppies, or getting a beer at a bar.

Current Studies and Hypotheses

The aim of this publication is to provide evidence that an application of 3D-VR videos is a valuable tool for psychological science. To this end, we investigated the emotional/motivational and cognitive processes associated with immersive VR experiences as opposed to conventional laboratory setups. Specifically, we present the very same stimulus material in both domains in order to identify the mechanism that is uniquely associated with either one of them. Study No. 1 investigates the electrophysiological correlates of approach and withdrawal motivation by means of frontal alpha asymmetries (FAAs). We hypothesized that the unmediated experiences of VR would facilitate a categorical shift in motivational processing resembling real-life processes. Study No. 2 investigates the depth of mnemonic processes in relation to the immersiveness under which the memory trace is formed. Under still unknown conditions, VR experiences seem to propagate the formation of autobiographical memories (see ‘Study 2’). Those memories are retrieved with greater accuracy constituting the VR memory superiority effect. We hypothesized that higher retrieval rates for VR experiences, as opposed to a conventional laboratory setting, would result from real-life mnemonic processing of luVRe videos. Replicating this prominent effect in VR thus is a benchmark for the legitimacy of luVRe and the applied experimental design.

Considering the size of the database and the psychophysical exertion that longer VR sessions impose on a participant, only a representative sample of luVRe was subject to testing in both studies. Furthermore, to ensure that the participants experience the virtual simulation as they would under real-life conditions, a dedicated experimental task was omitted.

Study 1: Frontal Alpha Asymmetries in Virtual Environments

Conventional experimental setups investigating emotional and motivational processes in response to pictorial stimuli oftentimes require participants to rate stimuli subsequent to their presentation (Bradley et al., 2001). Although this well-established approach might be suitable for the 2D picture presentation, we hypothesized it would not fully capture the immersive effect of VR. Two commonly used approaches to 2D scene rating seem applicable: Either stimulus and rating scene are simultaneously presented on one screen, or the rating occurs subsequent to the stimulus presentation. Technically spoken, both methods can and were partly used in VR paradigms (Li et al., 2017), but they present some inherent constraints to the paradigm. A rating overlay superimposed on the scene potentially reliably measures affect exactly when it occurs, but would be at odds with the goal of VR research to create a sensory impression mimicking real-life experiences. Alternatively, a subsequent rating in a neutral VR space would be feasible. However, the participant would perceive a complete change of scenery between stimulus exposure and rating, while the standard setup would take place within the same temporal-spatial reference frame. This approach neglects both, the nature of a 2D compared to a 3D experimental setup, and the meaning conveyed by the stimuli within such a frame. For example, the image of an attacking animal does not pose a threat, but is a token for a similar real-life experience (see introduction). Accordingly, motivational and emotional reactions to pictorial stimuli can depend heavily on the individuals’ experiences and might be weak, as pictoral stimuli alone sometimes do no suffice to elicit appropriate emotional tendencies (Harmon-Jones & Gable, 2018). Conversely, the immersive nature of VR leads to an entirely different experience for the participant. Although it can be assumed that participants in VR are aware that they are experiencing a sophisticated simulation (Kisker et al., 2019a; Lin, 2017), a rating of a VR scene is indeed a rating of an experience and not a token (Kisker et al., 2020).

Consequently, real-time measurements of affective processing in response to stimulus exposure might provide more meaningful insights into the effects of immersiveness on motivation and emotion. The experimental approach of study No.1 leverages the affordances of immersive VR experiences. Foremost, the three-dimensionality of VR constitutes the feeling of presence in, and self-relevance of the virtual environment and events. Moreover, given that VR promotes stronger emotional reactions than 2D setups (e. g. Gorini et al., 2010) and based on pilot studies,Footnote 1 we hypothesized that presence, physicality (3D; see Kisker et al., 2019a), and emotional immediacy facilitate strong motivational tendencies surfacing on an electrophysiological level.

Motivation and affect are intrinsically intertwined. Organisms tend to approach rewards and to perform beneficial operations that fulfill their needs or achieve positive goals. Conversely, they withdraw from any undesired outcome or punishment (for a most recent review, see Harmon-Jones, 2019; Harmon-Jones & Gable, 2018). These motivational tendencies are reflected on an electrophysiological level by alpha-band oscillations (8–13 Hz) measured over frontal scalp areas, called frontal alpha-asymmetries (FAAs). Specifically, the relative difference of alpha power at left-hemispheric and homologues right-hemispheric electrodes is believed to either reflect approach motivation (relative reduction over left areas) or withdrawal motivation (relative reduction over right areas). However, aside from this motivational model, other processes have been associated with two further models. The confundation of affect and motivational processes initially spawned a model associating a relative left-sided reduction of alpha power not with approach motivation, but with positive affect and a relative right-sided asymmetry not with withdrawal, but negative affect (Harmon-Jones & Gable, 2018). This valence model of frontal alpha-asymmetries (Davidson & Fox, 1982), however, does not account for the fact that anger, as an emotion we would consider to be of negative value, also leads to a relative left-sided reduction of alpha power. Relating motivational directions to FAAs, however, also does not seem to provide a final conclusion as evidence highlighting the role of cognitive control on affect emerges. Even neutral stimuli can elicit equally strong FAAs as high-approach positive pictures (erotic pictures), implying that the alpha asymmetry dynamics might actually mirror top-down inhibitory executive processes regulating the generation of affect (Schöne et al., 2016). Hewig (2018) further argues that FAAs might reflect intention consisting of a cognitive component, that is, the mental representation of the intended effect, and an affective-motivational component, the feeling of being determined to act. Most recent research implies that engaging in effortful control of emotion also accounts for the generation of FAAs (Lacey et al., 2020). Taken together, FAAs seem to be a meaningful starting point for real-time exploration of the dynamic interplay between affect, motivation, and executive control in a real-life environment and thus, for conceptually validating immersive VR videos as a suitable tool. Due to the explorative nature of the experiment, no dedicated hypothesis was defined. Rather, the aim was to identify and explore meaningful differences between VR and 2D presentation on the item level. The novelty of the method and the resulting limited publication base complicate the prediction of whether and what specific differences might occur. Aforementioned evidence is obtained under laboratory conditions. Hence, to which extent it might translate to immersive VR conditions is unclear. However, when the FAAs do not differ significantly between conditions, it can be concluded that the cognitive and emotional mechanisms deployed in both conditions are very much alike and that VR does not exhibit unique features other than a conventional computer. In contrast, different FFAs for both conditions would make a case for VR as a tool, as it could be concluded that the immersiveness of VR would give rise to more realistic cognitive and emotional functioning.

Methods Study 1

Participants

Forty-one students from Osnabrück University gave informed consent and participated in exchange for 15€ or partial course credits in the study. All participants were screened for psychological, neurological and cardiovascular disorders. They had normal or corrected-to-normal sight; in the latter case, only people with contact lenses were admitted. One participant was excluded during anamnesis due to the intake of centrally nervous effective medication. Two further participants were excluded from the analysis as they experienced the screen door effect (SDE), a visual artifact letting the viewer see distinct pixels or lines. The SDE limits immersion, as it reduces visual quality (Cho et al., 2017). Thus, a sample size of n = 19 remained per group (3D/360° group: Mage = 21.26, SDage = 2.54, 15 female, 17 right-handed; 2D group: Mage = 23.26, SDage = 2.60, 15 female, 14 right-handed). The study was conducted in accordance with the Declaration of Helsinki and has been approved by the local Ethics Committee of Osnabrück University.

Stimulus Material

Fifteen exemplary videos from the Library for Universal Virtual Reality Experiments (luVRe) were selected on the basis of their affective value. The classification of stimuli was based upon Lang and Bradley’s (2010) description of appetitive and aversive stimuli: Stimuli related to nutrition, reproduction, joy, and caregiving were classified as positive. Stimuli that posed threats, like kidnapping and emergency room scenes, were classified as negative. Stimuli were classified as neutral if they did not contain any special events or (inter-)actions, such as the exterior view on plain buildings and empty rooms (see Fig. 1 for examples). Since we could not rely on preliminary affective ratings, the videos were assigned to the respective affective dimension when the project members agreed on the emotional reaction to be expected (see Figs. 1, 2 and Table 1).

Fig. 1
figure 1

Exemplary stimuli. Note. Screenshots are taken from six of the 15 videos used as stimulus material, depicting the Emergency Room video (a), the Horses video (b), the Hotel Room video (c), the Bunker video (d), the Bar video (e), and the Planetarium video (f). The slightly distorted display of the screenshots results from being captured with a conventional video player instead of a 360° compatible program. During the experiment, the videos were displayed without distortion

Fig. 2
figure 2

Timing of a trial. Note. Each trial started with a 20 s resting phase, followed by a one-second fixation. Each video was presented for 60 s. Each trial took 81 s. The slightly distorted display of the screenshots results from being captured with a conventional video player instead of a 360° compatible program. During the experiment, the videos were displayed without distortion

Table 1 Content and classification of the 15 exemplary videos that were used as stimuli from luVRe

Each video was 60 s long. Twenty randomized sequences of the 15 videos were generated, subject to the constraint that no more than two videos of the same valence follow each other in presentation (Bradley et al., 2001).

Procedure

Both, 3D/360° videos and 2D videos were presented using the HTC Vive VR system to control for possible confounding factors that could result from wearing the VR system, like electromagnetic inferences or physical pressure on the electrodes (see Fig. 3). For the 3D/360° condition, the videos were presented as fully immersive VR-3D/360° videos. In order to create a similar experience in the 2D condition, only deprived of the immersive three-dimensional nature of VR, the videos were projected onto a large frameless virtual screen within the VR environment. Thus, the proportions of the depicted objects on the retina were maintained, and peripheral vision was stimulated likewise. However, the presentation for one group was as VR-3D/360° video, and the other group viewed videos in 2D albeit wearing an HMD. Each of the 20 video sequences was presented to one participant. The participants were allowed to make slight horizontal and vertical head movements to explore the scene but were encouraged not to do so too intensely or abruptly.

Fig. 3
figure 3

Combination of VR HMD and EEG. Note. The VR equipment included the immersive HTC Vive HMD, headphones for stereoscopic sound, and the HTC Vive tracking stations for real-time head-tracking. The VR equipment was carefully arranged atop the 128-electrode EEG

To accommodate the VR session prior to the experiment, especially to the HMD, participants spent 60 s in a neutral virtual room, followed by a visual ten seconds countdown announcing that the experiment was about to begin. The total presentation time for the 15 trials was approximately 22 min. Each trial started with a 20 s resting phase in the plain room with white walls, followed by a one-second fixation (red cross appears on the white wall). Then, one of the fifteen videos was presented for 60 s (see Fig. 2).

Subjective Measures

After the video presentation, the sense of presence was measured using the German version of the Igroup Presence Questionnaire (IPQ; Schubert et al., 2001), and participants were asked about prior VR experiences and motion sickness during and after the experiment.

Electrophysiological Recording and Preprocessing

During the presentation of the 15 trials, an electroencephalogram (EEG) with 128 electrodes was recorded, attached in accordance with the international 10–20-system. The Active-Two amplifier system from BioSemi (Amsterdam, Netherlands) was used. The sampling rate was 512 Hz, the bandwidth (3 dB) 104 Hz. Additionally, a horizontal electrooculogram (hEOG) and a vertical electrooculogram (vEOG) were recorded, and a common mode sense (CMS) and a driven right leg (DRL) electrode were applied. The EEG was recorded on the investigators’ computer using ActiView702 Lores.

The data processing followed the recommended standard procedure for FAA-analysis (see Lacey et al., 2020; Smith et al., 2017): The data were segmented into epochs from -1 s to 60s, relative to the onset of each video. Afterward, the EEG data was baseline corrected (500 ms before stimulus onset) and filtered between 0.1 Hz and 24 Hz. The chosen low-pass filter of 24 Hz prohibits interference from the 50/60 Hz mains power. Each electrode was detrended separately. The data was squared and logarithmized. A window size of one second was defined and shifted with a step size of 0.1 s. A Hamming window was applied, and the fast Fourier transformation (FFT) was calculated. The alpha band of 8 Hz to 12 Hz was extracted. Due to the robustness of this methodological approach, neither was a trial excluded from further analysis nor was any data omitted. For each video, a grand mean including all participants of the same group was calculated and averaged over selected time windows (see below: statistical analysis). For the calculation of the frontal alpha asymmetry score (FAA-score), electrode F4 was subtracted from electrode F3 (logarithmized left alpha power minus logarithmized right alpha power).

Statistical Analysis

Questionnaires

Prior VR experience and experience of motion sickness were recorded as categorical variables and analyzed using Pearson’s Chi-square test. For the analysis of the sense of presence, the IPQ scales General Presence, Spatial Presence, Involvement, and Realness were calculated. As General Presence was not normally distributed (p < 0.05), Mann-Whitney-U-test was performed, and Cronbach’s alpha was calculated.

EEG-Data

In line with our exploratory approach, and to account for the temporal dynamics of the video material and hence the unfolding of affective processes, we identified relevant time-windows before averaging over subjects with a running t-test with the time domain. To identify the most prominent motivational differences, the FAAs of both groups were tested against each other in that manner. The approach described below was deemed necessary as significant events are unevenly distributed over the 60s timeline of each video. As a criterion for reliable significance, the shortest eligible time window subject to further analysis consisted of ten consecutive significant data points and was thus one second long (albeit the vast majority were several dozens of seconds long). For the sake of simplicity and clarity, the so selected time-windows were averaged and further analyzed by separate t-tests, which are reported below.

Results Study 1

Subjective Measures

Participants of both groups neither differed with respect to their prior experience with HMDs (χ2(1) = 0.100, p = .752) nor regarding motion sickness (during video presentation: χ2(1) = 0.000, p = 1.0; after video presentation: χ2(1) = 1.026, p = .311).

The 3D/360° group reported higher sensations of general and spatial presence (General Presence: U = 117.50, z = −1.916, p = .033, MVR = 4.47, MPC = 3.74; Spatial Presence: U = 86.50, z = −2.577, p = .005, MVR = 4.35, MPC = 3.50). However, both groups did not differ with respect to Involvement (U = 170.00, z = −0.31, p = .494, MVR = 3.80, MPC = 3.81) and Realness (U = 135.50, z = −1.322, p = .095, MVR = 4.18, MPC = 3.84). Cronbach’s α was good for Spatial Presence (α = .794) and Involvement (α = .719), but poor for Realness (α = .564). Cronbach’s α could not be calculated for the one-item scale General Presence.

Dependent Measures

As expected, the FAA results of the respective videos differed considerably depending on the type of presentation (3D/360° vs. 2D). Importantly, the FAA-scores of both groups indicate opposite motivational tendencies in 14 of 15 videos. The motivational tendencies of both groups implied the same direction only for the emergency room video but still differed significantly in their intensity: In the 3D/360° group, the higher FAA-score implied a significantly stronger avoidance motivation than in the 2D group. The FAA-scores and respective statistics per video are given in Table 2 and visualized in the corresponding Fig. 4. The latency of the time windows ranged from one to 38 s (Mlatency range = 10.78 s).

Table 2 Results of the t-test for independent samples comparing the mean FAA-scores of the respective time windows between both groups per video. The time window is given in seconds after stimulus onset
Fig. 4
figure 4

FAA scores for luVRe videos. Note. Mean FAA-score per video for the selected time windows for significant events, grouped according to the categorization as negative (a), neutral (b), and positive (c). Significant events within a video included, for example, the onset of weeping (bloody bathroom video), entry of a person in an ABC suit (bunker video), serving of a beverage (bar video), and a horse nudging the viewer with his nostrils (horse video). According to conventional interpretation, positive FAA-scores reflect withdrawal motivation, whereas negative FAA-scores reflect approach motivation. Significant differences between both groups are marked (* p < .05). The error bars depict the standard error of the mean

Discussion Study 1

The aim of study No. 1 was to assess the emotional and motivational responses to videos from luVRe under immersive VR conditions as compared to a more conventional 2D condition. Noteworthy, the VR-videos did not cause considerably more motion sickness as compared to the 2D presentation. Most importantly, the 3D/360° group reported a higher sense of general as well as spatial presence. Hence, watching a video through an HMD does not lead to an enhanced feeling of being in the scene per se. Rather, the three-dimensionality is to be considered the decisive factor. Being surrounded by a 3D/360° environment shields against any external sensory input beyond the virtual environment and thus, underlies immersion and presence (e.g., Slater & Wilbur, 1997). It is noteworthy that this shielding effect does not result from physically blocking any external visual and acoustic cues by means of an HMD, but by the physicality of the presented environment. What might further contribute to a strong feeling of presence is that processing 3D environments is the brain’s natural mode of operation. Although it could be assumed that the reduced sensory impressions facilitate processing, both simple and complex tasks lead to higher cognitive load under 2D conditions as compared to 3D conditions (Dan & Reiner, 2017). From the point of view of evolutionary psychology, the brain has evolved in a complex environment and thus is adapted to process the environment in which the organism is physically present. This more realistic processing style could situate the participants in the VR environment and thus, constitutes presence. The fact that involvement and realness were equally high can be attributed to the fact that we employed a passive viewing paradigm with photorealistic stimuli.

The FAA results shed new light on emotional and motivational processes as well as their regulation in realistic as opposed to conventional environments. For the sake of clarity, we would like to emphasize that we are not predominantly interested in investigating FAAs per se, but in assessing differences in processing style between VR and conventional laboratory conditions (see introduction). Our intention was to provide a methodological and factual starting point for further in-depth research along with an impulse to reconsider prevailing theories. To give a complete account of the observed effects is not within the scope of the paper. That being said, it is evident that the way the same stimulus material is processed fundamentally differs between both conditions, most strikingly for videos previously categorized as negative. Thus, the following discussion will focus on these stimuli in order to exemplify some core concepts and ideas about the benefits of VR applications in this field of scientific research.

All negative videos in the 2D condition elicit a FAA that would be considered to index a tendency to withdrawal or negative affect, whereas 4 out of 5 FAAs in the 3D condition go into the other direction. In the emergency room scene, depicting reanimation, the withdrawal motivation in 3D is more pronounced compared to the 2D condition. The opposite pattern is the case for the dentist scene: Participants exhibit a strong withdrawal tendency when the dental drill starts spinning for the 2D group, compared to a small to neutral approach tendency for the 3D group. Explanations of this behavior are rather speculative but continue to spin the threat of the latest theories. The emergency scene might be interesting to watch in 2D when being outside the spatio-temporal reference frame. However, in a VR environment, the participant stands next to a scene where two medical workers are trying to reanimate a person and is ultimately confronted with real death, which, for example, strikes fear. Promoting the firm belief of being within the spatio-temporal reference frame of the scene, VR experiences reduce both physical and mental shielding from the occurring events. Thus, the meta-awareness that the virtual environment cannot affect one or be affected in return diminishes (Kisker et al., 2019b; Pan & Hamilton, 2018). Consequently, events within the virtual environment and their implications become highly self-relevant as they immediately affect the user, altering emotional and motivational responses as compared to mere on-screen experiences (Kisker et al., 2019b; Schöne et al., 2019).

Whereas emotion seems to be the dominating topic in this scene, emotion regulation could play an essential role in the dentist scene: Being reminded of a visit to the dentist, as in the 2D condition, leads to avoidance of negative affect; moreover, the visit itself is not associated with pleasantness. However, to get over with such a real-life visit, withdrawal motivation has been sufficiently downregulated or suppressed. FAAs in the 3D condition thus might be subject to or reflect regulation and intention (Hewig, 2018; Lacey et al., 2020).

This example, in particular and together with the other results from the negative category, shows the importance of realistic stimuli. They facilitate the investigation of real-world cognition and illustrate the conceptual differences between presenting stimulus material as a 2D reminder as opposed to a virtual experience. In contrast to the 2D group, the 3D group reacts to all other negative stimuli with approach and positive affect – according to the FAA. Despite the negative scene, participants might build the intention to flee from the atomic shelter or to fight the attacker in the kidnapping scene. Basically, this argumentation follows the idea of Harmon-Jones and Allen’s seminal study (1998), showing that anger as a negative emotion can lead to an approach-related FAA. Our study shows that under realistic conditions, similar results can be obtained, and, most importantly, scientific concepts about emotion, motivation and their regulation can be extended and suited to a more complex and realistic image of emotional experiences.

To summarize, the VR condition yielded completely different results as the current conventional laboratory condition, indicating that conventional paradigms do not translate into the virtual domain without loss. That can mainly be attributed to the VR environment providing a believable environment to which the motivational and emotional reactions are adapted. The immersiveness of VR constitutes a feeling of being in the scene; the impression of an explorable and touchable environment facilitates different processes as opposed to conventional laboratory conditions. The apparent deviation from laboratory results calls for a further in-depth investigation of said processes in order to draw a more appropriate picture of realistic cognitive and emotional mechanisms.

Study 2. Remembering Virtual Experiences

Among the versatile applications of VR as a tool in psychological research, memory research could particularly benefit from it. Conventionally, memory studies employ a design that resembles rather a cue-indexing approach than investigating a fully-grown memory: Participants are presented with pictorial stimuli on a computer, commonly dozens of unrelated items, and have to recall them in the course of the experiment. While this and related methods serve the purpose of identifying the core mechanism of memory, they do not grasp the complex nature of real-life memory traces. Memory traces are multimodal constructs. They incorporate a scene or broader context (Barrett & Kensinger, 2010) with an event together with sensorimotor information (Kelly et al., 2007; Wilson, 2001) and emotional connotations (Erk et al., 2003; Paulmann & Pell, 2011). Our very functioning depends on recalling past events along with their spatial and temporal context (Conway, 2005; Conway et al., 2004; Greenwald, 1980; Schöne et al., 2018) as well as on creating semantic abstractions of oneself in a particular scene constituting autobiographical memory and personal semantics (Klein & Loftus, 1993). Approaches putting autobiographical aspects of memory at the center of the scientific work pay tribute to this fact (e.g., Cabeza et al., 2004; Daselaar et al., 2008; Greenberg et al., 2005; McDermott et al., 2009).

Going one step further, VR enables all kinds of memory researchers to not only passively present stimulus material but to fathom the integration of sensorimotor and memory functions under or close to real-life conditions (Kelly et al., 2007; Schultheis & Rizzo, 2001). In particular, VR allows studying all aspects of memory traces with unprecedented sensitivity and accuracy. Terms as “contextual information” and “object” in VR actually refer to a complex spatial reference frame and a three-dimensional object within. The egocentric perspective along with a feeling of presence add to the sensorimotor stream and, as outlined in the first study, to affective content. The result is an associative engram resembling the key features of real-life mnemonic structures and functioning (Kisker et al., 2019b; Schöne et al., 2019).

Current VR studies focus on objects, meaning that participants are asked to recall objects they previously encountered in a virtual environment (Kisker et al., 2019b; Krokos et al., 2019; Ouellet et al., 2018; Sauzéon et al., 2012). Previous studies on memory in VR have found an enhanced retrieval rate under more realistic conditions (Ernstsen et al., 2019; Harman et al., 2017; Krokos et al., 2019; Schöne et al., 2019; Smith, 2019), although it should be noted the effect does not always occur (Kisker et al., 2019b; LaFortune & Macuga, 2016; Lorenz et al., 2018). That is remarkable as the vividness of VR should unanimously amplify the relevance of information extracted from the surroundings. This seems to especially hold true as the feeling of being in the scene creates self-relevance (see study 1). Interestingly, exogenous self-relevant information is preferably processed (Schöne et al., 2018) and as part of the autobiographical memory reliably accessed and retrieved, which may explain the enhanced memory effect for VR stimuli. A standardized dataset might thus help to shed light on the differences in mnemonic processing of the multimodal stream of exogenous and endogenous information.

The current study aimed to replicate the memory superiority effect for VR stimuli from the luVRe-database as opposed to a conventional 2D presentation in order to validate them. To that end, we selected a set of thirty videos from luVRe we deemed to be interesting, partially overlapping with the first study. Decisive for the design were the standards of conventional mnemonic experiments, namely randomized presentation of stimulus material and subsequent recall (Sauzéon et al., 2012). We thus employed a paradigm in which the participant watched videos in a randomized order on a monitor as in any other video study as opposed to the same stimuli material in a VR condition. To the best of our knowledge, this is one of the first VR memory studies (see also Kisker et al., 2020) presenting a high number of multifaceted scene stimuli in one experiment.

The rationale behind this conservative approach was that the conventional 2D condition should replicate conventional memory effects to its best, providing a benchmark for the effects under immersive conditions. We investigated the free recall of a scene as well as details by means of cued recall. In case that the VR setup would outperform the conventional 2D setup figuratively speaking on its own ground, the study would make a case for a memory superiority effect under controlled VR conditions and not only in a single immersive VR environment as often used in VR memory studies (see, e.g., Ouellet et al., 2018).

Methods Study 2

Participants

Sixty-eight participants were recruited from Osnabrück University. The study was conducted in accordance with the Helsinki Declaration and approved by the local ethics committee of Osnabrück University. Participants gave their informed written consent and were screened for psychological and neurological disorders. All had normal or corrected-to-normal vision. The participants were randomly assigned either to the VR condition (3D/360° videos) or to the conventional PC condition (PC, 2D-360° videos). Eight participants were excluded from the analysis due to incorrect procedures or technical problems during stimulus presentation. A sample size of N = 60 remained (VR: nVR = 30, Mage = 22.63, SDage = 2.79, 23 female, 26 right-handed; PC: nPC = 30, Mage = 20.77, SDage = 1.65, 25 female, 29 right-handed). Participants received partial course credits.

Stimuli and Procedure

Thirty 3D/360° videos from the Library for Universal Virtual Reality Experiments (luVRe) were used as stimuli. Each video was presented for ten seconds, resulting in a total presentation time of five minutes. Each participant saw all of the thirty videos but in different orders: To avoid position and sequence effects, five randomized orders of the thirty videos were generated.

Participants of the VR condition wore an HTC Vive head-mounted display (HMD), which allows for a 3D/360° view, head-tracking, and stereoscopic sound. Participants of the PC condition were seated in front of a 24″ monitor at 80 cm distance (visual angle: 2 × 18.33°). They could look around the video using the arrow keys. For both conditions, a basic video (cf. planetarium video, study 1), which was not part of the stimuli set, was presented to ensure sharp sight and to become familiar with looking around in the video either with the headset or the arrow keys. All subjects wore headphones for sound.

Immediately after the video presentation, the sense of presence was measured using the German version of the Igroup presence questionnaire (IPQ, Schubert et al., 2001), followed by a modified Taylor complex figure test (Taylor, 1969) as a distraction task. Afterward, an unannounced memory test was performed: Participants were asked to freely recall the video scenes they had seen and name them by key features (free recall, e.g., “I remember standing on a motorway bridge”, cf. Fig 5f). If the participants did not recall another scene for thirty seconds, the free recall was finished. Subsequently, detailed questions were asked about the scenes that had previously been recalled (cued recall). Accordingly, no detailed questions were asked about those scenes that were not recalled during free recall. For example, one video showed a kart race. If the subjects recalled this scene during free recall, they were asked during cued recall what colors the boundary of the karting track had (cf. Fig 5b & Table 3).

Fig. 5
figure 5

Exemplary stimuli. Note. The screenshots were taken from six of the 30 videos used as stimulus material, depicting a surgery room (a), a go-cart race (b), pole dancing (c), a cowshed (d), the Time Square, NY (e), and a motorway bridge (f). The slightly distorted display of the screenshots results from being captured with a conventional video player instead of a 360° compatible program. During the experiment, the videos were displayed without distortion

Table 3 Content of the 30 exemplary videos from the luVRe that were used as stimuli

Statistical Analysis

For analysis of the sense of presence, the IPQ subscales General Presence, Spatial Presence, Involvement, and Realness were calculated. Cronbach’s α was calculated per scale except for the one-item-scale General Presence. Group differences were analyzed using the one-tailed Mann-Whitney-U-test as normal distribution was not given for the subscale General Presence (p < 0.05), and the effect size r was calculated. The effect size estimate r is a correlation coefficient as an alternative to Cohen’s d for non-parametric tests (small effect: r ≥ .10; medium effect: r ≥ .30; large effect: r ≥ .50).

The memory performance regarding free recall was calculated as the quotient of the remembered scenes and the total number of presented scenes (free recall performance = recalled scenes/30). Memory performance regarding cued recall was calculated as the number of correctly answered detailed questions and the total number of questions asked, equivalent to the number of recalled scenes (cued recall performance = correct answers/recalled scenes). The group differences were analyzed using the one-tailed unpaired t-test due to the directed hypothesis of replicating the memory superiority effect for VR stimuli. Cohen’s d was calculated as an estimate of effect size.

Additionally, the recall rate in percent was calculated for each individual video regarding the whole group, the VR group, and the PC group. The Chi-square test was used to determine whether individual videos were remembered more frequently by one group or the other.

Results Study 2

Presence

As expected, the VR condition elicited a stronger sensation of feeling present in the virtual environment as compared to the PC condition, reflected in all IPQ scales (general presence: U = 240.50, z = −3.17, p = .001, r = 0.41; spatial presence: U = 205.00, z = −3.63, p < .001, r = 0.47; involvement: U = 213.00, z = −3.51, p < .001, r = 0.45; realness: U = 335.00, z = −1.71, p = .044, r = 0.22). Cronbach’s α indicates acceptable to good reliability for all scales (all α > 0.73; see Table 4).

Table 4 Descriptive statistics and reliability of the IPQ presence scales for both groups

Memory Performance

Participants of the VR condition freely recalled 55.6% of the scenes and hence, ca. 6% more as compared to the PC condition, who remembered approximately 49% (t(58) = 1.98, p = .026, d = 0.50, MVR = 0.556, MPC = 0.491; see Fig. 6). However, the performance of the groups did not differ regarding cued recall (t(58) = −1.35, p = .09, d = 0.35, MVR = 0.595, MPC = 0.64). Interestingly, the participants of the VR condition answered proportionally fewer questions correctly (59.5%) than participants of the PC condition (64.4%; see Fig. 6).

Fig. 6
figure 6

Relative memory performance in free and cued recall separately for both groups. Note. The error bars depict the standard error of the mean. Significant differences are marked (* p < 0.05)

The recall rate per individual video ranged from 16.7% - 91.7% for the total group (VR: 16.7% - 90%; PC: 16.7% - 93.3%; see Table 5). Notably, the most frequently remembered videos were: Puppies, mixed martial arts, karate fight, trauma room, surgery room, horse riding, and cowshed (all recall rates >70% for the whole group, see Table 5). Four videos were significantly more frequently recalled by the VR group as compared to the PC group (surgery room, horse riding, cowshed, break dancing, see Table 5). However, no further significant differences between both groups were found on the level of the individual videos. Descriptively, the recall rate was higher in the VR group for 17 of the videos and equal in both groups for seven videos.

Table 5 Recall rate in percent and Chi-square test per video

Discussion Study 2

The aim of study No. 2 was to replicate the memory superiority effect of VR experiences and shed new light on previously inconsistent results (Kisker et al., 2019b; Krokos et al., 2019; Schöne et al., 2019; Smith, 2019). As expected, the participants in the VR condition reported a higher sense of presence, which presumably is the most important factor underlying the also obtained VR memory superiority effect for the free recall (see Sauzéon et al., 2012). Both effects are important markings towards a broader application of VR tools in experimental psychology. The rapid mode of serial video presentation aligned with conventional paradigms yielded the same effects as previous VR studies investigating mnemonic mechanisms in a single environment. This is remarkable as the mode of presentation in this study might have a diminishing effect on the factors constituting the unique VR experience: The timed changes of place participants underwent is inconsistent with the laws of physics and make the user repeatedly aware that they are just in a VR simulation.

The cued recall did not yield significant results, meaning that no group exhibited a higher recall rate for scenic details. However, an effect could have been expected. Specifically, the feeling of being present in the scene might facilitate the memory for objects of interest within reach. However, our experiment did not completely leverage the affordance of VR as the questions for the cued recall were rather unspecific and did not follow a clear structure, which might account for the lack of an effect. Alternatively, the free recall task could have led to a likewise diminished cued recall effect based on the retrieval-induced forgetting effect (Ciranni & Shimamura, 1999): Successful recall of a scene could impair the recall for details in the second recall. As the cued recall followed the free recall in both conditions, participants in both groups were likewise affected. Future studies might omit the first stage in order to investigate scene detail knowledge. A field of research which could benefit from luVRe is research on autobiographical memory. Especially immersive media seem to aid free recall (Ernstsen et al., 2019; Harman et al., 2017; Ventura et al., 2019) as it is the mode of retrieval of autobiographical memory and commonly used for research on unique, lifelike events (Oedekoven et al., 2017).

Conclusion

Virtual reality could be a valuable extension for the toolbox of experimental psychology. Two experiments have provided evidence that cognitive-affective psychological science might benefit from the use of VR paradigms. Study No. 1 showed that presence and especially three-dimensionality fundamentally alter motivational processing and the perceived valence of stimulus. The driving factors underlying these effects have yet to be determined. However, study No. 1 provides first evidences that when presenting the same material under realistic as opposed to two-dimensional conditions the brain exhibits different functional properties. Study No. 2 replicated the memory superiority of virtual reality over on-screen presentation under the aggravated conditions of a fast-paced, everchanging stimulus set. This rather conventional mode of presentation has proven to be capable to facilitate the formation of autobiographical memories (e. g. Schöne et al., 2019; Kisker et al., 2020). Virtual experiences with material from the luVRe database thus are processed with the same mnemonic mechanisms as real-life experiences making the case for their realness and their feasibility for VR based research and aiming at reproducing real-life scenarios. A field of research, which could benefit from luVRe is research on autobiographical memory. Especially immersive media seem to aid free recall (Ernstsen et al., 2019; Harman et al., 2017; Ventura et al., 2019) as it is the mode of retrieval of autobiographical memory and commonly used for research on unique, lifelike events (Oedekoven et al., 2017).

Concludingly, VR combines the best of two worlds: Firstly, the enhanced realism of immersive simulations facilitates more naturalistic processing. As the brain has evolved under three-dimensional conditions throughout its evolutionary developmental history, testing the brain’s normal mode of operation significantly enhances the ecological validity of psychological science. Secondly, by using luVRe’s stimuli, high experiment control as the key feature of laboratory conditions is preserved. Most importantly, luVRe is easy to use as it does not require extensive technical skills which normally are the bottleneck for VR experiments. The videos can be arranged e. g. in a sequential order using video editing software and displayed with respective video players on any kind of VR headset. Moreover, they are particularly high in realism due to their photo-realistic appearance. Conversely, a programmed environment is a visual replica of a real-life scene and thus can be easily recognized as a simulation. In contrast to luVRe, simulations come with the disadvantage that the facial expressions and gestures of people are difficult to reproduce accurately. Hence, whereas real-life experiments suffer from limited abilities to reproduce social interactions, VR experiments maintain a high degree of realism while being easily replicable.

Application of VR and PC Experiments

It should be noted that the advantages and benefits of VR experiments, as well as the presented results, partially contradicting the prevailing doctrine, do neither render previous results and interpretations invalid nor will strictly controlled laboratory ever be obsolete. Previous methodologies are irreplaceable as they allow researchers to isolate emotional and cognitive mechanisms. The conventional laboratory is thus vital when it comes to the development of models concerning psychological processes. However, we suppose that models and mechanisms should be put up to test under more realistic conditions to explore whether or to what extent they change their mode of operation. Putting them in concert with other processes might show what role they actually play in everyday functioning.

Download, Legal Considerations, and Safety Warnings

The first version of the luVRe database is exclusively accessible for researchers at degree-granting institutions for non-profit (psychological) research. All persons depicted in the database gave their informed consent for a publication solely for scientific purposes. For further information and download, please visit https://www.psycho.uni-osnabrueck.de/fachgebiete/allgemeine_psychologie_i/luvre.html. The database is consistently growing, suggestions are welcome and can be sent, like any other inquiries regarding luVRe or technical questions, to luvre@uni-osnabrueck.de. Initial subjective ratings of valence, arousal, and motivation from pilot tests are provided for some videos and provided alongside the video material catalog.

Most of the scenes in the database are real footage and are not staged. Our first experiences indicate that videos presented under immersive VR conditions elicit much stronger emotional reactions compared to the same videos presented under conventional screen conditions. Consequently, we strongly advise to proceed with caution when setting up VR experiments and thoroughly screen participants, for i.a. (subclinical) emotional trauma or any vulnerabilities and when feasible. Researchers should be aware that their participants could be confronted with injured persons and dead bodies, be screamed at by armed police officers pointing a gun at their head (for a comprehensive account on ethical considerations regarding VR, see Parsons (2019)).