1 Introduction

The aim of virtual reality (VR) studies is often determined by the target population. Research has been conducted on adult users for therapeutic and training purposes (Fan and Wen 2019; Lele 2013; Levac et al. 2019), but also with fundamental objectives in mind, e.g., impact on memory (Smith 2019), link between emotion and sense of presence (e.g., Gromer et al. 2019). By contrast, VR research in children has generally been guided by various therapeutic objectives, e.g., pain distraction tool (Hoffman et al. 2000; Won et al. 2017; Gates et al. 2020); skill acquisition tool (Roussou et al. 2006; Shema-Shiratzky et al. 2019; Schwebel et al. 2008); and acquisition of social or behavioral skills by children with autism spectrum disorder (see Bellani et al. 2011, for a review). Although the use of VR has become increasingly widespread in these clinical areas, few studies have been conducted to investigate the emotional, motivational, and cognitive functions of typically developing children in VR. Some authors have explored the developmental implications of the use of VR (Bailey and Bailenson 2017; Subrahmanyam 2009). For example, Bailey and Bailenson (2017) discussed the impact of the developing executive functions on the way children experience VR and pointed out that there is still a lack of experimental studies systematically comparing adults’ and children’s behaviors in VR contexts.

1.1 Immersion, sense of presence, and memory

Virtual reality encompasses 3D computer-simulated environments and entities and includes behavioral interfaces that allow these environments and entities to interact with each other and with a user in a situation of pseudo-natural immersion (Fuchs 2018). In such virtual environments, the user can develop the feeling of “being there,” also called sense of presence, which is intimately linked to but distinct from immersion (Slater 2003). The concept of presence has been defined in various ways in the literature (see Schuemie et al. 2001 for a review). According to Lombard and Ditton (1997), one consensual approach to presence is to define it in terms of a perceptual illusion of non-mediation. It means that user’s sensory, cognitive, and affective systems continuously respond to what is presented in virtual environment. The user does not perceive or acknowledge the existence of the medium that broadcasts and allows to communicate with the virtual environment. Despite this, presence remains a multicomponent concept (e.g., self-presence, social presence, physical presence) (Biocca 1997; Heeter 1992; Lee 2004). Immersion refers to the objective capacity of the hardware to create an enveloping virtual environment (Slater 2003) and has an impact on the virtual experience. In the present study, we adopted Slater’s (2003) perspective by reserving the term immersion for what the technology objectively delivers and the term presence for “the human reaction to immersion.”

The quality of immersion results from the combination of various factors, such as field of view (the extent of the visible world), head tracking or visual fidelity (the realism and details of visual information), which is thought to be a central element in immersion (Smith 2019). However, too much visual stimulations (by a larger field of view or more visual details) might be responsible for negative effects and discomfort due to increased eye strain on the part of the user (i.e., cybersickness, for a review see, Rebenitsch and Owen 2016) or a sense of unease when people are asked to look at quasi-human characters (a phenomenon referred as the uncanny valley). Although visual fidelity modifies immersion directly, this does not mean that it will necessarily impact the sense of presence or the memory of virtual experiences.

In fact, contradictory results have been found in adults with regard to the relationship between immersion and sense of presence, with some studies reporting that immersion has no impact (Cummings and Bailenson 2016; North and North 2016; Yildirim et al. 2019) and others that it has an impact on the sense of presence (Cadet and Chainay 2020; North and North 2016). Interestingly, Yildirim et al. (2019) suggested that the inclusion of questions directly linked to immersion (e.g., questions on realism) in the questionnaires used to evaluate sense of presence may determine whether an impact of immersion on sense of presence is observed. In our previous study with young adults (Cadet and Chainay 2020), we demonstrated a higher sense of presence in the HMD than the computer-screen condition. However, the sense of presence was not affected by the visual quality of our virtual environments (e.g., 3D assets).

To our knowledge, no study has investigated changes in the impact of immersion on sense of presence during development. This, however, represents a critical issue for the creation of virtual environments that are suitably adapted for children for clinical, therapeutic, or advertising purposes. Interestingly, van Schaik et al. (2004) showed that, in a mixed-reality environment (e.g., when participants wear an HMD that allows them to see virtual and real elements simultaneously), sense of presence is negatively correlated with age. A possible indication concerning the specific impact of immersion on sense of presence in children could come from the way children experience virtual media compared to adults. Richert et al. (2011) have pointed out that children experience what is presented via virtual media as real (even under weak immersion conditions—in this case television).

Some authors have explained a greater sense of presence in children than in adults in terms of the late maturation of certain neural circuits involved in the control and emergence of the sense of presence (Baumgartner et al. 2008; Jäncke et al. 2009). In their fMRI study, Baumgartner et al. (2008) found that, in young adults (Mage = 26.2), the right dorsolateral prefrontal cortex reduced the sense of presence through down-regulation of the activation of the egocentric dorsal visual stream. In children (Mage = 8.7), these areas were not activated by the virtual experience, probably because they are not yet fully mature at that age, as has been revealed by voxel-based morphometry, with children exhibiting higher gray matter density and volume than adults. These results corroborate a previous EEG study by Baumgartner et al. (2006) that also showed increased spatial presence in children compared to adolescents, associated with decreased activation of prefrontal regions. In summary, children are more likely to feel present in VR and are likely to experience virtual media as more real than adults.

Concerning the impact of immersion on memory, it has been shown that different characteristics of immersion (e.g., field of view, device type, quality of the 3D assets used) can enhance or not a memory of virtual experiences in adults (Baños et al. 2008; Bowman and McMahan 2007; Cadet and Chainay 2020; Smith 2019). However, here again, less is known about the link between immersion and memory in children. Clinical applications have shown that VR can be used as a tool for the ecological assessment of episodic memory in children and young adults by allowing participants to navigate in a virtual city with a joystick and a desktop computer and assessing the quantity of correctly recalled information (Picard et al. 2017). However, this type of experiment cannot clarify the differences between adults and children in terms of the relationship between memory and immersion as the level of immersion was not directly manipulated in these experiments.

Some studies have investigated the link between sense of presence and memory in adults by correlating memory performance with the scores on sense-of-presence questionnaires and have found that better memory performance was associated with a greater sense of presence (Davis et al. 1999; Lin et al. 2002; Makowski et al. 2017). To our knowledge, only a few studies have investigated the influence of both immersion and sense of presence on memory in adults in the same experiment (Cadet and Chainay 2020) and no study has done so in children. Studies of this kind are necessary in order to better understand the influence of these two interrelated factors on memory performance in VR.

1.2 Emotions, memory, and sense of presence

The impact of emotion on memory, most frequently revealed by better memory performance for emotional stimuli than for neutral ones is a widely acknowledged phenomenon in the literature on adult memory (Kensinger and Schacter 2016, for a review). Even though the effect of emotion on memory (EEM) has been less well studied in children, several studies have nevertheless also described EEM in this population (e.g., Cordon et al. 2013; Massol et al. 2020). However, there is still a lack of consensus given that other studies have reported no EEM in children (e.g., Hamann and Stevens 2014 for a review; Leventon et al. 2014).

In the context of virtual reality experiences, there is a lack of precise knowledge about the effects of emotion on memory performance in both adults and children. Makowski et al. (2017) showed that memory performance in adults is strongly correlated with emotional experience and the sense of presence but not the level of immersion. These authors asked their participants to complete a self-assessed questionnaire concerning emotion, sense of presence, and memory for a movie viewed in 2D or 3D. However, the authors did not systematically manipulate emotion and the manipulation of immersion consisted solely in manipulating the viewing condition (2D or 3D). In our previous study with young adults, we showed that negative and positive stimuli encountered during the virtual experience were recalled better than neutral ones, and this independently of the visual quality of the stimuli and visual environment (Cadet and Chainay 2020). However, further studies are necessary in order to explore these effects, and especially the effects on children's memory.

Concerning the sense of presence, some studies in adults have shown that emotions are able to modulate sense of presence (Baños et al. 2004; Cadet and Chainay 2020; Gromer et al. 2019; Riva et al. 2007), while other studies have failed to observe such a relationship (Felnhofer et al. 2015). This intricate relationship between emotion and sense of presence has been discussed in the literature, and it seems that the discrepant results could be explained in part by the way emotion was manipulated in the experiments (Bouchard et al. 2008; Felnhofer et al. 2014; Slater 2003).

In fact, in the context of VR, emotion has most frequently been manipulated by means of a mood induction procedure (Baños et al. 2004; Felnhofer et al. 2015; Riva et al. 2007) or through exposure to situations eliciting specific emotions (e.g., fear, Gromer et al. 2019), and the results have shown that emotional conditions induced a better sense of presence than neutral ones even under low immersion conditions. In these studies, emotion itself was either not assessed (Baños et al. 2004) or was assessed using a post-test self-reported evaluation (Cadet and Chainay 2020; Felnhofer et al. 2015; Gromer et al. 2019; Riva et al. 2007) together with online psychophysiological measures such as heart rate and skin conductance (Felnhofer et al. 2015; Gromer et al. 2019). The best approach seems to be to combine both self-assessed questionnaires (subjective evaluation) and physiological measurements (objective evaluation), thus permitting a better evaluation of the user's emotional state. The specificity of the HMD hardware, such as the presence of an eye tracker and multiple head position sensors, allows several psychophysiological measurements, such as pupillary dilation, fixation time measurements, and head movements, to be included without adding any external equipment. Pupillary dilation has been validated as a suitable way of measuring emotional arousal in various studies which either have (Chen et al. 2017) or have not (Bradley et al. 2008; Sirois and Brisson 2014; Partala and Surakka 2003) used VR technology. Fixation time is considered to be a good indicator of visual attention during a task (see Hoang Duc et al. 2008 for a review). Head position could also be an indicator of approach or avoidance behavior toward the presented stimuli, given that positive stimuli are thought to induce approach behaviors, while negative ones are believed to trigger avoidance behaviors (Eerland et al. 2012; Phaf et al. 2014). However, negative stimuli could also trigger approach behaviors due, perhaps, to an aggressive defense mechanism involving the attraction of attention toward the negative stimuli (e.g., Hillman et al. 2004). Thus, the interpretation of this type of result in terms of approach or avoidance behavior should be considered with caution. The standard deviation of head rotation has been used to measure environment exploration by participant (Li et al. 2017; Slater et al. 1998; Won et al. 2017). Li et al. (2017) and Won et al. (2017) also found a correlation between the participant’s emotional experience and these movement measurements.

In children, VR is often used as a tool for eliciting specific emotions, especially in therapeutic contexts (e.g., fear of the phobic object in exposure therapy for phobia, Maskey et al. 2014, 2019). Only a few studies have examined in a systematic and more fundamental way how emotion modulates sense of presence in VR in this population (Baumgartner et al. 2006, 2008). According to Baumgartner et al., the same relation between emotion and presence as in adults exists in children, although this suggestion needs to be further investigated given the paucity of studies on this subject. Moreover, the strength of the relationship and the link with immersion have not yet been investigated in children.

1.3 Aim of the present study

Research on adults has demonstrated that memory performances in VR may be impacted by at least three interconnected factors: immersion, sense of presence, and emotion. However, the precise role of these factors is not yet fully understood. One of the reasons may be that immersion and emotion are rarely manipulated together in studies exploring this question. In addition, measurements have been performed, at least to our knowledge, for only one of the three factors at any given time, i.e., memory performances, emotion, or sense of presence, but not for all three simultaneously. Moreover, studies have often used only self-reported measures (e.g., questionnaires) in order to assess sense of presence or emotional responses. Thus, more complete studies are required in order to understand the respective roles of immersion, emotion, and presence in adult memory performance and this is even more true in the case of children, as VR has frequently almost exclusively been used as a tool in various types of therapy and only a few studies have focused on the above influences.

Thus, the aim of this study was to investigate the respective impact of each of these factors on memory performances in VR in both adults and children. This could lead to better theoretical knowledge of the interrelation between immersion, emotion, sense of presence, and memory. On a practical level, this could also provide evidence in support of recommendations concerning the design of virtual experiences for children, with a view to enhancing memory retention of the virtual content as well as to protect children from overwhelming emotional experiences. To perform this investigation, we applied the protocol with HMD we had previously used with young adults (Cadet and Chainay 2020) to a child population and then compared the previous results obtained with young adults with the children’s results. In addition, in the present study, we also analyzed the physiological and behavioral responses of both the adults and children. In this protocol, we modulated immersion by modifying the visual quality of the 3D assets (high-quality vs. low-quality) and modulated emotion by presenting stimuli previously evaluated as positive, negative, or neutral. We predicted that emotional enhancement of memory would be observed in both adults and children, but that it would be modulated by immersion and sense of presence, especially in the children. More specifically, we expected stronger EEM under high-quality immersion conditions and in environments eliciting a greater sense of presence. In particular, we expected to observe this in the children as they should experience a greater sense of presence than the adults due to the late maturation of the neural circuits responsible for the control and emergence of the sense of presence.

2 Methods

2.1 Participants

Eighty-eight volunteers, divided into two age groups, took part in the experiment: Forty children (Mage = 11.63 years, range = 10–14 years; 20 girls and 20 boys) and forty-eight young adults, all students at Lyon 2 University (Mage = 20.65 years, range = 18–28 years; 24 women and 24 men). Each participant performed the experiment under one of the two conditions: high 3D asset quality or low 3D asset quality. Thus, there were four experimental groups: children in high 3D asset quality, children in low 3D asset quality, young adults in high 3D asset quality, and young adults in low 3D asset quality.

All the children were middle-school pupils from Lyon. All the participants (and/or their legal representatives) gave their written informed consent and declared having good hearing and vision. One girl from the child group withdrew before the end of the experiment due to a cold she had contracted earlier. Her results were removed from the analysis. Exclusion criteria were epilepsy, mental retardation, the use of mood-regulating drugs, psychiatric or neurological disease of which the participant was aware. Participants were informed that they could discontinue the experiment at any time if they experienced any discomfort (e.g., cybersickness) or for any other reason without needing to provide any justification.

2.2 Materials and stimuli

2.2.1 Apparatus

For the HMD condition, we used an HTC Vive headset. The spatial resolution was 1080 × 1200 pixels per eye and the refresh rate was 90 Hz. The HMD was connected to a PC running Microsoft Windows 10, with an Intel® Core™ i8-7700HQ CPU with 2 × 3.60 GHz, 2 × 32 GB DDR 4 RAM, and a Geforce® GTX 1080. The cross-platform game engine Unity was used to create this experiment and sounds were presented through a Bose OE2i headset.

In order to measure pupillary dilation and fixation time during the experiment, we added the Pupil Labs add-on for eye tracking to our HTC Vive headset. The HTC Vive HMD features an accelerometer, a gyroscope, and laser position sensor, which allowed us to track head movement and head position (see Supplementary Material for a detailed description of the recording and processing of these data).

2.2.2 Stimuli

Fifty-eight 3D assets split into 12 sets of stimuli were used in this experiment. Six of the 12 sets were displayed in a virtual island environment: 3 sets of high-arousal negative stimuli (2 sets of 5 and 1 set of 4) and 3 sets of neutral stimuli (2 sets of 5 and 1 set of 4). The other 6 sets were presented in a virtual city environment (2 sets of 5 medium-arousal negative stimuli, 2 sets of 5 neutral stimuli, and 2 sets of 5 medium-arousal positive stimuli). Table 1 shows the mean valence and arousal scores for each set. These stimuli were selected from a set of 110 3D assets pre-tested by 31 students (Mage = 24.9, range = 20–28, women N = 15, men N = 16) at Lyon 2 University. During the pre-test, the participants were asked to rate arousal (− 4 low arousal to + 4 high arousal) and valence (− 4 negative to + 4 positive) using the self-assessment manikin (SAM test, Bradley and Lang 1994). For the selection of stimuli, we used following criteria: for valence (Mean scores per stimulus: min = − 3; max = 2.7), the stimuli with a mean score under − 0.6 was considered as negative, a score between − 0.6 and + 1.3 as neutral, and a score over + 1.3 as positive; for arousal (Mean scores per stimulus: min = − 3.1; max = 2.9), a mean score under − 2 was considered as low arousal, between − 2 and 0.3 as medium, and over 0.3 as highly arousing. For more details, see Cadet and Chainay (2020). As the same protocol and therefore the same stimuli were used with pre-adolescent as with adults, we planned to evaluate, a posteriori, the valence and the arousal of the 3D-assets with the pre-adolescent. However, due to the pandemic Covid-19 and the sanitary restrictions, we were not able to do these evaluations.

Table 1 Mean score for each set of stimuli present in places 1 to 6 in the Island and the City VE

To manipulate image quality, we created low-quality versions of the fifty-eight 3D assets. We used the Autodesk Maya software to remove between 30 and 50% of the polygons from each 3D asset (in such a way as to allow recognition) and the poorest texture and a level of shade were chosen in the Unity editor for assets and environments (for an example, see Fig. 1).

Fig. 1
figure 1

Example of neutral stimulus (a pig) as a low- and high-quality 3D asset

2.2.3 Measures

Prior to their participation, all the participants completed the French versions of the Hospital Anxiety and Depression Scale (HADS, Zigmond & Snaith, 1983), the Positive Affect Negative Affect Scale, which measures mood (PANAS for adults, Watson et al. 1988; PANAS-C for children, Laurent et al. 1999), and the Immersive Tendency Questionnaire (ITQ, Witmer and Singer 1998). HADS has previously been used with child populations (e.g., Hawley 2003), but ITQ has not. However, we did not find any equivalent of this last questionnaire for children and we therefore reworded it for this group in order to improve understanding. We also asked if they had prior experience with video games and VR using a 5-point scale (from “never” to “every day”).

During the experiment, we used a 9-point illustrated scale to evaluate valence (from “very unpleasant” to “very pleasant”) and arousal (from “weakly arousing” to “very arousing”) (SAM—Self Assessment Manikin test, Bradley and Lang 1994) reported for each visited place. We also used a 7-point scale (from “not at all” to “absolutely”) to rate different aspects of presence: general sense of presence (How present did you feel in the environment?), the sense of presence in terms of the ability to visit the environment (How much did you have the impression that you could visit the environment?), the sense of presence in terms of the ability to touch things (How much did you feel that you could touch things?), the participants' attentional focus (How focused did you feel on the task you were asked to perform?), and their motivation (How motivated did you feel to complete the task you were asked to perform?). We selected these five questions because they corresponded to the subscales usually included in the extended questionnaires measuring presence, but did not include any question that could be linked directly to immersion characteristics.

During the experiment, we also recorded the minimum and maximum head position on the z axis (front-behind), head rotation on the x, y and z axes (respectively, pitch, yaw and roll), pupillary dilation from the first to the fifth second, and fixation time (see Supplementary Material). After each environment (the City or the Island), we also measured memory for the presented stimuli using a free-recall task and the sense of presence by means of a French version of the Independent Television Commission Sense of Presence Inventory (ITC-SOPI, Lessiter et al. 2001), which consists of four factors: sense of physical space, engagement, ecological validity, and negative effects.

2.2.4 Procedure

The participant’s task was to visit 2 environments: the City and the Island, each containing 6 places to which participants could be teleported. The participants remained sitting during the experiment and no movements other than those that can be performed in the sitting position were possible. Head and hand tracking were ensured by the HMD and the two controllers. The order of the visits to the environments and places was left up to the participants in order to accentuate interaction during the experiment and avoid total passivity. However, the participants were asked to visit both environments and they had to explore the 6 places in the first environment visited before moving on to the second one (see Fig. 2). For information, 63% of the participants chose the City first and 37% the Island. In addition, stimuli were presented one by one in a random order and always in the 180° of space in front of the participants. To ensure that the participants did not miss a stimulus, a spatialized sound appeared before the stimulus and they were asked to point to each stimulus before it disappeared.

Fig. 2
figure 2

Protocol for the presentation of the two virtual environments (city and island) and the six different places in each environment. At the bottom on the left, the sequence of presentation of each stimulus in one virtual environment or place (e.g., the desert)

After visiting each place, the participants had to answer a number of questions that were displayed on screen and also presented orally. These questions were previously explained one by one in order to ensure that they were understood. First, the participants were asked to rate their self-reported valence and arousal during the visit to the place. They were then asked to rate their sense of presence by means of a 5-question questionnaire (see “Measures” section). The time taken to respond to these five questions and the following oral instructions about how to select the next place was longer than two minutes. This might have prevented emotion carry-over effects between two successive places.

Once the participants had visited all 6 places of the first environment, the HMD was removed, a French version of the ITC-SOPI questionnaire (Lessiter et al. 2001) was administered and the participants were asked to recall all the stimuli that they had seen in the 6 places. Responses were written down by the investigator. If there was an ambiguity about a recalled item, details were asked for at the end of the recall; if these details were insufficient to identify the item, it was counted as a false response. The same procedure was then reproduced for the environment that still had to be visited.

2.3 Data processing and statistical analysis

All information on data processing and recordings is given in the Supplementary Material. Distinct statistical analyses for the Island and City were performed on the correct free recall scores (item memory for virtual stimuli), the mean sense of presence scores (score on the 5 questions for each place), and emotional valence and arousal (SAM test). Each mean was obtained based on the scores for places presenting stimuli of the same emotional valence (Island—negative and neutral; City—positive, negative and neutral). Preliminary analyses were performed to check for sphericity (Mauchly’s test) and homogeneity of variance. If a violation was found, corrected scores (Greenhouse–Geisser) were used. For each measurement, we first performed repeated measures ANOVAs following the factorial design (emotion) × (age) × (quality)—2 × 2 × 2 for the Island and 3 × 2 × 2 for the City—with Emotion (Island—negative vs. neutral; City—positive vs. negative vs. neutral) as within-subject factor and quality of the 3D asset (high quality vs. low quality) and age (Adults vs. Children) as between-subjects factors. If necessary, planned comparisons were performed to allow a better understanding of the significant effects. Bonferroni’s correction was used for multiple comparisons. Additionally, a 2 × 2 × 2 ANOVA (environment) × (quality) × (age) was conducted on the ITC-SOPI scores, with environment (Island vs. City) as within-subject factor and quality (high quality vs low quality) and age (Adults vs Children) as between-subjects factors. Correlation and regression analyses were also performed to determine which of the subjective and objective measurements of emotion and presence could explain memory performance.

The results are presented in the following order: memory performance with ANOVA, correlation and regression analyses, ANOVAs on self-reported measurements of sense of presence, emotion (arousal and valence). The ANOVA for fixation duration is presented in the Supplementary Material.

The analysis concerning adults only has already been published in the Results section of our previous paper (Cadet and Chainay 2020).

3 Results

3.1 Questionnaires completed before the experiment

A one-way ANOVA was performed to compare anxiety (HADS-A), depression (HADS-D), positive and negative affect (PANAS-PA and PANAS-NA, respectively), prior experience with video games and VR, and immersive tendencies (IT) between the four experimental groups. No significant difference was found between age groups for: HADS-A (F(3, 45.3) = 1.716, p = 0.177), PANAS-PA (F(3, 44.2) = 0.176, p = 0.912), PANAS-NA (F(3, 45.6) = 1.183, p = 0.327), IT (F(3, 43.2) = 0.600, p = 0.618). We did, however, observe a significant difference in terms of: depression measured on HADS-D (F(3, 45.5) = 5.857, p = 0.002), with children having a significantly lower HADS-D score than adults; prior exposure to VR (F(3, 39.5) = 24.593, p < 0.001), with children being more exposed to VR than adults; and prior exposure to video games (F(3, 39.5) = 24.593, p < 0.001), with adults being more exposed to video games than children. The children also completed two subtests (Matrix Reasoning and Similarities) of the French version of the Wechsler Intelligence Scale for Children (WISC V). All children were in the normal performance range in both subtests.

3.2 Recall performances

3.2.1 ANOVA

3.2.1.1 Island

A significant effect of emotion (F(1, 83) = 57.45, p < 0.001, partial η2 = 0.41) was observed, with higher recall of the negative stimuli (M = 8.67, SE = 0.21) than the neutral stimuli (M = 6.52, SE = 0.23). Effect of age was also significant (F(1, 83) = 15.49, p < 0.001, partial η2 = 0.16) with higher recall for adults (M = 8.23, SE = 0.25) than for children (M = 6.96, SE = 0.24). There were no other significant effects or interactions.

3.2.1.2 City

A significant effect of emotion (F(2, 166) = 14.41, p < 0.001, partial η2 = 0.15) was observed, with higher recall of negative (M = 5.12, SE = 0.20) and positive stimuli (M = 5.27, SE = 0.18) than neutral stimuli (M = 4.05, SE = 0.21). Effect of age was also significant (F(1, 83) = 31.07, p < 0.001, partial η2 = 0.27), with higher recall for adults (M = 5.44, SE = 0.15) than for children (M = 4.21, SE = 0.16). Effect of quality was also significant (F(1, 83) = 4.06, p = 0.047, partial η2 = 0.05), with higher recall of high-quality stimuli (M = 5.04, SE = 0.16) than of low-quality stimuli (M = 4.60, SE = 0.17). More importantly, the interaction between emotion and age was significant (F(1,166) = 4.03, p = 0.020, partial η2 = 0.05) (see Fig. 3). The recall of negative (M = 4.85, SE = 0.30) and positive stimuli (M = 4.73, SE = 0.24) was higher than that of neutral stimuli (M = 3.06, SE = 0.20; respectively t(166) = 4.83, p < 0.001 and t(166) = 4.49, p < 0.001) in the children only. Additionally, adults recalled more neutral (M = 5.05, SE = 0.28) and positive stimuli (M = 5.82, SE = 0.25) than children (respectively, M = 3.06, SE = 0.20, t(248) = 5.49, p < 0.001; M = 4.73, SE = 0.24, t(248) = 3.00, p = 0.044). There were no other significant main effects or interactions.

Fig. 3
figure 3

Mean recall performance score for the city (min = 0, max = 10), depending on age (adults and children) and emotion (negative, neutral, and positive)

3.2.2 Correlation analyses

Correlation analyses were performed between the recall scores and the mean self-reported valence and arousal scores, sense of presence (SoP), minimum and maximum head position on Z axis, mean head rotation on the x, y and z axes (respectively, pitch, yaw and roll), mean pupillary dilation, and mean fixation time. We also performed a correlation analysis between various measurements of emotion and sense of presence (see Supplementary Material).

3.2.2.1 Island

For the adults, Pearson’s test showed a significant correlation only between recall performance and the arousal (r = −0.182, p = 0.002) and valence (r = 0.148, p = 0.012) scores. For the children, there was a significant correlation between recall and arousal score (r = 0.211, p = 0.001). No other significant correlations were found for the recall performances.

3.2.2.2 City

Independently of age group, Pearson’s test showed a significant correlation between recall performance and arousal score (r = 0.107, p = 0.015). No significant correlation was found for the adult or child groups analyzed separately.

3.2.3 Regression analysis

First we report the correlations between different predictors (means of self-evaluated arousal, valence, sense of presence—SoP, minimum and maximum head position on Z axis, pitch, yaw and roll, pupillary dilation from the first to the fifth second, fixation time), followed by a regression analysis.

3.2.4 Correlations between predictors

3.2.4.1 Island

The analysis performed independently of the group factor did not show any significant correlations between predictors. In the analyses carried out separately for adults and children, Pearson’s test showed several significant correlations between the predictors (see Table 2).

Table 2 Results of the analysis of correlations between the predictors in the Island environment for adults and children
3.2.4.2 City

The analysis performed independently of the group factor did not show any significant correlations between predictors. In the analysis performed separately for adults and children, Pearson’s test showed several significant correlations between predictors (see Table 3).

Table 3 Results of the analysis of correlations between predictors in the city environment for adults and children

3.2.5 Regression analysis

The stepwise regression analysis was performed separately for the Island and City environments. For each environment, we first included the data from all participants and then analyzed the data of each group (adults and children) separately. The percentage of correct recall per VE was entered in the models as a dependent variable and self-reported arousal and valence scores, fixation time, sense of presence, pupillary dilation (from the first to the fifth second), minimum and maximum head position (on Z axis) were entered as predictors.

3.2.5.1 Island

In the analysis including all participants, no predictors explained a significant proportion of variance in the recall task. When separate analyses were performed, a significant proportion of the variance in adults was explained by self-reported arousal and the SoP score, whereas for children, it was explained by the self-reported arousal and the mean pupillary dilation for second number two (see Table 4).

Table 4 Results of stepwise regressions for all device conditions, adults, and children for the recall task
3.2.5.2 City

In the joint analysis, the minimum head position on the Z axis and the self-reported arousal explained a significant proportion of variance in the recall task (see Table 4). In the separate analysis, no predictors explained a significant proportion of variance in the recall task, either for adults or for children.

3.3 Self-reported sense of presence

3.3.1 SoP—evaluation of places

3.3.1.1 Island

The ANOVA revealed a significant effect of emotion (F(1,83) = 16.51, p < 0.001, partial η2 = 0.17), with higher SoP for the negative (M = 22.47, SE = 0.48) than the neutral places (M = 21.28, SE = 0.54). Effect of age was also significant (F(1, 83) = 9.60, p = 0.003, partial η2 = 0.10), with higher SoP for children (M = 23.38, SE = 0.73) than for adults (M = 20.38, SE = 0.66). No other significant main effects or interactions were observed.

3.3.1.2 City

There were no significant main effects or interactions.

3.3.2 ITC-SOPI—evaluation of environments

Significant effects of environment were found for the spatial presence and engagement factors (respectively: F(1, 83) = 23.03, p < 0.001, partial η2 = 0.22; F(1, 83) = 29.60, p < 0.001, partial η2 = 0.26), with higher scores for the Island than the City. Effect of age was also significant for spatial presence (F(1, 83) = 7.07, p = 0.009, partial η2 = 0.08), engagement (F(1, 83) = 4.02, p = 0.048, partial η2 = 0.05), and ecological validity/naturalness (F(1, 83) = 5.22, p = 0.025, partial η2 = 0.06), with higher scores for children than for adults. A significant effect of quality was found for the negative effects factor (F(1, 83) = 9.85, p = 0.002, partial η2 = 0.11), with higher scores for the low-quality than for the high-quality condition. For ecological validity/naturalness, the interactions between environment and age and between environment and quality were significant, while for negative effects, the interaction between age and quality was also significant (see the Supplementary Material for detailed descriptions of these interactions and planned comparisons).

3.4 Self-reported emotion

3.4.1 Self-reported arousal

3.4.1.1 Island

A significant effect of emotion was observed (F(1, 83) = 105.65, p < 0.001, partial η2 = 0.56), with places with negative stimuli (M = 1.24, SE = 0.16) rated as more arousing than places with neutral ones (M = −0.21, SE = 0.20). A significant effect of age (F(1, 83) = 10.84, p = 0.001, partial η2 = 0.12) was observed, with children (M = 1.06, SE = 0.24) rating places as more arousing than adults (M =  −0.02, SE = 0.22). The interactions between emotion and quality (F(1, 83) = 4.84, p = 0.031, partial η2 = 0.06) and between Emotion and Age (F(1, 83) = 7.43, p = 0.008, partial η2 = 0.08) were significant. More importantly, the interaction between emotion, age and quality was significant (F(1, 83) = 5.22, p = 0.025, partial η2 = 0.06) (see Fig. 4a). To better understand this interaction, the data were analyzed separately for adults and children. For children, a significant effect of emotion was found (F(1, 37) = 77.17, p < 0.001, partial η2 = 0.68), with places with negative stimuli (M = 1.97, SE = 0.25) being rated as more arousing than places with neutral ones (M = 0.13, SE = 0.25). For adults, the interaction between emotion and quality (F(1, 46) = 11.16, p = 0.002, partial η2 = 0.20) was significant. The planned comparisons revealed that, in the case of high-quality but not low-quality 3D assets, places with negative stimuli (M = 1.19, SE = 0.30) were rated as more arousing than places with neutral stimuli (M = −0.50, SE = 0.38, t(46) = 6.34, p < 0.001). There were no other significant main effects or interactions.

Fig. 4
figure 4

Mean self-reported arousal score for the Island (panel A) and for the City (panel B) (min =  −4—weakly arousing, max = 4—very arousing) depending on age (adults and children), emotion (negative and neutral) and 3D asset quality (HQ and LQ). Mean self-reported valence score for the Island (panel C) and the City (panel D) (min =  −4—very unpleasant, max = 4—very pleasant), depending on age (adults and children), 3D asset quality (HQ and LQ) and emotion (negative and neutral)

3.4.1.2 City

A significant effect of emotion (F(2, 166) = 29.51, p < 0.001, partial η2 = 0.26) was observed, with places with negative stimuli (M = 0.26, SE = 0.18) being rated as more arousing than places with neutral stimuli (M = −0.95, SE = 0.21, t(166) = 7.62, p < 0.001) and those with positive stimuli (M = −0.21, SE = 0.20, t(166) = 2.99, p = 0.010). Places with positive stimuli were rated as more arousing than places with neutral stimuli (t(166) = 4.64, p < 0.001). The interaction between emotion and age was also significant (F(2, 166) = 4.48, p = 0.013, partial η2 = 0.05) but no significant differences between adults and children were revealed. More importantly, the interaction between emotion, age, and quality was significant (F(2, 166) = 10.12, p < 0.001, partial η2 = 0.11) (see Fig. 4b). To better understand this interaction, the data for adults and children were analyzed separately. For adults, most importantly, the interaction between emotion and quality (F(2, 92) = 9.94, p < 0.001, partial η2 = 0.18) was significant. This was explained by the difference observed on high-quality 3D assets only, for which places with negative (M = 0.21, SE = 0.33, t(92) = 5.96, p < 0.001) and positive stimuli (M = −0.35, SE = 0.36) were rated as more arousing than places with neutral stimuli (M = −1.67, SE = 0.33, t(92) = 4.17, p = 0.001). Moreover, places with neutral stimuli were rated as more arousing in low-quality 3D assets (M = 0.29, SE = 0.43) than in high-quality 3D assets. For children, the effect of emotion was significant (F(2, 74) = 24.68, p < 0.001, partial η2 = 0.40), with places with negative stimuli (M = 0.34, SE = 0.38) being rated as more arousing than places with positive stimuli (M = −0.62, SE = 0.42, t(74) = 4.27, p < 0.001) and places with neutral stimuli (M = −1.21, SE = 0.44, t(74) = 6.97, p < 0.001), and places with positive stimuli being rated as more arousing than places with neutral stimuli (t(74) = 2.70, p = 0.026). There were no other significant main effects or interactions.

3.4.2 Self-reported valence

3.4.2.1 Island

A significant effect of emotion (F(1, 83) = 74.37, p < 0.001, partial η2 = 0.47) was observed, with places with negative stimuli being rated as more negative (M = 0.01, SE = 0.18) than places with neutral stimuli (M = 1.43, SE = 0.12). A significant effect of age (F(1, 83) = 4.68, p = 0.033, partial η2 = 0.05) was observed, with children generally rating places as less negative (M = 1.00, SE = 0.19) than adults (M = 0.44, SE = 0.17). The interaction between emotion and quality (F(1, 83) = 11.73, p < 0.001, partial η2 = 0.12) was significant. More importantly, the interaction between emotion, age, and quality was significant (F(1, 83) = 16.29, p < 0.001, partial η2 = 0.16) (see Fig. 4c). To better understand this interaction, the data were analyzed separately for adults and children. A significant effect of emotion was found for children (F(1, 37) = 37.20, p < 0.001, partial η2 = 0.50), with places with neutral stimuli (M = 0.20, SE = 0.31) being rated as less negative than places with negative stimuli (M = 1.79, SE = 0.34). For adults, most importantly, the interaction between emotion and quality was significant (F(1, 46) = 34.56, p < 0.001, partial η2 = 0.43). Planned comparisons revealed that, in the high-quality but not in the low-quality 3D assets condition, places with neutral stimuli (M = 1.49, SE = 0.17) were rated as less negative than places with negative stimuli (M =  −1.00, SE = 0.33, t(46) = 8.41, p < 0.001). Moreover, places with negative stimuli were rated as more negative in the high-quality than the low-quality 3D assets condition (M = 0.62, SE = 0.23). The difference between the high- and low-quality 3D assets condition was not significant for neutral stimuli. There were no other significant effects or interactions.

3.4.2.2 City

A significant effect of emotion (F(2, 166) = 45.83, p < 0.001, partial η2 = 0.36) was observed, with places with positive stimuli (M = 1.84, SE = 0.14) being rated as more positive than places with neutral stimuli (M = 1.21, SE = 0.15, t(166) = 3.63, p = 0.001) and those with negative stimuli (M = 0.19, SE = 0.17, t(166) = 9.49, p < 0.001). Places with neutral stimuli were also evaluated as less negative than places with negative stimuli (t(166) = 5.86, p < 0.001). A significant effect of age (F(1, 83) = 20.25, p < 0.001, partial η2 = 0.20) was observed, with children generally rating places as more positive (M = 1.59, SE = 0.17) than adults (M = 0.57, SE = 0.15). Most interestingly, the interaction between emotion, age, and quality was also significant (F(2, 166) = 5.13, p = 0.007, partial η2 = 0.06) (see Fig. 4d). To better understand this interaction, the data were analyzed separately for adults and children. For adults, most importantly, the interaction between emotion and quality (F(2, 92) = 12.0, p < 0.001, partial η2 = 0.21) was significant. This interaction was explained by the fact that in the high-quality condition, the adults rated the places with positive stimuli (M = 2.08, SE = 0.25) as being more positive than those with neutral (M = 0.79, SE = 0.26, t(92) = 3.94, p = 0.002) or negative stimuli (M = −0.52, SE = 0.30,t(92) = 7.94, p < 0.001) but did not do so in the low-quality condition. In the high-quality condition, places with neutral stimuli were also rated as less negative than those with negative stimuli (t(92) = 4.00, p = 0.002). Additionally, places with positive stimuli were rated as more positive under high-quality conditions than they were under low-quality conditions (M = 0.48, SE = 0.25, t(121.4) = 4.20, p < 0.001). For children, a significant effect of emotion was observed (F(2, 74) = 25.94, p < 0.001, partial η2 = 0.41), with places with neutral stimuli (M = 1.81, SE = 0.23) being rated as less negative than places with negative stimuli (M = 0.58, SE = 0.27, t(74) = 4.77, p < 0.001), and places with positive stimuli being rated (M = 2.40, SE = 0.22) as more positive than places with negative stimuli (t(74) = 7.06, p < 0.001). There were no other significant main effects or interactions.

4 Discussion

The aim of this study was to investigate memory of a virtual experience in children and adults and to better specify the factors underlying memory performance in each population. The practical goal was to explore whether manipulating the quality of 3D assets could protect participants from an overwhelming emotional experience, while preserving their memory performances and sense of presence in the virtual content. To do this, we manipulated factors thought to be responsible for memory performance in VR, such as the quality of the 3D asset (immersion) and the emotional nature of the stimuli presented in virtual places. We then measured the physiological and motor responses of our participants to these stimuli. We also asked them to evaluate their sense of presence, arousal, and valence after having seen different places characterized by negative, positive, or neutral stimuli. In order to interpret the effects of these factors on memory performance, we will first discuss our results on the sense of presence, arousal, and valence evaluations, while also including the results of our physiological and motor measures. Secondly, we will discuss the memory performance results.

4.1 Sense of presence: emotion and age effects vary according to virtual environment

For each of the two environments (Island and City), the participants completed a self-assessed questionnaire to evaluate presence after visiting each of the 6 places. At the end of the visit to each environment, they also gave a global evaluation by means of the ITC-SOPI. We hypothesized that sense of presence would be stronger for the children than the adults, stronger for the high-quality than for the low-quality 3D assets, and stronger for the emotional than the neutral places.

As expected, the children reported a greater sense of presence than the adults when this parameter was measured more globally by means of the ITC-SOPI after the end of the visit of each environment, City and Island. However more fine-grained measurements carried out after visiting each place in these environments provide some interesting details, given that a difference was observed only in the Island environment. Thus, our results partly corroborate certain previous findings. For example, Baumgartner et al. (2006) found that children experienced a greater sense of presence than older participants when they were exposed to a roller-coaster scenario on a computer screen. The authors suggested that young children (mean age 9.2) are more likely to feel present in a virtual environment than older adolescents (mean age 15.8) due to a lack of maturation of the prefrontal lobe, meaning that they do not yet have the ability to control and monitor virtual experiences. Our results showed for the first time, at least to our knowledge, that not only can the same differences in VR experience be observed if an HMD is used compared to a study conducted without HMD but that, furthermore, the same differences reported by Baumgartner et al. (2006) between children and adolescents are also obtained in a study comparing adults (mean age 20.7) and rather older children (mean age 11.6). Finally, it should be noted that these differences between adults and children might depend on the environment. Mikropoulos and Strouboulis (2004) have suggested that adults and children do not refer to the same criteria to respond to questionnaires about presence. They compared the results they had obtained with children with those obtained by Usoh et al. (1999) with adults and observed that criteria such as input device or previous experience with the media impacted sense of presence in children but not in adults. Our data may suggest that there are other factors that interact with age and modulate sense of presence differently, for example participants’ familiarity with the environment. In fact, the stimuli presented in the City environment were probably more familiar to children than those presented in the Island environment, potentially explaining why the children felt more present than the adults in the Island but not in the City environment. The less familiar stimuli and environment could have attracted more attention, caused more excitement, and thus increased sense of presence. This suggestion is supported by the fact that, on average, the children evaluated the negative stimuli as more arousing in the Island than in the City environment. However, as we did not measure our participants’ familiarity with either the stimuli or with the environments, this suggestion needs further investigation.

As expected, and regardless of age, sense of presence was higher in places presenting emotional stimuli, but only, and again unexpectedly, in the Island environment. One explanation for this difference between the two environments might be the fact that the negative stimuli presented in the places in the Island environment were, in general, more arousing than the negative and positive stimuli presented in the City. In fact, it has been suggested that arousal is closely related to presence (Lombard and Ditton 1997), and several studies with adults have shown that an increase in sense of presence is correlated with an increase in physiological and subjective arousal (Schneider et al. 2004). Seth et al. (2012) even suggested that emotional content is sufficient to create sense of presence. Our correlation analysis also suggests a strong link between presence and self-evaluated arousal and valence: the participants felt more present in the places that they evaluated as more arousing and valenced (i.e., negatively or positively). To the best of our knowledge, our data provide the first demonstration that the link between presence and emotion is also observed in children. Moreover, in the City, the presented stimuli were mostly manufactured, non-animated, and more familiar than those presented in the Island environment. Motion has been reported to increase arousal (Detenber et al. 1998). As most of the stimuli in the City environment were non-animated, it is possible that they triggered less arousal in general and, consequently, there was less difference in the sense of presence experienced by the participants after viewing places with emotional and moderately arousing stimuli, on the one hand, and those with non-emotional stimuli, on the other. Thus, our results suggest that less arousing, non-animated, and more familiar stimuli generate less presence than more arousing, animated, and less familiar stimuli in both children and adults.

Contrary to our expectations, the quality of the 3D assets had no impact on sense of presence, either in the children or the adults. This result questions the existence of a cast-iron link between immersion and presence (Witmer and Singer 1998) and provides support for the demonstration by Bowman and McMahan (2007) that the level of immersion does not necessarily predict presence. These authors suggested that the same level of immersion could induce different levels of presence depending on the user and on his/her state of mind or recent history. In our study, the quality of the 3D assets was not a relevant factor for creating a sense of presence in a VR experiment with HMD, even when the mood, affective state, and personal tendency to immersion were controlled. It could be interesting for future investigations to explore more systematically the potential, internal, user-specific factors that might impact the link between immersion and presence.

4.2 Arousal and valence measurements: a complex interaction between immersion, emotional nature of the stimuli, and age

We presented negative, positive, and neutral stimuli in different places (one kind in each place) in the Island or the City environments and expected that the participants’ ratings of the places on the self-assessment manikin (Bradley and Lang 1994) would depend on the emotional nature of the stimuli.

As expected, the arousal and valence evaluations were affected by the emotional nature of the stimuli, but also by the age of the participants and the 3D quality of the assets. The children evaluated virtual places as more positive (in both environments) and arousing (only in the Island environment) than the adults. These results are in line with previous findings (Cordon et al. 2013) showing greater sensitivity to arousing content in children. Concerning the evaluation of virtual places as being more positive, this could be due to what Visch et al. (2010) called artefact emotions (i.e., the emotion felt in response to the medium itself, such as fascination or enjoyment). It is possible that the children were more likely to confuse the emotion generated by the content of the fictional world with the artefact emotions created by HMD-based exposure to 3D virtual content. Indeed, they might have rated negative and arousing content in the Island environment, such as a spider, as positive because they enjoyed being afraid of it during the experiment, as they often reported orally to the experimenter. With regard to the evaluation of arousal in the City, it is interesting that there were no significant differences in arousal ratings between the children and adults. Previous findings on non-VR content such as human faces (Vesker et al. 2018) or pictures (Cordon et al. 2013) have shown that children rate positive faces and pictures as more arousing than adults. It is possible that our positive stimuli in the City environment were less suitable for revealing such a difference in children than those used by Cordon et al. (2013).

Concerning the impact of immersion on emotion in children, valence and arousal evaluation were not modulated by the quality of the 3D assets, unlike in the case of the adults. Indeed, the low-quality 3D assets condition eliminated the differences in terms of the evaluation of valence and arousal between negative, neutral and positive stimuli only in the adults. Thus, the most striking result of emotional evaluation was that, for the adults only, the low-quality condition eliminated the differences between the positive, negative, and neutral stimuli in terms of the subjective evaluation of arousal and valence. In fact, in this condition, the adults evaluated all the places in the City and Island environments as being similarly arousing and valenced, irrespective of the emotional nature of the stimuli presented in these places. These results suggest that a low immersion condition could in some way have decreased the emotional impact of the intrinsic characteristics of the emotional stimuli on the subjective feeling of arousal and valence induced in the adults by these stimuli. Slater and Wilbur (1997) suggested that the salience of emotional stimuli can be modulated by several parameters, including vividness, which corresponds to visual salience, and that this modulation will have an impact on participants' emotional experience. A study by Visch et al. (2010), for example, supports this suggestion by observing that the judgment of intensity of different emotions experienced while viewing a movie was higher when the movie was viewed in a more immersive (CAVE) than in a less immersive (3D) condition. In our experiment, the vividness of the emotional stimuli was certainly lower in the low-quality 3D assets condition, and this is probably the reason why our young adults did not experience differences in their feelings of arousal and valence while viewing places with negative, positive, and neutral stimuli. However, this was not observed in the children. As mentioned previously, Baumgartner et al. (2008) examined the difference between high-presence and low-presence situations in terms of the subjective evaluation of immersion, valence and arousal, on the one hand, and brain area activation, on the other, in an fMRI study of children and adults. They observed that higher arousal and valence scores were reported in both the children and adults in the high-presence condition (compared to low-presence condition) whereas, interestingly, rating differences were greater in the adults for arousal and in the children for valence. The authors argued that in children, but not in adults, concurrent activation of areas involved in affective processing (e.g., amygdala and insula) and other areas important for the egocentric processing of the visual environment explains why children are more susceptible to the arousing impact of visual stimuli than adults. It is therefore possible that, as a result of this particular sensitivity to arousing content, the emotional experience was preserved in children even under low immersion conditions.

Taken together, our results suggest that children have higher emotional sensitivity than adults during a VR experience. This draws attention to the fact that, unlike in the case of adults, using poor-quality 3D assets could hardly be considered to be a potential way of manipulating the VR environment in order to protect children against strong emotional experiences in VR.

4.3 Memory of a virtual experience

4.3.1 Impacts of age, immersion, and emotion on memory

As previous studies have demonstrated an emotional enhancement of memory (EEM) in adults and children (Cadet and Chainay 2020; Hamann and Stevens 2014; Kensinger and Schacter 2016; Massol et al. 2020; Stenson et al. 2019), we expected memory performances to be better for emotional stimuli than for neutral ones. We also hypothesized that immersion (in our case, the 3D asset quality) would enhance EEM in a virtual context and that memory performance would correlate with the level of presence.

As predicted, the adults recalled more stimuli than the children, irrespective of environment. These results are in accordance with other studies that have compared memory performance in children and adults by means of recall tasks, although not in VR (e.g., Massol et al. 2020). Recall was also higher for high-quality 3D assets than for low-quality assets, but only in the City environment, thus partly confirming our predictions. The results observed in the City environment confirm what was reported by Wallet et al. (2011): participants who had encoded a route in a city under the high-quality image condition performed better in memory tasks concerning this route. These authors pointed out that the impact of visual quality on memory depends on the information needed to complete the memory task. It is thus possible that visual details were more important for the recall of City stimuli than of those on the Island. These discrepant results are, to some extent, consistent with the literature given that a link between visual fidelity and memory has not always been found (e.g., Mania et al. 2005). Moreover, as explained above, the stimuli in the City were manufactured and non-animated, unlike the case of the Island where the stimuli consisted mostly of wild animals. It is possible that the encoding of visual details is more important for the retrieval process in the case of manufactured and non-animated stimuli than for animated stimuli. Thus, the nature of the stimuli in VR could modulate the impact of visual quality on memory performance. Our discrepant results in two VR environments make it clear that the results cannot simply be generalized to all VR contexts and that precautions must be taken concerning the manipulation of factors such as visual quality when creating virtual experiences, as it is possible that they may impact the participant's memory of the experience.

More importantly, EEM was observed, with higher recall scores being obtained for emotional stimuli (negative and positive) than for neutral ones in both the Island and City environments and for both adults and children. We did not, however, observe any modulation of EEM by the quality of the 3D assets. It is possible that this absence of significant modulation of EEM by the quality of 3D assets was due to the fact that the manipulation was not sufficiently salient to impact the cognitive processes involved in EEM, such as attention engaged during the processing of the stimuli. Attention is thought to play an important role in EEM (Talmi and McGarry 2012) and also in sense of presence in VR, as suggested by Witmer and Singer (1998) and Schuemie et al. 2001), in the sense that a participant who allocates more attention is more involved in VR and is thus more present.

Subsequent analyses showed that EEM was more constant in the children than in the adults, since it was present in both environments for the former and only in the Island environment for the latter. In accordance with the suggestion formulated by Baumgartner et al. (2008), it is possible that the children in our study were more sensitive to emotional arousal than the adults and that the moderate arousal caused by the emotional stimuli in the City environment was therefore sufficiently intense to enhance memory in children but not in adults, who need more intense stimuli before feeling aroused. This proposal is supported by the results concerning the subjective evaluation of arousal, which was higher in the children than the adults, although it must be noted that the evaluation was performed globally for each place and not for individual items. The crucial role of arousal in EEM has been widely accepted in the literature on adult populations (Kensinger and Corkin 2004; Kensinger and Schacter 2016). Our study provides some evidence concerning the possibility that EEM may emerge in children under some circumstances, and even for stimuli that are only very moderately arousing. According to the attention mediation theory (Talmi et al. 2008), EEM could be due to the automatic capture of attention by emotional stimuli. Our fixation time results for the Island (see Supplementary Material) showed that the children visually explored the neutral stimuli for longer periods than the negative ones. The adults looked for equally long at the neutral and negative stimuli, and they looked for longer at the negative stimuli than the children. The neutral stimuli were equally explored by both the children and adults. This suggests, first, that the adults’ attention was not captured more by the negative than the neutral stimuli and, second, that the children avoided looking at the negative stimuli. Despite these results, an EEM was found independently of age. These results suggest that, irrespective of the time spent looking at the stimuli, emotional stimuli were still better remembered than neutral ones. It seems possible that the emotional reaction to the stimuli and its impact on memory did not require a very long exploration time. Indeed, Detenber et al. (1998) observed physiological responses to emotional pictures between 0.5 and 4 s after stimulus onset. Moreover, Sharot and Phelps (2004) showed an EEM for an arousing word even if presented in peripheral vision. Our negative stimuli might therefore have continued to arouse our participants even if they were not present in the center of the visual field. To conclude, our results suggest that enhancement of memory for negative stimuli may occur even after a short exploration of these stimuli.

4.3.2 What might predict memory in VR?

In order to examine which factors might predict memory performance in VR, we conducted a regression analysis using self-reported scores for valence, arousal, and sense of presence as well as physiological measures such as pupillary dilation, fixation time for the stimuli, minimum and maximum head position toward the stimulus and standard deviations of pitch, yaw and roll. In line with previous studies (Baños et al. 2004; Makowski et al. 2017; Riva et al. 2007; Västfjäll 2003), our results showed that arousal, valence, and sense of presence were strongly correlated with each other for both the adults and children. Similarly, we found that pupillary dilation, minimum and maximum head position, and standard deviation of pitch, yaw and roll were correlated with each other. This implies that our regression analyses should be interpreted with caution.

Globally, the regression analysis showed that emotional arousal (self-evaluated or pupillary dilation) was the best predictor of memory performance in VR. This result is consistent with previous findings showing a strong link between memory and arousal (Cahill and McGaugh 1998; Dolcos and Cabeza 2002; Roozendaal and McGaugh 2011). For the adults, sense of presence was also a predictor of memory performance in the Island environment in addition to self-evaluated arousal. Our results with adults confirm that the link between presence and memory performance exists, but that it is not always demonstrated (for a review, see Smith, 2019) as we did not observe it systematically. Makowski et al. (2017), for example, also obtained results suggesting that the relationship between memory performance and sense of presence is complex, as they observed that factual memory, though not temporal order memory, correlated positively with sense of presence in adults as rated after exposure to a 2D or 3D movie in the theater. Unlike these studies, our experiment made use of an HMD, which is known to create greater visual immersion than a projection screen (Bowman and McMahan 2007). Thus, one explanation for the fact that we did not always find that presence was a predictor of memory could be that the link between presence and memory does not remain stable with different levels of immersion.

For the children, the main predictor of memory performance was arousal (physiological and self-assessed). This result seems consistent with the findings of Quas and Lench (2007) on the relationship between arousal at encoding and children’s memory performance one week after having watched video clips eliciting fear. The authors found that a higher heart rate at encoding (corresponding to higher arousal) was related to fewer incorrect responses to direct questions about video details. The authors suggested that the video eliciting fear enhanced attention and led to more efficient encoding of details and consequently to better memory performance. Interestingly, Quas et al. (2004) found that after a shorter delay between encoding and retrieval, more similar to the interval used in our study, high arousal had a negative effect on memory. They suggested that the children in their study were similarly aroused during retrieval and encoding, thus leading to a reduced ability to focus adequately. In our study, it is possible that performing the recall task without the HMD, thus in a different context from encoding, prevented the children from being as strongly aroused as they were during encoding.

In summary, variations in memory performance in our study were explained by arousal, in combination with presence in the case of the adults, but by arousal alone for the children. In order to clarify these results, further research is necessary to investigate the respective roles of presence and emotional arousal in memory performance in children.

5 Limitations

The most important limitation of this study is the fact that our 3-D assets were pretested on valence and arousal only by young adults. The reason is that our experiments were first elaborated for adults population and the stimuli were pre-tested for this population (see Cadet and Chainay 2020). We planned to proceed to a posteriori evaluation of the assets with pre-adolescents, but because of the pandemic Covid-19 and sanitary restrictions we were not able to do these evaluations. Thus, our results and their interpretations have to be taken with precautions. However, there are many studies in the literature on emotion effects on memory in children that have also used stimuli pretested only by adults and have then proceeded to an a posteriori evaluation of the employed stimuli. Such evaluations have shown that children's ratings are very similar to those produced by adults (e.g., Cadet and Chainay 2021; Davidson et al. 2006; Hajcak et al. 2009). Thus, we believe that there is a good probability that the assessments of pre-adolescents would not be fundamentally different from those of young adults.

Another limitation is that the order of the visits to the environments and places was left up to the participants in order to counterbalance order effects between participants. Thus, at the individual level, it is possible that recency and primacy effects in memory could have affected our data. It is also possible that the subjective experience of the first environment influenced that for the second environment in that it might have induced expectation or anxiety about the content of the virtual environments. Another limitation is that we did not measure the participants’ familiarity with the virtual content and thus it is possible that this factor may have influenced the evaluation of the sense of presence. Finally, the children had significantly lower HADS-D scores than the adults. Depression may impact emotion regulation (Joormann and Gotlib 2010) and thus, in our experiment, the self-evaluations of emotions. However, a supplementary ANOVA (see the supplementary material) which we performed while including the HADS-D scores as covariate did not show any interaction between this factor and the other factors included in the analysis and did not change the significance of the results. Thus, in our study, the difference between the adults' and children's self-evaluations of emotion does not seem to be related to the level of depression.

6 Conclusion

Concerning memory, our study demonstrated the presence of EEM in the context of VR. We also observed that memory performance in VR appears to be explained by self-evaluated arousal and sense of presence in adults but only by self-evaluated arousal in children. Concerning the arousal, valence, and presence ratings: unlike in the adults, the quality of the 3D assets had no impact on emotional evaluation in the children; sense of presence was stronger for the children than the adults and was correlated with subjective emotional experience for both the adults and children.

More globally, our findings show that children are more likely to feel aroused and present in virtual environments than adults and that, unlike in the case of adults, lower visual fidelity does not protect children from the emotional content. These results should be considered with regard to future VR applications targeting children in the educational or recreational fields.