The study’s primary aims were to use VR to induce and socially modulate mimicry in neurotypical participants and to explore any differences in ASD. Participants mimicked the kinematics of the avatars’ movements despite being told only to copy the goal of the observed action. Autistic participants tended to mimic but did so to a lesser extent. In neither group, however, was mimicry modulated by the social engagement of the avatar. Possible reasons for this are discussed in further detail below.
A Novel Paradigm for Inducing Mimicry in VR
The results demonstrate that VR avatars can be used to induce mimicry in both neurotypical and autistic participants. Despite participants being told to point to the same targets the avatar pointed to, they were also sensitive to the kinematics of the observed action, rather than just the action goal. For example, on trials where the avatar moved with a high trajectory between the targets, participants also tended to move with a higher trajectory compared to trials where the avatar moved with a low trajectory. This supports previous kinematics studies, such as that by Wild et al. (
2012), in which participants copied the vertical and horizontal amplitude of observed actions despite being given goal-orientated instructions. Previous studies investigating mimicry within a VR setting had only explored reaction time measures of mimicry, such as a stimulus response compatibility paradigm (Pan and Hamilton
2015). The present study extends this work by demonstrating that participants mimic the kinematics of avatars’ movements. More generally, the present study adds to the growing number of studies which highlight the feasibility of VR in the ecologically valid study of human social interaction (Bohil et al.
2011; Georgescu et al.
2014). Our VR paradigm also has the potential to be used in combination with neuroimaging methods, such as functional near infrared spectroscopy, to elucidate the neural underpinnings of mimicry and how these might be different in ASD.
Reduced Mimicry in ASD
Both the neurotypical and ASD group mimicked the avatars movements, yet autistic participants did so to a lesser extent. This supports previous work demonstrating that autistic individuals can and do under certain conditions spontaneously mimic (Cook and Bird
2012; Grecucci et al.
2013) but there is a reduced propensity to do so (Edwards
2014). Most studies demonstrating a reduced propensity to mimic in ASD investigated children (e.g. Jiménez et al.
2014) and those conducted with adolescences or adults have focused on facial mimicry (Hertzig et al.
1989; McIntosh et al.
2006). Thus, the current study extends this work by showing that this reduced propensity to mimic in ASD continues into adulthood, is not restricted to spontaneous facial mimicry, and, most interestingly, occurs in a VR environment. Importantly, the groups did not differ in terms of their ability to copy the goal of the action (i.e. emulation) as there were no significant differences between the groups in the proportion of trials in which participants pointed to the incorrect targets. Again, this finding is supported by previous work showing intact emulation in ASD (Edwards
2014). Together, these findings support Hamilton’s (
2008) proposal of intact emulation yet differences in mimicry in ASD. Finally, the finding that mimicry differences in ASD occur when interacting with VR avatars has important practical and clinical implications for VR training programmes, and, potentially, VR diagnostic tools (Scassellati
2007). It suggests that the behaviours autistic individuals display in everyday life also occur when interacting with and responding to VR avatars. Although limitations of our current VR approach are discussed below.
Unmodulated Mimicry: Co-Presence and Social Cues
Mimicry was not modulated by how socially engaged the avatar was in either neurotypical or autistic participants. This is at odds with STORM and a series of previous studies which demonstrated that social cues, such eye-contact (Forbes et al.
2016), pro-social priming (Cook and Bird
2012) and emotional facial expressions (Grecucci et al.
2013), modulate mimicry in neurotypical participants; yet, this modulation is reduced in ASD. There are several possible reasons as to why the social manipulation did not modulate mimicry in the current study. Wang and Hamilton (
2012) proposed that the effect of eye-contact on mimicry is mediated by an audience effect, whereby the enhancement occurs when participants feel the observer is maintaining social engagement with them throughout the response period. In the current study, the socially engaged avatar gave participants eye-contact throughout their response period so it is unclear why mimicry was not enhanced. One possible reason could be the lack of co-presence with the VR avatars; mean co-presence scores were low. Thus, if participants felt the avatars were unrealistic this may have nullified the impact of any social manipulation and caused low co-presence scores. The avatars’ hand movements were motion captured so based on those of a human. This may account for the reliable mimicry effect as participants are likely to have regarded these movements as realistic. However, the avatars’ head movements, and facial expressions, such as the socially engaged avatar’s smile, were key frame animated. Although, participants’ qualitative experiences towards the avatars were not collected in the current study, in previous VR studies participants have reported that the avatars “were slightly robotic without facial expression which lessened impact” (Pan et al.
2016, p. 11). Moreover, Moser et al. (
2007) have highlighted differences in neural activation, such as reduced activation of the fusiform gyrus, when viewing an avatar with emotional facial expressions compared to a human face displaying the same expressions. Thus, the present limitations of the VR, especially with regard to realistic facial expression, may have accounted for the lack of co-presence and the lack of social modulation in the present study.
The 2D nature of our VR environment may also have contributed to the low co-presence scores. Although the physical world of the participant continued into the virtual world on the screen in front of them, there was a tangible divide between the physical world of the participant and the virtual world of the avatar. Schultze (
2010, p. 439) has highlighted how “one key-determinant of co-presence is … to jointly manipulate shared space and shared objects.” Therefore, the current paradigm may benefit from being implemented in a fully immersive VR setting, for example using a head-mounted display (HDM), such as the Oculus Rift or HTC Vive. This would allow the participants to be embodied (i.e. have their own avatar) and share the virtual space with the avatar, for example, both avatar and participant could point to the same virtual targets. However, studies using such an approach typically have the virtual targets positioned in mid-air without a table, but the kinematics of movements to such targets might differ. Implementing our paradigm safely and effectively using a HMD
with a physical table is technically challenging. A failure to embody participants accurately within a fully immersive HMD runs the risk of participants injuring their fingers on the table in front of them when pointing to the targets.
There was some level of interaction between the avatar and participant in the current study. For example, the avatar did not start her turn until the participant had returned to the resting pad, and, after the engaged avatar had finished her turn she oriented to a motion tracker attached to each participant’s forehead thereby giving a sense of eye contact. Despite these advantages over simple video stimuli, participants were still watching animations on a screen in front of them. Reader and Holmes (
2015) directly compared real life and video stimuli during an imitation task and found reduced object-directed imitation accuracy with the use of video stimuli. Furthermore, reduced activation of human motor cortex has been found when observing motor acts in videos compared to live movements (Järveläinen et al.
2001). Again, the use of a fully immersive, 3D environment, or, the use of real-life interaction partners may result in the social modulation of mimicry within the current paradigm.
Unmodulated Mimicry: Timing and Task Demands
In studies investigating social modulators of mimicry within a stimulus—response compatibility paradigm, there is usually a small time window between the social manipulation, the observed action and the subsequent response. For example, in Forbes et al. (
2016) the delay between the social manipulation and observed action was either 200 or 800 ms. Participants were then required to respond as soon as they saw the actor’s hand move in the video. Similarly, in Grecucci et al. (
2013) the facial expression was presented for 500 ms, participants then observed the moving hand for 1105 ms before being required to respond. Finally, in Pan and Hamilton (
2015; Experiment 2) the interaction between form (avatar vs. ball) and congruency (i.e. mimicry) was only found on reaction times to tap the first, but not the last, drum in the sequence. Together these studies support the view that for certain social manipulations the delay between action observation and performance needs to be minimised in order for the social manipulation to modulate mimicry. Future studies investigating social modulators of mimicry within the present paradigm may benefit from comparing the kinematics of movements to the first target.
The relatively high tasks demands in the current study may have contributed to a lack of social modulation. Error rates in stimulus–response compatibility paradigms are typically less than 0.1 % (e.g. Wang et al.
2010; Bird et al.
2007). In Pan and Hamilton’s (
2015) task mean error rates were between 1.2 and 1.5 %. In the present study the error rate was approximately double this for the neurotypical participants (2.6 %). The lower error rate in Pan and Hamilton (
2015) is likely due to the lower memory demands of their task. The required drum sequence was displayed on a virtual tablet in front of the avatar, whereas, in the current study participants had to memorise the correct three target sequence. Thus, the higher task demands in the present study may have nullified any potential social modulation of mimicry. Finally, it is also possible that lower task demands will enhance mimicry as this could increase participants’ ability to process the motion of the avatar’s movements (Rees et al.
1997). Future studies could reduce the task demands by having participants point to fewer targets.