Introduction
Successful interaction with other people can critically rely on vocal information. The voice conveys the speech message, provides information about who a person is (voice identity) and also about the speaker’s emotional state (for review see Belin et al.
2004). Expressing emotions by voice is an evolutionary preserved process (Darwin
2009; Talkington et al.
2013; Vettin and Todt
2005) and the correct interpretation of emotional calls from conspecifics can be critical for survival (Manser
2001; Ordonez-Gomez et al.
2015; Seyfarth et al.
1980). The perception of vocal emotion (i.e. the emotional information conveyed in a speaker’s voice) in humans relies on the analysis of specific acoustic features of the voice, such as the fundamental frequency (F0; i.e. the lowest frequency within the speech signal) or sound intensity (Fairbanks and Pronovost
1938; Gold et al.
2012; Quam and Swingley
2012). The fundamental frequency is perceived as vocal pitch (i.e. the perceptual correlate of F0) and sound intensity is perceived as loudness respectively.
There is evidence that people with autism spectrum disorder (ASD) have difficulties in recognising emotions and mental states from vocal speech (Globerson et al.
2015; Golan et al.
2007; Philip et al.
2010; Rosenblau et al.
2017; Rutherford et al.
2002; for review see Lartseva et al.
2015; but see Jones et al.
2011; Xavier et al.
2015).
It is currently unclear why people with ASD have difficulties with recognising vocal emotion. On one hand the difficulties might be based on a perceptual processing deficit, i.e. a deficit in perceiving voice acoustic features, such as impaired pitch perception. Alternatively, the difficulties might be due to higher-level social cognition difficulties. This latter view was supported by a recent study (Globerson et al.
2015), which is to our knowledge the only previous study that investigated the relation between abilities for perception of acoustic features and for vocal emotion recognition in people with ASD. The authors found that vocal emotion recognition was impaired in adults with high-functioning ASD, but that pitch discrimination (i.e. the ability to detect differences in pitch) for sounds was intact and positively correlated with vocal emotion recognition abilities. The authors concluded that the vocal emotion recognition deficit in people with ASD is associated with higher-level cross-modal emotion difficulties and difficulties of social cognition and that auditory perceptual abilities help to compensate for these higher-level emotion recognition difficulties (Globerson et al.
2015). However, in that study pitch discrimination was tested with non-vocal sounds (i.e. pure tone sine wave tones) and there is recent evidence that adults with high-functioning ASD have deficits in pitch discrimination rather for vocal (i.e. speech including vowels and words), than for non-vocal sounds (Jiang et al.
2015; Schelinski et al.
2017). This finding reopens the possibility that the difficulties with vocal emotion recognition in people with ASD are based on perceptual difficulties. We here hypothesised a relation between vocal emotion processing and pitch discrimination in
vocal sounds. Such a finding would be in line with the view that altered sensory processing in people with ASD might be critically contributing in explaining non-social (e.g. Pellicano
2013) as well as social symptoms associated with ASD (e.g. Baum et al.
2015; Robertson and Baron-Cohen
2017). Although sensory dysfunctions are now also integral parts in the DSM-5, sensory contributions to ASD symptomatology and impairments in higher social cognition have been poorly characterised and are often focused on hyper- and hypo-sensory processing which usually refers to an enhanced ability to perceive sensory stimuli or absent or less response to sensory input (for reviews see e.g. Pellicano
2013; Robertson and Baron-Cohen
2017).
To test our hypothesis, we investigated vocal emotion recognition and vocal and non-vocal pitch perception in a group of adults with high-functioning ASD and typically developing matched comparison group participants. We additionally included a test on vocal timbre discrimination to investigate whether the relation to vocal emotion processing would be related more specifically to the perception of vocal pitch or more general to the perception of voice acoustic features, such as vocal timbre (i.e. the property that distinguishes two sounds of identical pitch, intensity, duration and location; see e.g. Griffiths and Warren
2004).
Difficulties in emotion recognition are associated with reduced social functioning (e.g. Couture et al.
2006; Garcia-Villamisar et al.
2010). For example, difficulties in emotion recognition have been associated with lower social adaptive behaviour in people with ASD (Garcia-Villamisar et al.
2010). Investigating vocal emotion recognition in people with ASD is important because it will enhance the understanding of the underlying mechanisms of difficulties in socially relevant auditory processing. This better understanding might contribute to the identification of diagnostically relevant features as well as to more informed counselling and therapy strategies for emotion recognition difficulties in people with ASD.
Discussion
Our study confirmed the hypothesis of a relation between vocal emotion processing abilities and pitch discrimination abilities in vocal sounds. There were three key findings. First, vocal emotion recognition abilities correlated with vocal pitch perception abilities in adults with typical development. There was no such significant correlation in adults with high-functioning ASD. However the correlation coefficients did not differ significantly between the two groups. Second, the ASD group performed worse than the comparison group in tests on vocal emotion recognition and on vocal pitch perception. There were no significant group differences in non-vocal pitch perception as assessed by the MBEA (Peretz et al.
2003,
2008) and no significant group differences in vocal timbre perception. Third, lower vocal emotion recognition abilities were associated with higher extents of autism spectrum related traits in people with typical development and showed a trend to an association with higher symptom severity in people with ASD.
Our findings are in line with the view that sensory processing differences in people with ASD might be critically contributing to difficulties in social functioning (Baum et al.
2015; Dakin and Frith
2005; Happe and Frith
2006; Pellicano and Burr
2012; Robertson and Baron-Cohen
2017). Differences in sensory processing, such as hypo- and hyper-sensitivity to sensory input, are part of the core symptoms of ASD (APA
2013). Previous studies mainly focused on hyper- and hypo-sensory processing which usually refers to an enhanced ability to perceive sensory stimuli or absent or less response to sensory input (for reviews see e.g. Pellicano
2013; Robertson and Baron-Cohen
2017). Other sensory processing difficulties might also be fundamentally contributing to difficulties in higher-level social cognition (for review see Baum et al.
2015). For example, previous behavioural and neuroimaging results on voice identity processing in people with ASD converge to the view, that difficulties in perceiving and processing acoustic voice features might at least partly explain difficulties in voice identity perception (Schelinski et al.
2016,
2017). Our current results now give first indications that the vocal emotion recognition difficulties of people with ASD might also be at least partly of perceptual nature. This is a novel view on the difficulties people with ASD have with vocal emotion recognition as previous studies rather focused on a dysfunction at a higher cognitive level (Globerson et al.
2015; Golan et al.
2007; Philip et al.
2010; Rutherford et al.
2002).
Our findings are in agreement with a previous study (Globerson et al.
2015) in that we found no significant group differences in
non-vocal pitch perception abilities together with impaired vocal emotion recognition abilities in people with ASD. Critically, however,
vocal pitch perception impairments were present together with vocal emotion processing difficulties in people with ASD. We speculate that people with typical development use vocal pitch information to perform vocal emotion recognition tests and that this is reflected in the correlation between vocal pitch processing and vocal emotion recognition abilities in the comparison group. That there was no such significant correlation in the ASD group might indicate that vocal pitch information is not available for recognition of vocal emotion to the same extent. As there was no significant difference in correlation strength for the correlation between vocal pitch perception and vocal emotion recognition abilities between the groups this assumption remains speculative and needs to be revalidated in bigger samples. However, our findings are important because they complement previous studies by providing evidence that difficulties in vocal emotion recognition in people with ASD might be due to impairments on the perceptual level and not only due to modality-independent social cognitive impairments as suggested previously (Globerson et al.
2015).
A previous study has indicated that people with ASD might use non-vocal pitch processing abilities as a compensatory mechanism to perform vocal emotion recognition (Globerson et al.
2015). Our finding that we did not find significant group differences in a standard test on non-vocal pitch perception abilities is in agreement with such a suggestion. We did, however, not find a correlation between non-vocal pitch and vocal emotion recognition. This difference between the Globerson et al. (
2015) and our study might be explained by the use of different procedures to asses non-vocal pitch perception, i.e. an adaptive tracking procedure to determine individual thresholds in non-vocal pitch perception (Globerson et al.
2015) in contrast to recognition accuracy in a fixed set of stimuli in our study. Using an adaptive tracking procedure likely provides more sensitive results.
A prominent view on auditory processing in people with ASD suggests that difficulties in acoustic processing are more present for vocal stimuli (i.e. speech) as compared to non-vocal stimuli (i.e. non-speech) (e.g. see O’Connor
2012). In line with this assumption our ASD group had difficulties in vocal emotion and vocal pitch perception whereas the perception of non-vocal pitch (i.e. musical pitch assessed by the MBEA) was not significantly different between the groups. However, there are previous study results from adults with high-functioning ASD which contrast this assumption by showing: (i) Impairments in voice identity recognition that are dissociable from intact speech recognition abilities (Schelinski et al.
2016,
2014); (ii) Typical brain response to vocal sounds as compared to non-vocal sounds in voice-sensitive brain regions (Schelinski et al.
2016); and (iii) Intact vocal timbre perception (Bonnel et al.
2010) that is dissociable from difficulties in vocal pitch perception (see Table
4 and Schelinski et al.
2017). These results suggest that voice processing difficulties in people with high-functioning ASD do not cover all aspects of voice processing; they affect vocal pitch, vocal emotion and voice identity processing, but not to the same extent vocal timbre processing and vocal speech perception.
Previous studies showed that the expression (e.g. Nadig and Shaw
2012; for review see Fusaroli et al.
2017) and the perception of pitch can be altered in people with ASD (for review e.g. see O’Connor
2012). The ASD group showed significantly less accurate perception of vocal pitch than the comparison group whereas there were no significant group differences in non-vocal pitch perception (also see Schelinski et al.
2017). Our results on pitch perception are in line with previous evidence that non-vocal pitch perception (i.e. for pure and complex tones) is on the neurotypical level or even enhanced in people with ASD (e.g. Bonnel et al.
2003; Foxton et al.
2003; Globerson et al.
2015; Jones et al.
2009). With regard to vocal pitch perception previous results are less consistent (see e.g. Jarvinen-Pasley and Heaton
2007; Jiang et al.
2015). There are several factors that could explain the discrepancy between the findings, such as differences in the sample characteristics (e.g. differences in age or type of ASD diagnosis) and task design (e.g. differences in task difficulty and instruction or differences in the amount of pitch differences). Typical or even enhanced pitch processing in people with ASD has been related to a processing style which is characterised by enhanced or detailed perception of low-level perceptual information (enhanced perceptual functioning theory; Mottron et al.
2006) that can be associated with a weak ability to integrate elements into a coherent percept (weak central coherence theory; Happe and Frith
2006; for review see Haesen et al.
2011). While our results on vocal perception are difficult to explain by enhanced perception of low-level information, they are in line with the latter view and previous findings on voice identity perception (Schelinski et al.
2016,
2017) suggesting that difficulties in voice perception in people with high-functioning ASD might be related to difficulties in analysing and integrating complex acoustic voice features into a coherent voice percept.
Our results are in line with studies showing that in people with typical development vocal pitch information is essential for differentiating and recognising vocal emotion (e.g. Fairbanks and Pronovost
1938; Gold et al.
2012; Quam and Swingley
2012; Scherer et al.
1991). In the majority of these studies, the importance of vocal pitch in processing vocal emotion was shown by investigating how the perception of different emotions is influenced by different pitch characteristics of the vocal emotion stimulus material used in these studies. Here, we used an additional test on vocal pitch perception with independent stimulus material and provide first evidence that in people with typical development the ability to recognise vocal emotion is directly associated with the ability to perceive vocal pitch.
Previous studies showed that vocal emotion recognition difficulties are correlated with higher extents of autism spectrum traits as assessed by the AQ across people with typical development and people with ASD (Golan et al.
2006,
2007). However, it remained unclear whether such an association also holds when considering both groups separately. The present results indicated that vocal emotion recognition abilities were associated with AQ scores only within the comparison group. In line with previous study results (Rosenblau et al.
2017) within the ASD group, our results indicated a trend that vocal emotion recognition abilities were associated with symptom severity as assessed by the ADOS.
There are several possible confounds which mainly arise from task differences that we discuss in the following. For example, we assume that the differences in performance between vocal and non-vocal pitch perception in people with ASD is unlikely to be due to task differences as both tasks included complex sounds, i.e. vowels in the vocal pitch discrimination test and sounds from different instruments in the non-vocal pitch perception test. It is further unlikely that this dissociation in our study is due to differences in task difficulty as there were no group differences for the vocal timbre discrimination test which had exactly the same design as the vocal pitch discrimination test and only the task instruction differed. Critically, task differences, i.e. using an adaptive tracking procedure with pitch differences of less than one semitone, providing feedback after each response and conducting the test in the lab in the vocal pitch perception test might provide more sensitive results as compared to using a limited set of stimuli with pitch differences of at least one semitone in the non-vocal pitch perception test which was conducted online at home. We assume that this does not affect between group effects as both groups performed the tasks under the same conditions. However, the systematic investigation of vocal and non-pitch perception in people with ASD remains a subject to study. There are several other factors which might contribute to our results, such as verbal abilities, listener’s gender or the complexity of the presented emotions. For example there is evidence that verbal abilities are associated with vocal emotion recognition abilities, although findings are not consistent (for review see Lartseva et al.
2015). We assume that difficulties in vocal emotion recognition in our ASD sample cannot be explained by verbal abilities as groups were matched on verbal IQ and the same ASD group additionally showed intact speech recognition abilities and comparable speech sensitive brain responses as compared to the comparison group (Schelinski et al.
2016). Listener’s gender might be another critical variable which contributes to processing differences in emotion recognition (e.g. Rosenblau et al.
2017; Wacker et al.
2017). For example, a previous functional magnetic resonance imaging (fMRI) study showed differences in processing complex as compared to basic emotions in male and female participants (Rosenblau et al.
2017). We cannot infer on gender differences for the correlation between vocal emotion and vocal pitch discrimination based on the low number of females in our study. Further, we assume that the successful processing of complex emotions (e.g. pride, guilt) which requires a greater extent of socio-cognitive skills might at least partially underlie different mechanisms than we suggested for basic emotions (Alba-Ferrara et al.
2011; Rosenblau et al.
2017; Zinck and Newen
2008). The processing of vocal non-speech sounds (e.g. cry, laugh) which has been shown to be intact in people with ASD (Jones et al.
2011; Xavier et al.
2015) might also at least partially underlie different mechanisms. Additionally, one might assume that our study results are at least partly explainable by attention deficits within the ASD group. To control for possible attention differences between the ASD group and the comparison group, both groups were matched on attention using the d2 test of attention, i.e. there were no significant group differences in concentration performance as operationalised in this test. The d2 test relates to external visual stimuli. The ASD group might differ in the ability to attend stimuli using auditory stimuli. We find it however unlikely that a deficit in auditory attention can explain our results: We found comparable results between the ASD and the comparison group in tasks on working memory which required auditory attention and concentration, e.g. when recalling a series of numbers and letters which were read aloud by the experimenter (Wechsler
1997; Table
1). Additionally, there was a significant interaction in tasks with the same design and task demands (i.e. an interaction between vocal timbre and vocal pitch discrimination; Schelinski et al.
2017). Groups were also matched on performance IQ (Wechsler
1997), however, there was a larger variation of performance IQ scores within the ASD group. A pairwise matching with regard to performance IQ might additionally enhance comparability between the two groups.
We additionally tested whether the recognition accuracy in the vocal emotion recognition test was influenced by the level of emotional intensity or the frequency range of the stimulus material. Our results indicate that the overall worse performance in vocal emotion recognition in the ASD group was independent from the emotional intensity and frequency range of the stimuli used in the present study. This is in contrast to a previous study, in which vocal emotion recognition in people with ASD was mainly impaired for emotions that were difficult to recognise (low emotional intensity) and less impaired on emotion stimuli that were easy to recognise (high emotional intensity) (Globerson et al.
2015).
Behavioural data can provide evidence about possible underlying neuronal mechanisms. A previous study showed that the same sample of adults with high-functioning ASD as reported here, showed dysfunctional right posterior superior temporal sulcus and gyrus (STS/G) response to voice identity as compared to speech recognition (Schelinski et al.
2016; Supplementary Fig. 2). This region is in close proximity to posterior STS/G regions which preferably respond to vocal sounds including vocal speech and non-speech sounds (Belin et al.
2000), voice identity and vocal emotion processing in people with typical development (for meta-analyses see Blank et al.
2014; Frühholz and Grandjean
2013; Supplementary Fig. 2). Further, the posterior STS/G has been associated with sensitivity to acoustic aspects of the voice in vocal emotion (Frühholz et al.
2012) and voice identity perception (Andics et al.
2010; von Kriegstein et al.
2010; Warren et al.
2006). Thus, we speculate that difficulties in vocal emotion and voice identity recognition in people with high-functioning ASD might have a common origin in altered functioning of the posterior STS/G. However, the few studies that have so far investigated the brain representation of vocal emotion perception in people with ASD (Eigsti et al.
2012; Gebauer et al.
2014; Hesling et al.
2010; Rosenblau et al.
2017; Wang et al.
2007) do not provide clear evidence for altered functioning of the right posterior STS/G. Another candidate region for explaining difficulties in vocal pitch processing and potentially also vocal emotion recognition in people with ASD might be antero-lateral Heschl’s gyrus, because pitch processing is classically associated with this region (e.g. Kreitewolf et al.
2014; Patterson et al.
2002; Puschmann et al.
2010; for review see Griffiths and Hall
2012). However, it is currently unclear whether parts of antero-lateral Heschl’s specifically respond to vocal pitch. An explanation for the finding of vocal pitch processing deficits together with intact non-vocal pitch processing abilities in people with ASD at the level of antero-lateral Heschl’s is therefore highly speculative.
Conclusion and Outlook
Difficulties in emotion recognition are socially restricting (Couture et al.
2006; Garcia-Villamisar et al.
2010) and associated with social difficulties in people with ASD (Boraston et al.
2007). Perceptual impairments might contribute significantly to difficulties in social cognition (Baum et al.
2015; Gold et al.
2012). In humans, the ability to adapt behaviour in accordance with the perceived vocal emotion in conspecifics develops early in infancy (Mumme et al.
1996; Vaish and Striano
2004; Walker-Andrews and Grolnick
1983; for review see Grossmann
2010). This suggests an important role of vocal emotion recognition in the development of social cognition. In people with ASD, difficulties in perceiving basic acoustic features, such as vocal pitch, likely contribute to the development of difficulties in higher-level social cognition, such as vocal emotion and voice identity perception. Together with other findings (Baum et al.
2015; Dakin and Frith
2005; Pellicano and Burr
2012; Schelinski et al.
2016,
2017), our results reveal that the investigation of lower-level sensory processing in people with ASD is important as such differences potentially underlie difficulties in higher-level social cognition. Furthermore the perception of lower-level sensory features might be a useful tool for the early diagnosis of ASD.