Prosody is a suprasegmental feature of speech that adds additional pragmatic, affective, or grammatical information via changes in frequency, intensity, and duration of spoken utterances (McAlpine et al.,
2014; McCann & Peppé,
2003; Paul et al.,
2005). It plays an important role in speech communication and social interaction (Xu,
2005). While the acquisition of prosody starts from infancy (Levitt,
1993) and lays the foundations for children’s sociopragmatic development (Hübscher & Prieto,
2019), atypical prosody can become a barrier to everyday linguistic and social functioning, as seen in autism spectrum disorder (ASD) (Lloyd-Fox et al.,
2013; McCann & Peppé,
2003; Paul et al.,
2005).
ASD is a neurodevelopmental disorder associated with deficits in social communication and interaction as well as restricted and repetitive behaviours and interests (American Psychiatric Association,
2013). Prosodic deficits have been frequently observed in ASD across a variety of perception and production tasks (Diehl & Paul,
2012; Nakai et al.,
2014; Peppé et al.,
2007; Shriberg et al.,
2011; Tager‐Flusberg et al.,
2005). They can occur even in highly verbal individuals with ASD and tend to be lifelong even when other areas of language, such as semantics and syntax, improve (McCann & Peppé,
2003). Among the different areas of prosody, recognising and differentiating the rising from falling intonation in questions and statements represents an important aspect of conversational and linguistic competence (Dahan,
2015; Xie et al.,
2021), and the literature in ASD has produced mixed findings (Chevallier et al.,
2009; Filipe et al.,
2014; Järvinen-Pasley et al.,
2008a; Jiang et al.,
2015; McCann & Peppé,
2003; McCann et al.,
2007; Paul et al.,
2005; Peppé et al.,
2007). The current study investigated this issue by examining the roles of response bias, stimulus type, age, and pitch discrimination thresholds in the perception and production of statement-question intonation in ASD.
Perception of Statement-Question Intonation and Response Bias in ASD
Several studies used the same test battery, PEPS-C (Profiling Elements of Prosodic Systems-Children) (Peppé & McCann,
2003), to examine discrimination (e.g. same vs. different) and identification (e.g. question vs. statement) of statements and questions in ASD (Filipe et al.,
2014; Järvinen-Pasley et al.,
2008a; McCann et al.,
2007; Peppé et al.,
2007). Within this battery, statement-question identification is assessed using a turn-end task with single words, e.g. “Carrots.” vs. “Carrots?”. Statement-question discrimination is assessed within a short-item discrimination task, which contains the laryngographic sounds (devoid of meaning) of the statement-question pairs, as well as those of the liking-disliking pairs, e.g. “tea” pronounced as though the speakers like it or dislike it, from the affect subtask in PEPS-C. Thus, the identification and discrimination tasks are unmatched in stimulus type (speech vs. laryngographic sounds) and in the number of relevant stimuli (only statements or questions are used in the identification task, whereas both statement-question and liking-disliking pairs are included in the discrimination task) in these studies (Filipe et al.,
2014; Järvinen-Pasley et al.,
2008a; McCann et al.,
2007; Peppé et al.,
2007). Results from these studies suggest that individuals with ASD are unimpaired in statement-question identification compared to typically developing (TD) peers (Filipe et al.,
2014; Järvinen-Pasley et al.,
2008a; McCann et al.,
2007; Peppé et al.,
2007). However, impaired discrimination between statements and questions was observed in one sample of participants (31 ASD vs. 72 TD participants) (McCann et al.,
2007; Peppé et al.,
2007), while a different sample showed intact discrimination (21 ASD vs. 21 TD participants) (Järvinen-Pasley et al.,
2008a). In summary, studies using PEPS-C suggest intact statement-question identification but the results on statement-question discrimination in ASD are unclear, in part due to limitations of the design, as well as mixed results from different studies (Filipe et al.,
2014; Järvinen-Pasley et al.,
2008a; McCann et al.,
2007; Peppé et al.,
2007).
Using prosodic tasks other than PEPS-C (e.g. sentence stimuli from Patel et al.,
1998), several studies also reported intact statement-question identification beyond single-word stimuli in ASD (Chevallier et al.,
2009; Järvinen-Pasley et al.,
2008a; Paul et al.,
2005). However, using disyllabic phrases from Jiang et al. (
2010), Jiang et al. (
2015) revealed impaired identification and discrimination of statement-question intonation in Mandarin speakers with ASD. While the different results between Jiang et al. (
2015) and other studies (Chevallier et al.,
2009; Järvinen-Pasley et al.,
2008a; Paul et al.,
2005) may be attributed to the different language or cultural background of the participants: Mandarin Chinese versus English, it may also be the case that the discrepancy was due to differences in task difficulty across these studies. Indeed, participants from both the ASD and TD groups performed at ceiling in Paul et al. (
2005), which used the stimuli from Patel et al. (
1998). In that stimulus set, large pitch contrasts exist between the statements and questions (Patel et al.,
1998), and research has shown that even individuals with congenital amusia, a neurodevelopmental disorder of pitch processing, can perform as well as TD controls on both identification and discrimination of these statements/questions (Ayotte et al.,
2002; Patel et al.,
2008). Addressing the issue with ceiling performance in the literature, Liu et al. (
2010) designed and created a new set of ecologically valid stimuli with relatively subtle pitch contrasts between statements and questions, and revealed prosodic deficits in congenital amusia. Thus, using stimuli from Liu et al. (
2010), the current study aimed to examine whether English-speaking individuals with ASD would also show impaired statement-question identification and discrimination when task difficulty is increased.
In addition to identification/discrimination accuracy rates, it has been suggested that participants’ response patterns should also be scrutinised in order to detect possible response biases in ASD (Järvinen-Pasley et al.,
2008a). Specifically, Peppé et al. (
2007) observed that while children with ASD performed as well as controls in terms of judgement accuracy in statement-question identification, they were biased towards judging questions as statements. In this study, 12.9% of the ASD participants and 2.7% of the control participants judged all the questions as statements, showing a declarative bias, although this percentage difference did not reach statistical significance (Peppé et al.,
2007). For discrimination, impaired performance in ASD was mainly driven by false alarms, i.e. judging the same items as different (Peppé et al.,
2007). To investigate the declarative bias in ASD further, Järvinen-Pasley et al. (
2008a) examined another sample of participants and included the identification task in Patel et al. (
1998) in addition to the turn-end task in PEPS-C. While no significant group difference in response patterns was observed for the turn-end task in PEPS-C, a significant declarative bias was observed among 50% of participants with ASD (in comparison to 10% of controls) for the identification task from Patel et al. (
1998) (Järvinen-Pasley et al.,
2008a). However, no response bias emerged among Mandarin speakers with ASD for either identification or discrimination in Jiang et al. (
2015), although significantly lower accuracy rates were observed in ASD compared to TD. Thus, among the studies that examined response biases in ASD, mixed findings have been presented, with some studies indicating a response bias based on either statistics or simply percentage comparison (Järvinen-Pasley et al.,
2008a; McCann et al.,
2007; Peppé et al.,
2007), whereas others reporting no response bias (Jiang et al.,
2015), depending on the tasks and samples.
In summary, despite much research (Chevallier et al.,
2009; Filipe et al.,
2014; Järvinen-Pasley et al.,
2008a; Jiang et al.,
2015; McCann & Peppé,
2003; McCann et al.,
2007; Paul et al.,
2005; Peppé et al.,
2007), it remains unclear whether individuals with ASD are associated with deficits in identification and/or discrimination of statements and questions, and whether there are response biases driving the observed accuracy rates. These questions need to be addressed, as the answers have implications for the prosody phenotypes of ASD. As mentioned earlier, due to the limitations of the design in PEPS-C (Peppé & McCann,
2003), the short-item discrimination task contains not only statement-question pairs but also liking-disliking pairs, and in laryngographic sounds rather than in natural speech. Thus, one cannot make inferences about the ability to discriminate statements from questions in everyday language from this task. However, if it is indeed the case that ASD is associated with intact identification but impaired discrimination as reported in Peppé et al. (
2007), this dissociation between identification and discrimination may be interpreted as a special feature related to ASD phenotypes (Peppé et al.,
2007). An association between identification and discrimination has been observed in other studies: both are intact (Järvinen-Pasley et al.,
2008a); or both impaired (Jiang et al.,
2015). To further clarify this issue and to help understand the phenotypes of ASD, the current study employed both identification and discrimination tasks from Liu et al. (
2010) to investigate response patterns and the relationship between statement-question identification and discrimination in ASD.
Production of Statement-Question Intonation in ASD
In contrast to the mixed findings reported in perception studies, evidence from production studies has consistently suggested atypical intonation production in ASD (Filipe et al.,
2014; Fusaroli et al.,
2017; McCann & Peppé,
2003; McCann et al.,
2007). Specifically, statement responses of individuals with ASD were more likely to be judged as questions or ambiguous than those of controls (McCann et al.,
2007; Peppé et al.,
2007). In addition, utterances by individuals with ASD were much less likely to be judged as normal or natural than those of controls (Filipe et al.,
2014). These ratings were either given by the experimenter (“tester”) (McCann et al.,
2007; Peppé et al.,
2007) or by independent adult participants (Filipe et al.,
2014). Although informative, subjective ratings do not reveal what aspects of intonation production were atypical in ASD (e.g. pitch, duration, and intensity). In studies using objective acoustic measures, individuals with ASD showed significantly greater pitch range, mean pitch, and maximum pitch than controls for both statements and questions (Filipe et al.,
2014), and increased and inappropriate use of pitch accents as well as difficulty in producing high frequency boundary tones (Fosnot & Jun,
1999). These findings were supported by Fusaroli et al. (
2017), who systematically reviewed the literature quantifying acoustic patterns in ASD and identified significant differences in pitch production (e.g. pitch range and mean pitch) between individuals with ASD and controls, while finding no significant differences in other acoustic features (e.g. intensity, duration). In sum, the atypical production of intonation in ASD seems to be related to a salient acoustic parameter—pitch (DePape et al.,
2012; Fusaroli et al.,
2017).
However, previous studies on intonation production in ASD (Filipe et al.,
2014; Fosnot & Jun,
1999; McCann et al.,
2007; Peppé et al.,
2007) have not conducted acoustic analysis to verify the acoustic realisation of pitch direction in statements and questions in ASD. Acoustic measures are important because question and statement intonation are heavily dependent upon pitch direction, with rising tones representing questions and falling tones representing statements (Cruttenden,
1997; Lieberman,
1960). Misuse of pitch itself can cause not only atypical intonation production but also misperception of statements and questions. Indeed, studies on congenital amusia have demonstrated the importance of acoustic analysis in quantifying pitch realisation in production, when examining the relationship between production and perception (Hutchins & Peretz,
2012; Liu et al.,
2010; Loui et al.,
2008). However, it remains to be determined whether intonation production and perception abilities are related or dissociated among individuals with ASD. The current study addressed this issue by including an intonation imitation task and using acoustic measures to assess pitch direction of the final words in the produced statements and questions (Liu et al.,
2010).
Perception of Pitch in Speech Versus Music in ASD
As in speech, pitch is also used extensively in music to convey meaning and emotion (Patel,
2008). It has been intensely debated whether pitch processing is domain-specific or domain-general between speech and music domains (Mantell & Pfordresher,
2013; Patel,
2008; Peretz & Coltheart,
2003). In particular, Peretz and Coltheart (
2003) proposed that pitch information within a musical context is processed by a tonal encoding module which is absent in spoken pitch processing. Other researchers, however, argued for shared systems underlying the processing of information across both domains (Koelsch,
2011; Koelsch & Siebel,
2005; Patel,
2008; Sammler et al.,
2009). Comparing intonation perception with melodic contour perception, Jiang et al. (
2015) observed enhanced/intact melodic contour identification/discrimination but impaired statement-question identification and discrimination in Mandarin speakers with ASD. This finding suggested pitch processing deficits specific to the speech domain in ASD (Jiang et al.,
2015). However, other studies indicated enhanced identification of pitch contours (e.g. rising, falling, falling-rising, rising-falling) across speech and musical stimuli (Järvinen-Pasley et al.,
2008b), as well as superior discrimination of pitch patterns across speech-speech and speech-music stimulus pairs in ASD versus TD (Järvinen-Pasley & Heaton,
2007). Therefore, further research is warranted to clarify the domain specificity or generality of pitch processing in ASD. To our knowledge, no studies have yet compared pitch perception in ASD using speech and musical stimuli that are matched in global pitch contours derived from statement-question intonation. The present study aimed to fill this gap by investigating whether individuals with ASD would process intonation embedded in speech and musical stimuli differently, using the musical analogues of the statement-question discrimination task in Liu et al. (
2010).
The Development of Prosodic Abilities and its Relationship With Pitch Sensitivity
Studies of prosodic development in TD children suggest that there are significant improvements in the perception and production of statement-question intonation between ages 5 and 11 (Wells et al.,
2004). As children grow older, pitch becomes the primary cue for the statement-question contrast compared to intensity and duration in production (Patel & Grigos,
2006). While 4-year-olds used lengthened duration of the final syllable rather than a rising pitch contour to signify questions, 7-year-olds used multiple acoustic cues (including pitch, intensity and duration) and 11-year-olds used pitch cues predominantly to differentiate statements from questions (Patel & Grigos,
2006). Given that language delay and impairment are prevalent among children and youth with ASD (Kwok et al.,
2015), it may be the case that the development of prosodic skills is also delayed in ASD. Lyons et al., (
2014) investigated the developmental changes of four prosodic functions, including the perception and production of statement-question intonation, stress, phrasing, and affect, in “language-normal” and “language-impaired” preadolescents (9–12 years old) and adolescents (13–17 years old) with and without ASD. The results suggest that TD preadolescents performed as well as TD adolescents on statement-question identification and production, and thus no developmental improvement was observed among TD participants due to ceiling performance. The same pattern of results was also seen in “language-normal” ASD preadolescents and adolescents, who performed similarly to the TD groups on both identification and production of statements and questions. For the “language-impaired” ASD groups, however, significant age-related improvement was observed for identification, but not for production, of statements and questions. That is, while impaired statement-question identification was only observed among “language-impaired” ASD preadolescents, but not among adolescents, impaired statement-question production persisted among “language-impaired” ASD preadolescents and adolescents. Thus, there are developmental delays in the perception and production of statements and questions among “language-impaired” individuals with ASD (Lyons et al.,
2014).
In addition to the close relationship with language abilities (Lyons et al.,
2014), prosodic skills also correlate significantly with pitch processing abilities (Liu et al.,
2010,
2012; Vuvan et al.,
2015). In typical development, there are age-related improvements in the ability to discriminate the direction of pitch changes between ages 6–11 (Fancourt et al.,
2013). However, it has been reported that individuals with ASD show enhanced pitch discrimination early in development, and this ability maintains across children, adolescents and adults and does not correlate with receptive vocabulary (Mayer et al.,
2016). By contrast, controls show significant gains in pitch discrimination performance across development, which also correlates significantly with receptive vocabulary scores (Mayer et al.,
2016). This raises the questions as to whether and how pitch processing abilities influence intonation perception and production in individuals with ASD, and whether age plays a role in these abilities across the lifespan. The current study addressed these questions by examining the development of statement-question perception and production across children, adolescents and adults with and without ASD, as well as its relationship with pitch direction discrimination thresholds.
Present Study
In the current study, we matched ASD and TD children, adolescents and adults for age, sex, nonverbal IQ, receptive vocabulary, as well as verbal and nonverbal short-term memory. Focusing on the prosodic feature of statement-question intonation and the acoustic parameter of pitch, we examined intonation processing in ASD and TD from the perspectives of task condition (discrimination, identification, imitation), response bias, stimulus type (speech, music), developmental changes, and its association with pitch thresholds. We asked whether individuals with ASD differed from controls in their ability to discriminate, identify, and imitate statement-question intonation, whether individuals with ASD showed response bias in discrimination and identification tasks, and whether performance on intonation perception and production related to pitch direction discrimination thresholds. We also examined whether individuals with ASD would perform better on musical pitch processing than on linguistic pitch processing, comparing discrimination of natural speech and their musical analogues. Finally, we examined the effect of age on pitch and intonation perception and production for both ASD and control groups. Based on previous findings, we predicted that: (a) participants with ASD would show impaired performance compared to controls in intonation discrimination and identification tasks, and they would show response biases towards judging the same pairs as different and identifying questions as statements; (b) participants with ASD would show poorer performance on the imitation task compared with controls; (c) participants with ASD would perform better on the musical condition than the speech condition in the discrimination task; (d) across both groups, performance on intonation processing would be associated with pitch direction discrimination thresholds; and (e) participants with ASD would show different developmental trajectories for pitch and intonation processing compared with controls.