Temporal synchrony and audiovisual integration of speech and object stimuli in autism

https://doi.org/10.1016/j.rasd.2017.04.001Get rights and content

Highlights

  • We measured sensitivity to audiovisual asynchrony for speech and object stimuli.

  • Controls showed similar sensitivity to the two, carefully matched stimulus types.

  • Adolescents with autism showed higher sensitivity to asynchrony in audiovisual speech.

  • Higher sensitivity to asynchronous speech was associated with autism severity.

Abstract

Background

Individuals with Autism Spectrum Disorders (ASD) have been shown to have multisensory integration deficits, which may lead to problems perceiving complex, multisensory environments. For example, understanding audiovisual speech requires integration of visual information from the lips and face with auditory information from the voice, and audiovisual speech integration deficits can lead to impaired understanding and comprehension. While there is strong evidence for an audiovisual speech integration impairment in ASD, it is unclear whether this impairment is due to low level perceptual processes that affect all types of audiovisual integration or if it is specific to speech processing.

Method

Here, we measure audiovisual integration of basic speech (i.e., consonant-vowel utterances) and object stimuli (i.e., a bouncing ball) in adolescents with ASD and well-matched controls. We calculate a temporal window of integration (TWI) using each individual’s ability to identify which of two videos (one temporally aligned and one misaligned) matches auditory stimuli. The TWI measures tolerance for temporal asynchrony between the auditory and visual streams, and is an important feature of audiovisual perception.

Results

While controls showed similar tolerance of asynchrony for the simple speech and object stimuli, individuals with ASD did not. Specifically, individuals with ASD showed less tolerance of asynchrony for speech stimuli compared to object stimuli. In individuals with ASD, decreased tolerance for asynchrony in speech stimuli was associated with higher ratings of autism symptom severity.

Conclusions

These results suggest that audiovisual perception in ASD may vary for speech and object stimuli beyond what can be accounted for by stimulus complexity.

Introduction

Multisensory integration is a ubiquitous process by which information combined from multiple sensory streams leads to faster, more robust, and more accurate perception of the world around us. For example, when listening to a person speak, watching their mouth move leads to a faster auditory brain response (van Wassenhove, Grant, & Poeppel, 2005), louder-sounding speech (MacLeod & Summerfield, 1987), and increased understanding of both words (Schwartz, Berthommier, & Savariaux, 2004) and ideas (Arnold & Hill, 2001; Reisberg, McLean, & Goldfield, 1987). Some aspects of audiovisual integration are already developed and robust within the first year of life both at neural (Hyde, Jones, Porter, & Flom, 2010) and behavioral levels (Lewkowicz, 2010). In fact, audiovisual integration in infancy predicts later language and communication abilities (Kushnerenko et al., 2013), which is evidence for the importance of audiovisual integration in early language development. While audiovisual integration is present and important early in life, it continues to mature through childhood and adolescence (Dick, Solodkin, & Small, 2010; Knowland, Mercure, Karmiloff-Smith, Dick, & Thomas, 2014; Lewkowicz & Flom, 2014)

Mounting evidence supports an audiovisual integration impairment in ASD, although signs of this impairment may be subtle. For example, individuals with autism show a reduced McGurk effect (Mongillo et al., 2008, Woynaroski et al., 2013), which persists even when their looking time to faces is the same as in controls (Irwin, Tornatore, Brancazio, & Whalen, 2011). Individuals with autism also show reduced audiovisual benefit when listening to speech in background noise (Smith & Bennetto, 2007), although this may improve with age (Foxe et al., 2013). These differences are reflected at the neurophysiological level as shown in studies using both electroencephalography (Brandwein et al., 2013, Brandwein et al., 2014, Megnin et al., 2012) and fMRI (Hubbard et al., 2012). Notably, studies using the non-social “sound induced flash illusion” (Shams, Kamitani, & Shimojo, 2000) have shown intact integration of nonsocial stimuli in autism (i.e., presence of the illusion, Foss-Feig et al., 2010; van der Smagt, van Engeland, & Kemner, 2007).

Temporal synchronization of auditory and visual cues is a feature of audiovisual integration that plays a significant role in the degree of audiovisual benefit. Reducing audiovisual temporal synchrony can reduce both illusory audiovisual effects (Stevenson, Zemtsov, & Wallace, 2012) and audiovisual reaction time benefits (Diederich & Colonius, 2004). For any construct capturing audiovisual integration, there is thought to be a window of audiovisual asynchronies within which the dependent variable increases probabilistically, frequently referred to as the Temporal Window of Integration (TWI, Spence & Squire, 2003; van Wassenhove, Grant, & Poeppel, 2007). Recently, the effects of temporal synchrony on audiovisual integration in autism have been measured across multiple studies (de Boer-Schellekens, Eussen, & Vroomen, 2013; de Boer-Schellekens, Keetels, Eussen, & Vroomen, 2013; Foss-Feig et al., 2010; Grossman, Schneps, & Tager-Flusberg, 2009; Kwakye, Foss-Feig, Cascio, Stone, & Wallace, 2011; Stevenson et al., 2014, Woynaroski et al., 2013). Findings have included a larger temporal window of integration in autism across multiple stimulus types (de Boer-Schellekens, Eussen et al., 2013, de Boer-Schellekens, Keetels et al., 2013, Foss-Feig et al., 2010, Kwakye et al., 2011, Woynaroski et al., 2013), differentially larger temporal windows for speech stimuli only (Stevenson et al., 2014), or no difference in temporal windows (Grossman et al., 2009). Importantly, TWI can be affected by stimulus type (e.g., speech, object), stimulus complexity (e.g. complex human speech vs consonant-vowel utterances), and choice of dependent variable (e.g., presence of illusions, temporal order judgment, or simultaneity judgment), which may explain these discrepancies (Stevenson & Wallace, 2013; van Eijk, Kohlrausch, Juola, & van de Par, 2008; Vatakis, Navarra, Soto-Faraco, & Spence, 2008; Vatakis and Spence, 2006, Vatakis and Spence, 2010). While it is known that social and nonsocial stimuli are associated with different TWIs and that individuals with autism tend to show less impairment in nonsocial audiovisual integration paradigms, only one study directly compares social and nonsocial temporal synchrony perception in autism (Stevenson et al., 2014). Thus, while differences in multisensory integration for social versus nonsocial stimuli are a hallmark of the audiovisual integration literature in autism, we know little about the role of temporal synchrony detection in this pattern.

Most audiovisual integration tasks by their nature require various degrees of verbal processing. Specifically, they may require the subject to report on their perception, stating 1) whether sound or video appeared first (temporal order judgment) 2) whether the audio and video were in synch (simultaneity judgment) 3) what speech sound they heard (McGurk effect) or 4) how many times they heard a beep (flash-beep illusion). While responding with single words is not likely to be particularly taxing for the individuals with autism participating in these studies (i.e, those without intellectual disability), reduction of verbal task demands is preferable, particularly if those studies are to be extended to individuals with more significant verbal deficits.

While there is evidence that minimizing verbal demands still leads to group differences in audiovisual integration tasks in autism (Collignon et al., 2013), it is unknown whether this is the case for temporal synchrony tasks. Beyond verbal processes related to task demands, many investigations of temporal synchrony detection and its role in audiovisual integration in autism use speech stimuli that are both more visually and auditorily complex than those used for nonspeech conditions. Because more complex stimuli are generally associated with larger temporal windows of integration (Vatakis & Spence, 2006, 2010), this could lead to finding larger TWIs in autism that are related to the complexity of the stimuli rather than its social nature (Williams, Minshew, & Goldstein, 2015). Here, we use speech and object-based audiovisual stimuli designed to minimize differences in complexities (i.e.,in timing, pitch, etc) paired with a nonverbal temporal synchrony task to determine the role of stimulus type in audiovisual integration in autism. Specifically, we used a paradigm based on detection versus discrimination of temporal synchrony, and used speech and object stimuli shown in a pilot study to have similar TWIs. The task in the present study requires participants to use a physical response (i.e., pushing arrow buttons vs. verbal response) to select which of two videos matches auditory stimuli in both speech and object conditions. Selecting the matching video parallels the essential process of using visual cues to locate a speaker in a noisy room (i.e., the Audiovisual Cocktail Party Effect, (Zion Golumbic, Cogan, Schroeder, & Poeppel, 2013)). While similar designs have shown sensitivity to audiovisual temporal synchrony in autism and typical development, (Baart, Vroomen, Shaw, & Bortfeld, 2014; Bebko, Weiss, Demark, & Gomez, 2006; Patten, Watson, & Baranek, 2014), the present design incorporates subtly varying audiovisual asynchronies in order to measure the TWI for responses to both speech and object stimuli.

Section snippets

Methods

This study was approved by the University of Rochester Institutional Review Board, and all subjects and parents of subjects 17 years of age and younger completed informed consent.

Results

There was a significant interaction between condition (Speech, Object), and Group (Autism, Control) (F = 6.7, p = .01, η2partial = .15). This interaction was characterized by a differentially smaller TWI for speech compared to object stimuli in autism in comparison to the control group (see Fig. 3). Pairwise comparisons showed a difference in TWI for speech and object stimuli in autism (t = −2.68, p = .02, d = .67) but not in controls (t = 1.11, p = .28, d = .32). Independent samples t-test showed a trend toward

Discussion

In this study, we investigate temporal synchrony detection in a sample of children and adolescents with autism and well-matched controls using both speech and object-based stimuli in a task designed to simulate the process of locating a talking person. Individuals with autism showed a differentially smaller temporal window of integration (TWI) for the speech vs. object condition in comparison to typical controls. In spite of creating speech and object stimuli designed to minimize differences in

Implications

Audiovisual speech integration deficits in autism likely lead to real-life impairments in hearing and understanding human speech. There are studies showing that training in particular aspects of audiovisual integration (e.g., lipreading (Williams, Massaro, Peel, Bosseler, & Suddendorf, 2004), temporal synchrony detection (Powers, Hillock, & Wallace, 2009)) might improve audiovisual speech integration for individuals with ASD. The present study implies that features of audiovisual speech

Acknowledgements

This study was funded by an NIDCD Individual Predoctoral NRSA (F31 DC010769), with additional funding through the following grants: NIDCD R01 DC009439, NIDCD R21 DC011094. We thank Rafael Klorman for support with statistical analyses and Ashley Wilson for her assistance in stimulus development. We are also grateful to all of the families who invested their time to contribute to this research.

References (62)

  • T. Achenbach et al.

    Child behavior checklist for ages 6–18

    (2001)
  • P. Arnold et al.

    Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact

    British Journal of Psychology

    (2001)
  • J.M. Bebko et al.

    Discrimination of temporal synchrony in intermodal events by children with autism and children with developmental disabilities without autism

    Journal of Child Psychology and Psychiatry and Allied Disciplines

    (2006)
  • P. Boersma et al.

    Praat: Doing phonetics by computer

    (2009)
  • A.B. Brandwein et al.

    The development of multisensory integration in high-functioning autism: High-density electrical mapping and psychophysical measures reveal impairments in the processing of audiovisual inputs

    Cerebral Cortex

    (2013)
  • A. Brandwein et al.

    Neurophysiological indices of atypical auditory processing and multisensory integration are associated with symptom severity in autism

    Journal of Autism and Developmental Disorders

    (2014)
  • J. Constantino et al.

    Social responsiveness scale (SRS)

    (2005)
  • A. Diederich et al.

    Bimodal and trimodal multisensory enhancement: Effects of stimulus onset and intensity on reaction time

    Perception and Psychophysics

    (2004)
  • J. Foss-Feig et al.

    An extended multisensory temporal binding window in autism spectrum disorders

    Experimental Brain Research

    (2010)
  • J.J. Foxe et al.

    Severe multisensory speech integration deficits in high-functioning school-aged children with autism spectrum disorder (ASD) and their resolution during early adolescence

    Cerebral Cortex

    (2013)
  • W. Fujisaki et al.

    Recalibration of audiovisual simultaneity?

    Nature Neuroscience

    (2004)
  • R.B. Grossman et al.

    Slipped lips: Onset asynchrony detection of auditory-visual language in autism?

    Journal of Child Psychology and Psychiatry

    (2009)
  • A.L. Hubbard et al.

    Altered integration of speech and gesture in children with autism spectrum disorders

    Brain and Behavior

    (2012)
  • D.C. Hyde et al.

    Visual stimulation enhances auditory processing in 3-month-old infants and adults

    Developmental Psychobiology

    (2010)
  • J.R. Irwin et al.

    Can children with autism spectrum disorders hear a speaking face?

    Child Development

    (2011)
  • Y. Kikuchi et al.

    Atypical disengagement from faces and its modulation by the control of eye fixation in children with autism spectrum disorder

    Journal of Autism and Developmental Disorders

    (2011)
  • V.C.P. Knowland et al.

    Audio-visual speech perception: A developmental ERP investigation

    Developmental Science

    (2014)
  • E. Kushnerenko et al.

    Brain responses and looking behavior during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life

    Frontiers in Psychology

    (2013)
  • L.D. Kwakye et al.

    Altered auditory and multisensory temporal processing in autism spectrum disorders

    Frontiers in Integrative Neuroscience

    (2011)
  • D.J. Lewkowicz et al.

    The audiovisual temporal binding window narrows in early childhood

    Child Development

    (2014)
  • D.J. Lewkowicz

    Infant perception of audio-visual speech synchrony

    Developmental Psychology

    (2010)
  • Cited by (0)

    View full text