Elsevier

NeuroImage

Volume 43, Issue 2, 1 November 2008, Pages 379-387
NeuroImage

Look who's talking: The deployment of visuo-spatial attention during multisensory speech processing under noisy environmental conditions

https://doi.org/10.1016/j.neuroimage.2008.06.046Get rights and content

Abstract

In a crowded scene we can effectively focus our attention on a specific speaker while largely ignoring sensory inputs from other speakers. How attended speech inputs are extracted from similar competing information has been primarily studied in the auditory domain. Here we examined the deployment of visuo-spatial attention in multiple speaker scenarios. Steady-state visual evoked potentials (SSVEP) were monitored as a real-time index of visual attention towards three competing speakers. Participants were instructed to detect a target syllable by the center speaker and ignore syllables from two flanking speakers. The study incorporated interference trials (syllables from three speakers), no-interference trials (syllable from center speaker only), and periods without speech stimulation in which static faces were presented. An enhancement of flanking speaker induced SSVEP was found 70–220 ms after sound onset over left temporal scalp during interference trials. This enhancement was negatively correlated with the behavioral performance of participants - those who showed largest enhancements had the worst speech recognition performance. Additionally, poorly performing participants exhibited enhanced flanking speaker induced SSVEP over visual scalp during periods without speech stimulation. The present study provides neurophysiologic evidence that the deployment of visuo-spatial attention to flanking speakers interferes with the recognition of multisensory speech signals under noisy environmental conditions.

Section snippets

Participants

Seventeen neurologically normal paid volunteers participated in the study. Four participants were excluded from the analysis on the basis of extensive eye movement artifacts. The remaining thirteen participants (all right handed, mean age 25.5 y, range 19–35 y, 6 females) reported normal hearing and had normal or corrected-to-normal vision. The Institutional Review Board of the Nathan Kline Institute for Psychiatric Research approved the experimental procedures, and each subject provided

Behavioral results

An analysis of variance (ANOVA) for RTs triggered to the onset of lip movements in the multisensory task showed significantly shorter RTs in the no-interference (689 ms) compared to the interference trials (728 ms; F(1,12) = 16.22, p < .002). In addition, there was a higher hit-rate (HR) in the no-interference (93%) compared to interference trials (76%; F(1,12) = 11.36, p < .007), whereas the false alarm (FA) rate did not differ significantly between no-interference (0.5%) and interference trials

Discussion

In this study we examined the effects of visou-spatial attention on multisensory speech processing in a multiple speaker scenario. By monitoring SSVEP as a real-time index for the allocation of visual attention, we observed that the deployment of attention towards the visual inputs from flanking speaker interferes with speech recognition performance. We discuss these findings in detail below.

Summary and conclusion

The present study provides, to our knowledge, the first neurophysiologic evidence for the important role of visuo-spatial attention in speech recognition during multiple speaker interference conditions. Participants who showed stronger responses to the visual inputs from flanking speakers were more distracted in speech recognition performance than those who showed weaker responses to these inputs. This raises the question about the general nature of visual information processing in multiple

Acknowledgments

This work was supported by a grant from the U.S. National Institute of Mental Health (MH65350) to Dr. J.J. Foxe. Dr. D. Senkowski received support from a NARSAD young investigator award and the German Research Foundation (SE 1859/1-1). We would like to thank Dr. Simon Kelly and Dr. Alexander Maye for discussions of the data and analysis, Marina Shpaner and Jennifer Montesi for their technical assistance, and three anonymous reviewers for their helpful comments during revision of this article.

References (46)

  • Saint-AmourD. et al.

    Seeing voices: high-density electrical mapping and source-analysis of the multisensory mismatch negativity evoked during the McGurk illusion

    Neuropsychologia

    (2007)
  • SenkowskiD. et al.

    Good times for multisensory integration: effects of the precision of temporal synchrony as revealed by gamma-band oscillations

    Neuropsychologia

    (2007)
  • AhveninenJ. et al.

    Task-modulated “what” and “where” pathways in human auditory cortex

    Proc. Natl. Acad. Sci. U. S. A.

    (2006)
  • AlsiusA. et al.

    Attention to touch weakens audiovisual speech integration

    Exp. Brain Res.

    (2007)
  • AsariH. et al.

    Sparse representations for the cocktail party problem

    J. Neurosci.

    (2006)
  • BeckD.M. et al.

    Stimulus context modulates competition in human extrastriate cortex

    Nat. Neurosci.

    (2005)
  • BesleJ. et al.

    Bimodal speech: early suppressive visual effects in human auditory cortex

    Eur. J. Neurosci.

    (2004)
  • BusseL. et al.

    The spread of attention across modalities and space in a multisensory object

    Proc. Natl. Acad. Sci. U. S. A.

    (2005)
  • CallanD.E. et al.

    Multisensory integration sites identified by perception of spatial wavelet filtered visual speech gesture information

    J. Cogn. Neurosci.

    (2004)
  • ConwayA.R. et al.

    The cocktail party phenomenon revisited: the importance of working memory capacity

    Psychon. Bull. Rev.

    (2001)
  • DesimoneR. et al.

    Neural mechanisms of selective visual attention

    Annu. Rev. Neurosci.

    (1995)
  • EsceraC. et al.

    Involuntary attention and distractibility as evaluated with event-related brain potentials

    Audiol. Neuro-otol.

    (2000)
  • FoxeJ.J. et al.

    The case for feedforward multisensory convergence during early cortical processing

    NeuroReport

    (2005)
  • Cited by (44)

    • Neural correlates of multisensory enhancement in audiovisual narrative speech perception: A fMRI investigation

      2022, NeuroImage
      Citation Excerpt :

      Despite the fact that most of us are generally poor lip readers (Tye-Murray et al., 2007), the enhancing effects of visual speech can be dramatic, rendering mostly indecipherable vocalizations clearly audible (Ross, Saint-Amour, Leavitt, Javitt, et al., 2007; Sumby, 1954). This well-known “principle of inverse effectiveness” (Meredith & Stein, 1986; Stein et al., 1988; Stein & Meredith, 1993) holds that multisensory enhancement generally increases with the degradation of the unisensory signals and has been shown across species (Stein et al., 1993) and experimental approaches (Sumby, 1954; van de Rijt et al., 2019) (James, 2012; Ross, Saint-Amour, Leavitt, Javitt, et al., 2007; Stevenson et al., 2012). In the human brain the effect of congruent visual information can be observed at the neural level where low frequency neural activity phase locks to the temporal envelope of speech (Zion Golumbic et al., 2013) and has been shown to be enhanced in degraded auditory speech conditions (Crosse et al., 2016).

    • The interactions of multisensory integration with endogenous and exogenous attention

      2016, Neuroscience and Biobehavioral Reviews
      Citation Excerpt :

      The superior colliculus (SC) is part of the midbrain and contains a large number of multisensory neurons that play an important role in the integration of information from the somatosensory, visual and auditory modalities (Fairhall and Macaluso, 2009; Meredith and Stein, 1996; Wallace et al., 1998). The superior temporal sulcus (STS), which is an association cortex, mediates multisensory benefits at the level of object recognition (Werner and Noppeney, 2010b), especially for biologically relevant stimuli from different modalities; such stimuli include speech (Senkowski et al., 2008), faces/voices (Ghazanfar et al., 2005), and real-life objects (Beauchamp et al., 2004; Werner and Noppeney, 2010a). Posterior parietal regions such as the superior parietal lobule (SPL) and intraparietal sulcus (IPS) can mediate behavioral multisensory facilitation effects (Molholm et al., 2006; Werner and Noppeney, 2010a) through anticipatory motor control (Krause et al., 2012b).

    • Early visual and auditory processing rely on modality-specific attentional resources

      2013, NeuroImage
      Citation Excerpt :

      However our exploratory analyses links attentional modulation of brain activity to behavioral performance, which supports an active suppression mechanism (at least in the visual case): The group of participants that showed stronger reduction of visual processing during attention to auditory input performed better in the auditory task. Senkowski et al. (2008) reported a similar link between SSR amplitude modulation and behavioral performance in a multisensory attention task. In their study, participants had to attend to a frequency-tagged audiovisual speaker presentation (i.e. a flickering video of a face uttering syllables) that was flanked by two other frequency-tagged speakers.

    View all citing articles on Scopus
    View full text