Automatic audiovisual integration in speech perception

Gentilucci, Maurizio; Cattaneo, Luigi

doi:10.1007/s00221-005-0008-z

Automatic audiovisual integration in speech perception

Research article
Published: 21 July 2005

Volume 167, pages 66–75, (2005)
Cite this article

Experimental Brain Research Aims and scope Submit manuscript

Maurizio Gentilucci¹ &
Luigi Cattaneo¹

933 Accesses
37 Citations
3 Altmetric
Explore all metrics

Abstract

Two experiments aimed to determine whether features of both the visual and acoustical inputs are always merged into the perceived representation of speech and whether this audiovisual integration is based on either cross-modal binding functions or on imitation. In a McGurk paradigm, observers were required to repeat aloud a string of phonemes uttered by an actor (acoustical presentation of phonemic string) whose mouth, in contrast, mimicked pronunciation of a different string (visual presentation). In a control experiment participants read the same printed strings of letters. This condition aimed to analyze the pattern of voice and the lip kinematics controlling for imitation. In the control experiment and in the congruent audiovisual presentation, i.e. when the articulation mouth gestures were congruent with the emission of the string of phones, the voice spectrum and the lip kinematics varied according to the pronounced strings of phonemes. In the McGurk paradigm the participants were unaware of the incongruence between visual and acoustical stimuli. The acoustical analysis of the participants’ spoken responses showed three distinct patterns: the fusion of the two stimuli (the McGurk effect), repetition of the acoustically presented string of phonemes, and, less frequently, of the string of phonemes corresponding to the mouth gestures mimicked by the actor. However, the analysis of the latter two responses showed that the formant 2 of the participants’ voice spectra always differed from the value recorded in the congruent audiovisual presentation. It approached the value of the formant 2 of the string of phonemes presented in the other modality, which was apparently ignored. The lip kinematics of the participants repeating the string of phonemes acoustically presented were influenced by the observation of the lip movements mimicked by the actor, but only when pronouncing a labial consonant. The data are discussed in favor of the hypothesis that features of both the visual and acoustical inputs always contribute to the representation of a string of phonemes and that cross-modal integration occurs by extracting mouth articulation features peculiar for the pronunciation of that string of phonemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural Network Dynamics and Audiovisual Integration

Multisensory Integration in Speech Processing: Neural Mechanisms of Cross-Modal Aftereffects

Timing in audiovisual speech perception: A mini review and new psychophysical data

Article 15 December 2015

Jonathan H. Venezia, Steven M. Thurman, … Gregory Hickok

References

Bookheimer S (2002) Functional MRI of language: new approaches to understanding the cortical organization of semantic processing. Ann Rev Neurosci 25:151–188
Article PubMed CAS Google Scholar
Buccino G, Binkofski F, Fink GR, Fadiga L, Fogassi L, Gallese V, Seitz RJ, Rizzolatti G, Freund HJ (2001) Action observation activates premotor and parietal areas in somatotopic manner: an fMRI study. Eur J Neurosci 13:400–404
Article PubMed CAS Google Scholar
Buccino G, Lui F, Canessa N, Patteri I, Lagravinese G, Benuzzi F, Porro CA, Rizzolatti G (2004) Neural circuits involved in the recognition of actions performed by nonconspecific: an fMRI study. J Cogn Neurosci 16:114–126
Article PubMed Google Scholar
Calvert GA, Campbell R (2003) Reading speech from still and moving faces: the neural substrates of visibile speech. J Cogn Neurosci 15:57–70
Article PubMed Google Scholar
Calvert GA, Bullmore ET, Brammer MJ, Campbell R, Williams SC, McGuire PK, Woodruff PW, Iversen SD, David AS (1997) Activation of auditory cortex during silent lipreading. Science 276:593–596
Article PubMed CAS Google Scholar
Calvert GA, Brammer MJ, Bullmore ET, Campbell R, Iversen SD, David AS (1999) Response amplification in sensory-specific cortices during cross-modal binding. Neuroreport 10:2619–2623
Article PubMed CAS Google Scholar
Calvert GA, Bullmore ET, Brammer MJ (2000) Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr Biol 10:649–657
Article PubMed CAS Google Scholar
Campbell R, MacSweeney M, Surguladze S, Calvert GA, McGuire PK, Brammer MJ, David AS, Suckling J (2001) Cortical substrates for the perception of face actions: an fMRI study of the specificity of activation for seen speech and for meaningless lower face acts (gurnings). Cogn Brain Res 12:233–243
Article CAS Google Scholar
Carr L, Iacoboni M, Dubeau MC, Mazziotta JC (2003) Neural mechanisms of empathy in humans: a relay from neural systems for imitation to limbic areas. PNAS 100:5497–5502
Article PubMed CAS Google Scholar
Chen TH, Massaro DW (2004) Mandarin speech perception by ear and eye follows a universal principle. Percept Psychophys 66:820–836
PubMed Google Scholar
Demonet JF, Chollet F, Ramsay S, Cardebat D, Nespoulous JC, Wise R, Frackowiak RSJ (1992) The anatomy of phonological and semantic processing in normal subjects. Brain 115:1753–1768
Article PubMed Google Scholar
Ferrero F, Genre A, Boë LJ Contini M (1979) Nozioni di fonetica acustica. Edizioni Omega,Torino
Gentilucci M, Chieffi S, Scarpa M, Castiello U (1992) Temporal coupling between transport and grasp components during prehension movements: effects of visual perturbation. Behav Brain Res 47:71–82
Article PubMed CAS Google Scholar
Gentilucci M, Santunione P, Roy AC, Stefanini S (2004) Execution and observation of bringing a fruit to the mouth affect syllable pronunciation. Eur J Neurosci 19:190–202
Article PubMed Google Scholar
Grèzes J, Armony JL, Rowe J, Passingham RE (2003) Activations related to “mirror” and “canonical” neurones in the human brain: an fMRI study. Neuroimage 18:928–937
Article PubMed Google Scholar
Heiser M, Iacoboni M, Maeda F, Marcus J, Mazziotta JC (2003) The essential role of Broca’s area in imitation. Eur J Neurosci 17:1123–1128
Article PubMed Google Scholar
Iacoboni M, Woods RP, Brass M, Bekkering H, Mazziotta JC, Rizzolatti G (1999) Cortical mechanism of human imitation. Science 286:2526–2528
Article PubMed CAS Google Scholar
Leoni FA, Maturi P (2002) Manuale di Fonetica. Carocci, Roma
Google Scholar
Leslie KR, Johnson-Frey SH, Grafton S (2004) Functional imaging of face and hand imitation: towards a motor theory of empathy. Neuroimage 21:601–607
Article PubMed Google Scholar
Liberman AM, Mattingly IG (1985) The motor theory of speech perception revised. Cognition 1:1–36
Article Google Scholar
Massaro DW (1998) Perceiving talking faces: from speech perception to behavioral principle. MIT press, Cambrige, MA
Google Scholar
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748
Article PubMed CAS Google Scholar
Meltzoff AN (2002) Elements of a developmental theory of imitation. In: Meltzoff AN, Prinz W (eds) The imitative mind: development, evolution, and brain bases. Cambridge University Press, New York, pp 74–84
Google Scholar
Munhall KG, Vatikiotis-Bateson E (1998) The moving face during speech communication. In: Campbell R, Dodd B, Burnham D (eds) Hearing by eye II: advances in the psychology of speechreading and auditory-visual speech. Psychology, Hove UK, pp 123–139
Google Scholar
Oldfield RC (1971) The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9:97–113
Article PubMed CAS Google Scholar
Paulesu E, Frith CD, Frackowiak RSJ (1993) The neural correlates of the verbal component of working memory. Nature 362:342–345
Article PubMed CAS Google Scholar
Reisberg D, McLean J, Goldfield A (1987) Easy to hear but not to understand: a lipreading advantage with intact auditory stimuli. In Dodd B, Campbell R (eds) Hearing by eye: the psychology of lip-reading. Erlbaum, Hillsdale NJ, pp 97–113
Google Scholar
Rizzolatti G, Arbib MA (1998) Language within our grasp. Trends Neurosci 21:188–194
Article PubMed CAS Google Scholar
Sekiyama K, Tohkura Y (1993) Inter-language differences in the influence of visual cues in speech perception. J Phonetics 21:427–444
Google Scholar
Sekiyama K, Kanno I, Miura S, Sugita Y (2003) Audio-visual speech perception examined by fMRI and PET. Neurosci Res 47:277–287
Article PubMed Google Scholar
Sumby WH, Pollack I (1954) Visual contributions to speech intelligibility in noise. J Acoust Soc Am 26:212–215
Article Google Scholar
Summerfield Q (1992) Lipreading and audio-visual speech perception. Philos Trans R Soc Lond B Biol Sci 335:71–78
Article PubMed CAS Google Scholar
Watkins K, Paus T (2004) Modulation of motor excitability during speech perception: the role of Broca’s area. J Cogn Neurosci 16:978–987
Article PubMed Google Scholar
Zatorre RJ, Evans AC, Meyer E, Gjedde A (1992) Lateralization of phonetic and pitch discrimination in speech processing. Science 256:846–849
Article PubMed CAS Google Scholar

Download references

Acknowledgements

We whish to thank Paola Santunione and Andrea Candiani for the help in carrying out the experiments and Dr. Cinzia Di Dio for the comments on the manuscript. The work was supported by grant from MIUR (Ministero dell’Istruzione, dell’Università e della Ricerca) to M.G.

Author information

Authors and Affiliations

Dipartimento di Neuroscienze, Universitá di Parma, Via Volturno 39, 43100, Parma, Italy
Maurizio Gentilucci & Luigi Cattaneo

Authors

Maurizio Gentilucci
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Cattaneo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maurizio Gentilucci.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gentilucci, M., Cattaneo, L. Automatic audiovisual integration in speech perception. Exp Brain Res 167, 66–75 (2005). https://doi.org/10.1007/s00221-005-0008-z

Download citation

Received: 13 September 2004
Accepted: 30 March 2005
Published: 21 July 2005
Issue Date: November 2005
DOI: https://doi.org/10.1007/s00221-005-0008-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic audiovisual integration in speech perception

Abstract

Access this article

Similar content being viewed by others

Neural Network Dynamics and Audiovisual Integration

Multisensory Integration in Speech Processing: Neural Mechanisms of Cross-Modal Aftereffects

Timing in audiovisual speech perception: A mini review and new psychophysical data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic audiovisual integration in speech perception

Abstract

Access this article

Similar content being viewed by others

Neural Network Dynamics and Audiovisual Integration

Multisensory Integration in Speech Processing: Neural Mechanisms of Cross-Modal Aftereffects

Timing in audiovisual speech perception: A mini review and new psychophysical data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation