Abstract
We conducted three experiments in order to examine the influence of gaze behavior and fixation on audiovisual speech perception in a task that required subjects to report the speech sound they perceived during the presentation of congruent and incongruent (McGurk) audiovisual stimuli. Experiment 1 showed that the subjects’ natural gaze behavior rarely involved gaze fixations beyond the oral and ocular regions of the talker’s face and that these gaze fixations did not predict the likelihood of perceiving the McGurk effect. Experiments 2 and 3 showed that manipulation of the subjects’ gaze fixations within the talker’s face did not influence audiovisual speech perception substantially and that it was not until the gaze was displaced beyond 10°–20° from the talker’s mouth that the McGurk effect was significantly lessened. Nevertheless, the effect persisted under such eccentric viewing conditions and became negligible only when the subject’s gaze was directed 60° eccentrically. These findings demonstrate that the analysis of high spatial frequency information afforded by direct oral foveation isnot necessary for the successful processing of visual speech information.
Article PDF
Similar content being viewed by others
References
Baynes, K., Funnell, M. G., &Fowler, C. A. (1994). Hemispheric contributions to the integration of visual and auditory information in speech perception.Perception & Psychophysics,55, 633–641.
Benoõt, C., Guiard-Marigny, T., Le Goff, B., &Adjoudani, A. (1996). Which components of the face do humans and machines best speechread? In D. G. Stork & M. Hennecke (Eds.), Speechreading by humans and machines: Models, systems and applications (pp. 315–328). New York: Springer-Verlag.
Benton, A. (1990). Facial recognition 1990.Cortex,26, 491–499.
Bernstein, L. E., Demorest, M. E., &Tucker, P. E. (2000). Speech perception without hearing.Perception & Psychophysics,62, 233–252.
Burt, M. D., &Perrett, D. I. (1997). Perceptual asymmetries in judgements of facial attractiveness, age, gender, speech and expression.Neuropsychologia,35, 685–693.
Campbell, C. S., &Massaro, D. W. (1997). Perception of visible speech: Influence of spatial quantization.Perception,26, 627–644.
Campbell, R. (1986). The lateralization of lip-read sounds: A first look.Brain & Cognition,5, 1–21.
Campbell, R., de Gelder, B., &de Haan, E. (1996). The lateralization of lip-reading: A second look.Neuropsychologia,34, 1235–1240.
Collewijn, H., van der Mark, F., &Jansen, T. C. (1975). Precise recording of human eye movements.Vision Research,15, 447–450.
Demorest, M. E., &Bernstein, L. E. (1992). Sources of variability in speechreading sentences: A generalizability analysis.Journal of Speech & Hearing Research,35, 876–891.
Diesch, E. (1995). Left and right hemifield advantages of fusions and combinations in audiovisual speech perception.Quarterly Journal of Experimental Psychology,48A, 320–333.
Driver, J. (1996). Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading.Nature,381, 66–68.
Gagné, J. P., Masterson, V., Munhall, K. G., Bilida, N., &Querengesser, C. (1995). Across talker variability in speech intelligibility for conversational and clear speech.Journal of the Academy of Rehabilitative Audiology,27, 135–158.
Gordon, P. C., Eberhardt, J. L., &Rueckl, J. G. (1993). Attentional modulation of the phonetic significance of acoustic cues.Cognitive Psychology,25, 1–42.
Green, K. P., &Kuhl, P. K. (1989). The role of visual information in the processing of place and manner features in speech perception.Perception & Psychophysics,45, 34–42.
Green, K. P., Kuhl, P. K., Meltzoff, A. N., &Stevens, E. B. (1991). Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect.Perception & Psychophysics,50, 524–536.
Greenberg, S. (1999). Speaking in shorthand: A syllable-centric perspective for understanding pronunciation variation.Speech Communication,29, 159–176.
Hays, A. V., Richmond, B. J., &Optican, L. M. (1982). A UNIX-based multiple process system for real-time data acquisition and control.WESCON Conference Proceedings,2, 1–10.
Hughes, H. C., Nozawa, G., &Kitterle, F. (1996). Global precedence, spatial frequency channels, and the statistics of natural images.Journal of Cognitive Neuroscience,8, 197–230.
Jordan, T., &Sergeant, P. (2000). Effects of distance on visual and audiovisual speech recognition.Language & Speech,43, 107–124.
Langton, S. R., Watt, R. J., &Bruce, I. I. (2000). Do the eyes have it? Cues to the direction of social attention.Trends in Cognitive Sciences,4, 50–59.
Lansing, C. R., &McConkie, G. W. (1999). Attention to facial regions in segmental and prosodic visual speech perception tasks.Journal of Speech, Language, & Hearing Research,42, 526–538.
Loffler, G., &Wilson, H. R. (2001). Detecting shape deformation of moving patterns.Vision Research,41, 991–1006.
MacDonald, J., Andersen, S., &Bachmann, T. (2000). Hearing by eye: How much spatial degradation can be tolerated?Perception,29, 1155–1168.
MacDonald, J., &McGurk, H. (1978). Visual influences on speech perception processes.Perception & Psychophysics,24, 253–257.
Maioli, C., Benaglio, I., Siri, S., Sosta, K., &Cappa, S. (2001). The integration of parallel and serial processing mechanisms in visual search: Evidence from eye movement recording.European Journal of Neuroscience,13, 364–372.
Mäkelä, P., Näsänen, R., Rovamo, J., &Melmoth, D. (2001). Identification of facial images in peripheral vision.Vision Research,41, 599–610.
Massaro, D. W. (1987). Speech perception by ear and eye. Hillsdale, NJ: Erlbaum.
Massaro, D. W. (1998).Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, MA: MIT Press.
Massaro, D. W., &Cohen, M. M. (1993). Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables.Speech Communication,13, 127–134.
McGurk, H., &MacDonald, J. W. (1976). Hearing lips and seeing voices.Nature,264, 746–748.
McNeill, D. (1992).Hand and mind. Chicago: University of Chicago Press.
Miller, G. A., &Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants.Journal of the Acoustical Society of America,27, 338–352.
Moscovitch, M., Scullion, D., &Christie, D. (1976). Early versus late stages of processing and their relation to functional hemispheric asymmetries in face recognition.Journal of Experimental Psychology: Human Perception & Performance,2, 401–416.
Munhall, K. G., Gribble, P., Sacco, L., &Ward, M. (1996). Temporal constraints on the McGurk effect.Perception & Psychophysics,58, 351–362.
Munhall, K. G., Kroos, C., & Vatikiotis-Bateson, E. (2003).Spatial frequency requirements for audiovisual speech perception. Manuscript submitted for publication.
Munhall, K. G., &Tohkura, Y. (1998). Audiovisual gating and the time course of speech perception.Journal of the Acoustical Society of America,104, 530–539.
Munhall, K. G., &Vatikiotis-Bateson, E. (1998). The moving face during speech communication. In R. Campbell, B. Dodd, & D. Burnham (Eds.),Hearing by eye: Part 2. The psychology of speechreading and auditory-visual speech (pp. 123–136). Hove, U.K.: Psychology Press.
Posner, M. I. (1980). Orienting of attention.Quarterly Journal of Experimental Psychology,32, 3–25.
Rayner K. (1998). Eye movements in reading and information processing: 20 years of research.Psychological Bulletin,124, 372–422.
Reeves, A., &Sperling, G. (1986). Attention gating in short-term visual memory.Psychological Review,93, 180–206.
Reingold, E. M., Charness, N., Pomplun, M., &Stampe, D. M. (2001). Visual span in expert chess players: Evidence from eye movements.Psychological Science,12, 48–55.
Rhodes, G. (1985). Lateralized processes in face recognition.British Journal of Psychology,76, 249–271.
Richardson, C. K., Bowers, D., Bauer, R. M., Heilman, K. M., &Leonard, C. M. (2000). Digitizing the moving face during dynamic displays of emotion.Neuropsychologia,38, 1028–1039.
Robinson, D. A. (1963). A method of measuring eye movements using a scleral search coil in a magnetic field.IEEE Transactions in Biomedical Engineering,10, 137–145.
Sekiyama, K., &Tohkura, Y. (1991). McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility.Journal of the Acoustical Society of America,90, 1797–1805.
Smeele, P., Massaro, D., Cohen, M., &Sittig, A. (1998). Laterality in visual speech perception.Journal of Experimental Psychology: Human Perception & Psychophysics,24, 1232–1242.
Stork, D. G., &Hennecke, M. (1996).Speechreading by humans and machines: Models, systems and applications. New York: Springer-Verlag.
Sumby, W. H., &Pollack, I. (1954). Visual contribution to speech intelligibility in noise.Journal of the Acoustical Society of America,26, 212–215.
Summerfield, Q., &McGrath, M. (1984). Detection and resolution of audio-visual incompatibility in the perception of vowels.Quarterly Journal of Experimental Psychology,36A, 51–74.
Tanenhaus, M. K., Magnuson, J. S., Dahan, D., &Chambers, C. (2000). Eye movements and lexical access in spoken-language comprehension: Evaluating a linking hypothesis between fixations and linguistic processing.Journal of Psycholinguistic Research,29, 557–580.
Thorn, F., &Thorn, S. (1989). Speechreading with reduced vision: A problem of aging.Journal of the Optical Society of America,6, 491–499.
Tynan, P., &Sekuler, R. (1982). Motion processing in peripheral vision: Reaction time and perceived velocity.Vision Research,22, 61–68.
Vatikiotis-Bateson, E., Eigsti, I.-M., Yano, S., &Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception.Perception & Psychophysics,60, 926–940.
Vatikiotis-Bateson, E., Munhall, K. G., Hirayama, M., Kasahara, Y., &Yehia, H. (1996). Physiology-based synthesis of audiovisual speech. InProceedings of 4th Speech Production Seminar: Models and data (pp. 241–244). Autrans, France.
Viviani, P. (1990). Eye movements in visual search: Cognitive perceptual and motor control aspects. In E. Kowler (Ed.),Reviews of oculomotor research: Eye movements and their role in visual and cognitive processes (Vol. 4, pp. 353–393). Amsterdam: Elsevier.
Yarbus, A. L. (1967).Eye movements and vision. New York: Plenum.
Yehia, H. C., Rubin, P. E., &Vatikiotis-Bateson, E. (1998). Quantitative association of vocal-tract and facial behavior.Speech Communication,26, 23–44.
Author information
Authors and Affiliations
Corresponding author
Additional information
The Canadian Institutes of Health Research (M.P.), the Natural Sciences and Engineering Research Council of Canada, and the National Institute of Health (K.G.M.) supported this work.
Rights and permissions
About this article
Cite this article
Paré, M., Richler, R.C., ten Hove, M. et al. Gaze behavior in audiovisual speech perception: The influence of ocular fixations on the McGurk effect. Perception & Psychophysics 65, 553–567 (2003). https://doi.org/10.3758/BF03194582
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03194582