Abstract
Research has shown that auditory speech recognition is influenced by the appearance of a talker’s face, but the actual nature of this visual information has yet to be established. Here, we report three experiments that investigated visual and audiovisual speech recognition using color, gray-scale, and point-light talking faces (which allowed comparison with the influence of isolated kinematic information). Auditory and visual forms of the syllables /ba/, /bi/, /ga/, /gi/, /va/, and /vi/ were used to produce auditory, visual, congruent, and incongruent audiovisual speech stimuli. Visual speech identification and visual influences on identifying the auditory components of congruent and incongruent audiovisual speech were identical for color and gray-scale faces and were much greater than for point-light faces. These results indicate that luminance, rather than color, underlies visual and audiovisual speech perception and that this information is more than the kinematic information provided by point-light faces. Implications for processing visual and audiovisual speech are discussed.
Article PDF
Similar content being viewed by others
References
Bassili, J. N. (1978). Facial motion in the perception of faces and of emotional expression.Journal of Experimental Psychology: Human Perception & Performance,4, 373–379.
Berry, D. S. (1990). What can a moving face tell us?Journal of Personality & Social Psychology,58, 1004–1014.
Bingham, G. P. (1987). Scaling and kinematic form: Further investigations on the visual perception of lifted weight.Journal of Experimental Psychology: Human Perception & Performance,13, 155–177.
Binnie, C. A., & Barrager, D. C. (1969, November).Bi-sensoryestablished articulation functions for normal hearing and sensorineural hearing loss patients. Paper presented at the annual convention of the American Speech and Hearing Association, Chicago.
Binnie, C. A., Montgomery, A. A., &Jackson, P. L. (1974). Auditory and visual contributions to the perception of consonants.Journal of Speech & Hearing Research,17, 619–630.
Brannon, J. B. (1961). Speechreading of various speech materials.Journal of Speech & Hearing Disorders,26, 348–353.
Brodie, E. E., Wallace, A. M., &Sharrat, B. (1991). Effect of surface characteristics and style of production on naming and verification of pictorial stimuli.American Journal of Psychology,104, 517–545.
Bruce, V., Healey, P., Burton, M., Doyle, T., Coombes, A., &Linney, A. (1991). Recognising facial surfaces.Perception,20, 755–769.
Bruce, V., Henderson, Z., Greenwood, K., Hancock, P., Burton, M., &Miller, P. (2000). Verification of face identities from images captured on video.Journal of Experimental Psychology: Applied,5, 339–360.
Bruce, V., &Langton, S. (1994). The use of pigmentation and shading information in recognising the sex and identities of faces.Perception,23, 803–822.
Burt, D. M., &Perrett, D. I. (1995). Perception of age in adult Caucasian male faces: Computer graphic manipulation of shape and colour information.Proceedings of the Royal Society of London: Series B,259, 137–143.
Campbell, R. (1992). The neuropsychology of lipreading.Philosophical Transactions of the Royal Society of London: Series B,335, 39–45.
Campbell, R. (1996). Seeing speech in space and time: Psychological and neurological findings. InProceedings of the Fourth International Conference on Spoken Language Processing (pp. 1493–1496). Philadelphia: ICSLP.
Campbell, R., De Haan, E., &Brooks, B. (1997).Space and time in the identification of spoken syllables (Department of Human Communication Science, Work in Progress, No. 7). London: University College.
Campbell, R., Landis, T., &Regard, M. (1986). Face recognition and lipreading: A neurological dissociation.Brain,109, 509–521.
Cavanagh, P. (1987). Reconstructing the third dimension: Interactions between colour, texture, motion, binocular disparity and shape.Computer Vision, Graphics & Image Processing,37, 171–195.
Cavanagh, P., &Leclerc, Y. G. (1989). Shape from shadows.Journal of Experimental Psychology: Human Perception & Performance,15, 3–27.
Cavanagh, P., Tyler, C. W., &Favreau, O. E. (1984). Perceived velocity of moving chromatic gratings.Journal of the Optical Society of America A,1, 893–899.
Cornsweet, T. N. (1970).Visual perception. New York: Academic Press.
Cropper, S. J., &Derrington, A. M. (1996). Rapid colour specific detection of motion in human vision.Nature,379, 72–74.
Davidoff, J. B., &Ostergaard, A. L. (1988). The role of colour in categorical judgments.Quarterly Journal of Experimental Psychology,40A, 533–544.
Edwards, M., &Badcock, D. (1996). Global motion perception: Interaction of chromatic and luminance signals.Vision Research,36, 2423–2431.
Erber, N. (1969). Interaction of audition and vision in the recognition of oral speech stimuli.Journal of Speech & Hearing Research,12, 423–425.
Gailey, L. (1987). Psychological parameters of lip reading skill. In B. Dodd & R. Campbell (Eds.),Hearing by eye: The psychology of lip reading (pp. 115–141). London: Erlbaum.
Gegenfurtner, K. R., &Hawken, M. J. (1995). Temporal and chromatic properties of motion mechanisms.Vision Research,35, 1547–1563.
Gegenfurtner, K. R., &Hawken, M. J. (1996). Interaction of motion and color in the visual pathways.Trends in Neurosciences,19, 394–401.
Green, K. P., &Gerdeman, A. (1995). Cross-modal discrepancies in coarticulation and the integration of speech information: The McGurk effect with mismatched vowels.Journal of Experimental Psychology: Human Perception & Performance,6, 1409–1426.
Green, K. P., Kuhl, P. K., Meltzoff, A. N., &Stevens, E. B. (1991). Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect.Perception & Psychophysics,50, 524–536.
Greenberg, H. J., &Bode, D. L. (1968). Visual discrimination of consonants.Journal of Speech & Hearing Research,11, 869–874.
Guiard-Marigny, T., Ostry, D. J., &Benoit, C. (1995). Speech intelligibility of synthetic lips and jaw.Proceedings of the 13th International Congress of Phonetic Sciences,3, 222–225.
Hawken, M. J., Gegenfurtner, K. R., &Tang, C. (1994). Contrast dependence of colour and luminance motion mechanisms in human vision.Nature,367, 268–270.
Helson, H., Judd, D. B., &Wilson, M. (1956). Color rendition with fluorescent sources of illumination.Illuminating Engineering,51, 329–346.
Hill, H., &Bruce, V. (1996). Effects of lighting on the perception of facial surfaces.Journal of Experimental Psychology: Human Perception & Performance,22, 986–1004.
Hill, H., Bruce, V., &Akamatsu, S. (1995). Perceiving the sex and race of faces: The role of shape and colour.Proceedings of the Royal Society of London: Series B,261, 367–373.
Humphrey, G. K., Goodale, M. A., Jakobson, L. S., &Servos, P. (1994). The role of surface information in object recognition: Studies of a visual form agnosic and normal subjects.Perception,23, 1457–1481.
Humphreys, G. W., Donnelly, N., &Riddoch, J. (1993). Expression is computed separately from facial identity and it is computed separately for moving and static faces: Neuropsychological evidence.Neuropsychologia,21, 173–181.
Hurvich, L. M. (1981).Color vision. Sunderland, MA.: Sinauer.
Hutton, C. (1959). Combining auditory and visual stimuli in aural rehabilitation.Volta Review,61, 316–319.
Ijsseldijk, F. J. (1992). Speechreading performance under different conditions of video image, repetition, and speech rate.Journal of Speech & Hearing Research,35, 466–471.
Jacobsen, A., &Gilchrist, A. (1988). The ratio principle holds over a million-to-one range of illumination.Perception & Psychophysics,43, 1–6.
Johansson, G. (1973). Visual perception of biological motion and a model for its analysis.Perception & Psychophysics,14, 201–211.
Johnson, J. A., Rosenblum, L. D., &Saldaña, H. M. (1994). The contribution of a reduced visual image to speech perception in noise [Abstract].Journal of the Acoustical Societv of America,95(5, Pt. 2), 3009.
Jordan, T. R., &Bevan, K. (1997). Seeing and hearing rotated faces: Influences of facial orientation on visual and audiovisual speech recognition.Journal of Experimental Psychology: Human Perception & Performance,23, 388–403.
Jordan, T. R., &Sergeant, P. C. (1998). Effects of facial image size on visual and audio visual speech recognition. In R. Campbell, B. Dodd, & D. Burnham (Eds.),Hearing by eye: Part 2. The psychology of speechreading and audiovisual speech (pp. 155–176). London: Taylor & Francis.
Jordan, T. R., &Sergeant, P. C. (2000). Effects of distance on visual and audiovisual speech recognition.Language & Speech,43, 107–124.
Jordan, T. R., Sergeant, P. C, Martin, C. Thomas, S. M., &Thow, E. (1997). Effects of horizontal viewing angle on visual and audiovisual speech perception. InProceedings of IEEE International Conference on Computational Cybernetics and Simulation (pp. 1626–1631). Washington, DC: IEEE.
Jordan, T. R., & Thomas, S. M. (1999).Effects of horizontal viewing angle on visual and audiovisual speech recognition. Manuscript submitted for publication.
Kanzaki, R., & Campbell, R. (1999, August).Effects official brightness reversal on visual and audiovisual speech perception. Paper presented at the Audio Visual Speech Processing Conference, University of California, Santa Cruz.
Kemp, R., Pike, G., White, P., &Musselman, A. (1996). Perception and recognition of normal and negative faces: The role of shape from shading and pigmentation cues.Perception,25, 37–52.
Kozlowski, L. T., &Cutting, J. E. (1977). Recognizing the sex of a walker from a dynamic point-light display.Perception & Psychophysics,21, 575–580.
Krauskopf, J., &Farell, B. (1991). Vernier acuity: Effects of chromatic content, blur, and contrast.Vision Research,31, 735–749.
Kumar, T., Beutter, B. R., &Glaser, D. A. (1993). Perceived motion of a colored spot in a noisy chromatic background.Perception,22, 1205–1226.
Larr, A. L. (1959). Speechreading through closed-circuit television.Volta Review,61, 19–21.
Lee, K. J., &Perrett, D. (1997). Presentation time measures of the effects of manipulations in color space on discrimination of famous faces.Perception,26, 733–752.
Liberman, A. M. (1982). On finding that speech is special.American Psychologist,37, 148–167.
Liberman, A. M., &Mattingly, I. G. (1985). The motor theory of speech perception revised.Cognition,21, 1–36.
Livingstone, M. S., &Hubel, D. H. (1987). Psychophysical evidence for separate channels for the perception of form, color, movement and depth.Journal of Neuroscience,1, 3416–3468.
Livingstone, M. S., &Hubel, D. H. (1988). Segregation of form, colour, movement and depth: Anatomy, physiology and perception.Science,240, 740–749.
Macleod, A., &Summerfield, Q. (1987). Quantifying the contribution of vision to speech perception in noise.British Journal of Audiology,12, 131–141.
Macleod, A., &Summerfield, Q. (1990). A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: Rationale, evaluation, and recommendations for use.British Journal of Audiology,24, 29–43.
Marr, D., &Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three-dimensional shapes.Proceedings of the Royal Society of London: Series B,200, 269–294.
Massaro, D. W., &Cohen, M. M. (1990). Perception of synthesized audible and visible speech.Psychological Science,1, 55–63.
Massaro, D. W., &Cohen, M. M. (1996). Perceiving speech from inverted faces.Perception & Psychophysics,58, 1047–1065.
Mcgurk, H., &Macdonald, J. (1976). Hearing lips and seeing voices.Nature,264, 746–748.
Metha, A. B., Vingrys, A. J., &Badcock, D. R. (1994). Detection and discrimination of moving stimuli: The effects of color, luminance, and eccentricity.Journal of the Optical Society of America A,11, 1697–1709.
Middleweerd, M. J., &Plomp, R. (1987). The effect of speechreading on the speech-reception threshold of sentences in noise.Journal of the Acoustical Society of America,82, 2145–2147.
Montgomery, A. A., &Jackson, P. L. (1983). Physical characteristics of the lips underlying vowel lipreading performance.Journal of the Acoustical Society of America,73, 2134–2144.
Muller, E. M., & Macleod, G. (1982, April).Perioral biomechanics and its relation to labial motor control. Paper presented at the meeting of the Acoustical Society of America, Chicago.
Munhall, K. G., &Vatikiotis-Bateson, E. (1998). The moving face during speech communication. In R. Campbell, B. Dodd., & D. Burnham (Eds.),Hearing by eye: Part 2. The psychology of speechreading and audiovisual speech (pp. 123–139). London: Taylor & Francis.
Ohala, J. J. (1975). The temporal regulation of speech. In G. Fant & M. A. A. Tatham (Eds.),Auditory analysis and perception of speech (pp. 431–453). London: Academic Press.
Ostergaard, A. L., &Davidoff, J. B. (1985). Some effects of color on naming and recognition of objects.Journal ofExperimental Psychology: Learning. Memory, & Cognition,11, 579, 579–587.
Papathomas, T. V, Gorea, A., &Julesz, B. (1991). Two carriers for motion perception: Color and luminance.Vision Research,31, 1883–1991.
Petajan, E. D. (1984). Automatic lip reading to enhance speech recognition. InProceedings of the IEEE Communications Societv (pp. 265–272). Washington, DC: IEEE.
Price, C. J., &Humphreys, G. W. (1989). The effects of surface detail on object categorization and naming.Quarterly Journal of Experimental Psychology,41A, 797–828.
Ramachandran, V. S. (1988). Perception of shape from shading.Nature,331, 163–166.
Ramachandran, V. S., &Gregory, R. L. (1978). Does colour provide an input to human motion perception?Nature,275, 55–56.
Reisberg, D., Mclean, J., &Goldfield, A. (1987). Easy to hear but hard to understand: A lipreading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.),Hearing by eye: The psychology of lip reading (pp. 97–113). London: Erlbaum.
Ronneberg, J. (1993). Cognitive characteristics of skilled tactiling: The case of G.S.European Journal of Cognitive Psychology,5, 19–33.
Rosenblum, L. D., Johnson, J., &Saldana, H. M. (1996). Visual kinematic information for embellishing speech in noise.Journal of Speech & Hearing Research,39, 1159–1170.
Rosenblum, L. D., &Saldana, H. M. (1992). Discrimination tests of visually influenced syllables.Perception & Psychophysics,52, 461–473.
Rosenblum, L. D., &Saldana, H. M. (1996). An audiovisual test of kinematic primitives for visual speech perception.Journal of Experimental Psychology: Human Perception & Performance,22, 318–331.
Rosenblum, L. D., &Saldana, H. M. (1998). Time-varying information for visual speech perception. In R. Campbell, B. Dodd., & D. Burnham (Eds.),Hearing by eye: Part 2. The psychology ofspeechreading and audiovisual speech (pp. 61–81). London: Taylor & Francis.
Samuelsson, S., &Ronneberg, J. (1993). Implicit and explicit use of scripted constraints in lip reading.European Journal of Cognitive Psychology,5, 201–233.
Sekuler, R., &Blake, R. (1994).Perception. New York: McGraw-Hill.
Smeele, P. M. T., Hahnlen, L. D., Stevens, E. B., Kuhl, P K., &Meltzoff, A. N. (1995). Investigating the role of specific facial information in audiovisual speech perception [Abstract].Journal of the Acoustical Society of America,98(5, Pt. 2), 2983.
Stone, L. (1957). Facial clues of context in lip reading. InJohn Tracy Clinic. Los Angeles Research Papers (Vol. 5, pp. 36–45). Los Angeles: John Tracy Clinic.
Sudman, J. A., &Berger, K. W. (1971). Two-dimension vs. three dimension viewing in speech reading.Journal of Communication Disorders,4, 195–198.
Sumby, W. H., &Pollack, I. (1954). Visual contribution to speech intelligibility in noise.Journal of the Acoustical Society of America,26, 212–215.
Summerfield, A. Q. (1987). Some preliminaries to a comprehensive account of audiovisual speech perception. In B. Dodd & R. Campbell (Eds.),Hearing by eye: The psychology of lip reading (pp. 3–51). London: Erlbaum.
Summerfield, A. Q. (1992). Lip reading and audiovisual speech perception. In V Bruce, A. Cowey, A. W. Ellis., & D. I. Perrett (Eds.),Processing thefacial image (pp. 71–78). Oxford: Oxford University Press.
Summerfield, Q., Macleod, A., Mcgrath, M., &Brooke, M. (1989). Lips, teeth, and the benefits of lipreading. In A. W. Young & H. D. Ellis (Eds.),Handbook of research on face processing (pp. 223–233). Amsterdam: Elsevier.
Summerfield, Q., &Mcgrath, M. (1984). Detection and resolution of audiovisual incompatibility in the perception of vowels.Quarterly Journal of Experimental Psychology,36A, 51–74.
Troscianko, T. (1994). Contribution of colour to the motion aftereffect and motion perception.Perception,23, 1221–1231.
Vatikiotis-Bateson, E., Eigsti, I.-M., Yano, S., &Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception.Perception & Psychophysics,60, 926–940.
Walden, B. E., Prosek, R. A., Montgomery, A. A., Scherr, C. K., &Jones, C. J. (1977). Effects of training on the visual recognition of consonants.Journal of Speech & Hearing Research,20, 130–145.
Walden, B. E., Prosek, R. A., &Worthington, O. W. (1974). Predicting audiovisual consonant recognition performance of hearing-impaired adults.Journal of Speech & Hearing Research,17, 270–278.
Wurm, L. H., Legge, G. E., Isenberg, L. M., &Luebker, A. (1993). Color improves object recognition in normal and low vision.Journal of Experimental Psychology: Human Perception & Performance,19, 899–911.
Author information
Authors and Affiliations
Corresponding author
Additional information
The authorship of this article is alphabetical and reflects the equal contributions to this work by each author.
Rights and permissions
About this article
Cite this article
Jordan, T.R., Mccotter, M.V. & Thomas, S.M. Visual and audiovisual speech perception with color and gray-scale facial images. Perception & Psychophysics 62, 1394–1404 (2000). https://doi.org/10.3758/BF03212141
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03212141