Elsevier

Cognition

Volume 124, Issue 2, August 2012, Pages 194-200
Cognition

Sounds exaggerate visual shape

https://doi.org/10.1016/j.cognition.2012.04.009Get rights and content

Abstract

While perceiving speech, people see mouth shapes that are systematically associated with sounds. In particular, a vertically stretched mouth produces a /woo/ sound, whereas a horizontally stretched mouth produces a /wee/ sound. We demonstrate that hearing these speech sounds alters how we see aspect ratio, a basic visual feature that contributes to perception of 3D space, objects and faces. Hearing a /woo/ sound increases the apparent vertical elongation of a shape, whereas hearing a /wee/ sound increases the apparent horizontal elongation. We further demonstrate that these sounds influence aspect ratio coding. Viewing and adapting to a tall (or flat) shape makes a subsequently presented symmetric shape appear flat (or tall). These aspect ratio aftereffects are enhanced when associated speech sounds are presented during the adaptation period, suggesting that the sounds influence visual population coding of aspect ratio. Taken together, these results extend previous demonstrations that visual information constrains auditory perception by showing the converse – speech sounds influence visual perception of a basic geometric feature.

Highlights

► Speech sounds cross-modally exaggerate visual perception of associated shapes. ► Speech sounds enhance visual adaptation to associated shapes. ► Cross-modal speech experience influences population coding of a basic visual feature.

Introduction

Mouth shapes are systematically associated with sounds due to the anatomy of vocalization (e.g., Liberman and Mattingly, 1985, Sapir, 1929, Yehia et al., 1998). Experiencing these crossmodal associations may lead to neural connectivity or multimodal tuning for visual processing of mouth shapes and auditory processing of speech sounds (Nath and Beauchamp, 2011, Wilson, 2002). Indeed, patches of temporal cortex are activated more strongly by combinations of faces and voices than by either alone (Beauchamp, Argall, Bodurka, Duyn, & Martin, 2004), and viewing of silent lip reading activates the auditory cortex (Calvert et al., 1997). Behaviorally, presenting a talking face influences speech perception in infants (Kuhl & Meltzoff, 1982) and improves speech recognition in adults (von Kriegstein et al., 2008). In a classic study, McGurk and MacDonald (1976) demonstrated that auditory perception of a phoneme was altered by a concurrently presented face pronouncing a different phoneme.

Because audition is usually regarded as the primary modality for speech perception, speech related auditory–visual interactions have been evaluated in terms of how looking at the mouth influences hearing of speech. Here we investigated the converse. Does hearing speech sounds alter how we see shapes? We examined the perception of aspect ratio (horizontal or vertical elongation) for two reasons. First, aspect ratio is a fundamental visual feature that is population-coded in the ventral visual pathway (see Suzuki (2005) for a review), contributing to perception of 3D space, objects and faces (e.g., Biederman, 2001, Knill, 1998a, Knill, 1998b, Young and Yamane, 1992). Second, horizontal and vertical mouth elongations are ubiquitous in speech production, with a horizontally elongated mouth typically producing a /wee/ sound and a vertically elongated mouth typically producing a /woo/ sound. We thus used flat (horizontally elongated) and tall (vertically elongated) ellipses as the visual stimuli and /wee/ and /woo/ sounds as the auditory stimuli. We used simple ellipses rather than images of mouths so that observers would be unaware of the relationship between the sounds and aspect ratios. This was important because we wanted to test the hypothesis that consistent auditory–visual coincidences during speech perception develop general auditory–visual associations that influence visual perception at the level of basic shape coding. If the experience of looking at mouth shapes while listening to speech establishes associations between auditory representations of phonemes and visual representations of associated shapes, hearing a /wee/ sound may make a flat ellipse appear even flatter and hearing a /woo/ sound may make a tall ellipse appear even taller (Fig. 1a).

Section snippets

Experiment 1: Speech sounds exaggerate appearances of associated visual aspect ratios

We examined perception of a briefly flashed ellipse in three conditions. In the consistent-sound condition, an ellipse was presented with a consistent speech sound (a flat ellipse with a /wee/ sound or a tall ellipse with a /woo/ sound). In the inconsistent-sound condition, an ellipse was presented with an inconsistent speech sound (a flat ellipse with a /woo/ sound or a tall ellipse with a /wee/ sound). In the control, environmental-sound, condition, an ellipse was presented with an

Experiment 2: Speech sounds influence the population coding of aspect ratio

We have demonstrated that speech sounds associated with tall and flat mouth shapes implicitly (i.e., with no explicit awareness of the auditory–visual associations) exaggerate visual aspect ratios of simple ellipses. A potential mechanism of this crossmodal effect is that hearing speech sounds enhances responses of visual neurons tuned to the associated aspect ratios. To psychophysically evaluate this hypothesis, we investigated the speech sounds’ influences on aspect-ratio aftereffects; when a

Acknowledgments

This research was supported by National Institutes of Health Grants R01 EY018197-02S1, EY018197, EY021184, and T32 EY007043, and National Science Foundation Grant BCS 0643191.

References (41)

  • J. Wolfe

    Short test flashes produce large tilt aftereffects

    Vision Research

    (1984)
  • H. Yehia et al.

    Quantitative association of vocal tract and facial behavior

    Speech Communication

    (1998)
  • M.S. Beauchamp et al.

    Unraveling multisensory integration: Patchy organization within human STS multisensory cortex

    Nature Neuroscience

    (2004)
  • I.H. Bernstein et al.

    Effects of some variations in auditory input upon visual choice reaction time

    Journal of Experimental Psychology

    (1971)
  • I. Biederman

    Recognizing depth-rotated objects: A review of recent research and theory

    Spatial Vision

    (2001)
  • D.H. Brainard

    The psychophysics toolbox

    Spatial Vision

    (1997)
  • G.A. Calvert et al.

    Activation of auditory cortex during silent lip reading

    Science

    (1997)
  • L. Fadiga et al.

    Speech listening specifically modulates the excitability of tongue muscles: A TMS study

    European Journal of Neuroscience

    (2002)
  • B. Galantucci et al.

    The motor theory of speech reviewed

    Psychonomic Bulletin & Review

    (2006)
  • A. Gallace et al.

    Multisensory synesthetic interactions in the speeded classification of visual size

    Perception & Psychophysics

    (2006)
  • Cited by (27)

    • What makes a shape “baba”? The shape features prioritized in sound–shape correspondence change with development

      2019, Journal of Experimental Child Psychology
      Citation Excerpt :

      Whereas the bouba-kiki effect was first established in Spanish speakers (Köhler 1929, 1947), it has been found in other spoken languages as well (e.g., Nielsen & Rendall, 2011; Ramachandran & Hubbard, 2003; Westbury, 2005) and in individuals without written language (e.g., Bremner et al., 2013). Sound–shape correspondence is also robust across testing paradigms such as label matching (e.g., Köhler, 1929), speeded classification (e.g., Lupyan & Casasanto, 2015; Westbury, 2005), word generation (e.g., Nielsen & Rendall, 2013), and adaptation paradigms (Sweeny, Guzman-Martinez, Ortega, Grabowecky, & Suzuki, 2012). Despite the robustness of the bouba-kiki effect across languages and methods, several factors are known to influence sound–shape correspondence and its development.

    • Role of embodiment and presence in human perception of robots’ facial cues

      2018, International Journal of Human Computer Studies
    • Vision of tongue movements bias auditory speech perception

      2014, Neuropsychologia
      Citation Excerpt :

      In fact, there is no statistical audiovisual association between speech sounds and visible tongue shapes. Furthermore, there is no perceptual similarity between visible mouth configurations and tongue movement (as it was the case for Sweeny et al., 2012). On the other hand, if the visual presentation of tongue movements induces a significant bias on auditory perception then, the sensorimotor hypothesis is more likely to be true.

    View all citing articles on Scopus
    View full text