Sounds exaggerate visual shape
Highlights
► Speech sounds cross-modally exaggerate visual perception of associated shapes. ► Speech sounds enhance visual adaptation to associated shapes. ► Cross-modal speech experience influences population coding of a basic visual feature.
Introduction
Mouth shapes are systematically associated with sounds due to the anatomy of vocalization (e.g., Liberman and Mattingly, 1985, Sapir, 1929, Yehia et al., 1998). Experiencing these crossmodal associations may lead to neural connectivity or multimodal tuning for visual processing of mouth shapes and auditory processing of speech sounds (Nath and Beauchamp, 2011, Wilson, 2002). Indeed, patches of temporal cortex are activated more strongly by combinations of faces and voices than by either alone (Beauchamp, Argall, Bodurka, Duyn, & Martin, 2004), and viewing of silent lip reading activates the auditory cortex (Calvert et al., 1997). Behaviorally, presenting a talking face influences speech perception in infants (Kuhl & Meltzoff, 1982) and improves speech recognition in adults (von Kriegstein et al., 2008). In a classic study, McGurk and MacDonald (1976) demonstrated that auditory perception of a phoneme was altered by a concurrently presented face pronouncing a different phoneme.
Because audition is usually regarded as the primary modality for speech perception, speech related auditory–visual interactions have been evaluated in terms of how looking at the mouth influences hearing of speech. Here we investigated the converse. Does hearing speech sounds alter how we see shapes? We examined the perception of aspect ratio (horizontal or vertical elongation) for two reasons. First, aspect ratio is a fundamental visual feature that is population-coded in the ventral visual pathway (see Suzuki (2005) for a review), contributing to perception of 3D space, objects and faces (e.g., Biederman, 2001, Knill, 1998a, Knill, 1998b, Young and Yamane, 1992). Second, horizontal and vertical mouth elongations are ubiquitous in speech production, with a horizontally elongated mouth typically producing a /wee/ sound and a vertically elongated mouth typically producing a /woo/ sound. We thus used flat (horizontally elongated) and tall (vertically elongated) ellipses as the visual stimuli and /wee/ and /woo/ sounds as the auditory stimuli. We used simple ellipses rather than images of mouths so that observers would be unaware of the relationship between the sounds and aspect ratios. This was important because we wanted to test the hypothesis that consistent auditory–visual coincidences during speech perception develop general auditory–visual associations that influence visual perception at the level of basic shape coding. If the experience of looking at mouth shapes while listening to speech establishes associations between auditory representations of phonemes and visual representations of associated shapes, hearing a /wee/ sound may make a flat ellipse appear even flatter and hearing a /woo/ sound may make a tall ellipse appear even taller (Fig. 1a).
Section snippets
Experiment 1: Speech sounds exaggerate appearances of associated visual aspect ratios
We examined perception of a briefly flashed ellipse in three conditions. In the consistent-sound condition, an ellipse was presented with a consistent speech sound (a flat ellipse with a /wee/ sound or a tall ellipse with a /woo/ sound). In the inconsistent-sound condition, an ellipse was presented with an inconsistent speech sound (a flat ellipse with a /woo/ sound or a tall ellipse with a /wee/ sound). In the control, environmental-sound, condition, an ellipse was presented with an
Experiment 2: Speech sounds influence the population coding of aspect ratio
We have demonstrated that speech sounds associated with tall and flat mouth shapes implicitly (i.e., with no explicit awareness of the auditory–visual associations) exaggerate visual aspect ratios of simple ellipses. A potential mechanism of this crossmodal effect is that hearing speech sounds enhances responses of visual neurons tuned to the associated aspect ratios. To psychophysically evaluate this hypothesis, we investigated the speech sounds’ influences on aspect-ratio aftereffects; when a
Acknowledgments
This research was supported by National Institutes of Health Grants R01 EY018197-02S1, EY018197, EY021184, and T32 EY007043, and National Science Foundation Grant BCS 0643191.
References (41)
Surface orientation from texture: Ideal observers, generic observers and the information content of texture cues
Vision Research
(1998)Discriminating surface slant from texture: Comparing human and ideal observers
Vision Research
(1998)Distracted and confused? Selective attention under load
Trends in Cognitive Sciences
(2005)- et al.
The motor theory of speech revised
Cognition
(1985) - et al.
Shape discrimination and the judgment of perfect symmetry: Dissociation of shape from size
Vision Research
(1992) - et al.
Multisensory contributions to low-level, ‘unisensory’ processing
Current Opinion in Neurobiology
(2005) - et al.
Listening to talking faces: Motor cortical activation during speech perception
NeuroImage
(2005) - et al.
Auditory–visual crossmodal integration in perception of face gender
Current Biology
(2007) Attentional selection of overlapped shapes: A study using brief aftereffects
Vision Research
(2003)- et al.
Seeing and hearing speech excites the motor system involved in speech production
Neuropsychologia
(2003)
Short test flashes produce large tilt aftereffects
Vision Research
Quantitative association of vocal tract and facial behavior
Speech Communication
Unraveling multisensory integration: Patchy organization within human STS multisensory cortex
Nature Neuroscience
Effects of some variations in auditory input upon visual choice reaction time
Journal of Experimental Psychology
Recognizing depth-rotated objects: A review of recent research and theory
Spatial Vision
The psychophysics toolbox
Spatial Vision
Activation of auditory cortex during silent lip reading
Science
Speech listening specifically modulates the excitability of tongue muscles: A TMS study
European Journal of Neuroscience
The motor theory of speech reviewed
Psychonomic Bulletin & Review
Multisensory synesthetic interactions in the speeded classification of visual size
Perception & Psychophysics
Cited by (27)
What makes a shape “baba”? The shape features prioritized in sound–shape correspondence change with development
2019, Journal of Experimental Child PsychologyCitation Excerpt :Whereas the bouba-kiki effect was first established in Spanish speakers (Köhler 1929, 1947), it has been found in other spoken languages as well (e.g., Nielsen & Rendall, 2011; Ramachandran & Hubbard, 2003; Westbury, 2005) and in individuals without written language (e.g., Bremner et al., 2013). Sound–shape correspondence is also robust across testing paradigms such as label matching (e.g., Köhler, 1929), speeded classification (e.g., Lupyan & Casasanto, 2015; Westbury, 2005), word generation (e.g., Nielsen & Rendall, 2013), and adaptation paradigms (Sweeny, Guzman-Martinez, Ortega, Grabowecky, & Suzuki, 2012). Despite the robustness of the bouba-kiki effect across languages and methods, several factors are known to influence sound–shape correspondence and its development.
Audiovisual crossmodal correspondences: Behavioral consequences and neural underpinnings
2019, Multisensory Perception: From Laboratory to ClinicRole of embodiment and presence in human perception of robots’ facial cues
2018, International Journal of Human Computer StudiesCrossmodal synesthetic congruency improves visual timing in dyslexic children
2016, Research in Developmental DisabilitiesVision of tongue movements bias auditory speech perception
2014, NeuropsychologiaCitation Excerpt :In fact, there is no statistical audiovisual association between speech sounds and visible tongue shapes. Furthermore, there is no perceptual similarity between visible mouth configurations and tongue movement (as it was the case for Sweeny et al., 2012). On the other hand, if the visual presentation of tongue movements induces a significant bias on auditory perception then, the sensorimotor hypothesis is more likely to be true.
Mental imagery changes multisensory perception
2013, Current Biology