Abstract
All primate vocal signals are produced through the coordinated movements of the lungs, larynx (vocal folds), and the supralaryngeal vocal tract (Fitch & Hauser, 1995; Ghazanfar & Rendall, 2008). As a result, humans and other primates share a remarkable number of similarities in their vocal signaling, ranging from production mechanisms and signal structure to the inextricable link between vision and audition. In human speech, the signal—across all languages and contexts—is amplitude modulated, consisting of a rhythm that ranges from 2 to 7 Hz (Drullman, 1995; Greenberg et al., 2003; Chandrasekaran et al., 2009), roughly matching the timescale for syllable production. Such temporal modulation in similar frequency ranges also seems to be a common feature of several nonhuman primate vocalizations. For example, common marmoset (Callithrix jacchus) twitter calls are modulated in the 5–9 Hz range (Wang et al., 1995); squirrel monkey (Saimiri sciureus) vocalizations in the 6–10 Hz range (Godey et al., 2005); and finally cotton top tamarins (Saguinus oedipus), macaques (Macaca spp.), and chimpanzees (Pan troglodytes) all seem to have vocalizations modulated in the 3–8 Hz range (Cohen et al., 2007).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abry, C., Lallouache, M. T., & Cathiard, M. A. (1996). How can coarticulation models account for speech sensitivity in audio-visual desynchronization? In D. Stork & M. Henneke (Eds.), Speechreading by humans and machines: Models, systems and applications (pp. 247–255). Berlin: Springer-Verlag.
Barnes, C. L., & Pandya, P. K. (1992). Efferent cortical connections of multimodal cortex of the superior temporal sulcus in the rhesus monkey. Journal of Comparative Neurology, 318(2), 222–244.
Barraclough, N. E., Xiao, D., Baker, C. I., Oram, M. W., & Perrett, D. I. (2005). Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive Neuroscience, 17(3), 377–391.
Baylis, G. C., Rolls, E. T., & Leonard, C. M. (1987). Functional subdivisions of the temporal lobe neocortex. The Journal of Neuroscience, 7(2), 330–342.
Bernstein, L. E., Auer, E. T., & Takayanagi, S. (2004). Auditory speech detection in noise enhanced by lipreading. Speech Communication, 44(1–4), 5–18.
Besle, J., Fort, A., Delpuech, C., & Giard, M.-H. (2004). Bimodal speech: Early suppressive visual effects in human auditory cortex. European Journal of Neuroscience, 20(8), 2225–2234.
Bruce, C., Desimone, R., & Gross, C. G. (1981). Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. Journal Of Neurophysiology, 46(2), 369–384.
Buzsaki, G., & Draguhn, A. (2004). Neuronal oscillations in cortical networks. Science, 304, 1926–1929.
Cappe, C., & Barone, P. (2005). Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. European Journal of Neuroscience, 22(11), 2886–2902.
Chandrasekaran, C., & Ghazanfar, A. A. (2009). Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. Journal of Neurophysiology, 101(2), 773–788.
Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5(7), e1000436.
Chandrasekaran, C., Lemus, L., Trubanova, A., Gondan, M., & Ghazanfar, A. A. (2011). Monkeys and humans share a common computation for face/voice integration. PLoS Computational Biology, 7, e1002165.
Cohen, Y. E., Theunissen, F., Russ, B. E., & Gill, P. (2007). Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. Journal of Neurophysiology, 97(2), 1470–1484.
Driver, J., & Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on ’sensory-specific’ brain regions, neural responses, and judgments. Neuron, 57(1), 11–23.
Drullman, R. (1995). Temporal envelope and fine structure cues for speech intelligibility. The Journal of the Acoustical Society of America, 97(1), 585–592.
Drullman, R., Festen, J. M., & Plomp, R. (1994). Effect of reducing slow temporal modulations on speech reception. The Journal of the Acoustical Society of America, 95(5 Pt 1), 2670–2680.
Ettlinger, G., & Wilson, W. A. (1990). Cross-modal performance: Behavioural processes, phylogenetic considerations and neural mechanisms. Behavioural Brain Research, 40, 169–192.
Evans, T. A., Howell, S., & Westergaard, G. C. (2005). Auditory-visual cross-modal perception of communicative stimuli in tufted capuchin monkeys (Cebus apella). Journal of Experimental Psychology-Animal Behavior Processes, 31, 399–406.
Fitch, W. T. (1997). Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. The Journal of the Acoustical Society of America, 102(2), 1213–1222.
Fitch, W. T., & Hauser, M. (1995). Vocal production in nonhuman primates: Acoustics, physiology, and functional constraints on ldquohonestrdquo advertisement. American Journal of Primatology, 37(3), 191–219.
Fu, K. M. G., Shah, A. S., O’Connell, M. N., McGinnis, T., Eckholdt, H., Lakatos, P., et al. (2004). Timing and laminar profile of eye-position effects on auditory responses in primate auditory cortex. Journal Of Neurophysiology, 92(6), 3522–3531.
Ghazanfar, A. A., & Chandrasekaran, C. F. (2007). Paving the way forward: Integrating the senses through phase-resetting of cortical oscillations. Neuron, 53(2), 162–164.
Ghazanfar, A. A., & Logothetis, N. K. (2003). Facial expressions linked to monkey calls. Nature, 423(6943), 937–938.
Ghazanfar, A. A., & Rendall, D. (2008). Evolution of human vocal production. Current Biology, 18(11), R457–460.
Ghazanfar, A. A., & Schroeder, C. E. (2006). Is neocortex essentially multisensory? Trends in Cognitive Sciences, 10(6), 278–285.
Ghazanfar, A. A., Smith-Rohrberg, D., & Hauser, M. D. (2001). The role of temporal cues in rhesus monkey vocal recognition: Orienting asymmetries to reversed calls. Brain Behavior And Evolution, 58(3), 163–172.
Ghazanfar, A. A., Smith-Rohrberg, D., Pollen, A. A., & Hauser, M. D. (2002). Temporal cues in the perception of long calls by tamarins. Animal Behaviour, 64, 427–438.
Ghazanfar, A. A., Maier, J. X., Hoffman, K. L., & Logothetis, N. K. (2005). Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. The Journal of Neuroscience, 25(20), 5004–5012.
Ghazanfar, A. A., Nielsen, K., & Logothetis, N. K. (2006). Eye movements of monkey observers viewing vocalizing conspecifics. Cognition, 101(3), 515–529.
Ghazanfar, A. A., Turesson, H. K., Maier, J. X., van Dinther, R., Patterson, R. D., & Logothetis, N. K. (2007). Vocal-tract resonances as indexical cues in rhesus monkeys. Current Biology, 17(5), 425–430.
Ghazanfar, A. A., Chandrasekaran, C., & Logothetis, N. K. (2008). Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. The Journal of Neuroscience, 28(17), 4457–4469.
Ghazanfar, A. A., Chandrasekaran, C., & Morrill, R. J. (2010). Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: Implications for the evolution of audiovisual speech. European Journal of Neuroscience, 31(10), 1807–1817.
Godey, B. T., Atencio, C. A., Bonham, B. H., Schreiner, C. E., & Cheung, S. W. (2005). Functional organization of squirrel monkey primary auditory cortex: Responses to frequency-modulation sweeps. Journal of Neurophysiology, 94(2), 1299–1311.
Gothard, K. M., Battaglia, F. P., Erickson, C. A., Spitler, K. M., & Amaral, D. G. (2007). Neural responses to facial expression and face identity in the monkey amygdala. Journal of Neurophysiology, 97(2), 1671–1683.
Greenberg, S., Carvey, H., Hitchcock, L., & Chang, S. (2003). Temporal properties of spontaneous speech—a syllable-centric perspective. Journal of Phonetics, 31(3–4), 465–485.
Hauser, M. D., & Andersson, K. (1994). Left hemisphere dominance for processing vocalizations in adult, but not infant, rhesus monkeys: Field experiments. Proceedings of the National Academy of Sciences of the USA, 91(9), 3946–3948.
Hauser, M. D., & Ybarra, M. S. (1994). The role of lip configuration in monkey vocalizations—experiments using xylocaine as a nerve block. Brain and Language, 46(2), 232–244.
Hauser, M. D., Evans, C. S., & Marler, P. (1993). The role of articulation in the production of rhesus-monkey, Macaca mulatta, vocalizations. Animal Behaviour, 45(3), 423–433.
Hauser, M. D., Agnetta, B., & Perez, C. (1998). Orienting asymmetries in rhesus monkeys: The effect of time-domain changes on acoustic perception. Animal Behaviour, 56(1), 41–47.
Helfer, K. S., & Freyman, R. L. (2005). The role of visual speech cues in reducing energetic and informational masking. The Journal of the Acoustical Society of America, 117(2), 842–849.
Hikosaka, K., Iwai, E., Saito, H., & Tanaka, K. (1988). Polysensory properties of neurons in the anterior bank of the caudal superior temporal sulcus of the macaque monkey. Journal of Neurophysiology, 60(5), 1615–1637.
Jordan, K. E., Brannon, E. M., Logothetis, N. K., & Ghazanfar, A. A. (2005). Monkeys match the number of voices they hear to the number of faces they see. Current Biology, 15(11), 1034–1038.
Kayser, C., & Logothetis, N. K. (2009). Directed interactions between auditory and superior temporal cortices and their role in sensory integration. Frontiers in Integrative Neuroscience, 3, 7. doi: 10.3389/neuro.07.007.2009.
Kayser, C., Petkov, C. I., & Logothetis, N. K. (2008). Visual modulation of neurons in auditory cortex. Cerebral Cortex, 18, 1560–1574.
Klin, A., Jones, W., Schultz, R., Volkmar, F., & Cohen, D. (2002). Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. Archives of General Psychiatry, 59, 809–816.
Kuraoka, K., & Nakamura, K. (2007). Responses of single neurons in monkey amygdala to facial and vocal emotions. Journal of Neurophysiology, 97(2), 1379–1387.
Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., & Schroeder, C. E. (2005). An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology, 94(3), 1904–1911.
Lakatos, P., Chen, C.-M., O’Connell, M. N., Mills, A., & Schroeder, C. E. (2007). Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron, 53(2), 279–292.
Lansing, C. R., & McConkie, G. W. (2003). Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences. Perception & Psychophysics, 65(4), 536–552.
Logothetis, N. K. (2002). The neural basis of the blood-oxygen-level-dependent functional magnetic resonance imaging signal. Philosophical Transactions of the Royal Society B: Biological Sciences, 357(1424), 1003–1037.
MacNeilage, P. F. (1998). The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21(4), 499–511.
Munhall, K. G., & Tohkura, Y. (1998). Audiovisual gating and the time course of speech perception. The Journal of the Acoustical Society of America, 104(1), 530–539.
Munhall, K. G., & Vatikiotis-Bateson, E. (2004). Spatial and temporal constraints on audiovisual speech perception In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensory processes (pp. 177–188). Cambridge, MA: MIT Press.
Okun, M., Naim, A., & Lampl, I. (2010). The subthreshold relation between cortical local field potential and neuronal firing unveiled by intracellular recordings in awake rats. The Journal of Neuroscience, 30(12), 4440–4448.
Otero-Millan, J., Troncoso, X. G., Macknik, S. L., Serrano-Pedraza, I., & Martinez-Conde, S. (2008). Saccades and microsaccades during visual fixation, exploration, and search: Foundations for a common saccadic generator. Journal of Vision, 8(14), 1–18.
Parr, L. A. (2004). Perceptual biases for multimodal cues in chimpanzee (Pan troglodytes) affect recognition. Animal Cognition, 7(3), 171–178.
Partan, S. R. (2002). Single and multichannel facial composition: Facial expressions and vocalizations of rhesus macaques (Macaca mulatta). Behaviour, 139, 993–1027.
Pevsner, J. (2002). Leonardo da Vinci’s contributions to neuroscience. Trends in Neurosciences, 25, 217–220.
Puce, A., & Perrett, D. I. (2003). Electrophysiology and brain imaging of biological motion. Philosophical Transactions of the Royal Society B: Biological Sciences, 358(1431), 435–445.
Rajkai, C., Lakatos, P., & Schroeder, C. E. (2008). Visual fixation-related neuronal activity in the auditory cortex of monkeys. Paper presented at the Society for Neuroscience, Washington, DC, November 15–19. Abstract 770.2
Redican, W. K. (1975). Facial expressions in nonhuman primates. In L. A. Rosenblum (Ed.), Primate behavior: Developments in field and laboratory research. (pp. 103–194). New York: Academic Press.
Romanski, L. M., Averbeck, B. B., & Diltz, M. (2005). Neural representation of vocalizations in the primate ventrolateral prefrontal cortex. Journal of Neurophysiology, 93(2), 734–747.
Ross, L. A., Saint-Amour, D., Leavitt, V. M., Javitt, D. C., & Foxe, J. J. (2007). Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex, 17(5), 1147–1153.
Scalaidhe, S. P., Albright, T. D., Rodman, H. R., & Gross, C. G. (1995). Effects of superior temporal polysensory area lesions on eye movements in the macaque monkey. Journal of Neurophysiology, 73(1), 1–19.
Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S., & Puce, A. (2008). Neuronal oscillations and visual amplification of speech. Trends in Cognitive Sciences, 12(3), 106–113.
Schwartz, J.-L., Berthommier, F., & Savariaux, C. (2004). Seeing to hear better: evidence for early audio-visual interactions in speech identification. Cognition, 93(2), B69–B78.
Seltzer, B., & Pandya, D. N. (1994). Parietal, temporal, and occipita projections to cortex of the superior temporal sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology, 343(3), 445–463.
Senkowski, D., Schneider, T. R., Foxe, J. J., & Engel, A. K. (2008). Crossmodal binding through neural coherence: Implications for multisensory processing. Trends in Neurosciences, 31(8), 401–409.
Shannon, R. V., Zeng, F.-G., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270(5234), 303–304.
Shepherd, S. V., Steckenfinger, S. A., Hasson, U., & Ghazanfar, A. A. (2010). Human-monkey gaze correlations reveal convergent and divergent patterns of movie viewing. Current Biology, 20, 649–656.
Smith, Z. M., Delgutte, B., & Oxenham, A. J. (2002). Chimaeric sounds reveal dichotomies in auditory perception. Nature, 416(6876), 87–90.
Sugihara, T., Diltz, M. D., Averbeck, B. B., & Romanski, L. M. (2006). Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. The Journal of Neuroscience, 26(43), 11138–11147.
Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26(2), 212–215.
Teufel, C., Ghazanfar, A. A., & Fischer, J. (2010). On the relationship between lateralized brain function and orienting asymmetries. Behavioral Neuroscience, 124, 437–445.
Van Der Horst, R., Leeuw, A. R., & Wouter, A. D. (1999). Importance of temporal-envelope cues in consonant recognition. Journal of the Acoustical Society of America, 105(3), 1801–1809.
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the USA, 102(4), 1181–1186.
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45(3), 598–607.
Vatikiotis-Bateson, E., Eigsti, I. M., Yano, S., & Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception. Perception & Psychophysics, 60(6), 926–940.
Wang, X., Merzenich, M. M., Beitel, R., & Schreiner, C. E. (1995). Representation of a species-specific vocalization in the primary auditory cortex of the common marmoset: Temporal and spectral characteristics. Journal of Neurophysiology, 74(6), 2685–2706.
Werner-Reiss, U., Kelly, K. A., Trause, A. S., Underhill, A. M., & Groh, J. M. (2003). Eye position affects activity in primary auditory cortex of primates. Current Biology, 13(7), 554–562.
Yehia, H. C., Kuratate, T., & Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion and speech acoustics. Journal of Phonetics, 30(3), 555–568.
Acknowledgments
The authors gratefully acknowledge the scientific contributions and numerous discussions with Ipek Kulahci, Joost Maier, Darshana Narayanan, Stephen Shepherd, and Daniel Takahashi. This work was supported by NIH R01NS054898, NSF BCS-0547760 CAREER Award and the James S. McDonnell Scholar Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Ghazanfar, A.A., Chandrasekaran, C. (2013). The Influence of Vision on Auditory Communication in Primates. In: Cohen, Y., Popper, A., Fay, R. (eds) Neural Correlates of Auditory Cognition. Springer Handbook of Auditory Research, vol 45. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-2350-8_7
Download citation
DOI: https://doi.org/10.1007/978-1-4614-2350-8_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-2349-2
Online ISBN: 978-1-4614-2350-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)