Elsevier

Cognition

Volume 80, Issues 1–2, June 2001, Pages 97-126
Cognition

Auditory and visual objects

https://doi.org/10.1016/S0010-0277(00)00155-4Get rights and content

Abstract

Notions of objecthood have traditionally been cast in visuocentric terminology. As a result, theories of auditory and cross-modal perception have focused more on the differences between modalities than on the similarities. In this paper we re-examine the concept of an object in a way that overcomes the limitations of the traditional perspective. We propose a new, cross-modal conception of objecthood which focuses on the similarities between modalities instead of the differences. Further, we propose that the auditory system might consist of two parallel streams of processing (the ‘what’ and ‘where’ subsystems) in a manner analogous to current conceptions of the visual system. We suggest that the ‘what’ subsystems in each modality are concerned with objecthood. Finally, we present evidence for – and elaborate on – the hypothesis that the auditory ‘where’ subsystem is in the service of the visual-motor ‘where’ subsystem.

Introduction

In this article we argue for the concept of an auditory object. Although some have found such a concept so strange that they avoid the term altogether in favor of ‘auditory event’ (Blauert, 1997, p. 2), we are convinced that it is both a useful and important concept. To clarify it, we offer a distinction between an auditory ‘what’ subsystem and an auditory ‘where’ subsystem (in a manner analogous to Milner & Goodale, 1995), and argue that the ‘what’ subsystem forms auditory objects, and that the ‘where’ subsystem is in the service of vision.

The bias against the idea of auditory objecthood is embedded in folk ontology. Language itself1 may lead us to believe that objects are visible by definition. For example, according to the Oxford English Dictionary, object means “Something placed before the eyes, or presented to the sight or other sense; an individual thing seen or perceived, or that may be seen or perceived; a material thing” (Object, 1993). The etymology of the word object explains the visuocentric connotation of the word: it derives from the Latin ob-, ‘before’ or ‘toward’, and iacere, ‘to throw’. It used to mean, “Something ‘thrown’ or put in the way, so as to interrupt or obstruct the course of a person or thing; an obstacle, a hindrance” (Object, 1993). Indeed, most visible things are obstacles or a hindrance to sight; they prevent you from seeing something that lies behind them because they are opaque.2

In this paper we will deviate from our everyday notion of object in order to extend it to audition. We will do this by finding a different criterion for objecthood, one that does not rely on the notion of opacity. We must do this because the notion of opacity simply does not apply to auditory perception. Material things can of course be opaque to sound (Beranek, 1988, Chapter 3). But we do not listen to material things, we listen to vibrating things – audible sources. One sound source does not in general prevent you from hearing another: many natural sounds, especially biological ones, are composed of a fundamental frequency and discrete harmonics – i.e. they are sparse, like fences. Furthermore, masking is rare in nature because the masking sound must be considerably louder than the masked one (e.g. it takes the sound of a waterfall or thunder to mask our voices).

Although one sound can mask another, Bregman (1990), in his discussion of the auditory continuity illusion, shows that audible sources do not offer a natural analog to opacity. The auditory continuity illusion is created when one deletes part of a signal and replaces it with a louder sound: the signal is perceived to continue uninterrupted ‘behind’ the sound. Bregman compares this illusion with the visual experience of continuity behind an occluder (Fig. 1): “Let us designate the interrupted sound or visual surface as A, and consider it to be divided into A1 and A2 by B, the interrupting entity… [In vision one] object's surface must end exactly where the other begins and the contours of A must reach dead ends where they visually meet the outline of B. In the auditory modality, the evidence for the continuity occurs in the properties of B itself as well as in A1 and A2; B must give rise to a set of neural properties that contain those of the missing part of A. In vision, on the other hand, if objects are opaque, there is no hint of the properties of A in the visual region occupied by B” (p. 383).

We pointed out earlier that we do not listen to material things, but to audible sources. The auditory system is generally concerned with sources of sound (such as speech or music), not with surfaces that reflect the sound (Bregman, 1990, pp. 36–38). In a series of experiments, Watkins (Watkins, 1991, Watkins, 1998, Watkins, 1999, Watkins and Makin, 1996) has explored how the auditory system compensates for the distortion of spectral envelope (the major determinant of the perceived identity of many sounds) caused by factors such as room reverberation.

For the visual system just the opposite is true: it is generally concerned with surfaces of objects, not with the sources that illuminate them. As Mollon (1995) points out (giving credit to Monge, 1789):

our visual system is built to recognise … permanent properties of objects, their spectral reflectances, … not … the spectral flux … (pp. 148–149).

These differences are summarized in Table 1.

For these reasons, we believe that to understand auditory objects we will have to rethink certain commonly accepted analogies between visual and auditory perception. In particular, we will show that both modalities are endowed with ‘what’ and ‘where’ subsystems, but that the relation between these four subsystems is complex. Obviously it is the ‘what’ subsystem of each modality that deals with objects, and so we will devote considerable attention to the auditory ‘what’ subsystem. But before we do, we must attend to the evidence connecting the auditory ‘where’ subsystem and the visuomotor orienting subsystem. We will claim that auditory localization is in the service of visual localization. This assertion is one of the cornerstones of our argument that space is not central to the formation of auditory objects.

Section snippets

Auditory ‘where’ in the service of visual ‘where’

When two auditory sources appear to come from different spatial locations, shouldn't we say that they constitute different auditory objects, as do Wightman and Jenison (1995, pp. 371–372)? We prefer not to, because we believe that auditory localization is in the service of visual orienting, a hypothesis first formulated at the turn of the twentieth century by Angell: auditory “localisation occurs in the space world of vision–touch–movement… Most persons seem to make their localisation of sounds

‘What’ subsystems: objects, grouping, figure-ground, and edges

A perceptual object is that which is susceptible to figure-ground segregation. This definition will allow us to develop a useful concept of auditory object. A critic who defines figure-ground segregation as a process applied to objects might claim that our definition is circular. But we believe that the benefit of the new definition outweighs the cost of abandoning the definition of figure-ground segregation in terms of objects. We believe that the process of grouping and most forms of feature

Evidence for two auditory subsystems

The idea of a parallel between the two visual subsystems and two auditory subsystems is gaining favor (Cisek & Turgeon, 1999). Unfortunately, the evidence for a separation of streams in the auditory system is scattered in the literature and may not be sufficiently strong to be conclusive. We turn first to behavioral evidence, and then present neurophysiological evidence.

It is possible to create an auditory illusion in which the ‘what’ of a stimulus is perceived correctly, but the ‘where’ is

Relations between ‘what’ and ‘where’ in vision and audition

We have suggested that the auditory ‘where’ subsystem is probably in the service of the visuomotor subsystem. The visual ‘where’ subsystem provides us with spatial information about the world in egocentric terms (Milner & Goodale, 1995). We believe the same to be true about the auditory ‘where’ subsystem. In other words, both the visual and the auditory ‘where’ subsystems may be thought of as being in the service of action.

It is harder to describe the relation between the visual and the

Overview

In summary, consider Fig. 14. On the left side of the diagram we have set out the characteristics of audition, and on the right we have done so for vision. Each of the modalities is represented by two pathways, one labeled ‘what’ and the other labeled ‘where’. We should stress that we are using the term ‘where’ as shorthand for the sense of Milner and Goodale (1995), i.e. a subsystem that maintains spatial information in egocentric coordinates for the purpose of controlling action. That is why

Conclusion

The human cortex contains 1010 neurons. Up to half of these may be involved in visual function (Palmer, 1999, p. 24); the auditory system is much smaller. This seems to confirm that reality unfolds in space and time and that understanding is visual. But we believe that the main source of resistance to a non-visuocentric view of perception is the ‘Knowing is Seeing’ metaphor. According to Lakoff and Johnson (1999, Table 4.1, pp. 53–54) this metaphor (summarized in Table 2) is a tool all of us

Acknowledgements

We wish to thank B.J. Scholl and J. Mehler for their superb editorial work on this paper. We are also grateful to those who contributed in various ways to this paper: A. Bregman, R.S. Bolia, C. Spence, S. Handel, C.L. Krumhansl, J.G. Neuhoff, B. Repp, M. Turgéon, and A.J. Watkins. Our work is supported by NEI grant No. R01 EY 12926-06.

References (97)

  • E.H. Adelson et al.

    The plenoptic function and the elements of early vision

  • L. Aitkin

    Coding for auditory space

  • R.A. Andersen et al.

    Multimodal representation of space in the posterior parietal cortex and its use in planning movements

    Annual Review of Neuroscience

    (1997)
  • J.R. Angell
  • C. Arias et al.

    Echolacation: a study of auditory functioning in blind and sighted subjects

    Journal of Visual Impairment & Blindness

    (1993)
  • K.A. Arnell et al.

    The attentional blink across stimulus modalities: evidence for central processing limitations

    Journal of Experimental Psychology: Human Perception and Performance

    (1999)
  • E. Aronson et al.

    Space perception in early infancy: perception within a common auditory-visual space

    Science

    (1971)
  • D.H. Ashmead et al.

    Auditory perception of walls via spectral variations in the ambient sound field

    Journal of Rehabilitation Research and Development

    (1999)
  • D.H. Ashmead et al.

    Echolocation reconsidered: using spatial variations in the ambient sound field to guide locomotion

    Journal of Visual Impairment & Blindness

    (1998)
  • L.L. Beranek

    Acoustical measurements

    (1988)
  • P. Bertelson et al.

    Automatic visual bias of perceived auditory location

    Psychonomic Bulletin & Review

    (1998)
  • J. Blauert

    Spatial hearing: the psychophysics of human sound localization

    (1997)
  • M.S. Brainard et al.

    Experience-dependent plasticity in the inferior colliculus: a site for visual calibration of the neural representation of auditory space in the barn owl

    Journal of Neuroscience

    (1993)
  • A. Bregman

    Auditory scene analysis: the perceptual organization of sound

    (1990)
  • A.S. Bregman et al.

    Primary auditory stream segregation and perception of order in rapid sequences of tones

    Journal of Experimental Psychology

    (1971)
  • D.E. Broadbent

    Perception and communication

    (1958)
  • R. Brochard et al.

    Perceptual organization of complex auditory sequences: effects of number of simultaneous subsequences and frequency separation

    Journal of Experimental Psychology: Human Perception and Performance

    (1999)
  • H.A. Butchel et al.

    Spatial attentional shifts: implications for the role of polysensory mechanisms

    Neuropsychologia

    (1988)
  • C. Cherry

    On human communication

    (1959)
  • P. Cisek et al.

    ‘Binding through the fovea’, a tale of perception in the service of action

    Psyche

    (1999)
  • R.K. Clifton

    The development of spatial hearing in human infants

  • J.F. Culling et al.

    Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay

    Journal of the Acoustical Society of America

    (1995)
  • C.J. Darwin et al.

    Auditory objects of attention: the role of interaural time differences

    Journal of Experimental Psychology: Human Perception and Performance

    (1999)
  • D. Deustch et al.

    Separate “what” and “where” decision mechanisms in processing a dichotic tonal sequence

    Journal of Experimental Psychology: Human Perception and Performance

    (1976)
  • J. Duncan et al.

    Restricted attentional capacity within but not between sensory modalities

    Nature

    (1997)
  • Ehrenfels, C. von (1988). On ‘gestalt qualities’. In B. Smith (Ed.), Foundations of gestalt theory (pp. 82–117)....
  • J.J. Gibson

    The ecological approach to visual perception

    (1979)
  • J. Goldring et al.

    Combined eye-head gaze shifts to visual and auditory targets in humans

    Experimental Brain Research

    (1996)
  • H. Goldstein

    Classical mechanics

    (1980)
  • E.R. Hafter

    Binaural adaptation and the effectiveness of a stimulus beyond its onset

  • S. Handel

    Space is to time as vision is to audition: seductive but misleading

    Journal of Experimental Psychology: Human Perception and Performance

    (1988)
  • S. Handel

    Listening: an introduction to the perception of auditory events

    (1989)
  • Helmholtz, H. L. F. (1954). On the sensations of tone as a physiological basis for the theory of music (2nd ed., A. J....
  • M.F. Jay et al.

    Auditory receptive fields in primate superior colliculus shift with changes in eye position

    Nature

    (1984)
  • M.F. Jay et al.

    Localization of auditory and visual targets for the initialization of saccadic eye movements

  • G. Johansson et al.

    Toward a new theory of vision. Studies in wide-angle space perception

    Ecological Psychology

    (1989)
  • G. Kanizsa

    Organization in vision: essays on Gestalt perception

    (1979)
  • Kant, I. (1996). Critique of pure reason (W. S. Pluhar, Trans.). Indianapolis, IN: Hackett. (Original work published...
  • Cited by (288)

    • The hearing hippocampus

      2022, Progress in Neurobiology
    View all citing articles on Scopus
    View full text