Auditory and visual objects
Introduction
In this article we argue for the concept of an auditory object. Although some have found such a concept so strange that they avoid the term altogether in favor of ‘auditory event’ (Blauert, 1997, p. 2), we are convinced that it is both a useful and important concept. To clarify it, we offer a distinction between an auditory ‘what’ subsystem and an auditory ‘where’ subsystem (in a manner analogous to Milner & Goodale, 1995), and argue that the ‘what’ subsystem forms auditory objects, and that the ‘where’ subsystem is in the service of vision.
The bias against the idea of auditory objecthood is embedded in folk ontology. Language itself1 may lead us to believe that objects are visible by definition. For example, according to the Oxford English Dictionary, object means “Something placed before the eyes, or presented to the sight or other sense; an individual thing seen or perceived, or that may be seen or perceived; a material thing” (Object, 1993). The etymology of the word object explains the visuocentric connotation of the word: it derives from the Latin ob-, ‘before’ or ‘toward’, and iacere, ‘to throw’. It used to mean, “Something ‘thrown’ or put in the way, so as to interrupt or obstruct the course of a person or thing; an obstacle, a hindrance” (Object, 1993). Indeed, most visible things are obstacles or a hindrance to sight; they prevent you from seeing something that lies behind them because they are opaque.2
In this paper we will deviate from our everyday notion of object in order to extend it to audition. We will do this by finding a different criterion for objecthood, one that does not rely on the notion of opacity. We must do this because the notion of opacity simply does not apply to auditory perception. Material things can of course be opaque to sound (Beranek, 1988, Chapter 3). But we do not listen to material things, we listen to vibrating things – audible sources. One sound source does not in general prevent you from hearing another: many natural sounds, especially biological ones, are composed of a fundamental frequency and discrete harmonics – i.e. they are sparse, like fences. Furthermore, masking is rare in nature because the masking sound must be considerably louder than the masked one (e.g. it takes the sound of a waterfall or thunder to mask our voices).
Although one sound can mask another, Bregman (1990), in his discussion of the auditory continuity illusion, shows that audible sources do not offer a natural analog to opacity. The auditory continuity illusion is created when one deletes part of a signal and replaces it with a louder sound: the signal is perceived to continue uninterrupted ‘behind’ the sound. Bregman compares this illusion with the visual experience of continuity behind an occluder (Fig. 1): “Let us designate the interrupted sound or visual surface as A, and consider it to be divided into A1 and A2 by B, the interrupting entity… [In vision one] object's surface must end exactly where the other begins and the contours of A must reach dead ends where they visually meet the outline of B. In the auditory modality, the evidence for the continuity occurs in the properties of B itself as well as in A1 and A2; B must give rise to a set of neural properties that contain those of the missing part of A. In vision, on the other hand, if objects are opaque, there is no hint of the properties of A in the visual region occupied by B” (p. 383).
We pointed out earlier that we do not listen to material things, but to audible sources. The auditory system is generally concerned with sources of sound (such as speech or music), not with surfaces that reflect the sound (Bregman, 1990, pp. 36–38). In a series of experiments, Watkins (Watkins, 1991, Watkins, 1998, Watkins, 1999, Watkins and Makin, 1996) has explored how the auditory system compensates for the distortion of spectral envelope (the major determinant of the perceived identity of many sounds) caused by factors such as room reverberation.
For the visual system just the opposite is true: it is generally concerned with surfaces of objects, not with the sources that illuminate them. As Mollon (1995) points out (giving credit to Monge, 1789):
These differences are summarized in Table 1.our visual system is built to recognise … permanent properties of objects, their spectral reflectances, … not … the spectral flux … (pp. 148–149).
For these reasons, we believe that to understand auditory objects we will have to rethink certain commonly accepted analogies between visual and auditory perception. In particular, we will show that both modalities are endowed with ‘what’ and ‘where’ subsystems, but that the relation between these four subsystems is complex. Obviously it is the ‘what’ subsystem of each modality that deals with objects, and so we will devote considerable attention to the auditory ‘what’ subsystem. But before we do, we must attend to the evidence connecting the auditory ‘where’ subsystem and the visuomotor orienting subsystem. We will claim that auditory localization is in the service of visual localization. This assertion is one of the cornerstones of our argument that space is not central to the formation of auditory objects.
Section snippets
Auditory ‘where’ in the service of visual ‘where’
When two auditory sources appear to come from different spatial locations, shouldn't we say that they constitute different auditory objects, as do Wightman and Jenison (1995, pp. 371–372)? We prefer not to, because we believe that auditory localization is in the service of visual orienting, a hypothesis first formulated at the turn of the twentieth century by Angell: auditory “localisation occurs in the space world of vision–touch–movement… Most persons seem to make their localisation of sounds
‘What’ subsystems: objects, grouping, figure-ground, and edges
A perceptual object is that which is susceptible to figure-ground segregation. This definition will allow us to develop a useful concept of auditory object. A critic who defines figure-ground segregation as a process applied to objects might claim that our definition is circular. But we believe that the benefit of the new definition outweighs the cost of abandoning the definition of figure-ground segregation in terms of objects. We believe that the process of grouping and most forms of feature
Evidence for two auditory subsystems
The idea of a parallel between the two visual subsystems and two auditory subsystems is gaining favor (Cisek & Turgeon, 1999). Unfortunately, the evidence for a separation of streams in the auditory system is scattered in the literature and may not be sufficiently strong to be conclusive. We turn first to behavioral evidence, and then present neurophysiological evidence.
It is possible to create an auditory illusion in which the ‘what’ of a stimulus is perceived correctly, but the ‘where’ is
Relations between ‘what’ and ‘where’ in vision and audition
We have suggested that the auditory ‘where’ subsystem is probably in the service of the visuomotor subsystem. The visual ‘where’ subsystem provides us with spatial information about the world in egocentric terms (Milner & Goodale, 1995). We believe the same to be true about the auditory ‘where’ subsystem. In other words, both the visual and the auditory ‘where’ subsystems may be thought of as being in the service of action.
It is harder to describe the relation between the visual and the
Overview
In summary, consider Fig. 14. On the left side of the diagram we have set out the characteristics of audition, and on the right we have done so for vision. Each of the modalities is represented by two pathways, one labeled ‘what’ and the other labeled ‘where’. We should stress that we are using the term ‘where’ as shorthand for the sense of Milner and Goodale (1995), i.e. a subsystem that maintains spatial information in egocentric coordinates for the purpose of controlling action. That is why
Conclusion
The human cortex contains 1010 neurons. Up to half of these may be involved in visual function (Palmer, 1999, p. 24); the auditory system is much smaller. This seems to confirm that reality unfolds in space and time and that understanding is visual. But we believe that the main source of resistance to a non-visuocentric view of perception is the ‘Knowing is Seeing’ metaphor. According to Lakoff and Johnson (1999, Table 4.1, pp. 53–54) this metaphor (summarized in Table 2) is a tool all of us
Acknowledgements
We wish to thank B.J. Scholl and J. Mehler for their superb editorial work on this paper. We are also grateful to those who contributed in various ways to this paper: A. Bregman, R.S. Bolia, C. Spence, S. Handel, C.L. Krumhansl, J.G. Neuhoff, B. Repp, M. Turgéon, and A.J. Watkins. Our work is supported by NEI grant No. R01 EY 12926-06.
References (97)
- et al.
The functional anatomy of visuo-tactile integration in man: a study using pet
Neuropsychologia
(2000) - et al.
Evidence from functional magnetic resonance imaging of crossmodal binding in human heteromodal cortex
Current Biology
(2000) - et al.
Auditory agnosia and spatial deficits following left hemispheric lesions: evidence for distinct processing pathways
Neuropsychologia
(2000) - et al.
Spatial characteristics of visual-auditory summation in human saccades
Vision Research
(1998) - et al.
On the lawfulness of grouping by proximity
Cognitive Psychology
(1998) Cortical processing of complex sounds
Current Opinions in Neurobiology
(1998)- et al.
Does auditory attention shift in the direction of an upcoming saccade?
Neuropsychologia
(1999) - et al.
A feature-integration theory of attention
Cognitive Psychology
(1980) - et al.
Auditory spatial layout
- et al.
Inhibition of return: effects of attentional cuing on eye movement latencies
Journal of Experimental Psychology: Human Perception and Performance
(1994)
The plenoptic function and the elements of early vision
Coding for auditory space
Multimodal representation of space in the posterior parietal cortex and its use in planning movements
Annual Review of Neuroscience
Echolacation: a study of auditory functioning in blind and sighted subjects
Journal of Visual Impairment & Blindness
The attentional blink across stimulus modalities: evidence for central processing limitations
Journal of Experimental Psychology: Human Perception and Performance
Space perception in early infancy: perception within a common auditory-visual space
Science
Auditory perception of walls via spectral variations in the ambient sound field
Journal of Rehabilitation Research and Development
Echolocation reconsidered: using spatial variations in the ambient sound field to guide locomotion
Journal of Visual Impairment & Blindness
Acoustical measurements
Automatic visual bias of perceived auditory location
Psychonomic Bulletin & Review
Spatial hearing: the psychophysics of human sound localization
Experience-dependent plasticity in the inferior colliculus: a site for visual calibration of the neural representation of auditory space in the barn owl
Journal of Neuroscience
Auditory scene analysis: the perceptual organization of sound
Primary auditory stream segregation and perception of order in rapid sequences of tones
Journal of Experimental Psychology
Perception and communication
Perceptual organization of complex auditory sequences: effects of number of simultaneous subsequences and frequency separation
Journal of Experimental Psychology: Human Perception and Performance
Spatial attentional shifts: implications for the role of polysensory mechanisms
Neuropsychologia
On human communication
‘Binding through the fovea’, a tale of perception in the service of action
Psyche
The development of spatial hearing in human infants
Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay
Journal of the Acoustical Society of America
Auditory objects of attention: the role of interaural time differences
Journal of Experimental Psychology: Human Perception and Performance
Separate “what” and “where” decision mechanisms in processing a dichotic tonal sequence
Journal of Experimental Psychology: Human Perception and Performance
Restricted attentional capacity within but not between sensory modalities
Nature
The ecological approach to visual perception
Combined eye-head gaze shifts to visual and auditory targets in humans
Experimental Brain Research
Classical mechanics
Binaural adaptation and the effectiveness of a stimulus beyond its onset
Space is to time as vision is to audition: seductive but misleading
Journal of Experimental Psychology: Human Perception and Performance
Listening: an introduction to the perception of auditory events
Auditory receptive fields in primate superior colliculus shift with changes in eye position
Nature
Localization of auditory and visual targets for the initialization of saccadic eye movements
Toward a new theory of vision. Studies in wide-angle space perception
Ecological Psychology
Organization in vision: essays on Gestalt perception
Cited by (288)
The hearing hippocampus
2022, Progress in NeurobiologyListening to trees in the forest: Attentional set influences how semantic and acoustic factors interact in auditory perception
2024, Attention, Perception, and PsychophysicsThe role of auditory source and action representations in segmenting experience into events
2024, Nature Reviews Psychology