Integration of auditory and visual information in the recognition of realistic objects

Suied, Clara; Bonneel, Nicolas; Viaud-Delmon, Isabelle

doi:10.1007/s00221-008-1672-6

Integration of auditory and visual information in the recognition of realistic objects

Research Article
Published: 18 December 2008

Volume 194, pages 91–102, (2009)
Cite this article

Experimental Brain Research Aims and scope Submit manuscript

Clara Suied^1,2,
Nicolas Bonneel³ &
Isabelle Viaud-Delmon^1,2

767 Accesses
44 Citations
Explore all metrics

Abstract

Recognizing a natural object requires one to pool information from various sensory modalities, and to ignore information from competing objects. That the same semantic knowledge can be accessed through different modalities makes it possible to explore the retrieval of supramodal object concepts. Here, object-recognition processes were investigated by manipulating the relationships between sensory modalities, specifically, semantic content, and spatial alignment between auditory and visual information. Experiments were run under realistic virtual environment. Participants were asked to react as fast as possible to a target object presented in the visual and/or the auditory modality and to inhibit a distractor object (go/no-go task). Spatial alignment had no effect on object-recognition time. The only spatial effect observed was a stimulus–response compatibility between the auditory stimulus and the hand position. Reaction times were significantly shorter for semantically congruent bimodal stimuli than would be predicted by independent processing of information about the auditory and visual targets. Interestingly, this bimodal facilitation effect was twice as large as found in previous studies that also used information-rich stimuli. An interference effect was observed (i.e. longer reaction times to semantically incongruent stimuli than to the corresponding unimodal stimulus) only when the distractor was auditory. When the distractor was visual, the semantic incongruence did not interfere with object recognition. Our results show that immersive displays with large visual stimuli may provide large multimodal integration effects, and reveal a possible asymmetry in the attentional filtering of irrelevant auditory and visual information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Auditory enhancement of visual searches for event scenes

Article 10 January 2022

Object-based attention in complex, naturalistic auditory streams

Article Open access 27 February 2019

Automatic auditory disambiguation of visual awareness

Article 20 June 2017

Notes

A third alternative has been proposed by Mordkoff and Yantis (1991), showing that inter-stimulus contingencies could, in some cases, entirely explain the violation of the race model, thus challenging the conclusion of an integration of the sensory channels in the presence of these contingencies.
To be able to compare our results with previous studies, all the analyses were also performed on the initial non-transformed distribution.
For these analyses, as for all the other ones, the ANOVA on the non-transformed distribution gave similar results.

References

Alais D, Morrone C, Burr D (2006) Separate attentional resources for vision and audition. Proc Biol Sci 273:1339–1345
Article PubMed Google Scholar
Bedford FL (2001) Toward a general law of numerical/object identity. Cahiers de Psychologie Cognitive/Curr Psychol Cogn 20:113–176
Google Scholar
Bedford F (2004) Analysis of a constraint on perception, cognition, and development: one object, one place, one time. J Exp Psychol Hum Percept Perform 30:907–912
Article PubMed Google Scholar
Bertelson P, Vroomen J, Wiegeraad G, de Gelder B (1994) Exploring the relation between McGurk interference and ventriloquism. In: International conference on spoken language processing, Yokohama, Japan, pp 556–562
Calvert GA, Thesen T (2004) Multisensory integration: methodological approaches and emerging principles in the human brain. J Physiol Paris 98:191–205
Article PubMed Google Scholar
Calvert GA, Brammer MJ, Iversen SD (1998) Crossmodal identification. Trends Cogn Sci 2:247–253
Article Google Scholar
Caramazza A, Hillis AE, Rapp BC, Romani C (1990) The multiple semantic hypothesis: multiple confusions? Cogn Neuropsychol 7:161–189
Article Google Scholar
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, Hillsdale
Google Scholar
Frassinetti F, Bolognini N, Ladavas E (2002) Enhancement of visual perception by crossmodal visuo-auditory interaction. Exp Brain Res 147:332–343
Article PubMed Google Scholar
Frens MA, Van Opstal AJ, Van der Willigen RF (1995) Spatial and temporal factors determine auditory–visual interactions in human saccadic eye movements. Percept Psychophys 57:802–816
PubMed CAS Google Scholar
Giard MH, Peronnet F (1999) Auditory–visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J Cogn Neurosci 11:473–490
Article PubMed CAS Google Scholar
Giray M, Ulrich R (1993) Motor coactivation revealed by response force in divided and focused attention. J Exp Psychol Hum Percept Perform 19:1278–1291
Article PubMed CAS Google Scholar
Gondan M, Niederhaus B, Rosler F, Roder B (2005) Multisensory processing in the redundant-target effect: a behavioral and event-related potential study. Percept Psychophys 67:713–726
PubMed Google Scholar
Grice GR, Canham L (1990) Redundancy phenomena are affected by response requirements. Percept Psychophys 48:209–213
PubMed CAS Google Scholar
Grice GR, Gwynne JW (1987) Dependence of target redundancy effects on noise conditions and number of targets. Percept Psychophys 42:29–36
PubMed CAS Google Scholar
Grice GR, Reed JM (1992) What makes targets redundant? Percept Psychophys 51:437–442
PubMed CAS Google Scholar
Grice GR, Canham L, Gwynne JW (1984) Absence of a redundant-signals effect in a reaction time task with divided attention. Percept Psychophys 36:565–570
PubMed CAS Google Scholar
Harrington LK, Peck CK (1998) Spatial disparity affects visual–auditory interactions in human sensorimotor processing. Exp Brain Res 122:247–252
Article PubMed CAS Google Scholar
Hershenson M (1962) Reaction time as a measure of intersensory facilitation. J Exp Psychol 63:289–293
Article PubMed CAS Google Scholar
Holmes NP, Spence C (2005) Multisensory integration: space, time and superadditivity. Curr Biol 15:R762–R764
Article PubMed CAS Google Scholar
Hughes HC, Reuter-Lorenz PA, Nozawa G, Fendrich R (1994) Visual–auditory interactions in sensorimotor processing: saccades versus manual responses. J Exp Psychol Hum Percept Perform 20:131–153
Article PubMed CAS Google Scholar
Kinchla RA (1974) Detecting target elements in multielement displays: a confusability model. Percept Psychophys 15:149–158
Google Scholar
Laurienti PJ, Kraft RA, Maldjian JA, Burdette JH, Wallace MT (2004) Semantic congruence is a critical factor in multisensory behavioral performance. Exp Brain Res 158:405–414
Article PubMed Google Scholar
Lehmann S, Murray MM (2005) The role of multisensory memories in unisensory object discrimination. Brain Res Cogn Brain Res 24:326–334
Article PubMed Google Scholar
Lu CH, Proctor RW (1995) The influence of irrelevant location information on performance: a review of the Simon and spatial Stroop effects. Psychon Bull Rev 2:174–207
Google Scholar
Luce RD (1986) Response times: their role in inferring elementary mental organization. Oxford University Press, New York
Google Scholar
Martin A (2007) The representation of object concepts in the brain. Annu Rev Psychol 58:25–45
Article PubMed Google Scholar
Miller J (1982) Divided attention: evidence for coactivation with redundant signals. Cogn Psychol 14:247–279
Article PubMed CAS Google Scholar
Miller J (1986) Timecourse of coactivation in bimodal divided attention. Percept Psychophys 40:331–343
PubMed CAS Google Scholar
Miller J (1991) Channel interaction and the redundant-targets effect in bimodal divided attention. J Exp Psychol Hum Percept Perform 17:160–169
Article PubMed CAS Google Scholar
Moeck T, Bonneel N, Tsingos N, Drettakis G, Viaud-Delmon I, Alloza D (2007) Progressive perceptual audio rendering of complex scenes. In: ACM SIGGRAPH symposium on interactive 3D graphics and games
Molholm S, Ritter W, Javitt DC, Foxe JJ (2004) Multisensory visual–auditory object recognition in humans: a high-density electrical mapping study. Cereb Cortex 14:452–465
Article PubMed Google Scholar
Mordkoff JT, Yantis S (1991) An interactive race model of divided attention. J Exp Psychol Hum Percept Perform 17:520–538
Article PubMed CAS Google Scholar
Murray MM, Molholm S, Michel CM, Heslenfeld DJ, Ritter W, Javitt DC, Schroeder CE, Foxe JJ (2005) Grabbing your ear: rapid auditory–somatosensory multisensory interactions in low-level sensory cortices are not constrained by stimulus alignment. Cereb Cortex 15:963–974
Article PubMed Google Scholar
Patterson K, Nestor PJ, Rogers TT (2007) Where do you know what you know? The representation of semantic knowledge in the human brain. Nat Rev Neurosci 8:976–987
Article PubMed CAS Google Scholar
Raab DH (1962) Statistical facilitation of simple reaction times. Trans N Y Acad Sci 24:574–590
PubMed CAS Google Scholar
Radeau M, Bertelson P (1977) Adaptation to auditory–visual discordance and ventriloquism in semirealistic situations. Percept Psychophys 22:137–146
Google Scholar
Radeau M, Bertelson P (1978) Cognitive factors and adaptation to auditory–visual discordance. Percept Psychophys 23:341–343
PubMed CAS Google Scholar
Riddoch MJ, Humphreys GW, Coltheart M, Funnell E (1988) Semantic systems or system? Neuropsychological evidence re-examined. Cogn Neuropsychol 5:3–25
Article Google Scholar
Savazzi S, Marzi CA (2008) Does the redundant signal effect occur at an early visual stage? Exp Brain Res 184:275–281
Article PubMed Google Scholar
Schmitt M, Postma A, de Haan E (2000) Interactions between exogenous auditory and visual spatial attention. Q J Exp Psychol A 53:105–130
Article PubMed CAS Google Scholar
Schroger E, Widmann A (1998) Speeded responses to audiovisual signal changes result from bimodal integration. Psychophysiology 35:755–759
Article PubMed CAS Google Scholar
Schwarz W (1996) Further tests of the interactive race model of divided attention: the effects of negative bias and varying stimulus-onset asynchronies. Psychol Res 58:233–245
Article Google Scholar
Simon JR, Craft JL (1970) Effects of an irrelevant auditory stimulus on visual choice reaction time. J Exp Psychol 86:272–274
Article PubMed CAS Google Scholar
Simon JR, Sly PE, Vilapakkam S (1981) Effect of compatibility of SR mapping on reactions toward the stimulus source. Acta Psychol 47:63–81
Article CAS Google Scholar
Smith EL, Grabowecky M, Suzuki S (2007) Auditory–visual crossmodal integration in perception of face gender. Curr Biol 17:1680–1685
Article PubMed CAS Google Scholar
Stein BE, Meredith MA (1993) The merging of the senses. MIT, Cambridge
Google Scholar
Stein BE, London N, Wilkinson LK, Price DD (1996) Enhancement of perceived visual intensity by auditory stimuli: a psychophysical analysis. J Cogn Neurosci 8:497–506
Article Google Scholar
Teder-Salejarvi WA, Di Russo F, McDonald JJ, Hillyard SA (2005) Effects of spatial congruity on audio–visual multimodal integration. J Cogn Neurosci 17:1396–1409
Article PubMed CAS Google Scholar
Ulrich R, Miller J (1993) Information processing models generating lognormally distributed reaction times. J Math Psychol 37:513–525
Article Google Scholar
Ulrich R, Miller J, Schroter H (2007) Testing the race model inequality: an algorithm and computer programs. Behav Res Methods 39:291–302
PubMed Google Scholar
Yuval-Greenberg S, Deouell LY (2007) What you see is not (always) what you hear: induced gamma band responses reflect cross-modal interactions in familiar object recognition. J Neurosci 27:1090–1096
Article PubMed CAS Google Scholar
Zampini M, Torresan D, Spence C, Murray MM (2007) Auditory–somatosensory multisensory interactions in front and rear space. Neuropsychologia 45:1869–1877
Article PubMed Google Scholar
Zorzi M, Umilta C (1995) A computational model of the Simon effect. Psychol Res 58:193–205
Article PubMed CAS Google Scholar

Download references

Acknowledgments

We thank Khoa-Van Nguyen, Olivier Warusfel, George Drettakis, and Grace Leslie for their help. We are grateful to Shihab Shamma, Daniel Pressnitzer, Laurence Harris and two anonymous reviewers for useful comments on a previous version of this manuscript. This research was supported by the EU IST FP6 Open FET project CROSSMOD: “Crossmodal Perceptual Interaction and Rendering” IST-04891.

Author information

Authors and Affiliations

CNRS, UPMC UMR 7593, Hôpital de la Salpêtrière, Paris, France
Clara Suied & Isabelle Viaud-Delmon
IRCAM, CNRS UMR 9912, 1, place Igor Stravinsky, 75004, Paris, France
Clara Suied & Isabelle Viaud-Delmon
REVES, INRIA, Sophia-Antipolis, France
Nicolas Bonneel

Authors

Clara Suied
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Bonneel
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Viaud-Delmon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Clara Suied.

Appendix

An additional experiment was performed because we could not conclude from the results of our main experiment whether shorter RTs to semantically congruent stimuli than to semantically incongruent stimuli were due to semantic congruence or simply to redundancy of information. In this new experiment, the target stimuli were the sound of a frog (A+_f) and the image of a phone (V+_p). Participants had to respond to A+_f, V+_p, or when both were presented simultaneously (A+_fV+_p). In this case, the redundant target condition was also a semantically incongruent stimulus, whereas the non-redundant target condition was a semantically congruent stimulus. If the RSE observed in the main experiment was related to the semantic congruence between the auditory and the visual parts of the stimulus, there should be no bimodal integration for the incongruent stimuli (redundant targets) in the present control experiment. In addition, if semantically congruent trials benefited from crossmodal integration, mean RTs in semantically congruent trials (non-redundant target) should be shorter than mean RTs in semantically incongruent trials (redundant targets).

Eleven volunteers (5 women; mean age 30.9 ± 8 years; all but one right-handed) participated in the experiment. All were naïve with respect to the purpose of the experiment. None of them reported having hearing problems, and all reported normal or corrected-to-normal vision. All participants provided informed consent to participate in the study. Apparatus and stimuli were exactly the same as in the main experiment. Procedure was also highly similar, except for the definition of the go and no-go conditions. There were five go conditions: auditory frog alone (A+_f), visual phone alone (V+_p), auditory frog with a visual phone (A+_fV+_p), auditory frog with a visual frog (A+_fV−_f), and auditory phone with a visual phone (A−_pV+_p). The A+_fV+_p condition was the only redundant target; the other four conditions were non-redundant targets. The no-go conditions were an auditory phone alone (A−_p), a visual frog alone (V−_f) and an auditory phone with a visual frog (A−_pV−_f). Each go condition was presented 48 times and each no-go condition was presented 20 times. In this additional experiment, there are no inter-stimulus contingencies. Thus, in the case of a potential RSE, this would not be due to the contingencies. The entire experiment for each participant consisted of 300 stimuli of which 240 (80%) were task-relevant stimuli (go responses). Statistical analyses were similar as the ones performed in the main experiment (log-transformation and ANOVA on the mean ln(RTs)), except that we did not remove RTs greater than 1,000 ms, due to the difficulty of the task that lead to RTs of the order of 650 ms on average.

Nonparametric repeated-measures ANOVA (Friedman’s test) revealed a significant effect of condition (A−, V−, A−V−_i) on percentage of false alarms (χ ²(2) = 9.5; P < 0.01). Percentage of false alarms was higher with a bimodal stimulus A−_pV−_f (39.5 ± 6.7%) than with a unimodal one (21.4 ± 4.4% for A−_p and 21.8 ± 4.9% for V−_f). Only 0.9 ± 0.1% of misses were observed. Overall, the larger number of false alarms here in comparison to the main experiment (around three times more) could reveal the difficulty of the task (attend to two different objects at the same time).

RTs of this additional experiment are represented in Fig. 5. The distribution of the residuals of the ANOVA was not different from a normal distribution (Kolmogorov–Smirnov test: d = 0.09; N = 55; P > 0.2). Overall, RTs observed in this additional experiment were much longer than those of the main experiment (more than 600 ms here, compared to around 350 ms in the main experiment). This confirms the difficulty of the task. To identify between-condition differences in mean ln(RTs), a repeated-measures ANOVA was conducted with the five conditions as a within-subjects factor (A+_f, V+_p, A+_fV+_p, A+_fV−_f, A−_pV+_p). It revealed a significant main effect of condition (F _4,60 = 6.14; ε = 0.8; P < 0.001). Post hoc Tukey HSD tests revealed that this effect was due to a difference between A+_f and A+_fV+_p (P < 0.001) and a difference between A+_fV+_p and A+_fV−_f (P < 0.004). Importantly, there was no significant difference between the shortest of the unimodal conditions (here, V+_p) and the redundant target (A+_fV+_p) (P = 0.5). In other words, we observed no bimodal facilitation effect for redundant target (semantically incongruent stimulus). It is of course difficult to interpret unambiguously a null effect; however, it strongly suggests that semantic incongruence of redundant stimuli prevents any redundant facilitation effect.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suied, C., Bonneel, N. & Viaud-Delmon, I. Integration of auditory and visual information in the recognition of realistic objects. Exp Brain Res 194, 91–102 (2009). https://doi.org/10.1007/s00221-008-1672-6

Download citation

Received: 08 July 2008
Accepted: 26 November 2008
Published: 18 December 2008
Issue Date: March 2009
DOI: https://doi.org/10.1007/s00221-008-1672-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integration of auditory and visual information in the recognition of realistic objects

Abstract

Access this article

Similar content being viewed by others

Auditory enhancement of visual searches for event scenes

Object-based attention in complex, naturalistic auditory streams

Automatic auditory disambiguation of visual awareness

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integration of auditory and visual information in the recognition of realistic objects

Abstract

Access this article

Similar content being viewed by others

Auditory enhancement of visual searches for event scenes

Object-based attention in complex, naturalistic auditory streams

Automatic auditory disambiguation of visual awareness

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation