Skip to main content
Log in

Integration of auditory and visual information in the recognition of realistic objects

  • Research Article
  • Published:
Experimental Brain Research Aims and scope Submit manuscript

Abstract

Recognizing a natural object requires one to pool information from various sensory modalities, and to ignore information from competing objects. That the same semantic knowledge can be accessed through different modalities makes it possible to explore the retrieval of supramodal object concepts. Here, object-recognition processes were investigated by manipulating the relationships between sensory modalities, specifically, semantic content, and spatial alignment between auditory and visual information. Experiments were run under realistic virtual environment. Participants were asked to react as fast as possible to a target object presented in the visual and/or the auditory modality and to inhibit a distractor object (go/no-go task). Spatial alignment had no effect on object-recognition time. The only spatial effect observed was a stimulus–response compatibility between the auditory stimulus and the hand position. Reaction times were significantly shorter for semantically congruent bimodal stimuli than would be predicted by independent processing of information about the auditory and visual targets. Interestingly, this bimodal facilitation effect was twice as large as found in previous studies that also used information-rich stimuli. An interference effect was observed (i.e. longer reaction times to semantically incongruent stimuli than to the corresponding unimodal stimulus) only when the distractor was auditory. When the distractor was visual, the semantic incongruence did not interfere with object recognition. Our results show that immersive displays with large visual stimuli may provide large multimodal integration effects, and reveal a possible asymmetry in the attentional filtering of irrelevant auditory and visual information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. A third alternative has been proposed by Mordkoff and Yantis (1991), showing that inter-stimulus contingencies could, in some cases, entirely explain the violation of the race model, thus challenging the conclusion of an integration of the sensory channels in the presence of these contingencies.

  2. To be able to compare our results with previous studies, all the analyses were also performed on the initial non-transformed distribution.

  3. For these analyses, as for all the other ones, the ANOVA on the non-transformed distribution gave similar results.

References

  • Alais D, Morrone C, Burr D (2006) Separate attentional resources for vision and audition. Proc Biol Sci 273:1339–1345

    Article  PubMed  Google Scholar 

  • Bedford FL (2001) Toward a general law of numerical/object identity. Cahiers de Psychologie Cognitive/Curr Psychol Cogn 20:113–176

    Google Scholar 

  • Bedford F (2004) Analysis of a constraint on perception, cognition, and development: one object, one place, one time. J Exp Psychol Hum Percept Perform 30:907–912

    Article  PubMed  Google Scholar 

  • Bertelson P, Vroomen J, Wiegeraad G, de Gelder B (1994) Exploring the relation between McGurk interference and ventriloquism. In: International conference on spoken language processing, Yokohama, Japan, pp 556–562

  • Calvert GA, Thesen T (2004) Multisensory integration: methodological approaches and emerging principles in the human brain. J Physiol Paris 98:191–205

    Article  PubMed  Google Scholar 

  • Calvert GA, Brammer MJ, Iversen SD (1998) Crossmodal identification. Trends Cogn Sci 2:247–253

    Article  Google Scholar 

  • Caramazza A, Hillis AE, Rapp BC, Romani C (1990) The multiple semantic hypothesis: multiple confusions? Cogn Neuropsychol 7:161–189

    Article  Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, Hillsdale

    Google Scholar 

  • Frassinetti F, Bolognini N, Ladavas E (2002) Enhancement of visual perception by crossmodal visuo-auditory interaction. Exp Brain Res 147:332–343

    Article  PubMed  Google Scholar 

  • Frens MA, Van Opstal AJ, Van der Willigen RF (1995) Spatial and temporal factors determine auditory–visual interactions in human saccadic eye movements. Percept Psychophys 57:802–816

    PubMed  CAS  Google Scholar 

  • Giard MH, Peronnet F (1999) Auditory–visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J Cogn Neurosci 11:473–490

    Article  PubMed  CAS  Google Scholar 

  • Giray M, Ulrich R (1993) Motor coactivation revealed by response force in divided and focused attention. J Exp Psychol Hum Percept Perform 19:1278–1291

    Article  PubMed  CAS  Google Scholar 

  • Gondan M, Niederhaus B, Rosler F, Roder B (2005) Multisensory processing in the redundant-target effect: a behavioral and event-related potential study. Percept Psychophys 67:713–726

    PubMed  Google Scholar 

  • Grice GR, Canham L (1990) Redundancy phenomena are affected by response requirements. Percept Psychophys 48:209–213

    PubMed  CAS  Google Scholar 

  • Grice GR, Gwynne JW (1987) Dependence of target redundancy effects on noise conditions and number of targets. Percept Psychophys 42:29–36

    PubMed  CAS  Google Scholar 

  • Grice GR, Reed JM (1992) What makes targets redundant? Percept Psychophys 51:437–442

    PubMed  CAS  Google Scholar 

  • Grice GR, Canham L, Gwynne JW (1984) Absence of a redundant-signals effect in a reaction time task with divided attention. Percept Psychophys 36:565–570

    PubMed  CAS  Google Scholar 

  • Harrington LK, Peck CK (1998) Spatial disparity affects visual–auditory interactions in human sensorimotor processing. Exp Brain Res 122:247–252

    Article  PubMed  CAS  Google Scholar 

  • Hershenson M (1962) Reaction time as a measure of intersensory facilitation. J Exp Psychol 63:289–293

    Article  PubMed  CAS  Google Scholar 

  • Holmes NP, Spence C (2005) Multisensory integration: space, time and superadditivity. Curr Biol 15:R762–R764

    Article  PubMed  CAS  Google Scholar 

  • Hughes HC, Reuter-Lorenz PA, Nozawa G, Fendrich R (1994) Visual–auditory interactions in sensorimotor processing: saccades versus manual responses. J Exp Psychol Hum Percept Perform 20:131–153

    Article  PubMed  CAS  Google Scholar 

  • Kinchla RA (1974) Detecting target elements in multielement displays: a confusability model. Percept Psychophys 15:149–158

    Google Scholar 

  • Laurienti PJ, Kraft RA, Maldjian JA, Burdette JH, Wallace MT (2004) Semantic congruence is a critical factor in multisensory behavioral performance. Exp Brain Res 158:405–414

    Article  PubMed  Google Scholar 

  • Lehmann S, Murray MM (2005) The role of multisensory memories in unisensory object discrimination. Brain Res Cogn Brain Res 24:326–334

    Article  PubMed  Google Scholar 

  • Lu CH, Proctor RW (1995) The influence of irrelevant location information on performance: a review of the Simon and spatial Stroop effects. Psychon Bull Rev 2:174–207

    Google Scholar 

  • Luce RD (1986) Response times: their role in inferring elementary mental organization. Oxford University Press, New York

    Google Scholar 

  • Martin A (2007) The representation of object concepts in the brain. Annu Rev Psychol 58:25–45

    Article  PubMed  Google Scholar 

  • Miller J (1982) Divided attention: evidence for coactivation with redundant signals. Cogn Psychol 14:247–279

    Article  PubMed  CAS  Google Scholar 

  • Miller J (1986) Timecourse of coactivation in bimodal divided attention. Percept Psychophys 40:331–343

    PubMed  CAS  Google Scholar 

  • Miller J (1991) Channel interaction and the redundant-targets effect in bimodal divided attention. J Exp Psychol Hum Percept Perform 17:160–169

    Article  PubMed  CAS  Google Scholar 

  • Moeck T, Bonneel N, Tsingos N, Drettakis G, Viaud-Delmon I, Alloza D (2007) Progressive perceptual audio rendering of complex scenes. In: ACM SIGGRAPH symposium on interactive 3D graphics and games

  • Molholm S, Ritter W, Javitt DC, Foxe JJ (2004) Multisensory visual–auditory object recognition in humans: a high-density electrical mapping study. Cereb Cortex 14:452–465

    Article  PubMed  Google Scholar 

  • Mordkoff JT, Yantis S (1991) An interactive race model of divided attention. J Exp Psychol Hum Percept Perform 17:520–538

    Article  PubMed  CAS  Google Scholar 

  • Murray MM, Molholm S, Michel CM, Heslenfeld DJ, Ritter W, Javitt DC, Schroeder CE, Foxe JJ (2005) Grabbing your ear: rapid auditory–somatosensory multisensory interactions in low-level sensory cortices are not constrained by stimulus alignment. Cereb Cortex 15:963–974

    Article  PubMed  Google Scholar 

  • Patterson K, Nestor PJ, Rogers TT (2007) Where do you know what you know? The representation of semantic knowledge in the human brain. Nat Rev Neurosci 8:976–987

    Article  PubMed  CAS  Google Scholar 

  • Raab DH (1962) Statistical facilitation of simple reaction times. Trans N Y Acad Sci 24:574–590

    PubMed  CAS  Google Scholar 

  • Radeau M, Bertelson P (1977) Adaptation to auditory–visual discordance and ventriloquism in semirealistic situations. Percept Psychophys 22:137–146

    Google Scholar 

  • Radeau M, Bertelson P (1978) Cognitive factors and adaptation to auditory–visual discordance. Percept Psychophys 23:341–343

    PubMed  CAS  Google Scholar 

  • Riddoch MJ, Humphreys GW, Coltheart M, Funnell E (1988) Semantic systems or system? Neuropsychological evidence re-examined. Cogn Neuropsychol 5:3–25

    Article  Google Scholar 

  • Savazzi S, Marzi CA (2008) Does the redundant signal effect occur at an early visual stage? Exp Brain Res 184:275–281

    Article  PubMed  Google Scholar 

  • Schmitt M, Postma A, de Haan E (2000) Interactions between exogenous auditory and visual spatial attention. Q J Exp Psychol A 53:105–130

    Article  PubMed  CAS  Google Scholar 

  • Schroger E, Widmann A (1998) Speeded responses to audiovisual signal changes result from bimodal integration. Psychophysiology 35:755–759

    Article  PubMed  CAS  Google Scholar 

  • Schwarz W (1996) Further tests of the interactive race model of divided attention: the effects of negative bias and varying stimulus-onset asynchronies. Psychol Res 58:233–245

    Article  Google Scholar 

  • Simon JR, Craft JL (1970) Effects of an irrelevant auditory stimulus on visual choice reaction time. J Exp Psychol 86:272–274

    Article  PubMed  CAS  Google Scholar 

  • Simon JR, Sly PE, Vilapakkam S (1981) Effect of compatibility of SR mapping on reactions toward the stimulus source. Acta Psychol 47:63–81

    Article  CAS  Google Scholar 

  • Smith EL, Grabowecky M, Suzuki S (2007) Auditory–visual crossmodal integration in perception of face gender. Curr Biol 17:1680–1685

    Article  PubMed  CAS  Google Scholar 

  • Stein BE, Meredith MA (1993) The merging of the senses. MIT, Cambridge

    Google Scholar 

  • Stein BE, London N, Wilkinson LK, Price DD (1996) Enhancement of perceived visual intensity by auditory stimuli: a psychophysical analysis. J Cogn Neurosci 8:497–506

    Article  Google Scholar 

  • Teder-Salejarvi WA, Di Russo F, McDonald JJ, Hillyard SA (2005) Effects of spatial congruity on audio–visual multimodal integration. J Cogn Neurosci 17:1396–1409

    Article  PubMed  CAS  Google Scholar 

  • Ulrich R, Miller J (1993) Information processing models generating lognormally distributed reaction times. J Math Psychol 37:513–525

    Article  Google Scholar 

  • Ulrich R, Miller J, Schroter H (2007) Testing the race model inequality: an algorithm and computer programs. Behav Res Methods 39:291–302

    PubMed  Google Scholar 

  • Yuval-Greenberg S, Deouell LY (2007) What you see is not (always) what you hear: induced gamma band responses reflect cross-modal interactions in familiar object recognition. J Neurosci 27:1090–1096

    Article  PubMed  CAS  Google Scholar 

  • Zampini M, Torresan D, Spence C, Murray MM (2007) Auditory–somatosensory multisensory interactions in front and rear space. Neuropsychologia 45:1869–1877

    Article  PubMed  Google Scholar 

  • Zorzi M, Umilta C (1995) A computational model of the Simon effect. Psychol Res 58:193–205

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

We thank Khoa-Van Nguyen, Olivier Warusfel, George Drettakis, and Grace Leslie for their help. We are grateful to Shihab Shamma, Daniel Pressnitzer, Laurence Harris and two anonymous reviewers for useful comments on a previous version of this manuscript. This research was supported by the EU IST FP6 Open FET project CROSSMOD: “Crossmodal Perceptual Interaction and Rendering” IST-04891.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clara Suied.

Appendix

Appendix

An additional experiment was performed because we could not conclude from the results of our main experiment whether shorter RTs to semantically congruent stimuli than to semantically incongruent stimuli were due to semantic congruence or simply to redundancy of information. In this new experiment, the target stimuli were the sound of a frog (A+f) and the image of a phone (V+p). Participants had to respond to A+f, V+p, or when both were presented simultaneously (A+fV+p). In this case, the redundant target condition was also a semantically incongruent stimulus, whereas the non-redundant target condition was a semantically congruent stimulus. If the RSE observed in the main experiment was related to the semantic congruence between the auditory and the visual parts of the stimulus, there should be no bimodal integration for the incongruent stimuli (redundant targets) in the present control experiment. In addition, if semantically congruent trials benefited from crossmodal integration, mean RTs in semantically congruent trials (non-redundant target) should be shorter than mean RTs in semantically incongruent trials (redundant targets).

Eleven volunteers (5 women; mean age 30.9 ± 8 years; all but one right-handed) participated in the experiment. All were naïve with respect to the purpose of the experiment. None of them reported having hearing problems, and all reported normal or corrected-to-normal vision. All participants provided informed consent to participate in the study. Apparatus and stimuli were exactly the same as in the main experiment. Procedure was also highly similar, except for the definition of the go and no-go conditions. There were five go conditions: auditory frog alone (A+f), visual phone alone (V+p), auditory frog with a visual phone (A+fV+p), auditory frog with a visual frog (A+fV−f), and auditory phone with a visual phone (A−pV+p). The A+fV+p condition was the only redundant target; the other four conditions were non-redundant targets. The no-go conditions were an auditory phone alone (A−p), a visual frog alone (V−f) and an auditory phone with a visual frog (A−pV−f). Each go condition was presented 48 times and each no-go condition was presented 20 times. In this additional experiment, there are no inter-stimulus contingencies. Thus, in the case of a potential RSE, this would not be due to the contingencies. The entire experiment for each participant consisted of 300 stimuli of which 240 (80%) were task-relevant stimuli (go responses). Statistical analyses were similar as the ones performed in the main experiment (log-transformation and ANOVA on the mean ln(RTs)), except that we did not remove RTs greater than 1,000 ms, due to the difficulty of the task that lead to RTs of the order of 650 ms on average.

Nonparametric repeated-measures ANOVA (Friedman’s test) revealed a significant effect of condition (A−, V−, A−V−i) on percentage of false alarms (χ 2(2) = 9.5; P < 0.01). Percentage of false alarms was higher with a bimodal stimulus A−pV−f (39.5 ± 6.7%) than with a unimodal one (21.4 ± 4.4% for A−p and 21.8 ± 4.9% for V−f). Only 0.9 ± 0.1% of misses were observed. Overall, the larger number of false alarms here in comparison to the main experiment (around three times more) could reveal the difficulty of the task (attend to two different objects at the same time).

RTs of this additional experiment are represented in Fig. 5. The distribution of the residuals of the ANOVA was not different from a normal distribution (Kolmogorov–Smirnov test: d = 0.09; N = 55; P > 0.2). Overall, RTs observed in this additional experiment were much longer than those of the main experiment (more than 600 ms here, compared to around 350 ms in the main experiment). This confirms the difficulty of the task. To identify between-condition differences in mean ln(RTs), a repeated-measures ANOVA was conducted with the five conditions as a within-subjects factor (A+f, V+p, A+fV+p, A+fV−f, A−pV+p). It revealed a significant main effect of condition (F 4,60 = 6.14; ε = 0.8; P < 0.001). Post hoc Tukey HSD tests revealed that this effect was due to a difference between A+f and A+fV+p (P < 0.001) and a difference between A+fV+p and A+fV−f (P < 0.004). Importantly, there was no significant difference between the shortest of the unimodal conditions (here, V+p) and the redundant target (A+fV+p) (P = 0.5). In other words, we observed no bimodal facilitation effect for redundant target (semantically incongruent stimulus). It is of course difficult to interpret unambiguously a null effect; however, it strongly suggests that semantic incongruence of redundant stimuli prevents any redundant facilitation effect.

Fig. 5
figure 5

RTs of the semantically incongruent bimodal (redundant target, A+fV+p), semantically congruent bimodal (non-redundant target, A+fV−f and A−pV+p), and unimodal (A+f and V+p) conditions are presented. RTs were first transformed to a log scale and then averaged across all participants. The error bars represent one standard error of the mean. The log scale is converted back to ms for displays purposes. There was no bimodal facilitation effect (RTs to the A+fV+p condition are similar to RTs to the shortest unimodal condition, i.e. V+p). RTs to the semantically congruent conditions (A+fV−f and A−pV+p) were not shorter than RTs to the semantically incongruent condition (A+fV+p). The only significant differences observed are due to shorter RTs to the visual target alone compared to the auditory target alone

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suied, C., Bonneel, N. & Viaud-Delmon, I. Integration of auditory and visual information in the recognition of realistic objects. Exp Brain Res 194, 91–102 (2009). https://doi.org/10.1007/s00221-008-1672-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00221-008-1672-6

Keywords

Navigation