Abstract
Recognizing a natural object requires one to pool information from various sensory modalities, and to ignore information from competing objects. That the same semantic knowledge can be accessed through different modalities makes it possible to explore the retrieval of supramodal object concepts. Here, object-recognition processes were investigated by manipulating the relationships between sensory modalities, specifically, semantic content, and spatial alignment between auditory and visual information. Experiments were run under realistic virtual environment. Participants were asked to react as fast as possible to a target object presented in the visual and/or the auditory modality and to inhibit a distractor object (go/no-go task). Spatial alignment had no effect on object-recognition time. The only spatial effect observed was a stimulus–response compatibility between the auditory stimulus and the hand position. Reaction times were significantly shorter for semantically congruent bimodal stimuli than would be predicted by independent processing of information about the auditory and visual targets. Interestingly, this bimodal facilitation effect was twice as large as found in previous studies that also used information-rich stimuli. An interference effect was observed (i.e. longer reaction times to semantically incongruent stimuli than to the corresponding unimodal stimulus) only when the distractor was auditory. When the distractor was visual, the semantic incongruence did not interfere with object recognition. Our results show that immersive displays with large visual stimuli may provide large multimodal integration effects, and reveal a possible asymmetry in the attentional filtering of irrelevant auditory and visual information.
Similar content being viewed by others
Notes
A third alternative has been proposed by Mordkoff and Yantis (1991), showing that inter-stimulus contingencies could, in some cases, entirely explain the violation of the race model, thus challenging the conclusion of an integration of the sensory channels in the presence of these contingencies.
To be able to compare our results with previous studies, all the analyses were also performed on the initial non-transformed distribution.
For these analyses, as for all the other ones, the ANOVA on the non-transformed distribution gave similar results.
References
Alais D, Morrone C, Burr D (2006) Separate attentional resources for vision and audition. Proc Biol Sci 273:1339–1345
Bedford FL (2001) Toward a general law of numerical/object identity. Cahiers de Psychologie Cognitive/Curr Psychol Cogn 20:113–176
Bedford F (2004) Analysis of a constraint on perception, cognition, and development: one object, one place, one time. J Exp Psychol Hum Percept Perform 30:907–912
Bertelson P, Vroomen J, Wiegeraad G, de Gelder B (1994) Exploring the relation between McGurk interference and ventriloquism. In: International conference on spoken language processing, Yokohama, Japan, pp 556–562
Calvert GA, Thesen T (2004) Multisensory integration: methodological approaches and emerging principles in the human brain. J Physiol Paris 98:191–205
Calvert GA, Brammer MJ, Iversen SD (1998) Crossmodal identification. Trends Cogn Sci 2:247–253
Caramazza A, Hillis AE, Rapp BC, Romani C (1990) The multiple semantic hypothesis: multiple confusions? Cogn Neuropsychol 7:161–189
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, Hillsdale
Frassinetti F, Bolognini N, Ladavas E (2002) Enhancement of visual perception by crossmodal visuo-auditory interaction. Exp Brain Res 147:332–343
Frens MA, Van Opstal AJ, Van der Willigen RF (1995) Spatial and temporal factors determine auditory–visual interactions in human saccadic eye movements. Percept Psychophys 57:802–816
Giard MH, Peronnet F (1999) Auditory–visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J Cogn Neurosci 11:473–490
Giray M, Ulrich R (1993) Motor coactivation revealed by response force in divided and focused attention. J Exp Psychol Hum Percept Perform 19:1278–1291
Gondan M, Niederhaus B, Rosler F, Roder B (2005) Multisensory processing in the redundant-target effect: a behavioral and event-related potential study. Percept Psychophys 67:713–726
Grice GR, Canham L (1990) Redundancy phenomena are affected by response requirements. Percept Psychophys 48:209–213
Grice GR, Gwynne JW (1987) Dependence of target redundancy effects on noise conditions and number of targets. Percept Psychophys 42:29–36
Grice GR, Reed JM (1992) What makes targets redundant? Percept Psychophys 51:437–442
Grice GR, Canham L, Gwynne JW (1984) Absence of a redundant-signals effect in a reaction time task with divided attention. Percept Psychophys 36:565–570
Harrington LK, Peck CK (1998) Spatial disparity affects visual–auditory interactions in human sensorimotor processing. Exp Brain Res 122:247–252
Hershenson M (1962) Reaction time as a measure of intersensory facilitation. J Exp Psychol 63:289–293
Holmes NP, Spence C (2005) Multisensory integration: space, time and superadditivity. Curr Biol 15:R762–R764
Hughes HC, Reuter-Lorenz PA, Nozawa G, Fendrich R (1994) Visual–auditory interactions in sensorimotor processing: saccades versus manual responses. J Exp Psychol Hum Percept Perform 20:131–153
Kinchla RA (1974) Detecting target elements in multielement displays: a confusability model. Percept Psychophys 15:149–158
Laurienti PJ, Kraft RA, Maldjian JA, Burdette JH, Wallace MT (2004) Semantic congruence is a critical factor in multisensory behavioral performance. Exp Brain Res 158:405–414
Lehmann S, Murray MM (2005) The role of multisensory memories in unisensory object discrimination. Brain Res Cogn Brain Res 24:326–334
Lu CH, Proctor RW (1995) The influence of irrelevant location information on performance: a review of the Simon and spatial Stroop effects. Psychon Bull Rev 2:174–207
Luce RD (1986) Response times: their role in inferring elementary mental organization. Oxford University Press, New York
Martin A (2007) The representation of object concepts in the brain. Annu Rev Psychol 58:25–45
Miller J (1982) Divided attention: evidence for coactivation with redundant signals. Cogn Psychol 14:247–279
Miller J (1986) Timecourse of coactivation in bimodal divided attention. Percept Psychophys 40:331–343
Miller J (1991) Channel interaction and the redundant-targets effect in bimodal divided attention. J Exp Psychol Hum Percept Perform 17:160–169
Moeck T, Bonneel N, Tsingos N, Drettakis G, Viaud-Delmon I, Alloza D (2007) Progressive perceptual audio rendering of complex scenes. In: ACM SIGGRAPH symposium on interactive 3D graphics and games
Molholm S, Ritter W, Javitt DC, Foxe JJ (2004) Multisensory visual–auditory object recognition in humans: a high-density electrical mapping study. Cereb Cortex 14:452–465
Mordkoff JT, Yantis S (1991) An interactive race model of divided attention. J Exp Psychol Hum Percept Perform 17:520–538
Murray MM, Molholm S, Michel CM, Heslenfeld DJ, Ritter W, Javitt DC, Schroeder CE, Foxe JJ (2005) Grabbing your ear: rapid auditory–somatosensory multisensory interactions in low-level sensory cortices are not constrained by stimulus alignment. Cereb Cortex 15:963–974
Patterson K, Nestor PJ, Rogers TT (2007) Where do you know what you know? The representation of semantic knowledge in the human brain. Nat Rev Neurosci 8:976–987
Raab DH (1962) Statistical facilitation of simple reaction times. Trans N Y Acad Sci 24:574–590
Radeau M, Bertelson P (1977) Adaptation to auditory–visual discordance and ventriloquism in semirealistic situations. Percept Psychophys 22:137–146
Radeau M, Bertelson P (1978) Cognitive factors and adaptation to auditory–visual discordance. Percept Psychophys 23:341–343
Riddoch MJ, Humphreys GW, Coltheart M, Funnell E (1988) Semantic systems or system? Neuropsychological evidence re-examined. Cogn Neuropsychol 5:3–25
Savazzi S, Marzi CA (2008) Does the redundant signal effect occur at an early visual stage? Exp Brain Res 184:275–281
Schmitt M, Postma A, de Haan E (2000) Interactions between exogenous auditory and visual spatial attention. Q J Exp Psychol A 53:105–130
Schroger E, Widmann A (1998) Speeded responses to audiovisual signal changes result from bimodal integration. Psychophysiology 35:755–759
Schwarz W (1996) Further tests of the interactive race model of divided attention: the effects of negative bias and varying stimulus-onset asynchronies. Psychol Res 58:233–245
Simon JR, Craft JL (1970) Effects of an irrelevant auditory stimulus on visual choice reaction time. J Exp Psychol 86:272–274
Simon JR, Sly PE, Vilapakkam S (1981) Effect of compatibility of SR mapping on reactions toward the stimulus source. Acta Psychol 47:63–81
Smith EL, Grabowecky M, Suzuki S (2007) Auditory–visual crossmodal integration in perception of face gender. Curr Biol 17:1680–1685
Stein BE, Meredith MA (1993) The merging of the senses. MIT, Cambridge
Stein BE, London N, Wilkinson LK, Price DD (1996) Enhancement of perceived visual intensity by auditory stimuli: a psychophysical analysis. J Cogn Neurosci 8:497–506
Teder-Salejarvi WA, Di Russo F, McDonald JJ, Hillyard SA (2005) Effects of spatial congruity on audio–visual multimodal integration. J Cogn Neurosci 17:1396–1409
Ulrich R, Miller J (1993) Information processing models generating lognormally distributed reaction times. J Math Psychol 37:513–525
Ulrich R, Miller J, Schroter H (2007) Testing the race model inequality: an algorithm and computer programs. Behav Res Methods 39:291–302
Yuval-Greenberg S, Deouell LY (2007) What you see is not (always) what you hear: induced gamma band responses reflect cross-modal interactions in familiar object recognition. J Neurosci 27:1090–1096
Zampini M, Torresan D, Spence C, Murray MM (2007) Auditory–somatosensory multisensory interactions in front and rear space. Neuropsychologia 45:1869–1877
Zorzi M, Umilta C (1995) A computational model of the Simon effect. Psychol Res 58:193–205
Acknowledgments
We thank Khoa-Van Nguyen, Olivier Warusfel, George Drettakis, and Grace Leslie for their help. We are grateful to Shihab Shamma, Daniel Pressnitzer, Laurence Harris and two anonymous reviewers for useful comments on a previous version of this manuscript. This research was supported by the EU IST FP6 Open FET project CROSSMOD: “Crossmodal Perceptual Interaction and Rendering” IST-04891.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
An additional experiment was performed because we could not conclude from the results of our main experiment whether shorter RTs to semantically congruent stimuli than to semantically incongruent stimuli were due to semantic congruence or simply to redundancy of information. In this new experiment, the target stimuli were the sound of a frog (A+f) and the image of a phone (V+p). Participants had to respond to A+f, V+p, or when both were presented simultaneously (A+fV+p). In this case, the redundant target condition was also a semantically incongruent stimulus, whereas the non-redundant target condition was a semantically congruent stimulus. If the RSE observed in the main experiment was related to the semantic congruence between the auditory and the visual parts of the stimulus, there should be no bimodal integration for the incongruent stimuli (redundant targets) in the present control experiment. In addition, if semantically congruent trials benefited from crossmodal integration, mean RTs in semantically congruent trials (non-redundant target) should be shorter than mean RTs in semantically incongruent trials (redundant targets).
Eleven volunteers (5 women; mean age 30.9 ± 8 years; all but one right-handed) participated in the experiment. All were naïve with respect to the purpose of the experiment. None of them reported having hearing problems, and all reported normal or corrected-to-normal vision. All participants provided informed consent to participate in the study. Apparatus and stimuli were exactly the same as in the main experiment. Procedure was also highly similar, except for the definition of the go and no-go conditions. There were five go conditions: auditory frog alone (A+f), visual phone alone (V+p), auditory frog with a visual phone (A+fV+p), auditory frog with a visual frog (A+fV−f), and auditory phone with a visual phone (A−pV+p). The A+fV+p condition was the only redundant target; the other four conditions were non-redundant targets. The no-go conditions were an auditory phone alone (A−p), a visual frog alone (V−f) and an auditory phone with a visual frog (A−pV−f). Each go condition was presented 48 times and each no-go condition was presented 20 times. In this additional experiment, there are no inter-stimulus contingencies. Thus, in the case of a potential RSE, this would not be due to the contingencies. The entire experiment for each participant consisted of 300 stimuli of which 240 (80%) were task-relevant stimuli (go responses). Statistical analyses were similar as the ones performed in the main experiment (log-transformation and ANOVA on the mean ln(RTs)), except that we did not remove RTs greater than 1,000 ms, due to the difficulty of the task that lead to RTs of the order of 650 ms on average.
Nonparametric repeated-measures ANOVA (Friedman’s test) revealed a significant effect of condition (A−, V−, A−V−i) on percentage of false alarms (χ 2(2) = 9.5; P < 0.01). Percentage of false alarms was higher with a bimodal stimulus A−pV−f (39.5 ± 6.7%) than with a unimodal one (21.4 ± 4.4% for A−p and 21.8 ± 4.9% for V−f). Only 0.9 ± 0.1% of misses were observed. Overall, the larger number of false alarms here in comparison to the main experiment (around three times more) could reveal the difficulty of the task (attend to two different objects at the same time).
RTs of this additional experiment are represented in Fig. 5. The distribution of the residuals of the ANOVA was not different from a normal distribution (Kolmogorov–Smirnov test: d = 0.09; N = 55; P > 0.2). Overall, RTs observed in this additional experiment were much longer than those of the main experiment (more than 600 ms here, compared to around 350 ms in the main experiment). This confirms the difficulty of the task. To identify between-condition differences in mean ln(RTs), a repeated-measures ANOVA was conducted with the five conditions as a within-subjects factor (A+f, V+p, A+fV+p, A+fV−f, A−pV+p). It revealed a significant main effect of condition (F 4,60 = 6.14; ε = 0.8; P < 0.001). Post hoc Tukey HSD tests revealed that this effect was due to a difference between A+f and A+fV+p (P < 0.001) and a difference between A+fV+p and A+fV−f (P < 0.004). Importantly, there was no significant difference between the shortest of the unimodal conditions (here, V+p) and the redundant target (A+fV+p) (P = 0.5). In other words, we observed no bimodal facilitation effect for redundant target (semantically incongruent stimulus). It is of course difficult to interpret unambiguously a null effect; however, it strongly suggests that semantic incongruence of redundant stimuli prevents any redundant facilitation effect.
Rights and permissions
About this article
Cite this article
Suied, C., Bonneel, N. & Viaud-Delmon, I. Integration of auditory and visual information in the recognition of realistic objects. Exp Brain Res 194, 91–102 (2009). https://doi.org/10.1007/s00221-008-1672-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00221-008-1672-6