Skip to main content
Top
Gepubliceerd in:

06-08-2024 | Research

Crossmodal semantic congruence guides spontaneous orienting in real-life scenes

Auteurs: Daria Kvasova, Llucia Coll, Travis Stewart, Salvador Soto-Faraco

Gepubliceerd in: Psychological Research | Uitgave 7/2024

Log in om toegang te krijgen
share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail

Abstract

In real-world scenes, the different objects and events are often interconnected within a rich web of semantic relationships. These semantic links help parse information efficiently and make sense of the sensory environment. It has been shown that, during goal-directed search, hearing the characteristic sound of an everyday life object helps finding the affiliate objects in artificial visual search arrays as well as in naturalistic, real-life videoclips. However, whether crossmodal semantic congruence also triggers orienting during spontaneous, not goal-directed observation is unknown. Here, we investigated this question addressing whether crossmodal semantic congruence can attract spontaneous, overt visual attention when viewing naturalistic, dynamic scenes. We used eye-tracking whilst participants (N = 45) watched video clips presented alongside sounds of varying semantic relatedness with objects present within the scene. We found that characteristic sounds increased the probability of looking at, the number of fixations to, and the total dwell time on semantically corresponding visual objects, in comparison to when the same scenes were presented with semantically neutral sounds or just with background noise only. Interestingly, hearing object sounds not met with an object in the scene led to increased visual exploration. These results suggest that crossmodal semantic information has an impact on spontaneous gaze on realistic scenes, and therefore on how information is sampled. Our findings extend beyond known effects of object-based crossmodal interactions with simple stimuli arrays and shed new light on the role that audio-visual semantic relationships out in the perception of everyday life scenarios.
Voetnoten
1
The reason to anticipate the sound slightly is that semantic information can be accessed from visual stimuli within the first 100ms (for a review, see Peelen & Kastner, 2014), whereas the processing of the meaning of a complex naturalistic sound may require more time due to the temporal nature of the information (according to some reviews, approximately 150 ms after onset, Murray & Spierer, 2009). For this reason, the temporal window of audiovisual integration for complex sounds may be asymmetrical (Vatakis & Spence, 2010). Following this logic we decided to advance sound onsets by 100 ms, just like other studies before us (see Vatakis & Spence, 2010, for a review; and Knoeferle et al., 2016, Kvasova et al., 2019, 2023 for a similar procedure).
 
2
Regarding the other variables, this participant’s values were well within the group’s range. For the congruent, neutral and no sound conditions, respectively, the percentage AOI looked at was: 28% (min-max = [8–86], 28% [6–69] and 11% [0–67]; number of fixations: 0.33 [0.11–2.56], 0.56 [0.11–1.28], and 0.17 [0-1.25]; and dwell time: 80ms [17–505], 130ms [46–383], and 49ms [0-298].
 
Literatuur
go back to reference Blanton, H., & Jaccard, J. (2006). Arbitrary metrics in psychology. American Psychologist, 61(1), 27–41.CrossRefPubMed Blanton, H., & Jaccard, J. (2006). Arbitrary metrics in psychology. American Psychologist, 61(1), 27–41.CrossRefPubMed
go back to reference Bolognini, N., Frassinetti, F., Serino, A., & Làdavas, E. (2005). Acoustical vision’ of below threshold stimuli: Interaction among spatially converging audiovisual inputs. Experimental Brain Research, 160(3), 273–282.CrossRefPubMed Bolognini, N., Frassinetti, F., Serino, A., & Làdavas, E. (2005). Acoustical vision’ of below threshold stimuli: Interaction among spatially converging audiovisual inputs. Experimental Brain Research, 160(3), 273–282.CrossRefPubMed
go back to reference Burgess, P. W., Alderman, N., Forbes, C., Costello, A., Coates, L. M. A., Dawson, D. R., & Channon, S. (2006). The case for the development and use of ‘ecologically valid’ measures of executive function in experimental and clinical neuropsychology. Journal of the International Neuropsychological Society, 12(02), 194–209.CrossRefPubMed Burgess, P. W., Alderman, N., Forbes, C., Costello, A., Coates, L. M. A., Dawson, D. R., & Channon, S. (2006). The case for the development and use of ‘ecologically valid’ measures of executive function in experimental and clinical neuropsychology. Journal of the International Neuropsychological Society, 12(02), 194–209.CrossRefPubMed
go back to reference Chen, Y. C., & Spence, C. (2011). Cross-modal semantic priming by naturalistic sounds and spoken words enhances visual sensitivity. J Exp Psychol Human, 37, 1554–1568.CrossRef Chen, Y. C., & Spence, C. (2011). Cross-modal semantic priming by naturalistic sounds and spoken words enhances visual sensitivity. J Exp Psychol Human, 37, 1554–1568.CrossRef
go back to reference Chen, Z., Zhang, K., Cai, H., Ding, X., Jiang, C., & Chen, Z. (2024). Audio-visual saliency prediction for movie viewing in immersive environments: Dataset and benchmarks. Journal of Visual Communication and Image Representation, 104095. Chen, Z., Zhang, K., Cai, H., Ding, X., Jiang, C., & Chen, Z. (2024). Audio-visual saliency prediction for movie viewing in immersive environments: Dataset and benchmarks. Journal of Visual Communication and Image Representation, 104095.
go back to reference Coutrot, A., & Guyader, N. (2014). How saliency, faces, and sound influence gaze in dynamic social scenes. Journal of Vision, 14(8), 5–5.CrossRefPubMed Coutrot, A., & Guyader, N. (2014). How saliency, faces, and sound influence gaze in dynamic social scenes. Journal of Vision, 14(8), 5–5.CrossRefPubMed
go back to reference Coutrot, A., Guyader, N., Ionescu, G., & Caplier, A. (2012). Influence of soundtrack on eye movements during video exploration. Journal of Eye Movement Research, 5(4), 2.CrossRef Coutrot, A., Guyader, N., Ionescu, G., & Caplier, A. (2012). Influence of soundtrack on eye movements during video exploration. Journal of Eye Movement Research, 5(4), 2.CrossRef
go back to reference Driver, J. (1996). Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature, 381, 66–68.CrossRefPubMed Driver, J. (1996). Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature, 381, 66–68.CrossRefPubMed
go back to reference Foulsham, T., & Sanderson, L. A. (2013). Look who’s talking? Sound changes gaze behaviour in a dynamic social scene. Visual Cognition, 21(7), 922–944.CrossRef Foulsham, T., & Sanderson, L. A. (2013). Look who’s talking? Sound changes gaze behaviour in a dynamic social scene. Visual Cognition, 21(7), 922–944.CrossRef
go back to reference Henderson, J. M., & Hayes, T. R. (2017). Meaning-based guidance of attention in scenes as revealed by meaning maps. Nat Hum Behav, 2017(1), 743–747.CrossRef Henderson, J. M., & Hayes, T. R. (2017). Meaning-based guidance of attention in scenes as revealed by meaning maps. Nat Hum Behav, 2017(1), 743–747.CrossRef
go back to reference Hessel, R. S., Kemner, C., van den Boomen, C., & Hoogel, I. (2016). The area-of-interest problem in eyetracking research: A noise-robust solution for face and sparse stimuli. Behav Res, 48, 1694–1712.CrossRef Hessel, R. S., Kemner, C., van den Boomen, C., & Hoogel, I. (2016). The area-of-interest problem in eyetracking research: A noise-robust solution for face and sparse stimuli. Behav Res, 48, 1694–1712.CrossRef
go back to reference Iordanescu, L., Grabowecky, M., Franconeri, S., Theeuwes, J., & Suzuki, S. (2010). Characteristic sounds make you look at target objects more quickly. Attention Perception & Psychophysics, 72(7), 1736–1741.CrossRef Iordanescu, L., Grabowecky, M., Franconeri, S., Theeuwes, J., & Suzuki, S. (2010). Characteristic sounds make you look at target objects more quickly. Attention Perception & Psychophysics, 72(7), 1736–1741.CrossRef
go back to reference Iordanescu, L., Guzman-Martinez, E., Grabowecky, M., & Suzuki, S. (2008). Characteristic sounds facilitate visual search. Psychonomic Bulletin & Review, 15(3), 548–554.CrossRef Iordanescu, L., Guzman-Martinez, E., Grabowecky, M., & Suzuki, S. (2008). Characteristic sounds facilitate visual search. Psychonomic Bulletin & Review, 15(3), 548–554.CrossRef
go back to reference Kaiser, D., Stein,, T., & Peelen (2014). M. V.red Object grouping based on real-world regularities facilitates perception by reducing competitive interactions in visual cortex. Proceedings of the National Academy of Sciences 111 30 11217–11222. Kaiser, D., Stein,, T., & Peelen (2014). M. V.red Object grouping based on real-world regularities facilitates perception by reducing competitive interactions in visual cortex. Proceedings of the National Academy of Sciences 111 30 11217–11222.
go back to reference Kayser, C., Körding, K. P., & König, P. (2004). Processing of complex stimuli and natural scenes in the visual cortex. Current Opinion in Neurobiology, 14(4), 468–473.CrossRefPubMed Kayser, C., Körding, K. P., & König, P. (2004). Processing of complex stimuli and natural scenes in the visual cortex. Current Opinion in Neurobiology, 14(4), 468–473.CrossRefPubMed
go back to reference Kingstone, A., Smilek, D., Ristic, J., Kelland Friesen, C., & Eastwood, J. D. (2003). Attention, researchers! It is time to take a look at the real world. Current Directions in Psychological Science, 12(5), 176–180.CrossRef Kingstone, A., Smilek, D., Ristic, J., Kelland Friesen, C., & Eastwood, J. D. (2003). Attention, researchers! It is time to take a look at the real world. Current Directions in Psychological Science, 12(5), 176–180.CrossRef
go back to reference Knoeferle, K. M., Knoeferle, P., Velasco, C., & Spence, C. (2016). Multisensory brand search: How the meaning of sounds guides consumers’ visual attention. Journal of Experimental Psychology: Applied, 22(2), 196–210.PubMed Knoeferle, K. M., Knoeferle, P., Velasco, C., & Spence, C. (2016). Multisensory brand search: How the meaning of sounds guides consumers’ visual attention. Journal of Experimental Psychology: Applied, 22(2), 196–210.PubMed
go back to reference Koelewijn, T., Bronkhorst, A., & Theeuwes, J. (2010). Attention and the multiple stages of multisensory integration: A review of audiovisual studies. Acta Psychologica, 134, 372–384.CrossRefPubMed Koelewijn, T., Bronkhorst, A., & Theeuwes, J. (2010). Attention and the multiple stages of multisensory integration: A review of audiovisual studies. Acta Psychologica, 134, 372–384.CrossRefPubMed
go back to reference Kvasova, D., Garcia-Vernet, L., & Soto-Faraco, S. (2019). Characteristic sounds facilitate object search in real-life scenes. Frontiers in Psychology, 10, 2511.CrossRefPubMedPubMedCentral Kvasova, D., Garcia-Vernet, L., & Soto-Faraco, S. (2019). Characteristic sounds facilitate object search in real-life scenes. Frontiers in Psychology, 10, 2511.CrossRefPubMedPubMedCentral
go back to reference Lunn, J., Sjoblom, A., Soto-Faraco, S., & Forster, S. (2019). Multisensory enhancement of attention depends on whether you are already paying attention. Cognition, 187, 38–49.CrossRefPubMed Lunn, J., Sjoblom, A., Soto-Faraco, S., & Forster, S. (2019). Multisensory enhancement of attention depends on whether you are already paying attention. Cognition, 187, 38–49.CrossRefPubMed
go back to reference Mädebach, A., Wöhner, S., Kieseler, M. L., & Jescheniak, J. D. (2017). Neighing, Barking, and Drumming Horses—Object Related Sounds Help and Hinder Picture Naming. Journal of Experimental Psychology: Human Perception and Performance. Advance online publication. Mädebach, A., Wöhner, S., Kieseler, M. L., & Jescheniak, J. D. (2017). Neighing, Barking, and Drumming Horses—Object Related Sounds Help and Hinder Picture Naming. Journal of Experimental Psychology: Human Perception and Performance. Advance online publication.
go back to reference Mastroberardino, S., Santangelo, V., & Macaluso, E. (2015). Crossmodal semantic congruence can affect visuo-spatial processing and activity of the fronto-parietal attention networks. Frontiers in Integrative Neuroscience, 9(July), 45.PubMedPubMedCentral Mastroberardino, S., Santangelo, V., & Macaluso, E. (2015). Crossmodal semantic congruence can affect visuo-spatial processing and activity of the fronto-parietal attention networks. Frontiers in Integrative Neuroscience, 9(July), 45.PubMedPubMedCentral
go back to reference McDonald, J. J., Teder-Salejarvi, W. A., & Hillyard, S. A. (2000). Involuntary orienting to sound improves visual perception. Nature, 407, 906–908.CrossRefPubMed McDonald, J. J., Teder-Salejarvi, W. A., & Hillyard, S. A. (2000). Involuntary orienting to sound improves visual perception. Nature, 407, 906–908.CrossRefPubMed
go back to reference McDonald, J. J., Teder-Sälejärvi, W. A., & Ward, L. M. (2001). Multisensory integration and crossmodal attention effects in the human brain. Science, 292, 1791–1791.CrossRefPubMed McDonald, J. J., Teder-Sälejärvi, W. A., & Ward, L. M. (2001). Multisensory integration and crossmodal attention effects in the human brain. Science, 292, 1791–1791.CrossRefPubMed
go back to reference Min, X., Zhai, G., Zhou, J., Zhang, X. P., Yang, X., & Guan, X. (2020). A multimodal saliency model for videos with high audio-visual correspondence. IEEE Transactions on Image Processing, 29, 3805–3819.CrossRef Min, X., Zhai, G., Zhou, J., Zhang, X. P., Yang, X., & Guan, X. (2020). A multimodal saliency model for videos with high audio-visual correspondence. IEEE Transactions on Image Processing, 29, 3805–3819.CrossRef
go back to reference Molholm, S., Ritter, W., Javitt, D. C., & Foxe, J. J. (2004). Multisensory visual-auditory object recognition in humans: A high-density Electrical Mapping Study. Cerebral Cortex, 14(4), 452–465.CrossRefPubMed Molholm, S., Ritter, W., Javitt, D. C., & Foxe, J. J. (2004). Multisensory visual-auditory object recognition in humans: A high-density Electrical Mapping Study. Cerebral Cortex, 14(4), 452–465.CrossRefPubMed
go back to reference Murray, M. M., & Spierer, L. (2009). Auditory spatio-temporal brain dynamics and their consequences for multisensory interactions in humans. Hearing Research, 258, 121–133.CrossRefPubMed Murray, M. M., & Spierer, L. (2009). Auditory spatio-temporal brain dynamics and their consequences for multisensory interactions in humans. Hearing Research, 258, 121–133.CrossRefPubMed
go back to reference Nardo, D., Santangelo, V., & Macaluso, E. (2014). Spatial orienting in complex audiovisual environments. Human Brain Mapping, 35(4), 1597–1614.CrossRefPubMed Nardo, D., Santangelo, V., & Macaluso, E. (2014). Spatial orienting in complex audiovisual environments. Human Brain Mapping, 35(4), 1597–1614.CrossRefPubMed
go back to reference Neisser, U. (1976). Cognition and reality. Principles and implication of cognitive psychology. WH Freeman and Company. Neisser, U. (1976). Cognition and reality. Principles and implication of cognitive psychology. WH Freeman and Company.
go back to reference Neisser, U. (1982). Memory: What are the important questions? In J. U. Neisser, & I. E. Hyman (Eds.), Memory observed (pp. 3–18). Worth. Neisser, U. (1982). Memory: What are the important questions? In J. U. Neisser, & I. E. Hyman (Eds.), Memory observed (pp. 3–18). Worth.
go back to reference Oliva, A. (2005). Gist of the scene. In L. Itti, G. Rees, & J. Tsotsos (Eds.), Neurobiology of attention (pp. 251–257). Academic Press / Elsevier. Oliva, A. (2005). Gist of the scene. In L. Itti, G. Rees, & J. Tsotsos (Eds.), Neurobiology of attention (pp. 251–257). Academic Press / Elsevier.
go back to reference Peelen, M. V., & Kastner, S. (2011). A neural basis for real-world visual search in human occipitotemporal cortex. Proceedings of the National Academy of Sciences 108.29 : 12125–12130. Peelen, M. V., & Kastner, S. (2011). A neural basis for real-world visual search in human occipitotemporal cortex. Proceedings of the National Academy of Sciences 108.29 : 12125–12130.
go back to reference Peelen, M., & Kastner, S. (2014). Attention in the real world: Toward understanding its neural basis. Trends in Cognitive Sciences, 18(5). Peelen, M., & Kastner, S. (2014). Attention in the real world: Toward understanding its neural basis. Trends in Cognitive Sciences, 18(5).
go back to reference Pesquita, A., Brennan, A. A., Enns, J. T., & Soto-Faraco, S. (2013). Isolating shape from semantics in haptic-visual priming. Experimental Brain Research, 227(3), 311–322.CrossRefPubMed Pesquita, A., Brennan, A. A., Enns, J. T., & Soto-Faraco, S. (2013). Isolating shape from semantics in haptic-visual priming. Experimental Brain Research, 227(3), 311–322.CrossRefPubMed
go back to reference Quigley, C., Onat, S., Harding, S., Cooke, M., & König, P. (2007). Audio-visual integration during overt visual attention. Journal of Eye Movement Research, 1(2), 4, 1–17. Quigley, C., Onat, S., Harding, S., Cooke, M., & König, P. (2007). Audio-visual integration during overt visual attention. Journal of Eye Movement Research, 1(2), 4, 1–17.
go back to reference Song, G., Pellerin, D., & Granjon, L. (2013). Different types of sounds influence gaze differently in videos. Journal of Eye Movement Research, 6(4), 1–13.CrossRef Song, G., Pellerin, D., & Granjon, L. (2013). Different types of sounds influence gaze differently in videos. Journal of Eye Movement Research, 6(4), 1–13.CrossRef
go back to reference Soto-Faraco, S., Kvasova, D., Biau, E., Ikumi, N., Ruzzoli, M., Moris-Fernandez, L., & Torralba, M. (2019). Multisensory interactions in the real world. In M. Chun (Ed.), Cambridge elements of perception. Cambridge University Press. Soto-Faraco, S., Kvasova, D., Biau, E., Ikumi, N., Ruzzoli, M., Moris-Fernandez, L., & Torralba, M. (2019). Multisensory interactions in the real world. In M. Chun (Ed.), Cambridge elements of perception. Cambridge University Press.
go back to reference Spence, C., & Soto-Faraco, S. (2019). Crossmodal attention applied: Lessons for and from driving. To appear. In M. Chun (Ed.), Cambridge elements of attention. Cambridge University Press. Spence, C., & Soto-Faraco, S. (2019). Crossmodal attention applied: Lessons for and from driving. To appear. In M. Chun (Ed.), Cambridge elements of attention. Cambridge University Press.
go back to reference Van den Brink, R. L., Cohen, M. X., van der Burg, E., Talsma, D., Vissers, M. E., & Slagter, H. A. (2014). Subcortical, modality-specific pathways contribute to multisensory processing in humans. Cerebral Cortex, 24, 2169–2177.CrossRefPubMed Van den Brink, R. L., Cohen, M. X., van der Burg, E., Talsma, D., Vissers, M. E., & Slagter, H. A. (2014). Subcortical, modality-specific pathways contribute to multisensory processing in humans. Cerebral Cortex, 24, 2169–2177.CrossRefPubMed
go back to reference Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2008). Pip and pop: Nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance, 34, 1053–1065.PubMed Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2008). Pip and pop: Nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance, 34, 1053–1065.PubMed
go back to reference Vatakis, A., & Spence, C. (2010). Audiovisual temporal integration for complex speech, object-action, animal call, and musical stimuli. In M. J. Naumer & J. Kaiser (Ed.), Multisensory object perception in the primate brain (pp. 95–121). Vatakis, A., & Spence, C. (2010). Audiovisual temporal integration for complex speech, object-action, animal call, and musical stimuli. In M. J. Naumer & J. Kaiser (Ed.), Multisensory object perception in the primate brain (pp. 95–121).
go back to reference Vroomen, J., & de Gelder, B. (2000). Sound enhances visual perception: Cross- modal effects of auditory organization on vision. Journal of Experimental Psychology: Human Perception and Performance American Psychological Association. Vroomen, J., & de Gelder, B. (2000). Sound enhances visual perception: Cross- modal effects of auditory organization on vision. Journal of Experimental Psychology: Human Perception and Performance American Psychological Association.
go back to reference Wegner-Clemens, K., Malcolm, G. L., & Shomstein, S. (2024). Predicting attentional allocation in real‐world environments: The need to investigate crossmodal semantic guidance. Wiley Interdisciplinary Reviews: Cognitive Science, 15(3), e1675. https://doi.org/10.1002/wcs.1675 Wegner-Clemens, K., Malcolm, G. L., & Shomstein, S. (2024). Predicting attentional allocation in real‐world environments: The need to investigate crossmodal semantic guidance. Wiley Interdisciplinary Reviews: Cognitive Science, 15(3), e1675. https://​doi.​org/​10.​1002/​wcs.​1675
go back to reference Wu, C. C., Wick, F. A., & Pomplun, M. (2014). Guidance of visual attention by semantic information in real-world scenes. Frontiers in Psychology, 5, 54.CrossRefPubMedPubMedCentral Wu, C. C., Wick, F. A., & Pomplun, M. (2014). Guidance of visual attention by semantic information in real-world scenes. Frontiers in Psychology, 5, 54.CrossRefPubMedPubMedCentral
go back to reference Xie, J., Liu, Z., Li, G., & Song, Y. (2024). Audio-visual saliency prediction with multisensory perception and integration. Image and Vision Computing, 143, 104955.CrossRef Xie, J., Liu, Z., Li, G., & Song, Y. (2024). Audio-visual saliency prediction with multisensory perception and integration. Image and Vision Computing, 143, 104955.CrossRef
Metagegevens
Titel
Crossmodal semantic congruence guides spontaneous orienting in real-life scenes
Auteurs
Daria Kvasova
Llucia Coll
Travis Stewart
Salvador Soto-Faraco
Publicatiedatum
06-08-2024
Uitgeverij
Springer Berlin Heidelberg
Gepubliceerd in
Psychological Research / Uitgave 7/2024
Print ISSN: 0340-0727
Elektronisch ISSN: 1430-2772
DOI
https://doi.org/10.1007/s00426-024-02018-8