Skip to main content
Top
Gepubliceerd in:

01-02-2025 | Research

Estimating the distribution of numerosity and non-numerical visual magnitudes in natural scenes using computer vision

Auteurs: Kuinan Hou, Marco Zorzi, Alberto Testolin

Gepubliceerd in: Psychological Research | Uitgave 1/2025

Log in om toegang te krijgen
share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail

Abstract

Humans share with many animal species the ability to perceive and approximately represent the number of objects in visual scenes. This ability improves throughout childhood, suggesting that learning and development play a key role in shaping our number sense. This hypothesis is further supported by computational investigations based on deep learning, which have shown that numerosity perception can spontaneously emerge in neural networks that learn the statistical structure of images with a varying number of items. However, neural network models are usually trained using synthetic datasets that might not faithfully reflect the statistical structure of natural environments, and there is also growing interest in using more ecological visual stimuli to investigate numerosity perception in humans. In this work, we exploit recent advances in computer vision algorithms to design and implement an original pipeline that can be used to estimate the distribution of numerosity and non-numerical magnitudes in large-scale datasets containing thousands of real images depicting objects in daily life situations. We show that in natural visual scenes the frequency of appearance of different numerosities follows a power law distribution. Moreover, we show that the correlational structure for numerosity and continuous magnitudes is stable across datasets and scene types (homogeneous vs. heterogeneous object sets). We suggest that considering such “ecological” pattern of covariance is important to understand the influence of non-numerical visual cues on numerosity judgements.
Bijlagen
Alleen toegankelijk voor geautoriseerde gebruikers
Literatuur
go back to reference Anobile, G., Turi, M., Cicchini, G. M., & Burr, D. C. (2015). Mechanisms for perception of numerosity or texture-density are governed by crowding-like effects. Journal of Vision, 15(5), 4–4.CrossRefPubMed Anobile, G., Turi, M., Cicchini, G. M., & Burr, D. C. (2015). Mechanisms for perception of numerosity or texture-density are governed by crowding-like effects. Journal of Vision, 15(5), 4–4.CrossRefPubMed
go back to reference Bar, A., Bakhtiar, A., Tran, D., Loquercio, A., Rajasegaran, J., LeCun, Y., Darrell, T. (2024). Egopet: Egomotion and interaction data from an animal’s perspective. arXiv preprint arXiv:2404.09991. Bar, A., Bakhtiar, A., Tran, D., Loquercio, A., Rajasegaran, J., LeCun, Y., Darrell, T. (2024). Egopet: Egomotion and interaction data from an animal’s perspective. arXiv preprint arXiv:​2404.​09991.
go back to reference Berger, V. W., & Zhou, Y. (2014). Kolmogorov-smirnov test: Overview. Wiley statsref: Statistics reference online. Berger, V. W., & Zhou, Y. (2014). Kolmogorov-smirnov test: Overview. Wiley statsref: Statistics reference online.
go back to reference Borji, A., Cheng, M.-M., Hou, Q., Jiang, H., & Li, J. (2019). Salient object detection: A survey. Computational visual media, 5, 117–150.CrossRef Borji, A., Cheng, M.-M., Hou, Q., Jiang, H., & Li, J. (2019). Salient object detection: A survey. Computational visual media, 5, 117–150.CrossRef
go back to reference Cantrell, L., & Smith, L. B. (2013). Open questions and a proposal: A critical review of the evidence on infant numerical abilities. Cognition, 128(3), 331–352.CrossRefPubMedPubMedCentral Cantrell, L., & Smith, L. B. (2013). Open questions and a proposal: A critical review of the evidence on infant numerical abilities. Cognition, 128(3), 331–352.CrossRefPubMedPubMedCentral
go back to reference Clearfield, M. W., & Mix, K. S. (1999). Number versus contour length in infants’ discrimination of small visual sets. Psychological Science, 10(5), 408–411.CrossRef Clearfield, M. W., & Mix, K. S. (1999). Number versus contour length in infants’ discrimination of small visual sets. Psychological Science, 10(5), 408–411.CrossRef
go back to reference Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proc. of the ieee conference on computer vision and pattern recognition (cvpr). Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proc. of the ieee conference on computer vision and pattern recognition (cvpr).
go back to reference Dehaene, S. (2011). The number sense: How the mind creates mathematics. USA: Oxford University Press. Dehaene, S. (2011). The number sense: How the mind creates mathematics. USA: Oxford University Press.
go back to reference Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. Cvpr09. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. Cvpr09.
go back to reference DeWind, N. K., Adams, G. K., Platt, M. L., & Brannon, E. M. (2015). Modeling the approximate number system to quantify the contribution of visual stimulus features. Cognition, 142, 247–265.CrossRefPubMedPubMedCentral DeWind, N. K., Adams, G. K., Platt, M. L., & Brannon, E. M. (2015). Modeling the approximate number system to quantify the contribution of visual stimulus features. Cognition, 142, 247–265.CrossRefPubMedPubMedCentral
go back to reference Dolfi, S., Decarli, G., Lunardon, M., Grazia, De Filippo De., M., Gerola, S., Lanfranchi, S. & Zorzi, M. (2024). Weaker number sense accounts for impaired numerosity perception in dyscalculia: Behavioral and computational evidence. Developmental Science, 2024, e13538. Dolfi, S., Decarli, G., Lunardon, M., Grazia, De Filippo De., M., Gerola, S., Lanfranchi, S. & Zorzi, M. (2024). Weaker number sense accounts for impaired numerosity perception in dyscalculia: Behavioral and computational evidence. Developmental Science, 2024, e13538.
go back to reference Dolfi, S., Testolin, A., Cutini, S., & Zorzi, M. (2024). Measuring temporal bias in sequential numerosity comparison. Behavior Research Methods, 2024, 1–13. Dolfi, S., Testolin, A., Cutini, S., & Zorzi, M. (2024). Measuring temporal bias in sequential numerosity comparison. Behavior Research Methods, 2024, 1–13.
go back to reference Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.CrossRef Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.CrossRef
go back to reference Feigenson, L., Carey, S., & Spelke, E. (2002). Infants’ discrimination of number vs continuous extent. Cognitive Psychology, 44(1), 33–66.CrossRefPubMed Feigenson, L., Carey, S., & Spelke, E. (2002). Infants’ discrimination of number vs continuous extent. Cognitive Psychology, 44(1), 33–66.CrossRefPubMed
go back to reference Ferrigno, S., & Cantlon, J. (2017). Evolutionary constraints on the emergence of human mathematical concepts. Evolution of Nervous Systems, 2017, 56. Ferrigno, S., & Cantlon, J. (2017). Evolutionary constraints on the emergence of human mathematical concepts. Evolution of Nervous Systems, 2017, 56.
go back to reference Fiser, J., Berkes, P., Orbán, G., & Lengyel, M. (2010). Statistically optimal perception and learning: From behavior to neural representations. Trends in Cognitive Sciences, 14(3), 119–130.CrossRefPubMedPubMedCentral Fiser, J., Berkes, P., Orbán, G., & Lengyel, M. (2010). Statistically optimal perception and learning: From behavior to neural representations. Trends in Cognitive Sciences, 14(3), 119–130.CrossRefPubMedPubMedCentral
go back to reference Gebuis, T., & Reynvoet, B. (2012). The interplay between nonsymbolic number and its continuous visual properties. Journal of Experimental Psychology: General, 141(4), 642.CrossRefPubMed Gebuis, T., & Reynvoet, B. (2012). The interplay between nonsymbolic number and its continuous visual properties. Journal of Experimental Psychology: General, 141(4), 642.CrossRefPubMed
go back to reference Gemini Team, G., Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J. (2023). Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805. Gemini Team, G., Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J. (2023). Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:​2312.​11805.
go back to reference Hinton, G. E. (2007). Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10), 428–434.CrossRefPubMed Hinton, G. E. (2007). Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10), 428–434.CrossRefPubMed
go back to reference Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L. (2023). Segment anything. Proceedings of the ieee/cvf international conference on computer vision (pp. 4015–4026). Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L. (2023). Segment anything. Proceedings of the ieee/cvf international conference on computer vision (pp. 4015–4026).
go back to reference Krasin, I., Duerig, T., Alldrin, N., Veit, A., Abu-El-Haija, S., Belongie, S., . Murphy, K. (2016). Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github.com/openimages. Krasin, I., Duerig, T., Alldrin, N., Veit, A., Abu-El-Haija, S., Belongie, S., . Murphy, K. (2016). Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://​github.​com/​openimages.
go back to reference Leibovich, T., Katzin, N., Harel, M., & Henik, A. (2017). From “sense of number’’ to “sense of magnitude’’: The role of continuous magnitudes in numerical cognition. Behavioral and Brain Sciences, 40, e164.CrossRefPubMed Leibovich, T., Katzin, N., Harel, M., & Henik, A. (2017). From “sense of number’’ to “sense of magnitude’’: The role of continuous magnitudes in numerical cognition. Behavioral and Brain Sciences, 40, e164.CrossRefPubMed
go back to reference Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L. (2014). Microsoft coco: Common objects in context. Computer vision- eccv 2014: 13th european conference, zurich, switzerland, september 6-12, 2014, proceedings, part v 13 (pp. 740–755). Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L. (2014). Microsoft coco: Common objects in context. Computer vision- eccv 2014: 13th european conference, zurich, switzerland, september 6-12, 2014, proceedings, part v 13 (pp. 740–755).
go back to reference Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J. (2023). Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499. Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J. (2023). Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:​2303.​05499.
go back to reference Lourenco, S. F., & Aulet, L. S. (2023). A theory of perceptual number encoding. Psychological Review, 130(1), 155.CrossRefPubMed Lourenco, S. F., & Aulet, L. S. (2023). A theory of perceptual number encoding. Psychological Review, 130(1), 155.CrossRefPubMed
go back to reference Melcher, D., & Piazza, M. (2011). The role of attentional priority and saliency in determining capacity limits in enumeration and visual working memory. PloS one, 6(12), e29296.CrossRefPubMedPubMedCentral Melcher, D., & Piazza, M. (2011). The role of attentional priority and saliency in determining capacity limits in enumeration and visual working memory. PloS one, 6(12), e29296.CrossRefPubMedPubMedCentral
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781.
go back to reference Nasr, K., Viswanathan, P., & Nieder, A. (2019). Number detectors spontaneously emerge in a deep neural network designed for visual object recognition. Science Advances, 5(5), eaav7903.CrossRefPubMedPubMedCentral Nasr, K., Viswanathan, P., & Nieder, A. (2019). Number detectors spontaneously emerge in a deep neural network designed for visual object recognition. Science Advances, 5(5), eaav7903.CrossRefPubMedPubMedCentral
go back to reference Nieder, A. (2005). Counting on neurons: the neurobiology of numerical competence. Nature Reviews Neuroscience, 6(3), 177–190.CrossRefPubMed Nieder, A. (2005). Counting on neurons: the neurobiology of numerical competence. Nature Reviews Neuroscience, 6(3), 177–190.CrossRefPubMed
go back to reference Odic, D., & Oppenheimer, D. M. (2023). Visual numerosity perception shows no advantage in real-world scenes compared to artificial displays. Cognition, 230, 105291.CrossRefPubMed Odic, D., & Oppenheimer, D. M. (2023). Visual numerosity perception shows no advantage in real-world scenes compared to artificial displays. Cognition, 230, 105291.CrossRefPubMed
go back to reference Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21, 1112–1130.CrossRef Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21, 1112–1130.CrossRef
go back to reference Piantadosi, S. T. (2016). A rational analysis of the approximate number system. Psychonomic Bulletin & Review, 23, 877–886.CrossRef Piantadosi, S. T. (2016). A rational analysis of the approximate number system. Psychonomic Bulletin & Review, 23, 877–886.CrossRef
go back to reference Piazza, M., Facoetti, A., Trussardi, A. N., Berteletti, I., Conte, S., Lucangeli, D., & Zorzi, M. (2010). Developmental trajectory of number acuity reveals a severe impairment in developmental dyscalculia. Cognition, 116(1), 33–41.CrossRefPubMed Piazza, M., Facoetti, A., Trussardi, A. N., Berteletti, I., Conte, S., Lucangeli, D., & Zorzi, M. (2010). Developmental trajectory of number acuity reveals a severe impairment in developmental dyscalculia. Cognition, 116(1), 33–41.CrossRefPubMed
go back to reference Sanford, E. M., & Halberda, J. (2024). Non-numerical features fail to predict numerical performance in real-world stimuli. Cognitive Development, 69, 101415.CrossRef Sanford, E. M., & Halberda, J. (2024). Non-numerical features fail to predict numerical performance in real-world stimuli. Cognitive Development, 69, 101415.CrossRef
go back to reference Starr, A., DeWind, N. K., & Brannon, E. M. (2017). The contributions of numerical acuity and non-numerical stimulus features to the development of the number sense and symbolic math achievement. Cognition, 168, 222–233.CrossRefPubMed Starr, A., DeWind, N. K., & Brannon, E. M. (2017). The contributions of numerical acuity and non-numerical stimulus features to the development of the number sense and symbolic math achievement. Cognition, 168, 222–233.CrossRefPubMed
go back to reference Stoianov, I., & Zorzi, M. (2012). Emergence of a’visual number sense’in hierarchical generative models. Nature Neuroscience, 15(2), 194–196.CrossRefPubMed Stoianov, I., & Zorzi, M. (2012). Emergence of a’visual number sense’in hierarchical generative models. Nature Neuroscience, 15(2), 194–196.CrossRefPubMed
go back to reference Sullivan, J., Mei, M., Perfors, A., Wojcik, E., & Frank, M. C. (2021). Saycam: A large, longitudinal audiovisual dataset recorded from the infant’s perspective. Open mind, 5, 20–29.CrossRefPubMedPubMedCentral Sullivan, J., Mei, M., Perfors, A., Wojcik, E., & Frank, M. C. (2021). Saycam: A large, longitudinal audiovisual dataset recorded from the infant’s perspective. Open mind, 5, 20–29.CrossRefPubMedPubMedCentral
go back to reference Testolin, A., Dolfi, S., Rochus, M., & Zorzi, M. (2020). Visual sense of number vs sense of magnitude in humans and machines. Scientific Reports, 10(1), 10045.CrossRefPubMedPubMedCentral Testolin, A., Dolfi, S., Rochus, M., & Zorzi, M. (2020). Visual sense of number vs sense of magnitude in humans and machines. Scientific Reports, 10(1), 10045.CrossRefPubMedPubMedCentral
go back to reference Testolin, A., Zou, W. Y., & McClelland, J. L. (2020). Numerosity discrimination in deep neural networks: Initial competence, developmental refinement and experience statistics. Developmental Science, 23(5), e12940.CrossRefPubMed Testolin, A., Zou, W. Y., & McClelland, J. L. (2020). Numerosity discrimination in deep neural networks: Initial competence, developmental refinement and experience statistics. Developmental Science, 23(5), e12940.CrossRefPubMed
go back to reference Wang, W., Shen, J., Xie, J., Cheng, M.-M., Ling, H., & Borji, A. (2019). Revisiting video saliency prediction in the deep learning era. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 220–237.CrossRef Wang, W., Shen, J., Xie, J., Cheng, M.-M., Ling, H., & Borji, A. (2019). Revisiting video saliency prediction in the deep learning era. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 220–237.CrossRef
go back to reference Zorzi, M., & Testolin, A. (2018). An emergentist perspective on the origin of number sense. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1740), 20170043.CrossRef Zorzi, M., & Testolin, A. (2018). An emergentist perspective on the origin of number sense. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1740), 20170043.CrossRef
go back to reference Zorzi, M., Testolin, A., & Stoianov, I. P. (2013). Modeling language and cognition with deep unsupervised learning: A tutorial overview. Frontiers in Psychology, 4, 515.CrossRefPubMedPubMedCentral Zorzi, M., Testolin, A., & Stoianov, I. P. (2013). Modeling language and cognition with deep unsupervised learning: A tutorial overview. Frontiers in Psychology, 4, 515.CrossRefPubMedPubMedCentral
Metagegevens
Titel
Estimating the distribution of numerosity and non-numerical visual magnitudes in natural scenes using computer vision
Auteurs
Kuinan Hou
Marco Zorzi
Alberto Testolin
Publicatiedatum
01-02-2025
Uitgeverij
Springer Berlin Heidelberg
Gepubliceerd in
Psychological Research / Uitgave 1/2025
Print ISSN: 0340-0727
Elektronisch ISSN: 1430-2772
DOI
https://doi.org/10.1007/s00426-024-02064-2