Effective connectivity in the neural network underlying coarse-to-fine categorization of visual scenes. A dynamic causal modeling study
Introduction
Natural scenes are complex stimuli that are processed and categorized by the visual system in a remarkably fast and reliable fashion. Supported by convergent data on the functional neuroanatomy of visual pathways (Van Essen & Deyoe, 1995), neurophysiological recordings in primates (De Valois et al., 1982, De Valois et al., 1982, Hupé et al., 2001, Poggio, 1972, Shams and von der Malsburg, 2002; for a review, see Bullier, 2001), and psychophysical results in humans (Ginsburg, 1986, Hughes et al., 1996, Parker et al., 1992, Schyns and Oliva, 1994), several models of visual perception (Bar, 2003, Bar, 2007, Bullier, 2001, Hegdé, 2008, Kauffmann et al., 2014, Schyns and Oliva, 1994) suggest that visual analysis begins with the parallel extraction of different visual elementary features at different spatial frequencies following a predominantly coarse-to-fine default processing sequence. Low spatial frequencies (LSF) in scenes, conveyed by fast magnocellular pathways, provide coarse information about the stimulus (e.g., the global shape and structure of a scene), whereas high spatial frequencies (HSF) in scenes, conveyed more slowly by the parvocellular pathways, provide finer information about the stimulus (e.g., the edges and borders of an object in the scene).
One influential model (Bullier, 2001) proposed an integrated model of visual processing based on neurophysiological recordings in primates. According to this, information (such as LSF in scenes) conveyed rapidly by magnocellular pathways reaches the visual cortex first and then proceeds to high-order areas (parietal, frontal, and temporal areas) to generate a first-pass analysis of visual scenes. Results from these first-pass computations might be then retroinjected through feedback signals into lower level areas (e.g., primary visual cortex, V1) in time for the arrival of the visual information (such as HSF in scenes) conveyed more slowly by the parvocellular pathway. This retroinjected information is used to guide further processing of parvocellular information in high-order areas of the ventral cortical stream that ultimately mediates recognition (inferotemporal areas). The primary visual cortex may therefore act as an “active blackboard,” integrating computations made by higher-order cortical areas. However, this model did not explicitly address the role of spatial frequencies in visual processing. Bar et al. (2006) investigated the neural correlates and time-course of spatial frequency processing during recognition of objects in humans using drawings of objects as stimuli. Using MEG, they showed firstly that recognition of non-filtered objects elicited activation in the orbitofrontal cortex before activation of the fusiform gyrus in the inferotemporal cortex. They also demonstrated strong synchrony between the orbitofrontal and temporal cortices during object recognition, suggesting important functional interactions between these regions. Using fMRI, they went on to investigate whether activity in the orbitofrontal cortex was driven by a particular spatial frequency band. They presented drawings of objects filtered in either LSF or HSF, and observed activation of the orbitofrontal cortex for LSF stimuli, but not for HSF stimuli. Overall, their results suggest that the early activation of orbitofrontal cortex observed in MEG was in fact driven by the LSF content. The authors hypothesized a top-down facilitation mechanism, postulating that LSF information is projected rapidly, possibly via the magnocellular pathway of the dorsal cortical stream, from early visual areas to the orbitofrontal cortex, where it activates plausible interpretations of the visual input. Results of these computations are, according to their hypothesis, then projected to the fusiform gyrus, in the inferotemporal cortex, where object recognition is achieved. Kveraga, Boshyan, and Bar (2007) subsequently used dynamic causal modeling (DCM) in an fMRI study to investigate the interaction between the cortical structures involved in object recognition. They used drawings of objects as stimuli that were either preferentially processed by the magnocellular pathway (achromatic and low-luminance contrast drawings) or the parvocellular pathway (chromatically defined and isoluminant drawings). The study showed that magnocellular-biased stimuli (compared to parvocellular-biased stimuli) elicited strong activation in the orbitofrontal cortex, whereas parvocellular-biased stimuli induced stronger activation in the fusiform gyrus. They also showed that the processing of magnocellular-biased stimuli resulted in an increase in connectivity strength from the occipital cortex to the orbitofrontal cortex and from the orbitofrontal cortex to the fusiform gyrus, while the processing of parvocellular-biased stimuli resulted in increasing connectivity strength from the occipital cortex to the fusiform cortex. These results provided additional support for the existence of a top-down facilitation mechanism (Bar et al., 2006) that may be triggered by magnocellular information, such as LSF, projected early and rapidly to the orbitofrontal cortex. Overall, data from neurophysiological recordings in primates (Bullier, 2001) and object recognition in humans (Bar, 2007, Bar et al., 2006, Kveraga et al., 2007) have offered useful insights into the neural bases of a predominantly coarse-to-fine default processing of spatial frequencies. Critically, although these studies assumed that a coarse-to-fine processing occurs during visual perception, the order of spatial frequency presentation was not manipulated.
Peyrin et al. (2010) combined fMRI and ERP in the same participants and directly investigated the neural bases of coarse-to-fine processing of scenes. They used sequences of two spatial frequency-filtered scenes in rapid succession as stimuli, with either a coarse-to-fine sequence (LSF scene followed by an HSF scene), or a fine-to-coarse sequence (HSF scene followed by an LSF scene) in order to experimentally “mimic” the sequential processing of spatial frequencies and impose a “coarse-to-fine” versus “fine-to-coarse” processing of scenes. Participants had to decide whether the two scenes belonged to the same category. fMRI results showed firstly that coarse-to-fine sequences elicited greater activation of the prefrontal cortex (in the inferior frontal gyrus), the temporo-parietal areas, and the occipital gyrus compared to fine-to-coarse sequences. ERPs were used to provide the time course of activation in these regions. Critically, the inferior frontal gyrus was activated twice during the coarse-to-fine sequences: during processing of the first image in LSF, and then at the onset of the second image in HSF. Furthermore, specific activation of the occipital cortex for coarse-to-fine sequences was observed during processing of HSF scenes. Importantly, activation of the occipital cortex was only observed when HSF scenes were preceded by LSF scenes. These results therefore added support to Bar et al. (2006) and Bar (2007), and suggest that LSF information may activate the prefrontal cortex rapidly, in order to generate predictions about the visual category of the scene, and trigger top-down influences that subsequently constrain processing of the HSF scene. Furthermore, based on the integrated model of visual processing (Bullier, 2001), activation in the occipital cortex was interpreted as corresponding to the cortical site on which higher order areas can exert modulatory influences through feedback signals to guide further processing of parvocellular information (HSF) into higher order areas of the ventral visual stream (fusiform gyrus). However, Peyrin et al. (2010) did not investigate the interaction between the cortical structures and the causal influence (effective connectivity) of the high-order areas (either belonging to the dorsal or ventral streams) on the occipital cortex.
The present fMRI study aimed to further explore the neural bases of coarse-to-fine scene analysis and to determine their interactions, paying particular attention to the role of the occipital cortex. As in Peyrin et al. (2010), we explicitly manipulated the order of spatial frequency presentation. We used sequences, composed of six filtered versions of a scene assembled from LSF to HSF, allowing us to impose a coarse-to-fine processing of scenes. We also used reverse fine-to-coarse sequences, in which the same scenes were assembled from HSF to LSF, as control stimuli. Participants performed a categorization task on these stimuli (indoor vs. outdoor). Stimuli were adapted from previous studies (Musel, Chauvin, Guyader, Chokron, & Peyrin, 2012) which have shown more rapid categorization of coarse-to-fine than fine-to-coarse sequences. These studies provided new arguments in favor of a predominantly coarse-to-fine categorization of natural scenes, and suggested that such sequences are could be appropriate for the investigation of the neural substrates of coarse-to-fine processing.
We began by identifying the cerebral regions specifically involved in coarse-to-fine processing, compared to fine-to-coarse processing. Based on previously-mentioned studies (Bar et al., 2006, Kveraga et al., 2007, Peyrin et al., 2010), we expected that coarse-to-fine sequences would elicit greater activation in three cortical hubs: the occipital cortex, the prefrontal cortex, and the inferotemporal cortex. We then used DCM (Friston, Harrison, & Penny, 2003) to further explore the dynamic interactions between these regions. DCM is a generic Bayesian framework used to infer directed connectivity between a pre-defined set of cortical regions, based on fMRI time series from these regions. More specifically, DCM enables to estimate and make inference about how one neural system influences another (i.e. effective connectivity), and how it can be affected by the experimental context. To our knowledge, only one study has used DCM to investigate the effective connectivity between these three cortical hubs in the theoretical context of a top-down facilitation mechanism for visual recognition (Kveraga et al., 2007). Their DCM analysis did not, however, integrate potential feedback connections from high order areas to the occipital cortex. In the present study, our DCM analysis was driven by results obtained by Kveraga et al. (2007). In addition, we explicitly considered potential feedback connections to the occipital cortex during coarse-to-fine processing.
Section snippets
Materials and methods
Different analyses of the present data set were reported earlier (Musel et al., 2014).
Behavioral results
Two 2 × 2 variance analyses (ANOVA) with Sequences (CtF and FtC) and Categories (outdoor and indoor) as within-subjects factors were conducted on mean error rates (mER) and mean correct reaction times (mRTs). The ANOVA conducted on mER showed no effect of Sequence (Mean ± SD; CtF: 4.02 ± 4.91%; FtC: 6.03 ± 8.84%; F1,13 = 1.69, p = 0.22), but revealed a main effect of Categories (F1,13 = 7.48, p < 0.05). Participants made more errors when categorizing indoor (5.89 ± 7.67%) than outdoor scenes (4.42 ± 6.63%). No
Discussion
The present fMRI study aimed to investigate the neural bases of the coarse-to-fine processing of scenes and to further explore their interaction. In order to do this, we used dynamic scenes depicting either coarse-to-fine or fine-to-coarse processing of scenes as stimuli. This allowed us to experimentally “mimic” the sequential processing of spatial frequencies and impose a “coarse-to-fine” versus “fine-to-coarse” processing sequence of scenes. Participants were asked to perform a
Acknowledgments
This work was supported by the RECOR ANR Grant (ANR-12-JHS2-0002-01 RECOR) and a subvention from “Grenoble Pôle Cognition”. Louise Kauffmann was supported by Région Rhône-Alpes (Cible Grants). We thank the Grenoble MRI facility “IRMaGE” for enabling us to perform the fMRI acquisitions. Grenoble MRI facility IRMaGe was partly funded by the French program “Investissement d’Avenir” run by the “Agence Nationale pour la Recherche”: Grant “Infrastructure d’Avenir en Biologie Santé”
References (88)
The proactive brain: Using analogies and associations to generate predictions
Trends in Cognitive Sciences
(2007)Connections underlying the synthesis of cognition, memory, and emotion in primate prefrontal cortices
Brain Research Bulletin
(2000)Integrated model of visual processing
Brain Research
(2001)- et al.
A diffusion tensor imaging tractography atlas for virtual in vivo dissections
Cortex
(2008) - et al.
Spatial frequency selectivity of cells in macaque visual cortex
Vision Research
(1982) - et al.
The orientation and direction selectivity of cells in macaque visual cortex
Vision Research
(1982) - et al.
Depression alters “top-down” visual attention: A dynamic causal modeling comparison between depressed and healthy subjects
NeuroImage
(2011) - et al.
The parahippocampal place area: Recognition, navigation, or encoding?
Neuron
(1999) - et al.
Dynamic causal modelling
NeuroImage
(2003) - et al.
Stochastic designs in event-related fMRI
NeuroImage
(1999)