Uncovering the neural magnitude and spatio-temporal dynamics of natural image categorization in a fast visual stream
Introduction
Perceptual categorization, the process by which sensory events are differentiated and classified in subgroups, is critical in enabling human interaction with the world. Human faces, which carry ecologically important social information, constitute the most salient class of visual images for understanding perceptual categorization. Indeed, faces can be differentiated from other objects with astounding accuracy and speed (Hershler and Hochstein, 2005, Crouzet et al., 2010, Hershler et al., 2010, Crouzet and Thorpe, 2011, Scheirer et al., 2014). Furthermore, the perception of segmented images of faces is known to elicit a large, widely distributed and partly specific neural response in the human ventral occipito-temporal (VOT) cortex, with a right hemisphere advantage (Sergent et al., 1992, Allison et al., 1994, Allison et al., 1999, Puce et al., 1995, Kanwisher et al., 1997, Weiner and Grill-Spector, 2010, Rossion et al., 2012a, Zhen et al., 2015).
Scalp electroencephalography (EEG), or more rarely magnetoencephalography (MEG), defines the speed and temporal dynamics of face-selective responses in the millisecond range at a system-level of organization. Most significantly, an early response peaking at about 170 ms following stimulus onset (i.e., the N170/VPP complex) differs in amplitude in response to faces compared to other object categories (Jeffreys, 1989, Jeffreys and Tukmachi, 1992, Bötzel et al., 1995, Bentin et al., 1996, Eimer, 2000, Halgren et al., 2000, Rossion et al., 2000, Itier and Taylor, 2004, Rousselet et al., 2008, Ganis et al., 2012; for reviews, Rossion and Jacques, 2011, Rossion, 2014a). Contrary to earlier, potentially spurious differences between faces and objects, this N170 selectivity to faces is not accounted for by Fourier amplitude information, which carries global low-level statistical properties of images (Rossion and Caharel, 2011; see also Tanskanen et al., 2005, Rousselet et al., 2008).
However, crucially, despite natural viewing conditions providing us with continually changing streams of information in complex scenes, categorization of faces, and perceptual categorization in general, have almost exclusively been investigated at the behavioral and neural level with images presented in spatial and temporal isolation.
Spatial isolation refers to face and non-face object stimuli being segmented from their natural backgrounds. In the rare use of natural images (Itier and Taylor, 2004, Rousselet et al., 2004, Rousselet et al., 2007, Hershler and Hochstein, 2005, Hershler et al., 2010, Crouzet et al., 2010, Cauchoix et al., 2014), controlling for low-level statistical properties differing between faces and objects (e.g., Torralba and Oliva, 2003, VanRullen, 2006, Keil, 2008) is particularly challenging, and their contribution to behavioral and neural face-selective responses is difficult to exclude (Itier and Taylor, 2004, VanRullen, 2006, Rousselet et al., 2007, Cerf et al., 2008, Honey et al., 2008, Crouzet and Thorpe, 2011, Cauchoix et al., 2014; but see Hershler and Hochstein, 2006).
Temporal isolation of the stimuli of interest is the norm in behavioral and neural studies of perceptual categorization and refers to the stimuli being presented as unique events separated by long and often variable stimulus onset asynchronies (SOAs). Alternatively, a train of stimuli with brief SOAs is sometimes used in neuroimaging (i.e., a block design), but the responses to the individual stimuli are lumped into a global brain response. Moreover, in EEG/MEG studies, unmasked faces and nonface objects are typically presented for a long stimulus duration (e.g., Bötzel et al., 1995: 3–4 s; Rossion et al., 2000: 500 ms; Crouzet et al., 2010: 400 ms; Ganis et al., 2012: 800 ms; Carlson et al., 2013: 533 ms; Cauchoix et al., 2014: 300–600 ms; Cichy et al., 2014: 500 ms) and SOAs of 1–2 s at least. Thus, object/face categorization may appear to be a prolonged process (Cichy et al., 2014, Mur and Kriegeskorte, 2014) merely because of this long and uninterrupted stimulus duration: in reality, while a single glance suffices for categorization of faces (Crouzet et al., 2010), the duration of category-selective processes from this brief encounter with the stimulus, in context, remains completely unknown.
An alternative stimulus presentation mode has been offered by rapid serial visual presentation (RSVP), which has been used with natural stimuli presented in such rapid succession that they are backward- and forward-masked and may be visible for only a single glance; this technique has been employed to investigate the contributions of memory and attention processes to behavioral image recognition over time (Potter and Levy, 1969, Potter, 2012, Potter et al., 2014). However, to derive behavioral performance (i.e., detection) from RSVP, a limited number of stimuli are presented in each sequence (i.e., fewer than 20 in the previously cited studies), and the rapidity of within-category stimulus presentation (e.g., as fast as 13 ms per stimulus in Potter et al. (2014)) limits the availability of temporal information in response to a stimulus category at a neural system-level.
Here, we provide the first comprehensive report of the magnitude, onset, and duration, or more generally the temporal dynamics, of the differential neural response between natural images of faces and other object categories viewed at a single glance within a rapid visual presentation stream. The approach that we use is termed Fast Periodic Visual Stimulation (FPVS; Rossion, 2014b), in which stimuli are presented in a fast periodic stimulation stream (here, at 12.5 Hz, i.e., one stimulus every 80 ms) while EEG is recorded. Similarly to RSVP, natural and highly variable images are forward- and backward-masked and are visible only long enough to be seen in a single fixation. However, since a neural response to a selected image category (i.e., faces) is investigated here rather than an explicit behavioral response, we are able to present long stimulation sequences (2 min sequences, each containing about 1500 images, i.e., 200 s/12.5 Hz) and to periodically embed face images within the sequence at a lower rate, for instance every five items (i.e., 400 ms). Thus, we build on a recently introduced FPVS-EEG paradigm to measure high-level perceptual categorization in the human adult (Rossion et al., 2015, Jacques et al., 2016b; Jonas et al., 2016) and infant (de Heering and Rossion, 2015) brain.
Since face stimuli are presented periodically in this paradigm as a proportion of rapidly presented images from various non-face object categories, two distinct response types emerge in the EEG recording: 1) general visual responses synchronized with the base presentation rate of object stimuli (here 12.5 Hz) and 2) face-selective responses, representing differential responses to faces in contrast to non-face objects, present at the slower face stimulation rate (Fig. 1). In these conditions, note that a response to faces would not emerge were it identical to the response to non-face objects: thus, the response at the face stimulation rate inherently represents the differential response to faces, eliminating the need for post-hoc subtraction across conditions (Rossion et al., 2015, Jacques et al., 2016b; Jonas et al., 2016; see also Liu-Shuang et al. (2014)). Moreover, the response to faces in this paradigm reflects not only discrimination (since faces are contrasted to numerous object categories, about 250 variable object stimuli in total), but also generalization (i.e. invariance) across face exemplars (about 50 different face stimuli are used, varying in background, identity, expression, size, viewpoint and lighting conditions), and thus truly reflects face categorization (see Fig. 1).
Additionally, the embedded periodic presentation of a natural image category within the periodic base stimulation stream allows the contribution of low-level image features to the face-selective response to be restricted with minimal artificial stimulus standardization: putative amplitude spectrum differences that may vary across face and non-face images on average, but which do not vary consistently within the face stimulus set, are not present periodically and so are not captured at the face presentation rate. The variance within the face stimulus set is put in competition with the variability of a large number of natural stimuli in the non-face stimulus set: changes of local contrast, luminance and spatial frequency that occur at every stimulation cycle project to the 12.5 Hz base stimulus presentation rate. Finally, low-level visual cues which might vary systematically (i.e., periodically) at the slower face stimulation rate are reduced by the variability within the natural face stimulus set. Thus, electrophysiological activity at the face-stimulation rate reflects high-level face-selective responses that are absent when the amplitude spectrum is preserved, i.e., for periodically presented phase-scrambled face stimuli vs. phase-scrambled non-face object stimuli (Rossion et al., 2015, de Heering and Rossion, 2015).
An important advantage of a FPVS-EEG categorization paradigm is that it enables the review of the EEG data in both the frequency and time domains, each providing its unique advantages. The periodicity of the stimulus presentation can be exploited in the EEG frequency domain, which captures periodic responses exactly at the frequency (or frequencies) of stimulation.1 Such periodic responses, typically referred to as “Steady-State Visual Evoked Potentials” (SSVEPs, Regan, 1966, 1989; Norcia et al., 2015), are known for their objective localization and extremely high signal-to-noise ratio (SNR) in the frequency domain, and will be utilized here for face-selective response quantification. Moreover, the spatio-temporal dynamics of the face-selective response may be observed in the time domain: given the relatively low stimulation frequencies of face stimuli afforded by their spaced placement within the relatively fast presentation stream, FPVS-EEG is able to provide a rich description of information flow in response to faces in the time domain (e.g., Dzhelyova and Rossion, 2014, Rossion et al., 2015, Jacques et al., 2016b).
In summary, the specific goals of the present study were to exploit the FPVS-EEG paradigm to determine for the first time the magnitude of comprehensive face-categorization responses in a rapid visual stream of non-face objects, as well as to define their exact onset, duration and spatio-temporal pattern. These goals were achieved by 1) modifying the base stimulus presentation rate (i.e., 12.5 Hz here vs. 5.88 Hz previously) to severely constrain stimulus duration and segregate in time and space (i.e., scalp topography) face-selective responses from this fast base rate response (Alonso-Prieto et al., 2013); 2) quantifying multi-harmonic face-selective responses at a group and individual level across five manipulations of temporal distance between face stimuli, i.e., face SOAs, in the rapid visual stimulation stream.; and 3) comparing a typically used sinusoidal contrast modulation stimulation mode to an abrupt (i.e., squarewave) stimulation (Experiment 2) in order to determine the exact onset, propagation and temporal dynamics of face-selective responses in a rapid stimulation stream.
Section snippets
Participants
Sixteen healthy participants (age range 19–25 years, 8 female), from whom no data was rejected, were tested individually in a single EEG recording session for Experiment 1. All participants reported normal or corrected to normal vision and all were right-handed according to an adapted Edinburgh Handedness Inventory measurement (Oldfield, 1971). Participants were recruited from a university campus and received monetary compensation for their time. Signed informed consent was given by all
The face-categorization response is distributed across frequency-characterized harmonics
A significant response maximal over the right occipito-temporal ROI at the fundamental face stimulation frequency (F) for each condition was revealed in the grand-averaged frequency-domain amplitude spectrum (Fig. 2A). Additional harmonic frequency face-categorization responses (i.e., 2F, 3F, etc.) also emerged clearly: between 4 and 14 significant harmonic responses were identified for each condition (including the fundamental, i.e., first, harmonic; Table 1).2
Discussion
In the following sections we will discuss: 4.1) multi-harmonic frequency-domain response quantification; 4.2) the magnitude of the face-selective response; 4.3) the 100-ms onset of the face-selective response; 4.4) the 420-ms duration of this response; 4.5) the spatio-temporal dynamics of the face-selective response; and 4.6) cyclical and acyclical electrophysiological responses; we will finish with the Summary and Perspectives.
Summary and perspectives
In measuring differential responses evoked by briefly presented natural images of faces inserted periodically in streams of natural object images, this study provides the first comprehensive report of the magnitude (at the group and individual levels), onset, duration, and spatio-temporal dynamics of category-selective responses in a rapid and continuously changing visual stream of stimulation.
Validating a multi-harmonic frequency domain response quantification, we report the magnitude of a
Acknowledgments
This work was supported by the European Research Council (ERC; grant number facessvep 284025 to BR) an “Action de Recherche Concertee” grant (ARC; 13/18-053) and the Belgian National Foundation for Scientific Research (FNRS; grant number FC7159 to TR). The authors have no conflict of interests to report.
References (121)
- et al.
Ultra-rapid categorisation of natural images does not rely on colour: a study in monkeys and humans
Vis. Res.
(2000) Event-related brain potentials distinguish processing stages involved in face perception and recognition
Clin. Neurophysiol.
(2000)- et al.
The N170, not the P1, indexes the earliest time for categorical perception of faces, regardless of interstimulus variance
NeuroImage
(2012) - et al.
Temporal frequency tuning of cortical face-sensitive areas for individual face perception
NeuroImage
(2014) - et al.
The dynamic allocation of attention to emotion: Simultaneous and independent evidence from the late positive potential and steady state visual evoked potentials
Biol. Psychol.
(2013) - et al.
Frequency-domain analysis of fast oddball responses to visual stimuli: A feasibility study
Int. J. Psychophys.
(2009) - et al.
At first sight: a high-level pop-out effect for faces
Vis. Res.
(2005) - et al.
With a careful look: Still no low-level confound to face pop-out
Vis. Res.
(2006) - et al.
Early electrophysiological responses to multiple face orientations correlate with individual discrimination performance in humans
NeuroImage
(2007) - et al.
Corresponding ECoG and fMRI category-selective signals in human ventral temporal cortex
Neuropsychologia
(2016)
A single glance at natural face images generates larger and qualitatively different category-selective spatio-temporal signatures than other ecologically-relevant categories in the human brain
NeuroImage
Timing, timing, timing: fast decoding of object information from intracranial field potentials in human visual cortex
Neuron
An objective index of individual face discrimination in the right occipito-temporal cortex by means of fast periodic visual stimulation
Neuropsychologia
Nonparametric statistical testing of EEG- and MEG-data
J. Neurosci. Methods
Across-trial averaging of event-related EEG responses and beyond
Magn. Reson. Imaging
Is rapid adaptation paradigm too rapid? Implications forface and object processing
NeuroImage
Measurement of spatial contrast sensitivity with the swept contrast VEP
Vis. Res.
The timing of face selectivity and attentional modulation in visual processing
Neuroscience
The assessment and analysis of handedness: the Edinburgh inventory
Neuropsychologia
Some early uses of evoked brain responses in investigations of human visual function
Vis. Res.
Does physical interstimulus variance account for early electrophysiological face sensitive responses in the human brain? Ten lessons on the N170
NeuroImage
ERP evidence for the speed of face categorization in the human brain: disentangling the contribution of low-level visual cues from face perception
Vis. Res.
A steady-state visual evoked potential approach to individual face perception: effect of inversion, contrast-reversal and temporal dynamics
NeuroImage
Defining face perception areas in the human brain: a large-scale factorial fMRI face localizer analysis
Brain Cogn.
Understanding face perception by means of human electrophysiology
Trends Cognit. Sci.
Human extrastriate visual cortex and the perception of faces, words, numbers, and colors
Cereb. Cortex
Electrophysiological studies of human face perception I: potentials generated in occipitotemporal cortex by face and non-face stimuli
Cereb. Cortex
The 6 Hz fundamental frequency rate for individual face discrimination in the right occipito-temporal cortex
Neuropsychologia
Cue-invariant networks for figure and background processing in human visual cortex
J. Neurosci.
Selective dissociation between core and extended regions of the face Processing network in congenital prosopagnosia
Cereb. Cortex
Hierarchical processing of face viewpoint in human visual cortex
J. Neurosci.
Functional subdivisions of the temporal lobe neocortex
J. Neurosci.
Electrophysiological studies of face perception in humans
J. Cognit. Neurosci.
A robust and representative lower bound on object processing speed in humans
Eur. J. Neurosci.
View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex
Cereb. Cortex
Scalp topography and analysis of intracranial sources of face-evoked potentials
Exp. Brain Res.
Orientation-specific cortical responses develop in early infancy
Nature
Cerebral lateralization of face-sensitive areas in left-handers: only the FFA does not get in right
Cortex
Representational dynamics of object vision: the first 1000 ms
J. Vis.
The neural dynamics of face detection in the wild revealed by MVPA
J. Neurosci.
Predicting human gaze using low-level saliency combined with face detection
Resolving human object recognition in space and time
Nat. Neurosci.
Fast saccades toward faces: face detection in just 100 ms
J. Vis.
Low level cues and ultra-fast face detection
Front. Psychol.
Spatial and object-based attention modulates broadband high-frequency responses across the human visual cortical hierarchy
J. Neurosci.
Rapid categorization of natural face images in the infant right hemisphere
eLife
Supra-additive contribution of shape and surface information to individual face discrimination as revealed by fast periodic visual stimulation
J. Vis.
A revised neural framework for face processing
Annu. Rev. Vis. Sci.
Neural representations of personally familiar and unfamiliar faces in the anterior inferior temporal cortex of monkeys
PLoS One
Does the face-specific N170 component reflect the activity of a specialized eye processor?
NeuroReport
Cited by (117)
The anterior fusiform gyrus: The ghost in the cortical face machine
2024, Neuroscience and Biobehavioral ReviewsVoice categorization in the four-month-old human brain
2024, Current Biology