Elsevier

Neuropsychologia

Volume 91, October 2016, Pages 9-28
Neuropsychologia

Uncovering the neural magnitude and spatio-temporal dynamics of natural image categorization in a fast visual stream

https://doi.org/10.1016/j.neuropsychologia.2016.07.028Get rights and content

Highlights

  • Magnitude of a face-selective response is about 4 µV from frequency-domain analysis.

  • Onset of the face-selective response is about 100 ms post-stimulus presentation.

  • Duration of this high-level face-selective response is about 420 ms.

  • Spatial and temporal response progression across four successive time-windows.

  • Empirical evidence for the summation of multi-harmonic frequency responses.

Abstract

Perceptual categorization occurs rapidly under natural viewing conditions. Yet, the neural spatio-temporal dynamics of category-selective processes to single-glanced, natural (i.e., unsegmented) images in a rapidly changing presentation stream remain unknown. We presented human observers with natural images of objects at a fast periodic rate of 12.5 Hz, i.e., every 80 ms. Images of faces were inserted every 3, 5, 7, 9, or 11 stimuli, defining stimulus-onset-asynchronies (SOAs) between 240–880 ms, i.e., presentation frequencies (Fs) between 4.17–1.14 Hz. Robust face-selective responses were objectively identified and quantified at F and its harmonics (2F, 3F, etc.) for every condition in the electroencephalogram (EEG). The summed-harmonic face-selective response was significantly reduced by 25% at the lowest face SOA, i.e. 240 ms between two faces, but remained stable from 400 ms SOA onward. This high-level, right lateralized face-selective response emerged at about 100 ms post-stimulus onset and progressed spatially throughout four successive time-windows (i.e., P1-face, N1-face, P2-face, P3-face) from posterior to anterior occipito-temporal electrode sites. The total duration of a category-selective response to a briefly presented face stimulus in a rapid sequence of objects was estimated to be 420 ms. Uncovering the neural spatio-temporal dynamics of category-selectivity in a rapid stream of natural images goes well beyond previous evidence obtained from spatially and temporally isolated stimuli, opening an avenue for understanding human vision and its relationship to categorization behavior.

Introduction

Perceptual categorization, the process by which sensory events are differentiated and classified in subgroups, is critical in enabling human interaction with the world. Human faces, which carry ecologically important social information, constitute the most salient class of visual images for understanding perceptual categorization. Indeed, faces can be differentiated from other objects with astounding accuracy and speed (Hershler and Hochstein, 2005, Crouzet et al., 2010, Hershler et al., 2010, Crouzet and Thorpe, 2011, Scheirer et al., 2014). Furthermore, the perception of segmented images of faces is known to elicit a large, widely distributed and partly specific neural response in the human ventral occipito-temporal (VOT) cortex, with a right hemisphere advantage (Sergent et al., 1992, Allison et al., 1994, Allison et al., 1999, Puce et al., 1995, Kanwisher et al., 1997, Weiner and Grill-Spector, 2010, Rossion et al., 2012a, Zhen et al., 2015).

Scalp electroencephalography (EEG), or more rarely magnetoencephalography (MEG), defines the speed and temporal dynamics of face-selective responses in the millisecond range at a system-level of organization. Most significantly, an early response peaking at about 170 ms following stimulus onset (i.e., the N170/VPP complex) differs in amplitude in response to faces compared to other object categories (Jeffreys, 1989, Jeffreys and Tukmachi, 1992, Bötzel et al., 1995, Bentin et al., 1996, Eimer, 2000, Halgren et al., 2000, Rossion et al., 2000, Itier and Taylor, 2004, Rousselet et al., 2008, Ganis et al., 2012; for reviews, Rossion and Jacques, 2011, Rossion, 2014a). Contrary to earlier, potentially spurious differences between faces and objects, this N170 selectivity to faces is not accounted for by Fourier amplitude information, which carries global low-level statistical properties of images (Rossion and Caharel, 2011; see also Tanskanen et al., 2005, Rousselet et al., 2008).

However, crucially, despite natural viewing conditions providing us with continually changing streams of information in complex scenes, categorization of faces, and perceptual categorization in general, have almost exclusively been investigated at the behavioral and neural level with images presented in spatial and temporal isolation.

Spatial isolation refers to face and non-face object stimuli being segmented from their natural backgrounds. In the rare use of natural images (Itier and Taylor, 2004, Rousselet et al., 2004, Rousselet et al., 2007, Hershler and Hochstein, 2005, Hershler et al., 2010, Crouzet et al., 2010, Cauchoix et al., 2014), controlling for low-level statistical properties differing between faces and objects (e.g., Torralba and Oliva, 2003, VanRullen, 2006, Keil, 2008) is particularly challenging, and their contribution to behavioral and neural face-selective responses is difficult to exclude (Itier and Taylor, 2004, VanRullen, 2006, Rousselet et al., 2007, Cerf et al., 2008, Honey et al., 2008, Crouzet and Thorpe, 2011, Cauchoix et al., 2014; but see Hershler and Hochstein, 2006).

Temporal isolation of the stimuli of interest is the norm in behavioral and neural studies of perceptual categorization and refers to the stimuli being presented as unique events separated by long and often variable stimulus onset asynchronies (SOAs). Alternatively, a train of stimuli with brief SOAs is sometimes used in neuroimaging (i.e., a block design), but the responses to the individual stimuli are lumped into a global brain response. Moreover, in EEG/MEG studies, unmasked faces and nonface objects are typically presented for a long stimulus duration (e.g., Bötzel et al., 1995: 3–4 s; Rossion et al., 2000: 500 ms; Crouzet et al., 2010: 400 ms; Ganis et al., 2012: 800 ms; Carlson et al., 2013: 533 ms; Cauchoix et al., 2014: 300–600 ms; Cichy et al., 2014: 500 ms) and SOAs of 1–2 s at least. Thus, object/face categorization may appear to be a prolonged process (Cichy et al., 2014, Mur and Kriegeskorte, 2014) merely because of this long and uninterrupted stimulus duration: in reality, while a single glance suffices for categorization of faces (Crouzet et al., 2010), the duration of category-selective processes from this brief encounter with the stimulus, in context, remains completely unknown.

An alternative stimulus presentation mode has been offered by rapid serial visual presentation (RSVP), which has been used with natural stimuli presented in such rapid succession that they are backward- and forward-masked and may be visible for only a single glance; this technique has been employed to investigate the contributions of memory and attention processes to behavioral image recognition over time (Potter and Levy, 1969, Potter, 2012, Potter et al., 2014). However, to derive behavioral performance (i.e., detection) from RSVP, a limited number of stimuli are presented in each sequence (i.e., fewer than 20 in the previously cited studies), and the rapidity of within-category stimulus presentation (e.g., as fast as 13 ms per stimulus in Potter et al. (2014)) limits the availability of temporal information in response to a stimulus category at a neural system-level.

Here, we provide the first comprehensive report of the magnitude, onset, and duration, or more generally the temporal dynamics, of the differential neural response between natural images of faces and other object categories viewed at a single glance within a rapid visual presentation stream. The approach that we use is termed Fast Periodic Visual Stimulation (FPVS; Rossion, 2014b), in which stimuli are presented in a fast periodic stimulation stream (here, at 12.5 Hz, i.e., one stimulus every 80 ms) while EEG is recorded. Similarly to RSVP, natural and highly variable images are forward- and backward-masked and are visible only long enough to be seen in a single fixation. However, since a neural response to a selected image category (i.e., faces) is investigated here rather than an explicit behavioral response, we are able to present long stimulation sequences (2 min sequences, each containing about 1500 images, i.e., 200 s/12.5 Hz) and to periodically embed face images within the sequence at a lower rate, for instance every five items (i.e., 400 ms). Thus, we build on a recently introduced FPVS-EEG paradigm to measure high-level perceptual categorization in the human adult (Rossion et al., 2015, Jacques et al., 2016b; Jonas et al., 2016) and infant (de Heering and Rossion, 2015) brain.

Since face stimuli are presented periodically in this paradigm as a proportion of rapidly presented images from various non-face object categories, two distinct response types emerge in the EEG recording: 1) general visual responses synchronized with the base presentation rate of object stimuli (here 12.5 Hz) and 2) face-selective responses, representing differential responses to faces in contrast to non-face objects, present at the slower face stimulation rate (Fig. 1). In these conditions, note that a response to faces would not emerge were it identical to the response to non-face objects: thus, the response at the face stimulation rate inherently represents the differential response to faces, eliminating the need for post-hoc subtraction across conditions (Rossion et al., 2015, Jacques et al., 2016b; Jonas et al., 2016; see also Liu-Shuang et al. (2014)). Moreover, the response to faces in this paradigm reflects not only discrimination (since faces are contrasted to numerous object categories, about 250 variable object stimuli in total), but also generalization (i.e. invariance) across face exemplars (about 50 different face stimuli are used, varying in background, identity, expression, size, viewpoint and lighting conditions), and thus truly reflects face categorization (see Fig. 1).

Additionally, the embedded periodic presentation of a natural image category within the periodic base stimulation stream allows the contribution of low-level image features to the face-selective response to be restricted with minimal artificial stimulus standardization: putative amplitude spectrum differences that may vary across face and non-face images on average, but which do not vary consistently within the face stimulus set, are not present periodically and so are not captured at the face presentation rate. The variance within the face stimulus set is put in competition with the variability of a large number of natural stimuli in the non-face stimulus set: changes of local contrast, luminance and spatial frequency that occur at every stimulation cycle project to the 12.5 Hz base stimulus presentation rate. Finally, low-level visual cues which might vary systematically (i.e., periodically) at the slower face stimulation rate are reduced by the variability within the natural face stimulus set. Thus, electrophysiological activity at the face-stimulation rate reflects high-level face-selective responses that are absent when the amplitude spectrum is preserved, i.e., for periodically presented phase-scrambled face stimuli vs. phase-scrambled non-face object stimuli (Rossion et al., 2015, de Heering and Rossion, 2015).

An important advantage of a FPVS-EEG categorization paradigm is that it enables the review of the EEG data in both the frequency and time domains, each providing its unique advantages. The periodicity of the stimulus presentation can be exploited in the EEG frequency domain, which captures periodic responses exactly at the frequency (or frequencies) of stimulation.1 Such periodic responses, typically referred to as “Steady-State Visual Evoked Potentials” (SSVEPs, Regan, 1966, 1989; Norcia et al., 2015), are known for their objective localization and extremely high signal-to-noise ratio (SNR) in the frequency domain, and will be utilized here for face-selective response quantification. Moreover, the spatio-temporal dynamics of the face-selective response may be observed in the time domain: given the relatively low stimulation frequencies of face stimuli afforded by their spaced placement within the relatively fast presentation stream, FPVS-EEG is able to provide a rich description of information flow in response to faces in the time domain (e.g., Dzhelyova and Rossion, 2014, Rossion et al., 2015, Jacques et al., 2016b).

In summary, the specific goals of the present study were to exploit the FPVS-EEG paradigm to determine for the first time the magnitude of comprehensive face-categorization responses in a rapid visual stream of non-face objects, as well as to define their exact onset, duration and spatio-temporal pattern. These goals were achieved by 1) modifying the base stimulus presentation rate (i.e., 12.5 Hz here vs. 5.88 Hz previously) to severely constrain stimulus duration and segregate in time and space (i.e., scalp topography) face-selective responses from this fast base rate response (Alonso-Prieto et al., 2013); 2) quantifying multi-harmonic face-selective responses at a group and individual level across five manipulations of temporal distance between face stimuli, i.e., face SOAs, in the rapid visual stimulation stream.; and 3) comparing a typically used sinusoidal contrast modulation stimulation mode to an abrupt (i.e., squarewave) stimulation (Experiment 2) in order to determine the exact onset, propagation and temporal dynamics of face-selective responses in a rapid stimulation stream.

Section snippets

Participants

Sixteen healthy participants (age range 19–25 years, 8 female), from whom no data was rejected, were tested individually in a single EEG recording session for Experiment 1. All participants reported normal or corrected to normal vision and all were right-handed according to an adapted Edinburgh Handedness Inventory measurement (Oldfield, 1971). Participants were recruited from a university campus and received monetary compensation for their time. Signed informed consent was given by all

The face-categorization response is distributed across frequency-characterized harmonics

A significant response maximal over the right occipito-temporal ROI at the fundamental face stimulation frequency (F) for each condition was revealed in the grand-averaged frequency-domain amplitude spectrum (Fig. 2A). Additional harmonic frequency face-categorization responses (i.e., 2F, 3F, etc.) also emerged clearly: between 4 and 14 significant harmonic responses were identified for each condition (including the fundamental, i.e., first, harmonic; Table 1).2

Discussion

In the following sections we will discuss: 4.1) multi-harmonic frequency-domain response quantification; 4.2) the magnitude of the face-selective response; 4.3) the 100-ms onset of the face-selective response; 4.4) the 420-ms duration of this response; 4.5) the spatio-temporal dynamics of the face-selective response; and 4.6) cyclical and acyclical electrophysiological responses; we will finish with the Summary and Perspectives.

Summary and perspectives

In measuring differential responses evoked by briefly presented natural images of faces inserted periodically in streams of natural object images, this study provides the first comprehensive report of the magnitude (at the group and individual levels), onset, duration, and spatio-temporal dynamics of category-selective responses in a rapid and continuously changing visual stream of stimulation.

Validating a multi-harmonic frequency domain response quantification, we report the magnitude of a

Acknowledgments

This work was supported by the European Research Council (ERC; grant number facessvep 284025 to BR) an “Action de Recherche Concertee” grant (ARC; 13/18-053) and the Belgian National Foundation for Scientific Research (FNRS; grant number FC7159 to TR). The authors have no conflict of interests to report.

References (121)

  • C. Jacques et al.

    A single glance at natural face images generates larger and qualitatively different category-selective spatio-temporal signatures than other ecologically-relevant categories in the human brain

    NeuroImage

    (2016)
  • H. Liu et al.

    Timing, timing, timing: fast decoding of object information from intracranial field potentials in human visual cortex

    Neuron

    (2009)
  • J. Liu-Shuang et al.

    An objective index of individual face discrimination in the right occipito-temporal cortex by means of fast periodic visual stimulation

    Neuropsychologia

    (2014)
  • E. Maris et al.

    Nonparametric statistical testing of EEG- and MEG-data

    J. Neurosci. Methods

    (2007)
  • A. Mouraux et al.

    Across-trial averaging of event-related EEG responses and beyond

    Magn. Reson. Imaging

    (2008)
  • D. Nemrodov et al.

    Is rapid adaptation paradigm too rapid? Implications forface and object processing

    NeuroImage

    (2012)
  • A.M. Norcia et al.

    Measurement of spatial contrast sensitivity with the swept contrast VEP

    Vis. Res.

    (1989)
  • Y. Okazaki et al.

    The timing of face selectivity and attentional modulation in visual processing

    Neuroscience

    (2008)
  • R.C. Oldfield

    The assessment and analysis of handedness: the Edinburgh inventory

    Neuropsychologia

    (1971)
  • D. Regan

    Some early uses of evoked brain responses in investigations of human visual function

    Vis. Res.

    (2009)
  • B. Rossion et al.

    Does physical interstimulus variance account for early electrophysiological face sensitive responses in the human brain? Ten lessons on the N170

    NeuroImage

    (2008)
  • B. Rossion et al.

    ERP evidence for the speed of face categorization in the human brain: disentangling the contribution of low-level visual cues from face perception

    Vis. Res.

    (2011)
  • B. Rossion et al.

    A steady-state visual evoked potential approach to individual face perception: effect of inversion, contrast-reversal and temporal dynamics

    NeuroImage

    (2012)
  • B. Rossion et al.

    Defining face perception areas in the human brain: a large-scale factorial fMRI face localizer analysis

    Brain Cogn.

    (2012)
  • B. Rossion

    Understanding face perception by means of human electrophysiology

    Trends Cognit. Sci.

    (2014)
  • T. Allison et al.

    Human extrastriate visual cortex and the perception of faces, words, numbers, and colors

    Cereb. Cortex

    (1994)
  • T. Allison et al.

    Electrophysiological studies of human face perception I: potentials generated in occipitotemporal cortex by face and non-face stimuli

    Cereb. Cortex

    (1999)
  • E.A. Alonso-Prieto et al.

    The 6 Hz fundamental frequency rate for individual face discrimination in the right occipito-temporal cortex

    Neuropsychologia

    (2013)
  • L.G. Appelbaum et al.

    Cue-invariant networks for figure and background processing in human visual cortex

    J. Neurosci.

    (2006)
  • G. Avidan et al.

    Selective dissociation between core and extended regions of the face Processing network in congenital prosopagnosia

    Cereb. Cortex

    (2013)
  • V. Axelrod et al.

    Hierarchical processing of face viewpoint in human visual cortex

    J. Neurosci.

    (2012)
  • G.C. Baylis et al.

    Functional subdivisions of the temporal lobe neocortex

    J. Neurosci.

    (1987)
  • S. Bentin et al.

    Electrophysiological studies of face perception in humans

    J. Cognit. Neurosci.

    (1996)
  • M.M. Bieniek et al.

    A robust and representative lower bound on object processing speed in humans

    Eur. J. Neurosci.

    (2015)
  • M.C. Booth et al.

    View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex

    Cereb. Cortex

    (1998)
  • K. Bötzel et al.

    Scalp topography and analysis of intracranial sources of face-evoked potentials

    Exp. Brain Res.

    (1995)
  • O.J. Braddick et al.

    Orientation-specific cortical responses develop in early infancy

    Nature

    (1986)
  • H. Bukowski et al.

    Cerebral lateralization of face-sensitive areas in left-handers: only the FFA does not get in right

    Cortex

    (2013)
  • T. Carlson et al.

    Representational dynamics of object vision: the first 1000 ms

    J. Vis.

    (2013)
  • M. Cauchoix et al.

    The neural dynamics of face detection in the wild revealed by MVPA

    J. Neurosci.

    (2014)
  • M. Cerf et al.

    Predicting human gaze using low-level saliency combined with face detection

  • R.M. Cichy et al.

    Resolving human object recognition in space and time

    Nat. Neurosci.

    (2014)
  • S.M. Crouzet et al.

    Fast saccades toward faces: face detection in just 100 ms

    J. Vis.

    (2010)
  • S.M. Crouzet et al.

    Low level cues and ultra-fast face detection

    Front. Psychol.

    (2011)
  • I. Davidesco et al.

    Spatial and object-based attention modulates broadband high-frequency responses across the human visual cortical hierarchy

    J. Neurosci.

    (2013)
  • A. de Heering et al.

    Rapid categorization of natural face images in the infant right hemisphere

    eLife

    (2015)
  • M. Dzhelyova et al.

    Supra-additive contribution of shape and surface information to individual face discrimination as revealed by fast periodic visual stimulation

    J. Vis.

    (2014)
  • B. Duchaine et al.

    A revised neural framework for face processing

    Annu. Rev. Vis. Sci.

    (2015)
  • S. Eifuku et al.

    Neural representations of personally familiar and unfamiliar faces in the anterior inferior temporal cortex of monkeys

    PLoS One

    (2011)
  • M. Eimer

    Does the face-specific N170 component reflect the activity of a specialized eye processor?

    NeuroReport

    (1998)
  • Cited by (117)

    View all citing articles on Scopus
    View full text