Elsevier

NeuroImage

Volume 180, Part A, 15 October 2018, Pages 301-311
NeuroImage

Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids

https://doi.org/10.1016/j.neuroimage.2017.10.011Get rights and content

Highlights

  • Discrete, spoken phonemes can be classified with high performance from sensorimotor cortex, even before voice onset.

  • Rapid sequences of activity patterns in sensorimotor cortex reflect sequences of muscle contractions during spoken phonemes.

  • Decoding spoken phonemes benefits from inclusion of the temporal evolution of high frequency band power.

  • Decoding spoken phonemes benefits from sampling from the whole inferior sensorimotor region, with electrodes spaced 4 mm apart or less.

Abstract

For people who cannot communicate due to severe paralysis or involuntary movements, technology that decodes intended speech from the brain may offer an alternative means of communication. If decoding proves to be feasible, intracranial Brain-Computer Interface systems can be developed which are designed to translate decoded speech into computer generated speech or to instructions for controlling assistive devices. Recent advances suggest that such decoding may be feasible from sensorimotor cortex, but it is not clear how this challenge can be approached best. One approach is to identify and discriminate elements of spoken language, such as phonemes. We investigated feasibility of decoding four spoken phonemes from the sensorimotor face area, using electrocorticographic signals obtained with high-density electrode grids. Several decoding algorithms including spatiotemporal matched filters, spatial matched filters and support vector machines were compared. Phonemes could be classified correctly at a level of over 75% with spatiotemporal matched filters. Support Vector machine analysis reached a similar level, but spatial matched filters yielded significantly lower scores. The most informative electrodes were clustered along the central sulcus. Highest scores were achieved from time windows centered around voice onset time, but a 500 ms window before onset time could also be classified significantly. The results suggest that phoneme production involves a sequence of robust and reproducible activity patterns on the cortical surface. Importantly, decoding requires inclusion of temporal information to capture the rapid shifts of robust patterns associated with articulator muscle group contraction during production of a phoneme. The high classification scores are likely to be enabled by the use of high density grids, and by the use of discrete phonemes. Implications for use in Brain-Computer Interfaces are discussed.

Introduction

To function in life, it is critical to be able to communicate. Spoken and written language, as well as non-verbal expressions, allow people to interact socially. Expression of language in particular is crucial for communication of ones needs, ideas and opinions. People who are completely unable to express themselves are essentially excluded from society at every level (Bruno et al., 2011, Laureys et al., 2005, Rousseau et al., 2015). Although their numbers may be small, their predicament warrants research into ways to restore communication abilities (Chaudhary et al., 2016, Wolpaw et al., 2002). Disorders leading to severe communication disability include afflictions leading to total paralysis resulting from trauma, stroke and neurodegenerative diseases (Locked-In Syndrome)(Lulé et al., 2009), and loss of muscle coordination due to trauma or developmental disorders such as Cerebral Palsy. When some muscle control is preserved (however minimal), Assistive Technologies (AT) are available to maximally utilize intentional movements. When no control is preserved, there are no technologies available to meet the patients’ need for communication. In recent years attempts to achieve communication by means of a Brain-Computer Interface have increased, leading to promising avenues (Farwell and Donchin, 1988, Gallegos-Ayala et al., 2014, Kennedy and Bakay, 1998, McCane et al., 2015, Sellers et al., 2010, Sellers et al., 2014) but not yet to standard treatment for communication loss. Recently, however, a first case was presented where a Locked-In, late-stage ALS patient could successfully use a Brain-Computer Interface to communicate in daily life without requirement for presence of an expert (Vansteensel et al., 2016). The system was fully implanted, and allowed the patient to generate signals, obtained from electrodes directly on the motor cortex, to select items in spelling software. Non-invasive BCI solutions, using scalp EEG and the ‘P300 speller’, have also resulted in encouraging results (Farwell and Donchin, 1988, Kleih et al., 2011, McCane et al., 2015, Sellers et al., 2010), but these require considerable skill from caregivers to attach the scalp electrodes and initiate the system. The systems that currently work in select patients provide a coarse, but reliable, means to communicate, and do so by decoding specific events from the brain. They are, however, a far cry from restoring communication to a level where the user can interact with others in real-time. Nevertheless, a first step has been made on the road to restoring communication by extracting information from the cerebral cortex, encouraging further development.

Application of decoding algorithms, if conducted appropriately, can also reveal the mechanism by which the human brain translates neuronal activity to perceptions and actions (Brunner et al., 2015, Sadtler et al., 2014). As such, the fact that many of the associated cortical regions exhibit a topographical representation encourages the notion that different percepts or actions are associated with different topographical distributions of activity. This has been investigated notably in primary cortices (V1, A1, S1 and, to a lesser degree, M1), and has yielded successful identification of stimulus features by means of classifying the stimulus-induced cortical activity patterns (Bleichner et al., 2016, Branco et al., 2016, Formisano et al., 2008, Kay et al., 2008, Polimeni et al., 2010). The fact that cortical activity patterns map onto specific stimulus features supports the notion that topography reflects an orderly distribution of specific functions along the cortex, with each function being associated with one or more specific neuronal ensembles (or cortical columns) (Hubel and Wiesel, 1959, Markram, 2008, Mountcastle, 1997). Although such ensembles can be modulated in terms of response amplitude by selective attention and/or predictive mechanisms (Andersson et al., 2013, Brefczynski and DeYoe, 1999, Miall and Wolpert, 1996), and can be subject to an attention-driven shift in the exact mapping onto sensory space (Klein et al., 2014), the fact that activity patterns identify stimulus features reproducibly and robustly, indicates a certain degree of segregation of neuronal ensembles and the sensory space they code for.

Several approaches have been adopted in attempts to decode cortical activity to restore a means of communication. For EEG signals, detection of brain states has been utilized to select icons on a computer screen, by identifying a specific sensory input sequence emanating from that particular icon (visual or auditory pulse sequences which differ for each icon)(Fazel-Rezai et al., 2012). The recorded neural response to the sequence (which constitutes an amplified representation thereof) reveals which icon the person is attending to. Decoding is then tightly coupled to deliberate sensory input. Decoding internally generated actions is currently best feasible from sensorimotor cortex. With EEG the decline in amplitude of the mu rhythm (8–12 Hz, event-related desynchronization) that accompanies attempted or actual movement (McFarland et al., 2000, Pfurtscheller and Neuper, 1997), can be used also as a brain-state detector of an intentional act. Detection is here often translated to selection of an icon during a sequential icon scanning scheme (‘switch scanning’) or a unidirectional cursor movement. Neither EEG method is of much use for exploiting the fine topographical organization of the cortex. With intracranial EEG, or electrocorticography (ECoG), topographical patterns can be probed (Crone et al., 1998, Jacobs and Kahana, 2010, Miller et al., 2012). ECoG decoding approaches utilize the distribution of functionally coherent regions as is the case in the motor cortex (Bleichner et al., 2016, Bouchard and Chang, 2014, Miller et al., 2009, Schalk and Leuthardt, 2011) or visual cortex (Andersson et al., 2011). Language regions and networks may not provide adequate points of reference for decoding elements of speech since they do not exhibit a coherent topographical map (Kellis et al., 2010, Pei et al., 2011b), as seems to be the case for associative cortex in general (although some topography has been reported such as in (Harvey et al., 2013). Decoding (attempted) language production, however, is not constrained to language regions. The final stage of language production heavily depends on the sensorimotor cortex, which generates the motor commands for speaking and, for that matter, sign language (Bleichner et al., 2016, Bleichner et al., 2015, Crone et al., 2001). Given that both motor (Bleichner et al., 2016, Kellis et al., 2010, Siero et al., 2014) and somatosensory cortex (Branco et al., 2016, Sanchez-Panchuelo et al., 2012) exhibit quite detailed topographies, and that speaking involves rapid sequential patterns of muscle contractions in the face and vocal tract, the sensorimotor cortex should conceptually provide rich and coherent spatial and temporal information about what a person wants to say (Bouchard et al., 2013). Interestingly, and crucial for BCI research, research has shown that the sensorimotor activity patterns that are generated by complex hand gestures (representing letters of the American sign language alphabet for deaf people), are also generated by attempts to make these gestures in arm amputees (Lotze et al., 2001, Raffin et al., 2012, Roux et al., 2003, Bruurmijn et al., 2017). This finding suggests that actual and attempted motor acts may yield equally decodable cortical information, and that therefore research on cortical representations of speech is directly relevant for application in BCI technology for paralyzed people.

In this study, we tested the hypothesis that even the smallest elements of speech, phonemes, should provide decodable information from sensorimotor cortex for classification. This hypothesis relies on two assumptions. First, the cortical topographical representation of speech utterances such as phonemes maps onto the constellation of muscles or muscle groups that is required to produce the sound. Second, since speech involves rapid sequential schemes of muscle contractions even for phonemes, the contribution of time in the decoding algorithms should provide a significant contribution to phoneme classification (Bouchard et al., 2013, Jiang et al., 2016).

We report on a study on decoding of phoneme production from sensorimotor cortex in five patients implanted with high-density electrocorticography (ECoG) electrode grids. All patients had grids implanted for source localization of their seizures for subsequent surgical treatment of medically intractable epilepsy. In three patients, these grids were part of the clinical grid implantation plan, and in two patients the grid was placed as an addition to the clinical plan, for research purposes. All procedures were approved by the Medical Ethical Board of the hospital, and were in accordance with the Declaration of Helsinki of 2013. The ECoG grids over the sensorimotor face area had a high density of electrodes (3–4 mm center to center), allowing for detailed investigation of topographical representation of phoneme production. For decoding we focused on high-frequency broadband signal power (HFB, 65–125 Hz) (Crone et al., 1998) since this feature of the electrophysiological signal contains the most detailed and neuronal firing rate-related information (Bleichner et al., 2016, Miller et al., 2009, Siero et al., 2014). It is thought to most accurately reflect activity of neuronal ensembles, compared to other signal features (Manning et al., 2009, Miller et al., 2009, Ray and Maunsell, 2011). The density has been shown to produce independent signals between adjacent electrodes for the HFB and thus provide rich information about underlying cortical topography (Muller et al., 2016, Siero et al., 2014).

Section snippets

Subjects & data acquisition

ECoG signal was collected from five intractable epilepsy patients (Table 1) who had grids implanted subdurally over the inferior sensorimotor cortex on their right (subjects R1 and R2) or left (subjects L1, L2, and L3) hemisphere (depending on the probable location of the source of seizures). We refer to these grids as high density (HD) ECoG grids due to their high electrode density (3–4 mm center-to-center). Grids were obtained from Ad-tech Medical and PMT Corporation. Electrodes had an

Spoken phoneme ECoG classification

The main finding of our analysis was that the 5 classes (4 spoken phoneme classes plus rest) could be classified with STMF analysis, with a mean accuracy of 75.5% (sd 6.5%), at a mean empirically determined chance level of 26.4% (Table 2, Fig. 4. Given that one condition may dominate the classification of rest versus active, we also calculated classification scores for phonemes combined versus rest, and for the 4 phonemes without rest (Table 2). This revealed that active versus rest trials

Discussion

We addressed the hypothesis that elementary components of speech production, phonemes, engage the sensorimotor cortex in a decodable fashion. To this end, we conducted research in epilepsy patients with implanted HD electrode grids placed on the sensorimotor face area, and asked them to perform a phoneme production task. The cortical spatiotemporal activity patterns generated during this task proved to be highly reproducible and phoneme-specific, as evidenced by a high 5-class classification

Conclusion

A set of four phonemes could be classified with an accuracy that encourages further research on decoding speech from neuronal spatiotemporal activity patterns. The findings support and build upon reports that high-density grids on sensorimotor cortex improve decoding, and that inclusion of the finegrained temporal evolution of brain signals captures the rapid sequence of articulatory muscle groups employed in phoneme production. Whether these findings translate to decoding of attempted speech

Acknowledgements

This research was funded by the ERC-Advanced ‘iConnect’ project (grant 320708). We thank Frans Leijten, Cyrille Ferrier, Geertjan Huiskamp, and Tineke Gebbink for their help in collecting data, Peter Gosselaar and Peter van Rijen for implanting the electrodes, as well as the technicians, the staff of the clinical neurophysiology department and the subjects for their time and effort. We also thank the members of the UMC Utrecht ECoG research team (Elmar Pels, Mariana Branco) for data collection.

References (77)

  • L.M. McCane et al.

    P300-based brain-computer interface (BCI) event-related potentials (ERPs): people with amyotrophic lateral sclerosis (ALS) vs. Age-Matched controls

    Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol.

    (2015)
  • R.C. Miall et al.

    Forward models for physiological motor control

    Neural Netw. Four Major Hypotheses Neurosci.

    (1996)
  • X. Pei et al.

    Spatiotemporal dynamics of electrocorticographic high gamma activity during overt and covert word repetition

    NeuroImage

    (2011)
  • G. Pfurtscheller et al.

    Motor imagery activates primary sensorimotor area in humans

    Neurosci. Lett.

    (1997)
  • J.R. Polimeni et al.

    Laminar analysis of 7 T BOLD using an imposed spatial activation pattern in human V1

    NeuroImage

    (2010)
  • J.C.W. Siero et al.

    BOLD matches neuronal activity at the mm scale: a combined 7T fMRI and ECoG study in human sensorimotor cortex

    NeuroImage

    (2014)
  • J.R. Wolpaw et al.

    Brain-computer interfaces for communication and control

    Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol.

    (2002)
  • P. Andersson et al.

    Real-time decoding of brain responses to visuospatial attention using 7T fMRI

    PloS One

    (2011)
  • P. Andersson et al.

    Navigation of a telepresence robot via covert visuospatial attention and real-time fMRI

    Brain Topogr.

    (2013)
  • T. Blakely et al.

    Localization and classification of phonemes using high spatial resolution electrocorticography (ECoG) grids

    Conf. Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. IEEE Eng. Med. Biol. Soc. Annu. Conf.

    (2008)
  • M.G. Bleichner et al.

    Give me a sign: decoding four complex hand gestures based on high-density ECoG

    Brain Struct. Funct.

    (2016)
  • M.G. Bleichner et al.

    Classification of mouth movements using 7 T fMRI

    J. Neural Eng.

    (2015)
  • K.E. Bouchard et al.

    Neural decoding of spoken vowels from human sensory-motor cortex with high-density electrocorticography

  • K.E. Bouchard et al.

    Functional organization of human sensorimotor cortex for speech articulation

    Nature

    (2013)
  • M.P. Branco et al.

    Decoding hand gestures from primary somatosensory cortex using high-density ECoG

    NeuroImage

    (2016)
  • J.A. Brefczynski et al.

    A physiological correlate of the “spotlight” of visual attention

    Nat. Neurosci.

    (1999)
  • J.S. Brumberg et al.

    Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex

    Front. Neurosci.

    (2011)
  • C. Brunner et al.

    BNCI Horizon 2020: towards a roadmap for the BCI community

    Brain-Comput. Interfaces

    (2015)
  • M.-A. Bruno et al.

    A survey on self-assessed well-being in a cohort of chronic locked-in syndrome patients: happy majority, miserable minority

    BMJ Open

    (2011)
  • L.C.M. Bruurmijn et al.

    Preservation of hand movement representation in the sensorimotor areas of amputees

    Brain

    (2017)
  • U. Chaudhary et al.

    Brain-computer interfaces for communication and rehabilitation

    Nat. Rev. Neurol.

    (2016)
  • N.E. Crone et al.

    Electrocorticographic gamma activity during word production in spoken and sign language

    Neurology

    (2001)
  • N.E. Crone et al.

    Functional mapping of human sensorimotor cortex with electrocorticographic spectral analysis. II. Event-related synchronization in the gamma band

    Brain J. Neurol.

    (1998)
  • R. Fazel-Rezai et al.

    P300 brain computer interface: current challenges and emerging trends

    Front. Neuroeng.

    (2012)
  • E. Formisano et al.

    “Who” is saying “what”? Brain-based decoding of human voice and speech

    Science

    (2008)
  • G. Gallegos-Ayala et al.

    Brain communication in a completely locked-in patient using bedside near-infrared spectroscopy

    Neurology

    (2014)
  • F.H. Guenther et al.

    A wireless brain-machine interface for real-time speech synthesis

    PLoS One

    (2009)
  • B.M. Harvey et al.

    Topographic representation of numerosity in the human parietal cortex

    Science

    (2013)
  • Cited by (79)

    • Advances in human intracranial electroencephalography research, guidelines and good practices

      2022, NeuroImage
      Citation Excerpt :

      This style of work is particularly suited for hypothesis-driven iEEG research, considering the time it can take to record data from a sufficiently large group of patients. Because of the sparse spatial sampling combined with the high specificity and the high sensitivity, iEEG data is more prone to be explored at the individual participant level (c.f., the seminal work by (Penfield and Rasmussen, 1950)). Even if only a few electrodes are localized consistently in a handful of iEEG participants, their investigation can nevertheless result in serendipitous findings.

    • Neural decoding of speech with semantic-based classification

      2022, Cortex
      Citation Excerpt :

      Following word-level processing, the phonological information is then prepared for articulation. Studies on the neural decoding of speech (Anumanchipalli et al., 2019; Dash et al., 2020, 2021; Herff et al., 2015; Martin et al., 2014; Ramsey et al., 2018; Chakrabarti et al., 2015) have mostly focused on mapping the neural representations of the later part of the language production process. For instance, in Herff et al. (2015), it is shown that the recorded electrocorticography (ECoG) signals produced while participants were speaking specific phrases can be successfully mapped to phonemes to predict language production.

    View all citing articles on Scopus
    View full text