Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids
Introduction
To function in life, it is critical to be able to communicate. Spoken and written language, as well as non-verbal expressions, allow people to interact socially. Expression of language in particular is crucial for communication of ones needs, ideas and opinions. People who are completely unable to express themselves are essentially excluded from society at every level (Bruno et al., 2011, Laureys et al., 2005, Rousseau et al., 2015). Although their numbers may be small, their predicament warrants research into ways to restore communication abilities (Chaudhary et al., 2016, Wolpaw et al., 2002). Disorders leading to severe communication disability include afflictions leading to total paralysis resulting from trauma, stroke and neurodegenerative diseases (Locked-In Syndrome)(Lulé et al., 2009), and loss of muscle coordination due to trauma or developmental disorders such as Cerebral Palsy. When some muscle control is preserved (however minimal), Assistive Technologies (AT) are available to maximally utilize intentional movements. When no control is preserved, there are no technologies available to meet the patients’ need for communication. In recent years attempts to achieve communication by means of a Brain-Computer Interface have increased, leading to promising avenues (Farwell and Donchin, 1988, Gallegos-Ayala et al., 2014, Kennedy and Bakay, 1998, McCane et al., 2015, Sellers et al., 2010, Sellers et al., 2014) but not yet to standard treatment for communication loss. Recently, however, a first case was presented where a Locked-In, late-stage ALS patient could successfully use a Brain-Computer Interface to communicate in daily life without requirement for presence of an expert (Vansteensel et al., 2016). The system was fully implanted, and allowed the patient to generate signals, obtained from electrodes directly on the motor cortex, to select items in spelling software. Non-invasive BCI solutions, using scalp EEG and the ‘P300 speller’, have also resulted in encouraging results (Farwell and Donchin, 1988, Kleih et al., 2011, McCane et al., 2015, Sellers et al., 2010), but these require considerable skill from caregivers to attach the scalp electrodes and initiate the system. The systems that currently work in select patients provide a coarse, but reliable, means to communicate, and do so by decoding specific events from the brain. They are, however, a far cry from restoring communication to a level where the user can interact with others in real-time. Nevertheless, a first step has been made on the road to restoring communication by extracting information from the cerebral cortex, encouraging further development.
Application of decoding algorithms, if conducted appropriately, can also reveal the mechanism by which the human brain translates neuronal activity to perceptions and actions (Brunner et al., 2015, Sadtler et al., 2014). As such, the fact that many of the associated cortical regions exhibit a topographical representation encourages the notion that different percepts or actions are associated with different topographical distributions of activity. This has been investigated notably in primary cortices (V1, A1, S1 and, to a lesser degree, M1), and has yielded successful identification of stimulus features by means of classifying the stimulus-induced cortical activity patterns (Bleichner et al., 2016, Branco et al., 2016, Formisano et al., 2008, Kay et al., 2008, Polimeni et al., 2010). The fact that cortical activity patterns map onto specific stimulus features supports the notion that topography reflects an orderly distribution of specific functions along the cortex, with each function being associated with one or more specific neuronal ensembles (or cortical columns) (Hubel and Wiesel, 1959, Markram, 2008, Mountcastle, 1997). Although such ensembles can be modulated in terms of response amplitude by selective attention and/or predictive mechanisms (Andersson et al., 2013, Brefczynski and DeYoe, 1999, Miall and Wolpert, 1996), and can be subject to an attention-driven shift in the exact mapping onto sensory space (Klein et al., 2014), the fact that activity patterns identify stimulus features reproducibly and robustly, indicates a certain degree of segregation of neuronal ensembles and the sensory space they code for.
Several approaches have been adopted in attempts to decode cortical activity to restore a means of communication. For EEG signals, detection of brain states has been utilized to select icons on a computer screen, by identifying a specific sensory input sequence emanating from that particular icon (visual or auditory pulse sequences which differ for each icon)(Fazel-Rezai et al., 2012). The recorded neural response to the sequence (which constitutes an amplified representation thereof) reveals which icon the person is attending to. Decoding is then tightly coupled to deliberate sensory input. Decoding internally generated actions is currently best feasible from sensorimotor cortex. With EEG the decline in amplitude of the mu rhythm (8–12 Hz, event-related desynchronization) that accompanies attempted or actual movement (McFarland et al., 2000, Pfurtscheller and Neuper, 1997), can be used also as a brain-state detector of an intentional act. Detection is here often translated to selection of an icon during a sequential icon scanning scheme (‘switch scanning’) or a unidirectional cursor movement. Neither EEG method is of much use for exploiting the fine topographical organization of the cortex. With intracranial EEG, or electrocorticography (ECoG), topographical patterns can be probed (Crone et al., 1998, Jacobs and Kahana, 2010, Miller et al., 2012). ECoG decoding approaches utilize the distribution of functionally coherent regions as is the case in the motor cortex (Bleichner et al., 2016, Bouchard and Chang, 2014, Miller et al., 2009, Schalk and Leuthardt, 2011) or visual cortex (Andersson et al., 2011). Language regions and networks may not provide adequate points of reference for decoding elements of speech since they do not exhibit a coherent topographical map (Kellis et al., 2010, Pei et al., 2011b), as seems to be the case for associative cortex in general (although some topography has been reported such as in (Harvey et al., 2013). Decoding (attempted) language production, however, is not constrained to language regions. The final stage of language production heavily depends on the sensorimotor cortex, which generates the motor commands for speaking and, for that matter, sign language (Bleichner et al., 2016, Bleichner et al., 2015, Crone et al., 2001). Given that both motor (Bleichner et al., 2016, Kellis et al., 2010, Siero et al., 2014) and somatosensory cortex (Branco et al., 2016, Sanchez-Panchuelo et al., 2012) exhibit quite detailed topographies, and that speaking involves rapid sequential patterns of muscle contractions in the face and vocal tract, the sensorimotor cortex should conceptually provide rich and coherent spatial and temporal information about what a person wants to say (Bouchard et al., 2013). Interestingly, and crucial for BCI research, research has shown that the sensorimotor activity patterns that are generated by complex hand gestures (representing letters of the American sign language alphabet for deaf people), are also generated by attempts to make these gestures in arm amputees (Lotze et al., 2001, Raffin et al., 2012, Roux et al., 2003, Bruurmijn et al., 2017). This finding suggests that actual and attempted motor acts may yield equally decodable cortical information, and that therefore research on cortical representations of speech is directly relevant for application in BCI technology for paralyzed people.
In this study, we tested the hypothesis that even the smallest elements of speech, phonemes, should provide decodable information from sensorimotor cortex for classification. This hypothesis relies on two assumptions. First, the cortical topographical representation of speech utterances such as phonemes maps onto the constellation of muscles or muscle groups that is required to produce the sound. Second, since speech involves rapid sequential schemes of muscle contractions even for phonemes, the contribution of time in the decoding algorithms should provide a significant contribution to phoneme classification (Bouchard et al., 2013, Jiang et al., 2016).
We report on a study on decoding of phoneme production from sensorimotor cortex in five patients implanted with high-density electrocorticography (ECoG) electrode grids. All patients had grids implanted for source localization of their seizures for subsequent surgical treatment of medically intractable epilepsy. In three patients, these grids were part of the clinical grid implantation plan, and in two patients the grid was placed as an addition to the clinical plan, for research purposes. All procedures were approved by the Medical Ethical Board of the hospital, and were in accordance with the Declaration of Helsinki of 2013. The ECoG grids over the sensorimotor face area had a high density of electrodes (3–4 mm center to center), allowing for detailed investigation of topographical representation of phoneme production. For decoding we focused on high-frequency broadband signal power (HFB, 65–125 Hz) (Crone et al., 1998) since this feature of the electrophysiological signal contains the most detailed and neuronal firing rate-related information (Bleichner et al., 2016, Miller et al., 2009, Siero et al., 2014). It is thought to most accurately reflect activity of neuronal ensembles, compared to other signal features (Manning et al., 2009, Miller et al., 2009, Ray and Maunsell, 2011). The density has been shown to produce independent signals between adjacent electrodes for the HFB and thus provide rich information about underlying cortical topography (Muller et al., 2016, Siero et al., 2014).
Section snippets
Subjects & data acquisition
ECoG signal was collected from five intractable epilepsy patients (Table 1) who had grids implanted subdurally over the inferior sensorimotor cortex on their right (subjects R1 and R2) or left (subjects L1, L2, and L3) hemisphere (depending on the probable location of the source of seizures). We refer to these grids as high density (HD) ECoG grids due to their high electrode density (3–4 mm center-to-center). Grids were obtained from Ad-tech Medical and PMT Corporation. Electrodes had an
Spoken phoneme ECoG classification
The main finding of our analysis was that the 5 classes (4 spoken phoneme classes plus rest) could be classified with STMF analysis, with a mean accuracy of 75.5% (sd 6.5%), at a mean empirically determined chance level of 26.4% (Table 2, Fig. 4. Given that one condition may dominate the classification of rest versus active, we also calculated classification scores for phonemes combined versus rest, and for the 4 phonemes without rest (Table 2). This revealed that active versus rest trials
Discussion
We addressed the hypothesis that elementary components of speech production, phonemes, engage the sensorimotor cortex in a decodable fashion. To this end, we conducted research in epilepsy patients with implanted HD electrode grids placed on the sensorimotor face area, and asked them to perform a phoneme production task. The cortical spatiotemporal activity patterns generated during this task proved to be highly reproducible and phoneme-specific, as evidenced by a high 5-class classification
Conclusion
A set of four phonemes could be classified with an accuracy that encourages further research on decoding speech from neuronal spatiotemporal activity patterns. The findings support and build upon reports that high-density grids on sensorimotor cortex improve decoding, and that inclusion of the finegrained temporal evolution of brain signals captures the rapid sequence of articulatory muscle groups employed in phoneme production. Whether these findings translate to decoding of attempted speech
Acknowledgements
This research was funded by the ERC-Advanced ‘iConnect’ project (grant 320708). We thank Frans Leijten, Cyrille Ferrier, Geertjan Huiskamp, and Tineke Gebbink for their help in collecting data, Peter Gosselaar and Peter van Rijen for implanting the electrodes, as well as the technicians, the staff of the clinical neurophysiology department and the subjects for their time and effort. We also thank the members of the UMC Utrecht ECoG research team (Elmar Pels, Mariana Branco) for data collection.
References (77)
- et al.
Brain–computer interfaces for speech communication
Speech Commun. Silent. Speech Interfaces
(2010) Fourier-, Hilbert- and wavelet-based signal analysis: are they really different approaches?
J. Neurosci. Methods
(2004)- et al.
Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials
Electroencephalogr. Clin. Neurophysiol.
(1988) - et al.
Automated electrocorticographic electrode localization on individually rendered brain surfaces
J. Neurosci. Methods
(2010) - et al.
Direct brain recordings fuel advances in cognitive electrophysiology
Trends Cogn. Sci.
(2010) - et al.
Multi-scale analysis of neural activity in humans: implications for micro-scale electrocorticography
Clin. Neurophysiol.
(2016) - et al.
Out of the frying pan into the fire–the P300-based BCI faces real-world challenges
Prog. Brain Res.
(2011) - et al.
Attraction of position preference by spatial attention throughout human visual cortex
Neuron
(2014) - et al.
Life can be worth living in locked-in syndrome
Prog. Brain Res.
(2009) - et al.
Nonparametric statistical testing of EEG- and MEG-data
J. Neurosci. Methods
(2007)
P300-based brain-computer interface (BCI) event-related potentials (ERPs): people with amyotrophic lateral sclerosis (ALS) vs. Age-Matched controls
Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol.
Forward models for physiological motor control
Neural Netw. Four Major Hypotheses Neurosci.
Spatiotemporal dynamics of electrocorticographic high gamma activity during overt and covert word repetition
NeuroImage
Motor imagery activates primary sensorimotor area in humans
Neurosci. Lett.
Laminar analysis of 7 T BOLD using an imposed spatial activation pattern in human V1
NeuroImage
BOLD matches neuronal activity at the mm scale: a combined 7T fMRI and ECoG study in human sensorimotor cortex
NeuroImage
Brain-computer interfaces for communication and control
Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol.
Real-time decoding of brain responses to visuospatial attention using 7T fMRI
PloS One
Navigation of a telepresence robot via covert visuospatial attention and real-time fMRI
Brain Topogr.
Localization and classification of phonemes using high spatial resolution electrocorticography (ECoG) grids
Conf. Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. IEEE Eng. Med. Biol. Soc. Annu. Conf.
Give me a sign: decoding four complex hand gestures based on high-density ECoG
Brain Struct. Funct.
Classification of mouth movements using 7 T fMRI
J. Neural Eng.
Neural decoding of spoken vowels from human sensory-motor cortex with high-density electrocorticography
Functional organization of human sensorimotor cortex for speech articulation
Nature
Decoding hand gestures from primary somatosensory cortex using high-density ECoG
NeuroImage
A physiological correlate of the “spotlight” of visual attention
Nat. Neurosci.
Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex
Front. Neurosci.
BNCI Horizon 2020: towards a roadmap for the BCI community
Brain-Comput. Interfaces
A survey on self-assessed well-being in a cohort of chronic locked-in syndrome patients: happy majority, miserable minority
BMJ Open
Preservation of hand movement representation in the sensorimotor areas of amputees
Brain
Brain-computer interfaces for communication and rehabilitation
Nat. Rev. Neurol.
Electrocorticographic gamma activity during word production in spoken and sign language
Neurology
Functional mapping of human sensorimotor cortex with electrocorticographic spectral analysis. II. Event-related synchronization in the gamma band
Brain J. Neurol.
P300 brain computer interface: current challenges and emerging trends
Front. Neuroeng.
“Who” is saying “what”? Brain-based decoding of human voice and speech
Science
Brain communication in a completely locked-in patient using bedside near-infrared spectroscopy
Neurology
A wireless brain-machine interface for real-time speech synthesis
PLoS One
Topographic representation of numerosity in the human parietal cortex
Science
Cited by (79)
Artificial intelligence based multimodal language decoding from brain activity: A review
2023, Brain Research BulletinAdvances in human intracranial electroencephalography research, guidelines and good practices
2022, NeuroImageCitation Excerpt :This style of work is particularly suited for hypothesis-driven iEEG research, considering the time it can take to record data from a sufficiently large group of patients. Because of the sparse spatial sampling combined with the high specificity and the high sensitivity, iEEG data is more prone to be explored at the individual participant level (c.f., the seminal work by (Penfield and Rasmussen, 1950)). Even if only a few electrodes are localized consistently in a handful of iEEG participants, their investigation can nevertheless result in serendipitous findings.
Opportunities, pitfalls and trade-offs in designing protocols for measuring the neural correlates of speech
2022, Neuroscience and Biobehavioral ReviewsNeural decoding of speech with semantic-based classification
2022, CortexCitation Excerpt :Following word-level processing, the phonological information is then prepared for articulation. Studies on the neural decoding of speech (Anumanchipalli et al., 2019; Dash et al., 2020, 2021; Herff et al., 2015; Martin et al., 2014; Ramsey et al., 2018; Chakrabarti et al., 2015) have mostly focused on mapping the neural representations of the later part of the language production process. For instance, in Herff et al. (2015), it is shown that the recorded electrocorticography (ECoG) signals produced while participants were speaking specific phrases can be successfully mapped to phonemes to predict language production.