Elsevier

NeuroImage

Volume 32, Issue 2, 15 August 2006, Pages 821-841
NeuroImage

An fMRI investigation of syllable sequence production

https://doi.org/10.1016/j.neuroimage.2006.04.173Get rights and content

Abstract

Fluent speech comprises sequences that are composed from a finite alphabet of learned words, syllables, and phonemes. The sequencing of discrete motor behaviors has received much attention in the motor control literature, but relatively little has been focused directly on speech production. In this paper, we investigate the cortical and subcortical regions involved in organizing and enacting sequences of simple speech sounds. Sparse event-triggered functional magnetic resonance imaging (fMRI) was used to measure responses to preparation and overt production of non-lexical three-syllable utterances, parameterized by two factors: syllable complexity and sequence complexity. The comparison of overt production trials to preparation only trials revealed a network related to the initiation of a speech plan, control of the articulators, and to hearing one's own voice. This network included the primary motor and somatosensory cortices, auditory cortical areas, supplementary motor area (SMA), the precentral gyrus of the insula, and portions of the thalamus, basal ganglia, and cerebellum. Additional stimulus complexity led to increased engagement of the basic speech network and recruitment of additional areas known to be involved in sequencing non-speech motor acts. In particular, the left hemisphere inferior frontal sulcus and posterior parietal cortex, and bilateral regions at the junction of the anterior insula and frontal operculum, the SMA and pre-SMA, the basal ganglia, anterior thalamus, and the cerebellum showed increased activity for more complex stimuli. We hypothesize mechanistic roles for the extended speech production network in the organization and execution of sequences of speech sounds.

Introduction

Fluent speech requires a robust serial ordering mechanism to combine a finite set of discrete learned phonological units (such as phonemes or syllables) into larger meaningful expressions of words and sentences. Lashley (1951) posed the problem of serial order in behavior, asking how the brain organizes and executes smooth, temporally integrated behaviors such as speech and rhythmic motor control. His proposal for the “priming of expressive units,” or parallel, co-temporal activation of the items in a behavioral sequence prior to execution, has been supported in studies of speech production by bountiful data related to linguistic performance errors (e.g., MacKay, 1970, Fromkin, 1980, Gordon and Meyer, 1987), by reaction time experiments (e.g., Klapp, 2003), and by the demonstration of anticipatory and perseveratory co-articulation (e.g., Ohman, 1966; Hardcastle and Hewlett, 1999).

The problem of serial order in speech production can be considered at multiple levels. Phonemes, for example, might be manipulated to form syllables and words, where each phonemic token is learned and stored with corresponding auditory and/or orosensory consequences (see, for example, the DIVA model of speech production; Guenther, 1995, Guenther et al., 2006, which provides a computational account for how such tokens can be learned and produced). Data also suggest that syllable or word-sized tokens can be learned such that they may be efficiently executed as single motor chunks, forming a mental syllabary (Levelt and Wheeldon, 1994, Levelt et al., 1999, Cholin et al., 2006); these larger chunks might then serve as manipulable tokens for speech sequence planning.

In addition to organizing sequences of planned sounds within a memory buffer, speech production requires a mechanism to initiate or release items to the motor apparatus at precise times. Speakers can typically produce up to six to nine syllables (20 to 30 segments) per second, which is faster than any other form of discrete motor behavior (Kent, 2000). A system that coordinates the timed release of each discrete item in the planned sequence of speech is, therefore, of critical importance to fluent performance.

While the formulation of spoken language plans has been widely studied at a conceptual level (see, e.g., Levelt, 1989; Levelt et al., 1999), relatively little is known about the neural representations of those plans or about the cortical and subcortical machinery that guides the serial production of speech. Clinical studies have suggested that damage to the anterior insula or neighboring inferior frontal areas (Dronkers, 1996, Hillis et al., 2005, Tanji et al., 2001), supplementary motor area (Jonas, 1981, Jonas, 1987, Ziegler et al., 1997, Pai, 1999), basal ganglia (Pickett et al., 1998, Ho et al., 1998), or cerebellum (Riva, 1998, Silveri et al., 1998) may lead to deficits in sequencing and/or initiation of speech plans. Such deficits appear in various aphasias and apraxia of speech (AOS). Literal or phonemic paraphasias, in which “well-formed sounds or syllables are substituted or transposed in an otherwise recognizable target word” (Goodglass, 1993), exist in many aphasic patients including Broca's and (most commonly) conduction aphasics. AOS, a speech-motor condition1, has been attributed to damage to the left precentral gyrus of the insula (Dronkers, 1996), as well as the inferior frontal gyrus, subcortical structures, or posterior temporal/parietal regions (Hillis et al., 2005, Peach and Tonkovich, 2004, Duffy, 1995). Ziegler (2002) presents an excellent review of theoretical models of AOS.

Only a small portion of the large functional neuroimaging literature related to speech and language has dealt with overt speech production. Within that body, very few studies have explicitly addressed sequencing demands during overt speech. Riecker et al. (2000b) examined brain activations evoked by repetitive production of stimuli of varying complexity: consonant vowel syllables (CVs), CCCVs, CVCVCV non-word sequences, and CVCVCV words. This study found that production of none of the stimulus types (compared to a resting baseline condition) resulted in significant activations in the SMA or insula; activation was instead largely restricted to the primary sensorimotor areas. Only CCCV production led to significant activation of the cerebellum. Production of the multi-syllabic items led to a more limited and lateralized expanse of activation in the banks of the central sulcus than did production of single syllables.

Shuster and Lemieux (2005) compared production (both overt and covert) of multi-syllabic and mono-syllabic words following the presentation of an auditory exemplar. For overt speech, additional activation was found in the left inferior parietal lobe, inferior frontal gyrus, and precentral gyrus for multi-syllabic versus mono-syllabic words. Mono-syllabic words resulted in greater activation of the left middle frontal gyrus (BA46). The results for covert speech were somewhat dissimilar; for example, in covert speech, there was greater activation of the left middle frontal gyrus for multi-syllable words and greater activation in the left precentral gyrus for mono-syllable words. A consistent finding was that multi-syllable words caused additional activation in left inferior parietal areas (BA40), a region the authors suggest to be involved in speech programming. In comparing the results of this study to that of Riecker et al. (2000b), it is difficult to develop a consistent account for the effects of sequential complexity on the speech production system.

In the present experiment, we sought to clarify how the speech system organizes and produces sequences of speech sounds. While the DIVA model of speech production makes predictions about brain activations in the executive speech motor system (Guenther et al., 2006, Guenther, in press), it does not address brain regions likely to be responsible for sequence planning. Based on clinical observations and studies of other non-speech sequential motor control tasks, we expected to observe additional responses to additional stimulus complexity in a network of brain regions outside of the primary sensorimotor areas (and other regions treated by the DIVA model), including the prefrontal cortex, basal ganglia, anterior insula, supplementary motor area, and cerebellum. Blood oxygenation level-dependent (BOLD) functional magnetic resonance imaging (fMRI; see Ogawa et al., 1990, Belliveau et al., 1991, Kwong et al., 1992) was used to measure responses to speech sequences of varying complexity at both the sub- and suprasyllabic levels and in both preparatory and overt speech production tasks. We employed an “event-triggered” design with GO and NOGO trials that offered many benefits over previous methods (see Discussion). We discuss the results in terms of the necessary mechanisms for sequencing and initiation in fluent speech production.

Section snippets

Subjects

Thirteen right-handed native English speakers (ages 22–50 years, mean 28.7 years, six females) with no history of neurological, speech, language, or hearing disorders participated. Written informed consent was obtained according to the Boston University Institutional Review Board and the Massachusetts General Hospital Human Research Committee.

Experimental protocol

Tasks consisted of preparing to produce (NOGO trials) and overtly producing (GO trials) three syllable sequences. The linguistic content of the stimuli

Acoustic analysis

The mean acoustic duration and between-subject standard deviation (in ms) for utterances of each stimulus type were as follows: S_seq, S_syl: 993 ± 215; C_seq, S_syl: 1006 ± 186; S_seq, C_syl: 1195 ± 209; C_seq, C_syl: 1332 ± 155. The difference between S_seq, S_syl and C_seq, S_syl was not significant. All other pair-wise differences were significant (P < 0.05).

Basic speech production network

Production of each of the stimulus types was individually contrasted with the baseline condition. Group results showed regions of

Discussion

In this study, we sought to better understand the neural substrates for planning and producing sequences of simple speech sounds, a faculty that is ubiquitous in normal discourse. This topic has received relatively little attention in the neuroimaging literature to date, with most studies of language production focusing on aspects of word generation and production (reviewed in Indefrey and Levelt, 2000, Turkeltaub et al., 2002) or on other aspects of verbal output such as speaking rate (

Acknowledgments

This research was supported by the National Institute on Deafness and other Communication Disorders (R01 DC02852, F. Guenther PI). Imaging was performed at the Athinoula A. Martinos Center for Biomedical Imaging; this work was made possible by grants from the National Center for Research Resources (P41RR14075) and the MIND institute. The authors would like to thank Daniel Bullock, Satrajit Ghosh, Jason Tourville, Alfonso Nieto-Castanon, Julie Goodman, and Larry Wald for their assistance with

References (152)

  • A. Dale et al.

    Cortical surface-based analysis: I. Segmentation and surface reconstruction

    NeuroImage

    (1999)
  • M. D'Esposito et al.

    Functional MRI studies of spatial and nonspatial working memory

    Cogn. Brain Res.

    (1998)
  • G.I. de Zubicaray et al.

    Cerebral regions associated with verbal response initiation, suppression and strategy use

    Neuropsychologia

    (2000)
  • F.C. Donders

    Over de snelheid van psychische proessen (On the speed of mental processes)

    Acta Psychol.

    (1969)
  • J.A. Fiez

    Neuroimaging studies of speech: an overview of techniques and methodological approaches

    J. Commun. Disord.

    (2001)
  • B. Fischl et al.

    Cortical surface-based analysis: II. Inflation, flattening, and a surface-based coordinate system

    NeuroImage

    (1999)
  • K.J. Friston et al.

    Conjunction revisited

    NeuroImage

    (2005)
  • C.R. Genovese et al.

    Thresholding of statistical maps in functional neuroimaging using the false discovery rate

    NeuroImage

    (2002)
  • V.L. Gracco et al.

    Imaging speech production using fMRI

    NeuroImage

    (2005)
  • F.H. Guenther et al.

    Neural modeling and imaging of the cortical interactions underlying syllable production

    Brain Lang.

    (2006)
  • P. Gupta et al.

    Serial position effects in nonword repetition

    J. Mem. Lang.

    (2005)
  • M. Habib et al.

    Mutism and auditory agnosia due to bilateral insular damage — role of the insula in human communication

    Neuropsychologia

    (1995)
  • S. Hayasaka et al.

    Combining voxel intensity and cluster extent with permutation test framework

    NeuroImage

    (2004)
  • R.N. Henson et al.

    Recoding, storage, rehearsal and grouping in verbal short-term memory: an fMRI study

    Neuropsychologia

    (2000)
  • A.K. Ho et al.

    Sequence heterogeneity in parkinsonian speech

    Brain Lang.

    (1998)
  • S. Jonas

    The supplementary motor region and speech emission

    J. Commun. Disord.

    (1981)
  • U. Jürgens

    The efferent and efferent connections of the supplementary motor area

    Brain Res.

    (1984)
  • J.G. Kerns et al.

    Prefrontal cortex guides context-appropriate responding during language production

    Neuron

    (2004)
  • M.P. Kirschen et al.

    Load- and practice-dependent increases in cerebro-cerebellar activation in verbal working memory: an fMRI study

    NeuroImage

    (2005)
  • J.D. Kropotov et al.

    Selection of actions in the basal ganglia-thalamocortical circuits: review and model

    Int. J. Psychophysiol.

    (1999)
  • H.C. Leiner et al.

    Cognitive and language functions of the human cerebellum

    Trends Neurosci.

    (1993)
  • W.J. Levelt et al.

    Do speakers have access to a mental syllabary?

    Cognition

    (1994)
  • X. Lu et al.

    Anticipatory activity in primary motor cortex codes memorized movement sequences

    Neuron

    (2005)
  • D.G. MacKay

    Spoonerisms: the structure of errors in the serial order of speech

    Neuropsychologia

    (1970)
  • F.A. Middleton et al.

    Basal ganglia and cerebellar loops: motor and cognitive circuits

    Brain Res. Rev.

    (2000)
  • J.W. Mink

    The basal ganglia: focused selection and inhibition of competing motor programs

    Prog. Neurobiol.

    (1996)
  • J.W. Mink et al.

    Basal ganglia intrinsic circuits and their role in behavior

    Curr. Opin. Neurobiol.

    (1993)
  • K.G. Munhall

    Functional imaging during speech production

    Acta Psychol.

    (2001)
  • S. Abrahams et al.

    Functional magnetic resonance imaging of verbal fluency and confrontation naming using compressed image acquisition to permit overt responses

    Hum. Brain Mapp.

    (2003)
  • H. Ackermann et al.

    Speech rate and rhythm in cerebellar dysarthria: an acoustic analysis of syllabic timing

    Folia Phoniatr. Logop.

    (1994)
  • H. Ackermann et al.

    Speech deficits in ischaemic cerebellar lesions

    J. Neurol.

    (1992)
  • H. Ackermann et al.

    Temporal organization of ”internal speech” as a basis for cerebellar modulation of cognitive functions

    Behav. Cogn. Neurosci. Rev.

    (2004)
  • G.E. Alexander et al.

    Parallel organization of functionally segregated circuits linking basal ganglia and cortex

    Annu. Rev. Neurosci.

    (1986)
  • B.E. Averbeck et al.

    Parallel processing of serial movements in prefrontal cortex

    Proc. Natl. Acad. Sci.

    (2002)
  • B.B. Averbeck et al.

    Neural activity in prefrontal cortex during copying geometrical shapes: I. Single cells encode shape, sequence, and metric parameters

    Exp. Brain Res.

    (2003)
  • E. Awh et al.

    Dissociation of storage and rehearsal in verbal working memory

    Psychol. Sci.

    (1996)
  • A.D. Baddeley

    Working Memory

    (1986)
  • J.W. Belliveau et al.

    Functional mapping of the human visual cortex by magnetic resonance imaging

    Science

    (1991)
  • Y. Benjamini et al.

    Controlling the false discovery rate: a practical and powerful approach to multiple testing

    J. R. Stat. Soc., Ser. B Methodol.

    (1995)
  • R.M. Birn et al.

    Magnetic field changes in the human brain due to swallowing or speaking

    Magn. Reson. Med.

    (1998)
  • Cited by (403)

    View all citing articles on Scopus
    View full text