Phonotactic cues for segmentation of fluent speech by infants

doi:10.1016/S0010-0277(00)00109-8

Cognition

Volume 78, Issue 2, February 2001, Pages 91-121

https://doi.org/10.1016/S0010-0277(00)00109-8 Get rights and content

Abstract

There is growing evidence that infants become sensitive to the probabilistic phonotactics of their ambient language sometime during the second half of their first year. The present study investigates whether 9-month-olds make use of phonotactic cues to segment words from fluent speech. Using the Headturn Preference Procedure, we found that infants listened to a CVC stimulus longer when the stimulus previously appeared in a sentential context with good phonotactic cues than when it appeared in one without such cues. The goodness of the phonotactic cues was estimated from the frequency with which the $C·C$ clusters at the onset and offset of a CVC test stimulus (i.e. $C· C$ V $C ·C$ ) are found within and between words in child-directed speech, with high between-word probability associated with good cues to word boundaries. A similar segmentation result emerged when good phonotactic cues occurred only at the onset (i.e. $C· C$ VC·C) or the offset (i.e. C·CV $C ·C$ ) of the target words in the utterances. Together, the results suggest that 9-month-olds use probabilistic phonotactics to segment speech into words and that high-probability between-word clusters are interpreted as both word onsets and word offsets.

Introduction

Much speech input to which infants are exposed consists of continuous strings of sounds (Aslin, 1993, van de Weijer, 1998, Woodward and Aslin, 1990). Therefore, a crucial challenge for language learners is to divide the speech stream correctly into small storable chunks, namely, words. There is growing evidence that infants are remarkably proficient at exploiting regularities in the sound patterns of their native language to guide this segmentation process. Some of the regularities that have been investigated so far concern the typical stress pattern of words (i.e. prosody),¹ the phonetic realization of phonemes as a function of their position in words (i.e. allophonic variations),² the probability of speech units' contiguous occurrence (i.e. distributional regularities),³ and the probability of the contiguous occurrence of particular sequences of phonemes (i.e. phonotactics).⁴ The present paper focuses on segmentation based on sensitivity to phonotactic cues.

Sometime during the latter half of their first year, infants develop sensitivity to how stress, phones, and phonemes typically pattern within words. For instance, American 9-month-olds show a preference for spoken stimuli that conform to the dominant strong–weak prosodic pattern of bisyllabic English words (e.g. Jusczyk, Cutler, & Redanz, 1993). They also seem to rely on such prosodic regularity to parse the input. Jusczyk, Houston, and Newsome (1999) observed that 7.5-month-olds could spot a familiarized strong–weak word like ‘doctor’ in a fluent speech passage, but they did not detect a comparable weak–strong word like ‘guitar’ in passages after familiarization with it. Critically though, they tended to segment strong syllables plus consistently recurring weak syllables as units when they occurred together in passages. For instance, they responded to ‘taris’ when the strong syllable of ‘guitar’ was consistently followed by ‘is’ in a passage. Together, these findings suggest that in its earliest stages English-learners' speech segmentation is based on trochaic footing.

Some allophonic variations also correlate with word boundaries (Bolinger and Gerstman, 1957, Church, 1987, Lehiste, 1960, Umeda and Coker, 1974). For example, the phoneme /t/ is aspirated when it begins a word (e.g. ‘top’ [t^hop]) but not when it appears in non-initial positions (e.g. ‘stop’ [stop]). Two-month-old infants discriminate such allophonic differences, e.g. the allophonic variants of /t/ and /r/ in ‘night rates’ (Hohne & Jusczyk, 1994). Moreover, Jusczyk, Hohne, and Baumann (1999) found that sensitivity to allophonic variations contributes to the parsing of longer passages by 10.5 months of age. Infants familiarized with either ‘nitrates’ or ‘night rates’ showed a preference for a subsequent passage containing the appropriate version, an indication that they are sensitive to how allophonic distinctions typically align with word boundaries.

Yet, prosodic regularities and allophonic variations do not constitute absolute word boundary cues in English. Lexical stress is predominantly word-initial but many words bear stress on other syllables. Likewise, only a limited number of allophonic variations contribute to the computation of word boundaries. Thus, speech segmentation is essentially a heuristic process whose chances of success increase when cues are combined (e.g. Christiansen et al., 1998, Jusczyk, 1999, Mattys et al., 1999, Morgan, 1996, Morgan and Saffran, 1995, Saffran et al., 1996). Infants are proficient at detecting a variety of regularities in the speech signal even when no explicit speech cues to word boundaries are present. For example, 8-month-old infants exposed to a continuous stream of concatenated CV syllables notice regularities in the syllables' arrangement after only 2 min of exposure (Saffran, Aslin, & Newport, 1996). They discriminate between test strings of high transitional-probability syllables (i.e. syllables that always occur contiguously during familiarization) and strings of low transitional-probability syllables (i.e. syllables that occur in a less systematic order). This finding suggests that infants can use the transitional probabilities between adjacent syllables as an indicator of word boundaries, with low probabilities associated with boundaries between words.

A type of distributional regularity that has recently received a great deal of attention is probabilistic phonotactics (e.g. Gimson, 1980). Probabilistic phonotactics concern the frequency with which phonemes tend to occur next to each other in natural speech sequences, ranging from never to very frequently. For instance, in English, a sequence like [zt] is never found inside of words; it is phonotactically ‘illegal’. A sequence like [sd], though rare, is occasionally found within words (e.g. ‘disdain’) whereas [st] is very frequent (e.g. ‘stop’, ‘listing’, ‘best’, etc.). Adult speakers are sensitive to such phonotactic regularities (e.g. Massaro and Cohen, 1983, Pitt and McQueen, 1998); they respond to words or non-words faster if these consist of high- rather than low-frequency sequences of phonemes (Auer, 1993, Vitevitch and Luce, 1999, Vitevitch et al., 1997). Adolescents and young children show similar patterns of phonotactic sensitivity (Brown and Hildum, 1956, Messer, 1967, Pertz and Bever, 1975). Three- to 4-year-old children judge nonsense words made of high-frequency phonemic sequences as being more likely words than matched strings containing rare (but legal) sequences. They also pronounce the former more accurately than the latter (Messer, 1967).

The origin of phonotactic sensitivity can be found in infancy. By 9 months, infants have accumulated enough information about words to exhibit a preference for phonotactically well-formed speech strings. Jusczyk, Friederici, Wessels, Svenkerud, and Jusczyk (1993) observed that 9-month-old American infants listened longer to a list of words with phonemic sequences legal in English but illegal in Dutch than to words with sequences legal in Dutch but illegal in English. Dutch infants showed the opposite pattern of preference. By comparison, 6-month-olds listened equally to words with legal or illegal phonotactic sequences. The same age breakdown holds when non-words are opposed not on their components' phonotactic legality but on their phonotactic probability. Jusczyk, Luce, and Charles-Luce (1994) found that 9-month-olds, but not 6-month-olds, listened significantly longer to monosyllabic non-words containing high-probability phonotactic sequences (e.g. ‘chun’) than to ones containing low-probability phonotactic sequences (e.g. ‘yush’).

The goal of the present study is to examine if infants can exploit their sensitivity to probabilistic phonotactics to segment words from fluent speech (see McQueen, 1998, for evidence that adult listeners do). To use such information in word segmentation, in addition to responding to phonotactic well-formedness, infants must be sensitive to how sequences of phonemes usually align with word boundaries (see Church, 1987, for an early proposal about the computational efficiency of phonotactics in word segmentation). Partially supporting this possibility, Friederici and Wessels (1993) observed that Dutch 9-month-old infants preferred monosyllables with a cluster of consonants in a permissible position than matched stimuli with a cluster in an impermissible position. For instance, in Dutch, [br] is a typical word onset cluster, but is impermissible at word offset, whereas [rt] is typical at word offset but is impermissible at word onset. Nine-month-olds, but not 4.5- or 6-month-olds, listened longer to monosyllables with the test clusters in permissible positions (e.g. [ $br$ ef] or [mu $rt$ ]) than to the ones with the same clusters in impermissible positions (e.g. [fe $br$ ] or [ $rt$ um]). Thus, by 9 months, infants not only discriminate legal/frequent from illegal/rare phonemic patterns but also take into account the positions of such sequences within words.

More directly, Mattys et al. (1999) tested 9-month-olds' sensitivity to how CC consonant clusters are typically distributed with respect to word boundaries. They devised two types of lists of bisyllabic CVC·CVC non-words (a dot indicates a syllabic boundary). The stimuli in one list type (the ‘within-word’ stimuli) contained a C·C cluster that is frequently found within English words but infrequently found across words in fluent speech (e.g. ‘mo $ft$ uth’, [mo $f·t$ ǂθ]). The other list type (the ‘between-word’ stimuli) was composed of matched non-words containing a minimally changed C·C cluster with the opposite phonotactic pattern: the cluster was frequent across words and infrequent within words (e.g. ‘mo $fh$ uth’, [mo $f·h$ ǂθ]). The within- and between-word frequencies of the test clusters were obtained from the Bernstein (1982) corpus of child-directed speech. The stimuli, which differed only in the relative frequency of their C·C clusters, yielded several notable findings. Nine-month-olds listened longer to the within-word stimuli than the between-word stimuli when the stimuli were stressed on the first syllable. However, they listened longer to the between-word stimuli when (a) the stimuli were stressed on the second syllable or (b) the stimuli were stressed on the first syllable and included a 500-ms silent pause between the two syllables. These results were interpreted as evidence that 9-month-olds are sensitive to how sequences of phonemes typically align with word boundaries and to how this phonotactic sensitivity relates to the tendency to perceive stress as word-initial (Jusczyk, Houston, & Newsome, 1999). Nevertheless, although these results show that 9-month-olds' perceptual preferences are consistent with speech segmentation based on phonotactic regularities, they do not directly demonstrate that infants actually use this sensitivity to segment words from fluent speech. Accordingly, the present study was designed to explore whether English-learning 9-month-olds use phonotactic cues in on-line word segmentation.

Section snippets

Experiment 1

If infants rely on phonotactic regularities to segment speech into words, they should notice the presence of a word embedded in a fluent speech passage more easily if the phonotactic patterns at the word's edges set it apart from rather than blend it into the neighboring words. In this first experiment, 9-month-olds were familiarized with two passages. In one passage, a CVC target word occurred in contexts in which the surrounding words provided good phonotactic word boundary cues. In the other

Experiment 2

The hypothesis tested in Experiment 2 is that a biphone with low within-word probability and high between-word probability is an efficient word onset cue (Brent and Cartwright, 1996, Cairns et al., 1997, Gaygen and Luce, submitted, Mattys et al., 1999). Segmentation of a test word with an ‘only-onset’ phonotactic cue would suggest that phonotactic segmentation can proceed efficiently on a left-to-right basis, with phonotactic information used to interpret later-occurring information as speech

Experiment 3

In this experiment, we test the hypothesis that a cluster of consonants with low within-word probability and high between-word probability facilitates the extraction of the word preceding the cluster. This possibility is of particular interest because segmentation from offset would imply that some form of retroactive processing from the word's offset boundary is used to recover the test word. Such operation is, in essence, more complex than segmentation from the onset and it presumably implies

General discussion

The results of these three experiments provide strong evidence that 9-month-olds can use their sensitivity to probabilistic phonotactics (Mattys et al., 1999) to segment words from fluent speech. In all experiments, infants exhibited a preference for a word (or a non-word) that previously occurred in a fluent speech passage, provided that at least one boundary of that word was phonotactically cued. Phonotactic cues to word boundaries are based on the characteristics of the consonant–consonant

Acknowledgements

This work was supported by a research grant from NICHD (#15795) and a Research Scientist Award from NIMH (#01490) to P.W.J. We wish to thank Elizabeth Johnson and Ann Marie Jusczyk for their constructive comments on this manuscript.

References (66)

M.R. Brent et al.
Distributional regularity and phonotactic constraints are useful for segmentation
Cognition
(1996)
P. Cairns et al.
Bootstrapping word boundaries: a bottom-up corpus-based approach to speech segmentation
Cognitive Psychology
(1997)
K. Church
Phonological parsing and lexical retrieval
Cognition
(1987)
C.H. Echols et al.
Perception of rhythmic units in speech by infants and adults
Journal of Memory and Language
(1997)
A. Fernald
Four-month-old infants prefer to listen to motherese
Infant Behavior and Development
(1985)
P.W. Jusczyk
How infants begin to extract words from speech
Trends in Cognitive Science
(1999)
P.W. Jusczyk et al.
Infants' detection of sound patterns of words in fluent speech
Cognitive Psychology
(1995)
P.W. Jusczyk et al.
Infants' sensitivity to the sound patterns of native language words
Journal of Memory and Language
(1993)
P.W. Jusczyk et al.
Nine-month-olds' attention to sound similarities in syllables
Journal of Memory and Language
(1999)
P.W. Jusczyk et al.
The beginnings of word segmentation in English-learning infants
Cognitive Psychology
(1999)

R.N. Aslin

Segmentation of fluent speech into words: learning models and the role of maternal input

Aslin, R. N. (1999, April). Utterance-final bias in word recognition by 8-month-olds. Paper presented at the biennial...

Auer, E. T. (1993). Dynamic processing in spoken word recognition: the influence of paradigmatic and syntactic states....

Bernstein, N. (1982). Acoustic study of mothers' speech to language-learning children: an analysis of vowel...

D.L. Bolinger et al.

Disjuncture as a cue to constraints

Word

(1957)

R.W. Brown et al.

Expectancy and the perception of syllables

Language

(1956)

L. Burzio

Principles of English stress

(1994)

M.H. Christiansen et al.

Learning to segment speech using multiple cues: a connectionist model

Language and Cognitive Processes

(1998)

Cited by (225)

Abstract processing of syllabic structures in early infancy
2024, Cognition
Syllables are one of the fundamental building blocks of early language acquisition. From birth onwards, infants preferentially segment, process and represent the speech into syllable-sized units, raising the question of what type of computations infants are able to perform on these perceptual units. Syllables are abstract units structured in a way that allows grouping phonemes into sequences. The goal of this research was to investigate 4-to-5-month-old infants' ability to encode the internal structure of syllables, at a target age when the language system is not yet specialized on the sounds and the phonotactics of native languages. We conducted two experiments in which infants were first familiarized to lists of syllables implementing either CVC (consonant-vowel-consonant) or CCV (consonant-consonant-vowel) structures, then presented with new syllables implementing both structures at test. Experiments differ in the degree of phonological similarity between the materials used at familiarization and test. Results show that infants were able to differentiate syllabic structures at test, even when test syllables were implemented by combinations of phonemes that infants did not hear before. Only infants familiarized with CVC syllables discriminated the structures at test, pointing to a processing advantage for CVC over CCV structures. This research shows that, in addition to preferentially processing the speech into syllable-sized units, during the first months of life, infants are also capable of performing fine-grained computations within such units.
Consequences of phonological variation for algorithmic word segmentation
2023, Cognition
Over the first year, infants begin to learn the words of their language. Previous work suggests that certain statistical regularities in speech could help infants segment the speech stream into words, thereby forming a proto-lexicon that could support learning of the eventual vocabulary. However, computational models of word segmentation have typically been tested using language input that is much less variable than actual speech is. We show that using actual, transcribed pronunciations rather than dictionary pronunciations of the same speech leads to worse segmentation performance across models. We also find that phonologically variable input poses serious problems for lexicon building, because even correctly segmented word forms exhibit a complex, many-to-many relationship with speakers' intended words. Many phonologically distinct word forms were actually the same intended word, and many identical transcriptions came from different intended words. The fact that previous models appear to have substantially overestimated the utility of simple statistical heuristics suggests a need to consider the formation of the lexicon in infancy differently.
Same vowels but different contrasts: Mandarin listeners’ perception of English /ei/-/iː/ in unfamiliar phonotactic contexts
2023, Journal of Phonetics
The study presented here examines how adult L2 listeners’ L1 phonotactics interferes with L2 vowel perception in different consonantal contexts. We examined Mandarin listeners’ perception of the English /ei/-/iː/ vowel contrast in three onset consonantal contexts, /p f w/, which represent different phonotactic scenarios with respect to the permissibility of Mandarin phonology. L1 Mandarin listeners (N = 42) completed a series of three tasks: a categorisation task, a vowel identification task, and an AXB discrimination task. The results show that English /ei/-/iː/ are perceived as highly contrastive in the /p/ context because both /pei/ and /piː/ constitute a licit sequence in Mandarin phonology. However, participants experience substantial /ei/-/iː/ category confusion in the /f/ and /w/ contexts, where Mandarin listeners repair perceptually by modifying the vowel quality in illicit (unattested) consonant–vowel sequences, i.e., */fiː/ → /fei/ and */wiː/ → /wei/. Further exploratory analyses indicate that L2 listeners’ vowel perception in unfamiliar phonotactic contexts is associated with their target language experience, typically indicated by their L2 vocabulary size. The findings thus suggest that the acquisition of novel phonotactic regularities is tied to increased experience with the L2 lexicon.
Novel phonotactic learning by children and infants: Generalizing syllable-position but not co-occurrence regularities
2023, Journal of Experimental Child Psychology
Citation Excerpt :
Critically, it remains unclear whether children and/or infants spontaneously access syllable-position information while representing phonotactic constraints (see Zamuner & Kharlamov, 2016, for a review of the role of syllable structure in phonotactic learning). The current inquiry does not call into question previous findings regarding knowledge of natural language speech-sound patterns (e.g., Coady & Aslin, 2004; Friederici & Wessels, 1993; Mattys & Jusczyk, 2001; Mattys et al., 1999), the learning of novel sound patterns (e.g., Chambers et al., 2003, 2011; Graf Estes, Gluck, & Grimm, 2016; Seidl & Buckley, 2005; Seidl et al., 2009; Wang & Seidl, 2015), or the mapping of word forms to objects (e.g., Graf Estes et al., 2011; MacKenzie et al., 2012; Richtsmeier et al., 2009a, 2009b, 2011); rather, it aimed to consider the standard interpretation of these findings. Previous results have often been interpreted as showing that children and/or infants are sensitive to constraints on consonant combinations (e.g., Graf Estes et al., 2016) or consonants occurring in specific syllable positions (e.g., Chambers et al., 2003) when in fact it is possible that information about both sound combinations and syllable position contributed to the results.
Restrictions in the sequencing of sounds (phonotactic constraints) can be represented at the level of sound co-occurrences (e.g., in baF.Pev, F and P co-occur) and at the level of the syllable (e.g., F is syllable-coda/end, P is syllable-onset/start). Can children (5-year-olds) and infants (11-month-olds) represent constraints as sound co-occurrences and/or relative to syllable positions? Participants listened to artificial languages displaying both word-medial consonant restrictions in co-occurrence pairs (e.g., FP or DZ but not FZ) and in the position of consonants within syllables (e.g., P/Z onsets and D/F codas) in words like baF.Pev and tiD.Zek. Children responded similarly to novel words with the same (e.g., FP) versus different (e.g., FZ) co-occurrence pairs, but they were more misled (i.e., responded “heard it before”) by novel words with consonants in the same (e.g., onset-P) versus different (e.g., coda-P) syllable positions (Experiment 1). With the same training stimuli, infants had similar orientation times for novel words with the same versus different co-occurrence pairs, but they had longer orientation times for novel words with consonants in the same versus different syllable positions (Experiment 2). Thus, across different methods and ages, syllable-position information was more readily available for generalization than consonant co-occurrence information. The results suggest that when multiple regularities are present simultaneously, some phonotactic constraints (e.g., consonants in particular syllable positions) may be spontaneously represented and generalized by children and infants, whereas others (e.g., consonant co-occurrences) might not be available. The results contribute toward understanding how children and infants represent sound sequences.
Pupillary entrainment reveals individual differences in cue weighting in 9-month-old German-learning infants
2022, Cognition
Young infants can segment continuous speech with statistical as well as prosodic cues. Understanding how these cues interact can be informative about how infants solve the segmentation problem. Here we investigate how German-speaking adults and 9-month-old German-learning infants weigh statistical and prosodic cues when segmenting continuous speech. We measured participants' pupil size while they were familiarized with a continuous speech stream where prosodic cues were pitted off against transitional probabilities. Adult participants' changes in pupil size synchronized with the occurrence of prosodic words during the familiarization and the temporal alignment of these pupillary changes was predictive of adult participants' performance at test. Further, 9-month-olds as a group failed to consistently segment the familiarization stream with prosodic or statistical cues. However, the variability in temporal alignment of the pupillary changes at word frequency showed that prosodic and statistical cues compete for dominance when segmenting continuous speech. A follow-up language development questionnaire at 40 months of age suggested that infants who entrained to prosodic words performed better on a vocabulary task and those infants who relied more on statistical cues performed better on grammatical tasks. Together these results suggest that statistics and prosody may serve different roles in speech segmentation in infancy.
Infants' developing sensitivity to native language phonotactics: A meta-analysis
2022, Cognition
We used Bayesian modeling to aggregate experiments investigating infants' sensitivity to native language phonotactics. Our findings were based on data from 83 experiments on about 2000 infants learning 8 languages, tested using 4 different methods. Our results showed that, unlike with artificial languages, infants do exhibit sensitivity to native language phonotactic patterns in a lab setting. However, the exact developmental trajectory depends on the phonotactic pattern being tested. Before 8 months, infants tuned into non-local dependencies between vowels: specifically, vowel harmony. Between 8- and 10-months, infants demonstrated a consistent sensitivity to both local dependencies and non-local consonant dependencies. Sensitivity to non-local vowel dependencies that are not based on harmony emerged only after 10-months. These findings provide a benchmark for future experimental and computational research on the acquisition of phonotactics.

View all citing articles on Scopus

View full text

Phonotactic cues for segmentation of fluent speech by infants

Abstract

Introduction

Section snippets

Experiment 1

Experiment 2

Experiment 3

General discussion

Acknowledgements

Cognition

Cognitive Psychology

Cognition

Journal of Memory and Language

Infant Behavior and Development

Trends in Cognitive Science

Cognitive Psychology

Journal of Memory and Language

Journal of Memory and Language

Cognitive Psychology

Journal of Memory and Language

Infant Behavior and Development

Infant Behavior and Development

Cognitive Psychology

Cognitive Psychology

Cognitive Psychology

Journal of Memory and Language

Journal of Verbal Learning and Verbal Behavior

Journal of Memory and Language

Cognitive Psychology

Journal of Memory and Language

Journal of Memory and Language

Journal of Memory and Language

Journal of Phonetics

Journal of Memory and Language

Segmentation of fluent speech into words: learning models and the role of maternal input

Disjuncture as a cue to constraints

Word

Expectancy and the perception of syllables

Language

Principles of English stress

Learning to segment speech using multiple cues: a connectionist model

Language and Cognitive Processes