Phonotactic cues for segmentation of fluent speech by infants
Introduction
Much speech input to which infants are exposed consists of continuous strings of sounds (Aslin, 1993, van de Weijer, 1998, Woodward and Aslin, 1990). Therefore, a crucial challenge for language learners is to divide the speech stream correctly into small storable chunks, namely, words. There is growing evidence that infants are remarkably proficient at exploiting regularities in the sound patterns of their native language to guide this segmentation process. Some of the regularities that have been investigated so far concern the typical stress pattern of words (i.e. prosody),1 the phonetic realization of phonemes as a function of their position in words (i.e. allophonic variations),2 the probability of speech units' contiguous occurrence (i.e. distributional regularities),3 and the probability of the contiguous occurrence of particular sequences of phonemes (i.e. phonotactics).4 The present paper focuses on segmentation based on sensitivity to phonotactic cues.
Sometime during the latter half of their first year, infants develop sensitivity to how stress, phones, and phonemes typically pattern within words. For instance, American 9-month-olds show a preference for spoken stimuli that conform to the dominant strong–weak prosodic pattern of bisyllabic English words (e.g. Jusczyk, Cutler, & Redanz, 1993). They also seem to rely on such prosodic regularity to parse the input. Jusczyk, Houston, and Newsome (1999) observed that 7.5-month-olds could spot a familiarized strong–weak word like ‘doctor’ in a fluent speech passage, but they did not detect a comparable weak–strong word like ‘guitar’ in passages after familiarization with it. Critically though, they tended to segment strong syllables plus consistently recurring weak syllables as units when they occurred together in passages. For instance, they responded to ‘taris’ when the strong syllable of ‘guitar’ was consistently followed by ‘is’ in a passage. Together, these findings suggest that in its earliest stages English-learners' speech segmentation is based on trochaic footing.
Some allophonic variations also correlate with word boundaries (Bolinger and Gerstman, 1957, Church, 1987, Lehiste, 1960, Umeda and Coker, 1974). For example, the phoneme /t/ is aspirated when it begins a word (e.g. ‘top’ [thop]) but not when it appears in non-initial positions (e.g. ‘stop’ [stop]). Two-month-old infants discriminate such allophonic differences, e.g. the allophonic variants of /t/ and /r/ in ‘night rates’ (Hohne & Jusczyk, 1994). Moreover, Jusczyk, Hohne, and Baumann (1999) found that sensitivity to allophonic variations contributes to the parsing of longer passages by 10.5 months of age. Infants familiarized with either ‘nitrates’ or ‘night rates’ showed a preference for a subsequent passage containing the appropriate version, an indication that they are sensitive to how allophonic distinctions typically align with word boundaries.
Yet, prosodic regularities and allophonic variations do not constitute absolute word boundary cues in English. Lexical stress is predominantly word-initial but many words bear stress on other syllables. Likewise, only a limited number of allophonic variations contribute to the computation of word boundaries. Thus, speech segmentation is essentially a heuristic process whose chances of success increase when cues are combined (e.g. Christiansen et al., 1998, Jusczyk, 1999, Mattys et al., 1999, Morgan, 1996, Morgan and Saffran, 1995, Saffran et al., 1996). Infants are proficient at detecting a variety of regularities in the speech signal even when no explicit speech cues to word boundaries are present. For example, 8-month-old infants exposed to a continuous stream of concatenated CV syllables notice regularities in the syllables' arrangement after only 2 min of exposure (Saffran, Aslin, & Newport, 1996). They discriminate between test strings of high transitional-probability syllables (i.e. syllables that always occur contiguously during familiarization) and strings of low transitional-probability syllables (i.e. syllables that occur in a less systematic order). This finding suggests that infants can use the transitional probabilities between adjacent syllables as an indicator of word boundaries, with low probabilities associated with boundaries between words.
A type of distributional regularity that has recently received a great deal of attention is probabilistic phonotactics (e.g. Gimson, 1980). Probabilistic phonotactics concern the frequency with which phonemes tend to occur next to each other in natural speech sequences, ranging from never to very frequently. For instance, in English, a sequence like [zt] is never found inside of words; it is phonotactically ‘illegal’. A sequence like [sd], though rare, is occasionally found within words (e.g. ‘disdain’) whereas [st] is very frequent (e.g. ‘stop’, ‘listing’, ‘best’, etc.). Adult speakers are sensitive to such phonotactic regularities (e.g. Massaro and Cohen, 1983, Pitt and McQueen, 1998); they respond to words or non-words faster if these consist of high- rather than low-frequency sequences of phonemes (Auer, 1993, Vitevitch and Luce, 1999, Vitevitch et al., 1997). Adolescents and young children show similar patterns of phonotactic sensitivity (Brown and Hildum, 1956, Messer, 1967, Pertz and Bever, 1975). Three- to 4-year-old children judge nonsense words made of high-frequency phonemic sequences as being more likely words than matched strings containing rare (but legal) sequences. They also pronounce the former more accurately than the latter (Messer, 1967).
The origin of phonotactic sensitivity can be found in infancy. By 9 months, infants have accumulated enough information about words to exhibit a preference for phonotactically well-formed speech strings. Jusczyk, Friederici, Wessels, Svenkerud, and Jusczyk (1993) observed that 9-month-old American infants listened longer to a list of words with phonemic sequences legal in English but illegal in Dutch than to words with sequences legal in Dutch but illegal in English. Dutch infants showed the opposite pattern of preference. By comparison, 6-month-olds listened equally to words with legal or illegal phonotactic sequences. The same age breakdown holds when non-words are opposed not on their components' phonotactic legality but on their phonotactic probability. Jusczyk, Luce, and Charles-Luce (1994) found that 9-month-olds, but not 6-month-olds, listened significantly longer to monosyllabic non-words containing high-probability phonotactic sequences (e.g. ‘chun’) than to ones containing low-probability phonotactic sequences (e.g. ‘yush’).
The goal of the present study is to examine if infants can exploit their sensitivity to probabilistic phonotactics to segment words from fluent speech (see McQueen, 1998, for evidence that adult listeners do). To use such information in word segmentation, in addition to responding to phonotactic well-formedness, infants must be sensitive to how sequences of phonemes usually align with word boundaries (see Church, 1987, for an early proposal about the computational efficiency of phonotactics in word segmentation). Partially supporting this possibility, Friederici and Wessels (1993) observed that Dutch 9-month-old infants preferred monosyllables with a cluster of consonants in a permissible position than matched stimuli with a cluster in an impermissible position. For instance, in Dutch, [br] is a typical word onset cluster, but is impermissible at word offset, whereas [rt] is typical at word offset but is impermissible at word onset. Nine-month-olds, but not 4.5- or 6-month-olds, listened longer to monosyllables with the test clusters in permissible positions (e.g. [ef] or [mu]) than to the ones with the same clusters in impermissible positions (e.g. [fe] or [um]). Thus, by 9 months, infants not only discriminate legal/frequent from illegal/rare phonemic patterns but also take into account the positions of such sequences within words.
More directly, Mattys et al. (1999) tested 9-month-olds' sensitivity to how CC consonant clusters are typically distributed with respect to word boundaries. They devised two types of lists of bisyllabic CVC·CVC non-words (a dot indicates a syllabic boundary). The stimuli in one list type (the ‘within-word’ stimuli) contained a C·C cluster that is frequently found within English words but infrequently found across words in fluent speech (e.g. ‘mouth’, [moǂθ]). The other list type (the ‘between-word’ stimuli) was composed of matched non-words containing a minimally changed C·C cluster with the opposite phonotactic pattern: the cluster was frequent across words and infrequent within words (e.g. ‘mouth’, [moǂθ]). The within- and between-word frequencies of the test clusters were obtained from the Bernstein (1982) corpus of child-directed speech. The stimuli, which differed only in the relative frequency of their C·C clusters, yielded several notable findings. Nine-month-olds listened longer to the within-word stimuli than the between-word stimuli when the stimuli were stressed on the first syllable. However, they listened longer to the between-word stimuli when (a) the stimuli were stressed on the second syllable or (b) the stimuli were stressed on the first syllable and included a 500-ms silent pause between the two syllables. These results were interpreted as evidence that 9-month-olds are sensitive to how sequences of phonemes typically align with word boundaries and to how this phonotactic sensitivity relates to the tendency to perceive stress as word-initial (Jusczyk, Houston, & Newsome, 1999). Nevertheless, although these results show that 9-month-olds' perceptual preferences are consistent with speech segmentation based on phonotactic regularities, they do not directly demonstrate that infants actually use this sensitivity to segment words from fluent speech. Accordingly, the present study was designed to explore whether English-learning 9-month-olds use phonotactic cues in on-line word segmentation.
Section snippets
Experiment 1
If infants rely on phonotactic regularities to segment speech into words, they should notice the presence of a word embedded in a fluent speech passage more easily if the phonotactic patterns at the word's edges set it apart from rather than blend it into the neighboring words. In this first experiment, 9-month-olds were familiarized with two passages. In one passage, a CVC target word occurred in contexts in which the surrounding words provided good phonotactic word boundary cues. In the other
Experiment 2
The hypothesis tested in Experiment 2 is that a biphone with low within-word probability and high between-word probability is an efficient word onset cue (Brent and Cartwright, 1996, Cairns et al., 1997, Gaygen and Luce, submitted, Mattys et al., 1999). Segmentation of a test word with an ‘only-onset’ phonotactic cue would suggest that phonotactic segmentation can proceed efficiently on a left-to-right basis, with phonotactic information used to interpret later-occurring information as speech
Experiment 3
In this experiment, we test the hypothesis that a cluster of consonants with low within-word probability and high between-word probability facilitates the extraction of the word preceding the cluster. This possibility is of particular interest because segmentation from offset would imply that some form of retroactive processing from the word's offset boundary is used to recover the test word. Such operation is, in essence, more complex than segmentation from the onset and it presumably implies
General discussion
The results of these three experiments provide strong evidence that 9-month-olds can use their sensitivity to probabilistic phonotactics (Mattys et al., 1999) to segment words from fluent speech. In all experiments, infants exhibited a preference for a word (or a non-word) that previously occurred in a fluent speech passage, provided that at least one boundary of that word was phonotactically cued. Phonotactic cues to word boundaries are based on the characteristics of the consonant–consonant
Acknowledgements
This work was supported by a research grant from NICHD (#15795) and a Research Scientist Award from NIMH (#01490) to P.W.J. We wish to thank Elizabeth Johnson and Ann Marie Jusczyk for their constructive comments on this manuscript.
References (66)
- et al.
Distributional regularity and phonotactic constraints are useful for segmentation
Cognition
(1996) - et al.
Bootstrapping word boundaries: a bottom-up corpus-based approach to speech segmentation
Cognitive Psychology
(1997) Phonological parsing and lexical retrieval
Cognition
(1987)- et al.
Perception of rhythmic units in speech by infants and adults
Journal of Memory and Language
(1997) Four-month-old infants prefer to listen to motherese
Infant Behavior and Development
(1985)How infants begin to extract words from speech
Trends in Cognitive Science
(1999)- et al.
Infants' detection of sound patterns of words in fluent speech
Cognitive Psychology
(1995) - et al.
Infants' sensitivity to the sound patterns of native language words
Journal of Memory and Language
(1993) - et al.
Nine-month-olds' attention to sound similarities in syllables
Journal of Memory and Language
(1999) - et al.
The beginnings of word segmentation in English-learning infants
Cognitive Psychology
(1999)
Infants' sensitivity to phonotactic patterns in the native language
Journal of Memory and Language
The headturn preference procedure for testing auditory perception
Infant Behavior and Development
Cognitive influences on cross-language speech perception in infants
Infant Behavior and Development
Processing interactions and lexical access during word recognition in continuous speech
Cognitive Psychology
Phonotactic and prosodic effects on word segmentation in infants
Cognitive Psychology
The TRACE model of speech perception
Cognitive Psychology
Segmentation of continuous speech using phonotactics
Journal of Memory and Language
Implicit phonology in children
Journal of Verbal Learning and Verbal Behavior
A rhythmic bias in preverbal speech segmentation
Journal of Memory and Language
The possible-word constraint in the segmentation of continuous speech
Cognitive Psychology
Is compensation for coarticulation mediated by the lexicon?
Journal of Memory and Language
Word segmentation: the role of distributional cues
Journal of Memory and Language
Vowel harmony and speech segmentation in Finnish
Journal of Memory and Language
Allophonic variations in American English
Journal of Phonetics
Probabilistic phonotactics and neighborhood activation in spoken word recognition
Journal of Memory and Language
Segmentation of fluent speech into words: learning models and the role of maternal input
Disjuncture as a cue to constraints
Word
Expectancy and the perception of syllables
Language
Principles of English stress
Learning to segment speech using multiple cues: a connectionist model
Language and Cognitive Processes
Cited by (225)
Abstract processing of syllabic structures in early infancy
2024, CognitionNovel phonotactic learning by children and infants: Generalizing syllable-position but not co-occurrence regularities
2023, Journal of Experimental Child PsychologyCitation Excerpt :Critically, it remains unclear whether children and/or infants spontaneously access syllable-position information while representing phonotactic constraints (see Zamuner & Kharlamov, 2016, for a review of the role of syllable structure in phonotactic learning). The current inquiry does not call into question previous findings regarding knowledge of natural language speech-sound patterns (e.g., Coady & Aslin, 2004; Friederici & Wessels, 1993; Mattys & Jusczyk, 2001; Mattys et al., 1999), the learning of novel sound patterns (e.g., Chambers et al., 2003, 2011; Graf Estes, Gluck, & Grimm, 2016; Seidl & Buckley, 2005; Seidl et al., 2009; Wang & Seidl, 2015), or the mapping of word forms to objects (e.g., Graf Estes et al., 2011; MacKenzie et al., 2012; Richtsmeier et al., 2009a, 2009b, 2011); rather, it aimed to consider the standard interpretation of these findings. Previous results have often been interpreted as showing that children and/or infants are sensitive to constraints on consonant combinations (e.g., Graf Estes et al., 2016) or consonants occurring in specific syllable positions (e.g., Chambers et al., 2003) when in fact it is possible that information about both sound combinations and syllable position contributed to the results.