Elsevier

Cognition

Volume 78, Issue 2, February 2001, Pages 91-121
Cognition

Phonotactic cues for segmentation of fluent speech by infants

https://doi.org/10.1016/S0010-0277(00)00109-8Get rights and content

Abstract

There is growing evidence that infants become sensitive to the probabilistic phonotactics of their ambient language sometime during the second half of their first year. The present study investigates whether 9-month-olds make use of phonotactic cues to segment words from fluent speech. Using the Headturn Preference Procedure, we found that infants listened to a CVC stimulus longer when the stimulus previously appeared in a sentential context with good phonotactic cues than when it appeared in one without such cues. The goodness of the phonotactic cues was estimated from the frequency with which the C·C clusters at the onset and offset of a CVC test stimulus (i.e. CVC·C) are found within and between words in child-directed speech, with high between-word probability associated with good cues to word boundaries. A similar segmentation result emerged when good phonotactic cues occurred only at the onset (i.e. CVC·C) or the offset (i.e. C·CVC·C) of the target words in the utterances. Together, the results suggest that 9-month-olds use probabilistic phonotactics to segment speech into words and that high-probability between-word clusters are interpreted as both word onsets and word offsets.

Introduction

Much speech input to which infants are exposed consists of continuous strings of sounds (Aslin, 1993, van de Weijer, 1998, Woodward and Aslin, 1990). Therefore, a crucial challenge for language learners is to divide the speech stream correctly into small storable chunks, namely, words. There is growing evidence that infants are remarkably proficient at exploiting regularities in the sound patterns of their native language to guide this segmentation process. Some of the regularities that have been investigated so far concern the typical stress pattern of words (i.e. prosody),1 the phonetic realization of phonemes as a function of their position in words (i.e. allophonic variations),2 the probability of speech units' contiguous occurrence (i.e. distributional regularities),3 and the probability of the contiguous occurrence of particular sequences of phonemes (i.e. phonotactics).4 The present paper focuses on segmentation based on sensitivity to phonotactic cues.

Sometime during the latter half of their first year, infants develop sensitivity to how stress, phones, and phonemes typically pattern within words. For instance, American 9-month-olds show a preference for spoken stimuli that conform to the dominant strong–weak prosodic pattern of bisyllabic English words (e.g. Jusczyk, Cutler, & Redanz, 1993). They also seem to rely on such prosodic regularity to parse the input. Jusczyk, Houston, and Newsome (1999) observed that 7.5-month-olds could spot a familiarized strong–weak word like ‘doctor’ in a fluent speech passage, but they did not detect a comparable weak–strong word like ‘guitar’ in passages after familiarization with it. Critically though, they tended to segment strong syllables plus consistently recurring weak syllables as units when they occurred together in passages. For instance, they responded to ‘taris’ when the strong syllable of ‘guitar’ was consistently followed by ‘is’ in a passage. Together, these findings suggest that in its earliest stages English-learners' speech segmentation is based on trochaic footing.

Some allophonic variations also correlate with word boundaries (Bolinger and Gerstman, 1957, Church, 1987, Lehiste, 1960, Umeda and Coker, 1974). For example, the phoneme /t/ is aspirated when it begins a word (e.g. ‘top’ [thop]) but not when it appears in non-initial positions (e.g. ‘stop’ [stop]). Two-month-old infants discriminate such allophonic differences, e.g. the allophonic variants of /t/ and /r/ in ‘night rates’ (Hohne & Jusczyk, 1994). Moreover, Jusczyk, Hohne, and Baumann (1999) found that sensitivity to allophonic variations contributes to the parsing of longer passages by 10.5 months of age. Infants familiarized with either ‘nitrates’ or ‘night rates’ showed a preference for a subsequent passage containing the appropriate version, an indication that they are sensitive to how allophonic distinctions typically align with word boundaries.

Yet, prosodic regularities and allophonic variations do not constitute absolute word boundary cues in English. Lexical stress is predominantly word-initial but many words bear stress on other syllables. Likewise, only a limited number of allophonic variations contribute to the computation of word boundaries. Thus, speech segmentation is essentially a heuristic process whose chances of success increase when cues are combined (e.g. Christiansen et al., 1998, Jusczyk, 1999, Mattys et al., 1999, Morgan, 1996, Morgan and Saffran, 1995, Saffran et al., 1996). Infants are proficient at detecting a variety of regularities in the speech signal even when no explicit speech cues to word boundaries are present. For example, 8-month-old infants exposed to a continuous stream of concatenated CV syllables notice regularities in the syllables' arrangement after only 2 min of exposure (Saffran, Aslin, & Newport, 1996). They discriminate between test strings of high transitional-probability syllables (i.e. syllables that always occur contiguously during familiarization) and strings of low transitional-probability syllables (i.e. syllables that occur in a less systematic order). This finding suggests that infants can use the transitional probabilities between adjacent syllables as an indicator of word boundaries, with low probabilities associated with boundaries between words.

A type of distributional regularity that has recently received a great deal of attention is probabilistic phonotactics (e.g. Gimson, 1980). Probabilistic phonotactics concern the frequency with which phonemes tend to occur next to each other in natural speech sequences, ranging from never to very frequently. For instance, in English, a sequence like [zt] is never found inside of words; it is phonotactically ‘illegal’. A sequence like [sd], though rare, is occasionally found within words (e.g. ‘disdain’) whereas [st] is very frequent (e.g. ‘stop’, ‘listing’, ‘best’, etc.). Adult speakers are sensitive to such phonotactic regularities (e.g. Massaro and Cohen, 1983, Pitt and McQueen, 1998); they respond to words or non-words faster if these consist of high- rather than low-frequency sequences of phonemes (Auer, 1993, Vitevitch and Luce, 1999, Vitevitch et al., 1997). Adolescents and young children show similar patterns of phonotactic sensitivity (Brown and Hildum, 1956, Messer, 1967, Pertz and Bever, 1975). Three- to 4-year-old children judge nonsense words made of high-frequency phonemic sequences as being more likely words than matched strings containing rare (but legal) sequences. They also pronounce the former more accurately than the latter (Messer, 1967).

The origin of phonotactic sensitivity can be found in infancy. By 9 months, infants have accumulated enough information about words to exhibit a preference for phonotactically well-formed speech strings. Jusczyk, Friederici, Wessels, Svenkerud, and Jusczyk (1993) observed that 9-month-old American infants listened longer to a list of words with phonemic sequences legal in English but illegal in Dutch than to words with sequences legal in Dutch but illegal in English. Dutch infants showed the opposite pattern of preference. By comparison, 6-month-olds listened equally to words with legal or illegal phonotactic sequences. The same age breakdown holds when non-words are opposed not on their components' phonotactic legality but on their phonotactic probability. Jusczyk, Luce, and Charles-Luce (1994) found that 9-month-olds, but not 6-month-olds, listened significantly longer to monosyllabic non-words containing high-probability phonotactic sequences (e.g. ‘chun’) than to ones containing low-probability phonotactic sequences (e.g. ‘yush’).

The goal of the present study is to examine if infants can exploit their sensitivity to probabilistic phonotactics to segment words from fluent speech (see McQueen, 1998, for evidence that adult listeners do). To use such information in word segmentation, in addition to responding to phonotactic well-formedness, infants must be sensitive to how sequences of phonemes usually align with word boundaries (see Church, 1987, for an early proposal about the computational efficiency of phonotactics in word segmentation). Partially supporting this possibility, Friederici and Wessels (1993) observed that Dutch 9-month-old infants preferred monosyllables with a cluster of consonants in a permissible position than matched stimuli with a cluster in an impermissible position. For instance, in Dutch, [br] is a typical word onset cluster, but is impermissible at word offset, whereas [rt] is typical at word offset but is impermissible at word onset. Nine-month-olds, but not 4.5- or 6-month-olds, listened longer to monosyllables with the test clusters in permissible positions (e.g. [bref] or [murt]) than to the ones with the same clusters in impermissible positions (e.g. [febr] or [rtum]). Thus, by 9 months, infants not only discriminate legal/frequent from illegal/rare phonemic patterns but also take into account the positions of such sequences within words.

More directly, Mattys et al. (1999) tested 9-month-olds' sensitivity to how CC consonant clusters are typically distributed with respect to word boundaries. They devised two types of lists of bisyllabic CVC·CVC non-words (a dot indicates a syllabic boundary). The stimuli in one list type (the ‘within-word’ stimuli) contained a C·C cluster that is frequently found within English words but infrequently found across words in fluent speech (e.g. ‘moftuth’, [mof·tǂθ]). The other list type (the ‘between-word’ stimuli) was composed of matched non-words containing a minimally changed C·C cluster with the opposite phonotactic pattern: the cluster was frequent across words and infrequent within words (e.g. ‘mofhuth’, [mof·hǂθ]). The within- and between-word frequencies of the test clusters were obtained from the Bernstein (1982) corpus of child-directed speech. The stimuli, which differed only in the relative frequency of their C·C clusters, yielded several notable findings. Nine-month-olds listened longer to the within-word stimuli than the between-word stimuli when the stimuli were stressed on the first syllable. However, they listened longer to the between-word stimuli when (a) the stimuli were stressed on the second syllable or (b) the stimuli were stressed on the first syllable and included a 500-ms silent pause between the two syllables. These results were interpreted as evidence that 9-month-olds are sensitive to how sequences of phonemes typically align with word boundaries and to how this phonotactic sensitivity relates to the tendency to perceive stress as word-initial (Jusczyk, Houston, & Newsome, 1999). Nevertheless, although these results show that 9-month-olds' perceptual preferences are consistent with speech segmentation based on phonotactic regularities, they do not directly demonstrate that infants actually use this sensitivity to segment words from fluent speech. Accordingly, the present study was designed to explore whether English-learning 9-month-olds use phonotactic cues in on-line word segmentation.

Section snippets

Experiment 1

If infants rely on phonotactic regularities to segment speech into words, they should notice the presence of a word embedded in a fluent speech passage more easily if the phonotactic patterns at the word's edges set it apart from rather than blend it into the neighboring words. In this first experiment, 9-month-olds were familiarized with two passages. In one passage, a CVC target word occurred in contexts in which the surrounding words provided good phonotactic word boundary cues. In the other

Experiment 2

The hypothesis tested in Experiment 2 is that a biphone with low within-word probability and high between-word probability is an efficient word onset cue (Brent and Cartwright, 1996, Cairns et al., 1997, Gaygen and Luce, submitted, Mattys et al., 1999). Segmentation of a test word with an ‘only-onset’ phonotactic cue would suggest that phonotactic segmentation can proceed efficiently on a left-to-right basis, with phonotactic information used to interpret later-occurring information as speech

Experiment 3

In this experiment, we test the hypothesis that a cluster of consonants with low within-word probability and high between-word probability facilitates the extraction of the word preceding the cluster. This possibility is of particular interest because segmentation from offset would imply that some form of retroactive processing from the word's offset boundary is used to recover the test word. Such operation is, in essence, more complex than segmentation from the onset and it presumably implies

General discussion

The results of these three experiments provide strong evidence that 9-month-olds can use their sensitivity to probabilistic phonotactics (Mattys et al., 1999) to segment words from fluent speech. In all experiments, infants exhibited a preference for a word (or a non-word) that previously occurred in a fluent speech passage, provided that at least one boundary of that word was phonotactically cued. Phonotactic cues to word boundaries are based on the characteristics of the consonant–consonant

Acknowledgements

This work was supported by a research grant from NICHD (#15795) and a Research Scientist Award from NIMH (#01490) to P.W.J. We wish to thank Elizabeth Johnson and Ann Marie Jusczyk for their constructive comments on this manuscript.

References (66)

  • P.W. Jusczyk et al.

    Infants' sensitivity to phonotactic patterns in the native language

    Journal of Memory and Language

    (1994)
  • D.G. Kemler Nelson et al.

    The headturn preference procedure for testing auditory perception

    Infant Behavior and Development

    (1995)
  • C.E. Lalonde et al.

    Cognitive influences on cross-language speech perception in infants

    Infant Behavior and Development

    (1995)
  • W.D. Marslen-Wilson et al.

    Processing interactions and lexical access during word recognition in continuous speech

    Cognitive Psychology

    (1978)
  • S.L. Mattys et al.

    Phonotactic and prosodic effects on word segmentation in infants

    Cognitive Psychology

    (1999)
  • J.L. McClelland et al.

    The TRACE model of speech perception

    Cognitive Psychology

    (1986)
  • J.M. McQueen

    Segmentation of continuous speech using phonotactics

    Journal of Memory and Language

    (1998)
  • S. Messer

    Implicit phonology in children

    Journal of Verbal Learning and Verbal Behavior

    (1967)
  • J.L. Morgan

    A rhythmic bias in preverbal speech segmentation

    Journal of Memory and Language

    (1996)
  • D. Norris et al.

    The possible-word constraint in the segmentation of continuous speech

    Cognitive Psychology

    (1997)
  • M.A. Pitt et al.

    Is compensation for coarticulation mediated by the lexicon?

    Journal of Memory and Language

    (1998)
  • J.R. Saffran et al.

    Word segmentation: the role of distributional cues

    Journal of Memory and Language

    (1996)
  • K. Suomi et al.

    Vowel harmony and speech segmentation in Finnish

    Journal of Memory and Language

    (1997)
  • N. Umeda et al.

    Allophonic variations in American English

    Journal of Phonetics

    (1974)
  • M.S. Vitevitch et al.

    Probabilistic phonotactics and neighborhood activation in spoken word recognition

    Journal of Memory and Language

    (1999)
  • R.N. Aslin

    Segmentation of fluent speech into words: learning models and the role of maternal input

  • Aslin, R. N. (1999, April). Utterance-final bias in word recognition by 8-month-olds. Paper presented at the biennial...
  • Auer, E. T. (1993). Dynamic processing in spoken word recognition: the influence of paradigmatic and syntactic states....
  • Bernstein, N. (1982). Acoustic study of mothers' speech to language-learning children: an analysis of vowel...
  • D.L. Bolinger et al.

    Disjuncture as a cue to constraints

    Word

    (1957)
  • R.W. Brown et al.

    Expectancy and the perception of syllables

    Language

    (1956)
  • L. Burzio

    Principles of English stress

    (1994)
  • M.H. Christiansen et al.

    Learning to segment speech using multiple cues: a connectionist model

    Language and Cognitive Processes

    (1998)
  • Cited by (225)

    • Novel phonotactic learning by children and infants: Generalizing syllable-position but not co-occurrence regularities

      2023, Journal of Experimental Child Psychology
      Citation Excerpt :

      Critically, it remains unclear whether children and/or infants spontaneously access syllable-position information while representing phonotactic constraints (see Zamuner & Kharlamov, 2016, for a review of the role of syllable structure in phonotactic learning). The current inquiry does not call into question previous findings regarding knowledge of natural language speech-sound patterns (e.g., Coady & Aslin, 2004; Friederici & Wessels, 1993; Mattys & Jusczyk, 2001; Mattys et al., 1999), the learning of novel sound patterns (e.g., Chambers et al., 2003, 2011; Graf Estes, Gluck, & Grimm, 2016; Seidl & Buckley, 2005; Seidl et al., 2009; Wang & Seidl, 2015), or the mapping of word forms to objects (e.g., Graf Estes et al., 2011; MacKenzie et al., 2012; Richtsmeier et al., 2009a, 2009b, 2011); rather, it aimed to consider the standard interpretation of these findings. Previous results have often been interpreted as showing that children and/or infants are sensitive to constraints on consonant combinations (e.g., Graf Estes et al., 2016) or consonants occurring in specific syllable positions (e.g., Chambers et al., 2003) when in fact it is possible that information about both sound combinations and syllable position contributed to the results.

    View all citing articles on Scopus
    View full text