Elsevier

Cognition

Volume 103, Issue 1, April 2007, Pages 147-162
Cognition

Brief article
Infant-directed speech supports phonetic category learning in English and Japanese

https://doi.org/10.1016/j.cognition.2006.03.006Get rights and content

Abstract

Across the first year of life, infants show decreased sensitivity to phonetic differences not used in the native language [Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behaviour and Development, 7, 49–63]. In an artificial language learning manipulation, Maye, Werker, and Gerken [Maye, J., Werker, J. F., & Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82(3), B101–B111] found that infants change their speech sound categories as a function of the distributional properties of the input. For such a distributional learning mechanism to be functional, however, it is essential that the input speech contain distributional cues to support such perceptual learning. To test this, we recorded Japanese and English mothers teaching words to their infants. Acoustic analyses revealed language-specific differences in the distributions of the cues used by mothers (or cues present in the input) to distinguish the vowels. The robust availability of these cues in maternal speech adds support to the hypothesis that distributional learning is an important mechanism whereby infants establish native language phonetic categories.

Introduction

One of the defining characteristics of language is its productivity. Units can be combined and recombined to allow for the creation of new words, phrases, and sentences. Languages differ in their phoneme repertoires, the sets of consonant and vowel sounds that distinguish meaning, and in the rules for how phonemes can be combined to create words and morphemes. It has been known for nearly 40 years that speakers of different languages represent and discriminate best those phonetic differences that are used phonemically (to contrast meaning) in their native language (Abramson & Lisker, 1970), but the means by which native phonetic categories are established have still not been fully explained.

Important advances were made in understanding this problem over 30 years ago when it was demonstrated that very young infants discriminate not only native, but also non-native phonetic differences (Eimas et al., 1971, Streeter, 1976), suggesting that sensitivity to the phonetic detail used to distinguish adult phonemic categories may be part of the initial perceptual apparatus. And, as first shown over 20 years ago (Werker & Tees, 1984), initial perceptual sensitivities change across the first year of life, resulting in diminished sensitivity to phonetic differences that are not used phonemically in the native language (see Best and McRoberts, 2003, Saffran et al., 2006), and enhanced sensitivity to native distinctions (see Kuhl et al., 2006, Polka et al., 2001).

Several models were proposed to explain the underlying mechanisms that allow infants to tune speech sound categories so rapidly. Early models, following the structural–functional linguistics tradition (e.g., Jakobson, 1949, Trubetskoy, 1969) assumed that the tuning of native categories emerges only after the establishment of contrastive words in the lexicon (e.g., Werker & Pegg, 1992). More recent perceptual learning models posited various similarity metrics to account for the change from broad-based to language-specific phonetic perception [e.g., the “Perceptual Assimilation Model” (PAM) Best and McRoberts, 2003; the “Native Language Magnet Model” (NLM) Kuhl, 1993]. The missing piece in supporting a perceptual learning model was an explication and demonstration of an actual learning mechanism that could account for these changes.

In 2002, Maye, Werker, and Gerken provided evidence that distributional learning might underlie the rapid tuning to the categories of the native language. Using an artificial language learning manipulation, two groups of infants aged 6–8 months were exposed to all steps of an 8-step continuum of [da] to [ta]1 speech syllables. One group heard more instances of the two center points in the continuum, steps 4 and 5, corresponding to a unimodal frequency distribution (as might be experienced in a language without the [da]/[ta] distinction). The other group of infants heard more instances of steps 2 and 7, corresponding to a bimodal frequency distribution (as might be experienced by infants being raised in a language with this distinction). Both groups heard equal numbers of the remaining stimuli. Following 2.3 min of familiarization, infants in the bimodal but not the unimodal group showed evidence of discriminating steps 1 from 8. Maye and Weiss (2003) replicated the distributional learning finding with two new sets of speech contrasts, and more recently, Yoshida, Pons, and Werker (2006) have replicated it with a non-native distinction. Together, these results indicate that distributional learning could be a mechanism that allows for speech sound category restructuring in the first year of life, prior to the establishment of a lexicon.

While laboratory-based artificial language learning studies constitute proof in principle that a particular learning mechanism is available, infants will not be able to apply this learning mechanism unless the speech they hear comprises such defined distributional regularities. Thus, an essential step is to examine the characteristics of input speech that infants hear, to see if the frequency distribution of the relevant acoustic/phonetic cues required for this learning mechanism does indeed exist in the input.

In this study we analyze durational and spectral cues of vowels from Japanese and English maternal speech. In Canadian English, the vowels differ primarily in vowel color. The acoustic correlates of vowel color are seen in the frequency of the formants; i.e. spectral differences. In Japanese there are only five vowels that differ in color, but every vowel has two forms, a long and a short form. Although likely once a dominant cue in English as well (Lehiste, 1970, Port, 1981), the historic length difference that still exists in some tense/lax vowel pairs has been superseded by a color difference in the English vowel space. A comparison of input speech for vowel pairs that differ in length in Japanese, and primarily in vowel color in English would thus allow a test of the hypothesis that there are distributional characteristics in the input that support native category learning. Because both English infants (Cooper and Aslin, 1990, Fernald, 1985) and Japanese infants (Hayashi, Tamekawa, & Kiritani, 2001) show a preference for listening to infant-directed over adult-directed speech, the strongest evidence would be provided by a study of infant-directed speech.

Vowels are more variable than consonants. Many factors influence vowel duration, including the voicing of the surrounding consonants, emphatic stress, focus, position in an utterance, and affect (for an overview see Erickson, 2000). The spectral differences that cue vowel color distinctions are also influenced by many factors, including pitch height and degree of pitch change (see Lieberman and Blumstein, 1988, Trainor and Desjardins, 2002 for a discussion) and speaking rate (Lindblom, 1963). In infant-directed speech, both vowel duration and spectral properties are affected by high pitch and highly affective modulation in English (e.g., Fernald, 1985) and Japanese (Hayashi et al., 2001). Vowel duration is much longer in infant-directed than in adult-directed speech (e.g., Andruski and Kuhl, 1996, Fernald and Simon, 1984, Kuhl et al., 1997), raising the very real possibility that the critical distributional cues to support distinctive categories in the domain of duration may be quite different in the input. Although it has been shown that the articulatory configurations used to distinguish vowel color differences in English are exaggerated in infant-directed over adult-directed speech (Andruski & Kuhl, 1996), the higher overall pitch could nonetheless lead to varying availability of the spectral cues distinguishing vowels.

Here we ask if the crucial distributional information is present in infant-directed speech to allow infants to modify initial sensitivities and establish native language vowel categories. We compare two languages, Japanese and English, on two very similar vowel pairs. In adult speech both vowel pairs are cued only by duration in Japanese, whereas in English both are cued spectrally, with duration as a less predictive, but most likely still available, secondary cue. If there are sufficient distributional cues in input speech to allow infants to tune their perceptual systems to the phonetic categories of the native language using distributional learning, then the following predictions must be upheld: (1) there should be two significantly distinct distributions of vowel length but not vowel color in the two members of each vowel pair as produced by the Japanese mothers, and (2) there should be two distinct distributions of vowel color in the two members of each vowel pair produced by the English mothers, and the distribution of vowel length should not be as distinct as it is for Japanese. Moreover, these differences should be apparent not only when the categories are already given, but the characteristics of maternal input – on their own – should yield language-specific categories. Specifically, (3) the input speech of Japanese mothers should better predict two categories for each vowel pair on the basis of vowel length than will the input speech of English mothers and (4) the input speech of English mothers should better predict two categories for each vowel pair on the basis of vowel color than will the input speech of Japanese mothers.

Section snippets

Participants

The study was conducted at the Infant Studies Centre at the University of British Columbia, Vancouver, Canada and in the NTT Communication Science Laboratories, Keihanna, Japan. A total of 30 mothers (20 Canadian-English and 10 Japanese) and their 12-month-old infants participated in the study.

Japanese infants were able to sit through the full version of the study. The Canadian-English infants were much less compliant, and were only able to complete half of the study; thus, twice as many

Results

The sentences were classified as “Reading” (read sentences) or “Spontaneous” speech (description of the visual scene). From the Japanese recordings, a range of 52–64 tokens from each mother from the Reading, and a range of 30–65 tokens from the Spontaneous speech were analyzed. For the English recordings, a range of 25–36 tokens from each mother were analyzed from the Reading. The Spontaneous speech was analyzed from 19 mothers (one mother did not produce any nonsense words spontaneously), with

Discussion

The goal of this study was to determine if, in the face of all the variation present in infant-directed speech, there are sufficient cues in the input to support distributional learning of native language phonetic categories. In our study of Japanese and English mothers teaching new words to their infants, we found clear and consistent language-specific cues. The vowel pairs /E–ee/ and /I–ii/ each differed more in length in the speech of Japanese mothers, whereas each differed more in color (as

Acknowledgements

We thank Noshin Lalji-Samji for designing the picture books, Jeremy Biesanz for statistical advice, Eli Puterman for help performing the analyses, and Jay McClelland for feedback and comments. We are grateful to all the mothers and infants who participated in this research.

References (41)

  • M.L. Erickson

    Simultaneous effects on vowel duration in American English: A covariance structure modeling approach

    Journal of the Acoustical Society of America

    (2000)
  • A. Fernald

    Human maternal vocalizations to infants as biologically relevant signals: An evolutionary perspective

  • A. Fernald et al.

    Expanded intonation contours in mothers’ speech to newborns

    Developmental Psychology

    (1984)
  • A. Fernald et al.

    A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants

    Journal of Child Language

    (1989)
  • A. Hayashi et al.

    Developmental change in auditory preferences for speech stimuli in Japanese infants

    Journal of Speech, Language and Hearing Research

    (2001)
  • Jakobson, R. (1949). On the identification of phonemic entities. Travaux du Cercle Linguistique de Copenhague, 5,...
  • P.K. Kuhl

    Innate predispositions and the effects of experience in speech perception: The native language magnet theory

  • P.K. Kuhl et al.

    Cross-language analysis of phonetic units in language addressed to infants

    Science

    (1997)
  • P.K. Kuhl et al.

    Infants show a facilitation effect for native language perception between 6 and 12 months

    Developmental Science

    (2006)
  • P. Ladefoged

    A course in phonetics

    (1993)
  • Cited by (0)

    This manuscript was accepted under the editorship of Jacques Mehler.

    View full text