Brief articleInfant-directed speech supports phonetic category learning in English and Japanese☆
Introduction
One of the defining characteristics of language is its productivity. Units can be combined and recombined to allow for the creation of new words, phrases, and sentences. Languages differ in their phoneme repertoires, the sets of consonant and vowel sounds that distinguish meaning, and in the rules for how phonemes can be combined to create words and morphemes. It has been known for nearly 40 years that speakers of different languages represent and discriminate best those phonetic differences that are used phonemically (to contrast meaning) in their native language (Abramson & Lisker, 1970), but the means by which native phonetic categories are established have still not been fully explained.
Important advances were made in understanding this problem over 30 years ago when it was demonstrated that very young infants discriminate not only native, but also non-native phonetic differences (Eimas et al., 1971, Streeter, 1976), suggesting that sensitivity to the phonetic detail used to distinguish adult phonemic categories may be part of the initial perceptual apparatus. And, as first shown over 20 years ago (Werker & Tees, 1984), initial perceptual sensitivities change across the first year of life, resulting in diminished sensitivity to phonetic differences that are not used phonemically in the native language (see Best and McRoberts, 2003, Saffran et al., 2006), and enhanced sensitivity to native distinctions (see Kuhl et al., 2006, Polka et al., 2001).
Several models were proposed to explain the underlying mechanisms that allow infants to tune speech sound categories so rapidly. Early models, following the structural–functional linguistics tradition (e.g., Jakobson, 1949, Trubetskoy, 1969) assumed that the tuning of native categories emerges only after the establishment of contrastive words in the lexicon (e.g., Werker & Pegg, 1992). More recent perceptual learning models posited various similarity metrics to account for the change from broad-based to language-specific phonetic perception [e.g., the “Perceptual Assimilation Model” (PAM) Best and McRoberts, 2003; the “Native Language Magnet Model” (NLM) Kuhl, 1993]. The missing piece in supporting a perceptual learning model was an explication and demonstration of an actual learning mechanism that could account for these changes.
In 2002, Maye, Werker, and Gerken provided evidence that distributional learning might underlie the rapid tuning to the categories of the native language. Using an artificial language learning manipulation, two groups of infants aged 6–8 months were exposed to all steps of an 8-step continuum of [da] to [ta]1 speech syllables. One group heard more instances of the two center points in the continuum, steps 4 and 5, corresponding to a unimodal frequency distribution (as might be experienced in a language without the [da]/[ta] distinction). The other group of infants heard more instances of steps 2 and 7, corresponding to a bimodal frequency distribution (as might be experienced by infants being raised in a language with this distinction). Both groups heard equal numbers of the remaining stimuli. Following 2.3 min of familiarization, infants in the bimodal but not the unimodal group showed evidence of discriminating steps 1 from 8. Maye and Weiss (2003) replicated the distributional learning finding with two new sets of speech contrasts, and more recently, Yoshida, Pons, and Werker (2006) have replicated it with a non-native distinction. Together, these results indicate that distributional learning could be a mechanism that allows for speech sound category restructuring in the first year of life, prior to the establishment of a lexicon.
While laboratory-based artificial language learning studies constitute proof in principle that a particular learning mechanism is available, infants will not be able to apply this learning mechanism unless the speech they hear comprises such defined distributional regularities. Thus, an essential step is to examine the characteristics of input speech that infants hear, to see if the frequency distribution of the relevant acoustic/phonetic cues required for this learning mechanism does indeed exist in the input.
In this study we analyze durational and spectral cues of vowels from Japanese and English maternal speech. In Canadian English, the vowels differ primarily in vowel color. The acoustic correlates of vowel color are seen in the frequency of the formants; i.e. spectral differences. In Japanese there are only five vowels that differ in color, but every vowel has two forms, a long and a short form. Although likely once a dominant cue in English as well (Lehiste, 1970, Port, 1981), the historic length difference that still exists in some tense/lax vowel pairs has been superseded by a color difference in the English vowel space. A comparison of input speech for vowel pairs that differ in length in Japanese, and primarily in vowel color in English would thus allow a test of the hypothesis that there are distributional characteristics in the input that support native category learning. Because both English infants (Cooper and Aslin, 1990, Fernald, 1985) and Japanese infants (Hayashi, Tamekawa, & Kiritani, 2001) show a preference for listening to infant-directed over adult-directed speech, the strongest evidence would be provided by a study of infant-directed speech.
Vowels are more variable than consonants. Many factors influence vowel duration, including the voicing of the surrounding consonants, emphatic stress, focus, position in an utterance, and affect (for an overview see Erickson, 2000). The spectral differences that cue vowel color distinctions are also influenced by many factors, including pitch height and degree of pitch change (see Lieberman and Blumstein, 1988, Trainor and Desjardins, 2002 for a discussion) and speaking rate (Lindblom, 1963). In infant-directed speech, both vowel duration and spectral properties are affected by high pitch and highly affective modulation in English (e.g., Fernald, 1985) and Japanese (Hayashi et al., 2001). Vowel duration is much longer in infant-directed than in adult-directed speech (e.g., Andruski and Kuhl, 1996, Fernald and Simon, 1984, Kuhl et al., 1997), raising the very real possibility that the critical distributional cues to support distinctive categories in the domain of duration may be quite different in the input. Although it has been shown that the articulatory configurations used to distinguish vowel color differences in English are exaggerated in infant-directed over adult-directed speech (Andruski & Kuhl, 1996), the higher overall pitch could nonetheless lead to varying availability of the spectral cues distinguishing vowels.
Here we ask if the crucial distributional information is present in infant-directed speech to allow infants to modify initial sensitivities and establish native language vowel categories. We compare two languages, Japanese and English, on two very similar vowel pairs. In adult speech both vowel pairs are cued only by duration in Japanese, whereas in English both are cued spectrally, with duration as a less predictive, but most likely still available, secondary cue. If there are sufficient distributional cues in input speech to allow infants to tune their perceptual systems to the phonetic categories of the native language using distributional learning, then the following predictions must be upheld: (1) there should be two significantly distinct distributions of vowel length but not vowel color in the two members of each vowel pair as produced by the Japanese mothers, and (2) there should be two distinct distributions of vowel color in the two members of each vowel pair produced by the English mothers, and the distribution of vowel length should not be as distinct as it is for Japanese. Moreover, these differences should be apparent not only when the categories are already given, but the characteristics of maternal input – on their own – should yield language-specific categories. Specifically, (3) the input speech of Japanese mothers should better predict two categories for each vowel pair on the basis of vowel length than will the input speech of English mothers and (4) the input speech of English mothers should better predict two categories for each vowel pair on the basis of vowel color than will the input speech of Japanese mothers.
Section snippets
Participants
The study was conducted at the Infant Studies Centre at the University of British Columbia, Vancouver, Canada and in the NTT Communication Science Laboratories, Keihanna, Japan. A total of 30 mothers (20 Canadian-English and 10 Japanese) and their 12-month-old infants participated in the study.
Japanese infants were able to sit through the full version of the study. The Canadian-English infants were much less compliant, and were only able to complete half of the study; thus, twice as many
Results
The sentences were classified as “Reading” (read sentences) or “Spontaneous” speech (description of the visual scene). From the Japanese recordings, a range of 52–64 tokens from each mother from the Reading, and a range of 30–65 tokens from the Spontaneous speech were analyzed. For the English recordings, a range of 25–36 tokens from each mother were analyzed from the Reading. The Spontaneous speech was analyzed from 19 mothers (one mother did not produce any nonsense words spontaneously), with
Discussion
The goal of this study was to determine if, in the face of all the variation present in infant-directed speech, there are sufficient cues in the input to support distributional learning of native language phonetic categories. In our study of Japanese and English mothers teaching new words to their infants, we found clear and consistent language-specific cues. The vowel pairs /E–ee/ and /I–ii/ each differed more in length in the speech of Japanese mothers, whereas each differed more in color (as
Acknowledgements
We thank Noshin Lalji-Samji for designing the picture books, Jeremy Biesanz for statistical advice, Eli Puterman for help performing the analyses, and Jay McClelland for feedback and comments. We are grateful to all the mothers and infants who participated in this research.
References (41)
Four-month-old infants prefer to listen to motherese
Infant Behaviour and Development
(1985)- et al.
Infant sensitivity to distributional information can affect phonetic discrimination
Cognition
(2002) - et al.
Cues to post-vocalic voicing in mother-child speech
Journal of Phonetics
(1984) - et al.
Cross-language speech perception: evidence for perceptual reorganization during the first year of life
Infant Behaviour and Development
(1984) - Abramson, A. S., & Lisker, L. (1970). Discriminability along the voicing continuum; Cross-language tests. In...
- Andruski, J., & Kuhl, P. K. (1996). The acoustic structure of vowels in mothers’ speech to infants and children. In...
- et al.
Infant perception of non-native consonant contrasts that adults assimilate in different ways
Language and Speech
(2003) - Boersma, P., & Weenink, D. (2004). Praat: doing phonetics by computer (Version 4.3.02) [Computer program]. Retrieved...
- et al.
Preference for infant-directed speech within the first month after birth
Child Development
(1990) - et al.
Speech perception in infants
Science
(1971)
Simultaneous effects on vowel duration in American English: A covariance structure modeling approach
Journal of the Acoustical Society of America
Human maternal vocalizations to infants as biologically relevant signals: An evolutionary perspective
Expanded intonation contours in mothers’ speech to newborns
Developmental Psychology
A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants
Journal of Child Language
Developmental change in auditory preferences for speech stimuli in Japanese infants
Journal of Speech, Language and Hearing Research
Innate predispositions and the effects of experience in speech perception: The native language magnet theory
Cross-language analysis of phonetic units in language addressed to infants
Science
Infants show a facilitation effect for native language perception between 6 and 12 months
Developmental Science
A course in phonetics
Cited by (0)
- ☆
This manuscript was accepted under the editorship of Jacques Mehler.