Elsevier

Speech Communication

Volume 41, Issue 1, August 2003, Pages 233-243
Speech Communication

Perception and acquisition of linguistic rhythm by infants

https://doi.org/10.1016/S0167-6393(02)00106-1Get rights and content

Abstract

In the present paper, we address the issue of the emergence in infancy of speech segmentation procedures that were found to be specific to rhythmic classes of languages in adulthood. These metrical procedures, which segment fluent speech into its constitutive word sequence, are crucial for the acquisition by infants of the words of their native language. We first present a prosodic bootstrapping proposal according to which the acquisition of these metrical segmentation procedures would be based on an early sensitivity to rhythm (and rhythmic classes). We then review several series of experiments that have studied infants’ ability to discriminate languages between birth and 5 months, in an attempt to specify their sensitivity to rhythm and the implication of rhythm perception in the acquisition of these segmentation procedures. The results presented here establish infants’ sensitivity to rhythmic classes (from birth onwards). They further show an evolution of infants’ language discriminations between birth and 5 months which, though not inconsistent with our proposal, nevertheless call for more studies on the possible implication of rhythm in the acquisition of the metrical segmentation procedures.

Introduction

The focus of the present paper is on the issue of the segmentation of fluent speech into words, and more particularly on the development of speech segmentation procedures during infancy. For adults, speech segmentation involves language-specific phonological procedures (see below, and Cutler, McQueen and Norris, this volume) that allow for the retrieval of the acoustic sound patterns of words from fluent speech, and the connecting of these sound patterns to the lexical representations stored in the lexicon. Hence for adults, this task could be facilitated by the lexicon itself, making it a problem of word recognition as well as (or rather than) a problem of word segmentation. However, the segmentation task has to be different for the infant who, starting with no lexicon and no language-specific phonological knowledge, has to discover, rather than recognize, the words in the input and learn the speech segmentation procedure appropriate to the language to be learnt. In the following, we present several studies that have recently investigated the issue of word segmentation in early infancy. These studies have first established that speech segmentation emerges between 6 and 7.5 months of age in English-learning American infants, hence months before the onset of lexical comprehension and production (Jusczyk and Aslin, 1995). They have then set out to specify the kind of information that infants rely on to postulate word boundaries in fluent speech.

Some of these studies have established that young American infants are sensitive to different kinds of potential word-boundary markers that are language-specific: allophonic information (the fact that the distribution of allophones within words is position-dependent), phonotactic information (the fact that some but not all phonemic sequences are legal at the lexical level), and prosodic/metrical information (the fact that lexical stress is predominantly word initial in English). A sensitivity to allophonic differences was found in infants as young as 2 months of age (Hohne and Jusczyk, 1994), as attested by their ability to discriminate between pairs such as “nitrate” and “night rate”. Infants have also been found to become sensitive to phonotactic properties of their native language between 6 and 9 months of age (Friederici and Wessels, 1993; Jusczyk et al., 1993b, Jusczyk et al., 1994; Mattys et al., 2001). This is shown by the emergence of a preference for legal or frequent sequences of phonemes in their native language with respect to illegal or infrequent ones (e.g., the frequent and infrequent non-word sequences “chun” and “yush”). Last but not least, a preference for words with the predominant English strong–weak stress pattern (e.g., “porter”) over less frequent weak–strong words (e.g., “report”) also emerges between these two ages (Jusczyk et al., 1993a; Turk et al., 1995), revealing the emergence of a sensitivity to native word stress patterns.

Other studies have investigated whether, once they are sensitive to these language-specific markers, infants actually use them to infer word-like units. First, some studies showed that although 9-month-old infants only use allophonic cues to locate familiar words in fluent speech when they are guided by distributional cues, 10.5-month-old can rely on these sole allophonic cues (Jusczyk et al., 1999a). Moreover, a recent study has shown that the fact of providing infants with word-boundary phonotactic information helps them extract familiarized words from fluent speech (Mattys and Jusczyk, 2001).

Finally, it was shown that 9- to 10.5-month-old American infants rely on the typical stress pattern of English to group syllables into word-like units (Morgan and Saffran, 1995) and to remember familiar sequences of syllables (Echols et al., 1997). The ability to retrieve familiar words from fluent speech was also found to depend on their stress pattern. Indeed, Jusczyk et al. (1999b) found that infants begin segmenting strong–weak nouns (e.g., “doctor” and “candle”) from fluent speech at 7.5 months, but begin segmenting weak–strong nouns (e.g., “guitar” and “beret”) only at 10.5 months. These authors suggested that this processing advantage of strong–weak words might result from the specification of the predominant stress pattern of English, and the emergence by 7.5 months of a segmentation procedure, appropriate for English, and based on its metrical properties. Following this procedure, infants would place a word boundary before the occurrence of every strong syllable in the speech stream, allowing for the detection of strong–weak but not weak–strong words. Note that this procedure is language specific, as it would not be appropriate for the acquisition of French in which the metrical structure is different.1 Later, that is between 7.5 and 10.5 months, infants would become sensitive to the (language-general) distributional properties of syllables in the speech stream, which would then allow them to segment weak–strong words by grouping these two syllables together. Note that further support that infants can use statistical regularities in the order of syllables forming a continuous sequence to build cohesive word-like units comes from a study by Saffran et al. (1996) on 8-month-old infants.

Hence, the studies above provide good evidence that the very onset of word learning/segmentation is under the influence of language-specific segmentation procedures, particularly a procedure based on the prosodic/metrical properties of the native language (the strong–weak segmentation procedure for English). At this point, and in spite of its importance, this finding is problematic unless we explain how infants come to specify these phonological properties of their native language. Indeed, if we are trying to explain how infants come to start segmenting fluent speech, we cannot say that they use some ‘knowledge’ of the fact that words are stressed initially in their language unless we can explain how they discovered that fact independently of segmentation: a classical bootstrapping problem. In this paper, we propose the existence of such an independent mechanism for the acquisition of the metrical segmentation procedure. This proposal was initially inspired by data from the adult speech segmentation literature and the phonetic literature, which we review in the following. Then, we propose a prosodic bootstrapping account of the acquisition of the metrical segmentation procedures, and then turn to recent data that we have started to gather in support of this proposal.

Several studies have looked at the way adults segment fluent speech. These studies indicated that speech segmentation is influenced by the metrical system of the native language, such that adults speaking French, English and Japanese use different metrical segmentation procedures. It further appeared that each procedure is based on the metrical unit characteristic of a particular language. Hence, the syllable appeared to be the unit of segmentation used by French-speaking (Mehler et al., 1981), Spanish- and Catalan-speaking (Sebastián-Gallés et al., 1992), and Portuguese-speaking (Morais et al., 1989) adults. However, the segmentation procedures used in English (Cutler et al., 1986; Cutler and Norris, 1988) and Dutch (Vroomen et al., 1996) are apparently guided by information about typical word stress patterns, which involve an alternation of strong and weak syllables. A third pattern was found for Japanese adults, who appeared to rely on the mora (Otake et al., 1993).2 Finally, adults’ segmentation procedures appear to be deeply embedded in their language-specific competence, and acquired at a very young age. This is attested by the fact that the metrical procedure they use is determined by their native language rather than by the language they are listening to: once they have mastered a particular language, adults rely on procedures appropriate to that language even when listening to a foreign language (Cutler et al., 1986; Otake et al., 1993). Moreover, it has also been shown that even very proficient bilinguals are dominant in one of their languages, and have developed specialized metrical segmentation procedures in only one of their languages (Cutler et al., 1992).

It has been suggested that each of the different types of metrical segmentation procedures is optimally adapted to the processing of a particular rhythmic class of languages (Cutler and Mehler, 1993; Otake et al., 1993; see also Sebastián-Gallés et al., 1992; Vroomen et al., 1996), even if minor processing differences can be found within a class.3 This proposal is based on a three-way classification of languages according to their predominant rhythmic structure (Abercrombie, 1967; Pike, 1945). By this classification, most Germanic languages (e.g., English, Dutch, German) have a rhythm based on the stress unit (i.e., the foot), most Romance languages (e.g., French, Italian, Spanish) have a rhythm based on the syllable, while languages such as Japanese have a mora-based rhythm. Note that these rhythmic units are hierarchically related at the phonological level, as feet are made up of syllables that are made up of morae.

The view that the rhythmic properties of a language shape adults’ processing procedures has influenced views of how infants develop efficient procedures for segmenting native language utterances. Mehler et al. (1996) have proposed that the emergence of metrical segmentation procedures rests on an early sensitivity to prosody, and more precisely here, linguistic rhythm specified at a non-segmented level. Note that this proposal meets other recent proposals regarding infants’ early sensitivity to prosodic information, and the importance of prosody in early language acquisition (Fernald and Kuhl, 1987; Jusczyk, 1997; Jusczyk et al., 1993a, Jusczyk et al., 1993b; Jusczyk and Thompson, 1978; Karzon and Nicholas, 1989; Kuhl and Miller, 1982; Morse, 1972; Nazzi et al., 1998a, Nazzi et al., 1998b; Spring and Dale, 1977).

More specifically, our proposal is that infants’ sensitivity to rhythm at the utterance/supra-segmental level will allow them to specify the type of rhythm of their native language, and develop the procedure appropriate to its segmentation. Hence, the emergence of the metrical segmentation procedures is not based on infants learning that words in their native language have a specific stress pattern (which requires knowledge of at least some words), but on infants specifying the type of rhythm of their native language. This acquisition scenario works because there is a relation between rhythm at the sentence level and meter at the lexical level, so that the metrical units segmented by the infants will more or less correspond to words. We further hypothesize that infants will discover the correspondence between metrical units and lexical units at some point between the ages of 7.5 months (onset of speech segmentation) and 10 to 12 months (acquisition of the first words).

To validate the above (prosodic bootstrapping) proposal, three points need to be validated. First, we need to demonstrate that there are acoustic correlates to the rhythmic classes. Such evidence has been reported in several recent investigations (Arvaniti, 1994; Den Os, 1988; Fant et al., 1991; Nazzi, 1997; Shafer et al., 1999; Ramus et al., 1999). Second, we need to determine that infants are sensitive to rhythmic class information, and, third, that this sensitivity plays a role in the acquisition of the segmentation procedures. The rest of this paper is a review of the research bearing on infants’ sensitivity to and acquisition of rhythmic class information, which has studied infants’ ability to discriminate between languages to explore how they perceive linguistic rhythm and acquire the rhythmic properties of their native language.

Section snippets

Language discrimination at birth

Studies exploring newborns’ ability to discriminate between languages first contrasted their to-be-native language to a foreign language. Some of the experiments in Mehler et al. (1988) explored the ability of French newborns to discriminate utterances drawn from languages of different rhythmic classes. Those infants discriminated utterances in French from others in Russian, but did not seem to discriminate English from Italian. This led to the conclusion that early language discrimination was

Acquisition of the rhythmic properties of the native language

Following the pattern of language discrimination found in their newborn study, Nazzi et al. (1998a) proposed a more precise prosodic bootstrapping account of the acquisition of the metrical procedures found to be used by adults. According to their proposal, called the rhythmic class acquisition hypothesis, infants’ initial sensitivity to rhythmic classes would allow them to specify the common rhythmic properties of their native rhythmic class, from which they would develop its associated

Conclusion and future directions

In this paper, we have presented a developmental proposal for the acquisition of the metrical speech segmentation procedures used in adulthood. Given the bootstrapping issue that these procedures, which operate at the lexical level, need to be learnt at a level independent of the word, we have proposed that these procedures could develop from an early sensitivity to and acquisition of the rhythmic properties of speech at the utterance/suprasegmental level. Based on data from the adult and

Acknowledgments

We would like to thank the co-authors of the different papers cited here, for their crucial contributions to our theoretical and experimental work. We would also like to thank the babies that participated in the studies and their families, and, of course, Bert Schouten for organizing and inviting us to the Nature of Speech Perception workshop where both authors presented their work. FR was supported by the Délégation Générale pour l’Armement and a Marie Curie fellowship from the European

References (55)

  • P.W. Jusczyk et al.

    Infants’ sensitivity to phonotactic patterns in the native language

    Journal of Memory and Language

    (1994)
  • S.L. Mattys et al.

    Phonotactic cues for segmentation of fluent speech by infants

    Cognition

    (2001)
  • J. Mehler et al.

    The syllable’s role in speech segmentation

    Journal of Verbal Learning and Verbal Behavior

    (1981)
  • C. Moon et al.

    Two-day-olds prefer their native language

    Infant Behavior and Development

    (1993)
  • P.A. Morse

    The discrimination of speech and nonspeech stimuli in early infancy

    Journal of Experimental Child Psychology

    (1972)
  • T. Nazzi et al.

    Discrimination of pitch contours by neonates

    Infant Behavior and Development

    (1998)
  • T. Nazzi et al.

    Language discrimination by English learning 5-month-olds: effects of rhythm and familiarity

    Journal of Memory and Language

    (2000)
  • T. Otake et al.

    Mora or syllable? Speech segmentation in Japanese

    Journal of Memory and Language

    (1993)
  • N. Sebastián-Gallés et al.

    Contrasting syllabic effects in Catalan and Spanish

    Journal of Memory and Language

    (1992)
  • J.F. Werker et al.

    Cross-language speech perception: Evidence for perceptual reorganization during the first year of life

    Infant Behavior and Development

    (1984)
  • D. Abercrombie

    Elements of General Phonetics

    (1967)
  • A. Christophe et al.

    Is Dutch native English? Linguistic analysis by 2-month-olds

    Developmental Science

    (1998)
  • A. Cutler et al.

    The role of strong syllables in segmentation for lexical access

    Journal of Experimental Psychology: Human Perception and Performance

    (1988)
  • G. Dehaene-Lambertz et al.

    Faster orientation latencies toward native language in two-month-old infants

    Language and Speech

    (1998)
  • E. Den Os

    Rhythm and Tempo of Dutch and Italian: A Contrastive Study

    (1988)
  • T. Dutoit et al.

    The MBROLA project: towards a set of high-quality speech synthesizers free of use for non-commercial purposes

  • A. Fernald et al.

    Acoustic determinants of infant preference for motherese speech

    Infant Behavior and Development

    (1987)
  • Cited by (140)

    • Quantifying the role of rhythm in infants' language discrimination abilities: A meta-analysis

      2021, Cognition
      Citation Excerpt :

      Future studies that compare monolinguals and bilinguals' discrimination skills in the early months of life would be able to illuminate effects of the rhythmic properties of the tested languages, language nativeness and multilingualism. The distinction between the two aforementioned hypotheses discussed by Nazzi and Ramus (2003) is that the rhythmic class acquisition hypothesis predicts that by a few months of age, infants should be able to discriminate between two non-native languages in their native rhythm class, while the native language acquisition hypothesis predicts they should not. There is currently no meta-analytic evidence one way or the other.

    • Preverbal Development and Speech Perception

      2020, Encyclopedia of Infant and Early Childhood Development
    View all citing articles on Scopus
    View full text