Intersensory redundancy promotes infant detection of prosody in infant-directed speech

https://doi.org/10.1016/j.jecp.2019.02.008Get rights and content

Abstract

Prosody, or the intonation contours of speech, conveys emotion and intention to the listener and provides infants with an early basis for detecting meaning in speech. Infant-directed speech (IDS) is characterized by exaggerated prosody, slower tempo, and elongated pauses, all amodal properties detectable across the face and voice. Although speech is an audiovisual event, it has been studied primarily as a unimodal auditory stream without the synchronized dynamic face of the speaker. According to the intersensory redundancy hypothesis, redundancy across the senses facilitates perceptual learning of amodal information, including prosody. We predicted that young infants who are still learning to discriminate and categorize prosodic information would detect prosodic changes better in the presence of intersensory redundancy (i.e., synchronous audiovisual speech) than in its absence (i.e., unimodal auditory or asynchronous audiovisual speech). To test this hypothesis, 72 4-month-old infants were habituated to recordings of women reciting passages in IDS with prosody conveying either approval or prohibition and then were tested with recordings of a novel passage with either a change or no change in prosody. Infants who received bimodal synchronous stimulation exhibited significant visual recovery to the novel passage with a change in prosody, but not to a novel passage with no change in prosody. Infants in the unimodal auditory and bimodal asynchronous conditions did not exhibit visual recovery in either condition. Results support the hypothesis that intersensory redundancy facilitates detection and abstraction of invariant prosody across changes in linguistic content and likely serves as an early foundation for the detection of meaning in fluent speech.

Introduction

To break into language learning, infants are faced with the challenge of parsing what they hear from a continuous speech stream into discriminable units (i.e., words). Infant-directed speech (IDS), also known as motherese, provides the naïve perceiver valuable information in the form of frequent and elongated pauses, slower tempo, pitch changes (i.e., higher pitch and wider pitch range), and more prosodic repetition (Fernald, 1984, Fernald, 1989, Ladd et al., 1985, Newport et al., 1977). These exaggerated prosodic features (or intonation contours) characterizing IDS provide opportunities for infants to begin to parse the speech stream and perceive meaning in speech (Morgan, 1996, Nazzi et al., 2000, Soderstrom, 2007, Spinelli et al., 2017). In IDS, emotional expressions are also exaggerated, making it easier to accurately detect affective information in the face (Juslin and Laukka, 2001, Ladd et al., 1985). Furthermore, caregivers use IDS to elicit infant attention, communicate meaning, and maintain social interactions (Bryant and Barrett, 2007, Fernald, 1984, Spinelli et al., 2017, Trainor et al., 2000). Decades of research indicate that infants benefit significantly from adults’ use of IDS. These studies demonstrate not only that infants prefer to listen to IDS over adult-directed speech (ADS; e.g., Cooper and Aslin, 1990, Fernald, 1985) but also that the unique prosodic patterns found in IDS promote better outcomes during infancy and childhood, including attention, language learning, and discrimination of emotions or affective information (Saint-Georges et al., 2013, Santarcangelo and Dyer, 1988, Spinelli et al., 2017, Werker and McLeod, 1989). The affective intent of speech is linked to specific acoustic profiles (e.g., happiness is characterized by a slower rate of speech and wider expansion of pitch range; anger is characterized by a short sharp tone and narrow pitch range; Fernald, 1993, Juslin and Laukka, 2003, Sakkalou and Gattis, 2012, Scherer, 1986, Scherer, 2003). The coordination of affective and acoustic information is exaggerated in IDS. Prosodies conveying approval and praise (e.g., “Good baby!”) are characterized by exaggerated rise–fall pitch contours and sustained volume intensity, whereas prohibition and warning prosodies (e.g., “No, don’t touch!”) are characterized by low pitch, high intensity, and short staccato contours (Fernald and Kuhl, 1987, Fernald, 1989). Adults (both with and without experience with infants) are able to identify the communicative intent of a speaker using only prosodic information in bids of approval and prohibition (Fernald, 1989). These results highlight the important role discrimination of prosodic characteristics plays in conveying communicative intent and affect to the listener.

Given the importance of perceiving prosody for learning language, as well as the consistent use of IDS within and across cultures by caregivers and non-caregiving adults (Fernald et al., 1989), the current study examined the conditions that promote infant detection of changes in prosody. Prosody and affect discrimination have typically been studied as vocal expressions (e.g., Moore et al., 1997, Soderstrom, 2007, Spence and Moore, 2003, Trainor et al., 2000). However, speech is a multisensory event, providing coordinated and synchronized changes across the face, voice, and gesture for amodal properties specifying prosodic information (Bahrick and Lickliter, 2002, Bahrick and Lickliter, 2014, Gibson, 1969). Amodal information is information that can be conveyed across more than one sense modality, including timing (such as rhythm, tempo, and duration) and intensity patterns that specify affect and communicative intent in audiovisual speech. Similarly, emotion has been characterized as a multicomponent process across feeling, physiology, and expression, with expression reflected in the face, voice, and gesture (Johnstone and Scherer, 2000, Scherer, 2003). Thus, prosody signifying approval versus prohibition is available not only as a vocal signal but also through correlated changes in the movements of the face (e.g., rhythm, tempo, duration, and intensity changes) as well as through the rising and falling pitch of the voice synchronized with rising and falling movements of the cheeks, forehead, and eyebrows.

The intersensory redundancy hypothesis posits that information presented in temporal synchrony and redundantly across sensory modalities (e.g., auditory, visual) facilitates attention and perceptual learning about amodal information, particularly in young infants (Bahrick and Lickliter, 2000, Bahrick and Lickliter, 2002, Bahrick and Lickliter, 2014, Bahrick et al., 2002). Prosodic patterns characterizing approval versus prohibition are conveyed by synchronized changes in the tempo, rhythm, and duration of speech, amodal properties detectable across both the face and voice. Research has demonstrated that young infants are skilled at detecting these amodal temporal properties. For example, infants detect changes in the tempo (Bahrick et al., 2002) and rhythm (Bahrick & Lickliter, 2000) of an audiovisual event more easily and earlier in development when the audible and visible information is presented together in synchrony (e.g., a toy hammer tapping a particular rhythm) rather than when it is presented in just one sense modality alone (auditory or visual) or out of synchrony. Thus, we expected that the face–voice synchrony provided by audiovisual speech would facilitate the early detection of prosodic changes.

The characteristic prosody found in IDS has several important contributions to infant attention and learning (Colombo, Frick, Ryther, Coldren, & Mitchell, 1995). Researchers have posited that the function of IDS is threefold: to regulate infant attention, to highlight the structure of language in adult speech for language-learning children, and to help infants interpret incoming affective information from others (Cooper et al., 1997, Fernald, 1989, Grieser and Kuhl, 1988, Singh et al., 2002). In support of these claims, research examining the benefits of the prosody found in IDS has shown that it (a) aids in the promotion or maintenance of infant attention to faces, voices, and eye gaze (Kaplan et al., 1995, Saint-Georges et al., 2013, Senju and Csibra, 2008, Spinelli et al., 2017) as well as to language (Fernald and Mazzie, 1991, Ramírez-Esparza et al., 2014, Werker and McLeod, 1989); (b) highlights the syntactic or grammatical structure of language (Fernald and Mazzie, 1991, Ramírez-Esparza et al., 2014, Werker and McLeod, 1989) and the lexical meaning of individual words (Golinkoff and Alioto, 1995, Ma et al., 2011, Song et al., 2010), consequently leading to improved language outcomes (Ramírez-Esparza et al., 2014); and (c) helps infants to interpret affective information and discriminate between emotions conveyed in faces and voices (Fernald, 1989).

Some have argued that it is the emotion or emotional expressiveness of IDS prosody that sets it apart from ADS (Singh et al., 2002, Trainor et al., 2000). Trainor et al. (2000) examined acoustic samples of both IDS and ADS and contended that reported differences between IDS and ADS emerge as a result of the differences in emotional expression conveyed in each type of speech registered, with more widespread and varied emotion conveyed in IDS and more inhibited expression of emotion conveyed in ADS. Singh et al. (2002) also suggested that the greater affect in IDS as compared with ADS contributes to infant preferences for IDS over ADS. In their study, they held affective information constant while presenting unimodal IDS and ADS samples and found that 6-month-olds do not show a significant preference for either speech register. These findings highlight the unique and important role that affective information in speech plays in infant attention to IDS. They also point to the need for further research examining how infants detect changes in prosody that conveys affective information such as that conveying approval and prohibition.

Even young infants are keen perceivers of affect and prosody. Infants show early preferences for prosodic contours that contain positive affect, such as approval and comfort, over those that contain negative affect, such as prohibition (Fernald, 1993, Papoušek et al., 1990). By 4 months of age, infants show preferences for IDS conveying approval over IDS conveying disapproval (Papoušek et al., 1990). Infants also show more positive affect themselves (e.g., smiling) for IDS conveying approval when compared with IDS conveying prohibition (Fernald, 1993). This was the case across 5-month-olds learning multiple languages, suggesting a cross-cultural preference for positive affect in IDS. Spence and colleagues (Moore et al., 1997, Spence and Moore, 2003) examined in two separate publications 6-month-olds’ ability to discriminate and categorize affective prosody. Using an infant-controlled familiarization–test paradigm, infants were familiarized with a set of IDS utterances in prosodies specifying either approval or comfort and then were presented with a novel instance of either the familiar prosody (control group; e.g., if familiarized with comfort utterances, they received a novel comfort utterance) or the novel prosody (experimental group; e.g., if familiarized with comfort utterances, they received an approval utterance). In one set of studies, Moore et al. (1997) found that 6-month-olds from the experimental group could form categories of affective prosody when they used low-pass filtered utterances, in which the linguistic content of the utterances had been masked but the prosodic features of the utterances, such as pitch, rhythm, and intensity, were preserved and attenuated. In a follow-up study, Spence and Moore (2003) showed that 6-month-olds in the experimental group, but not 4-month-olds, could discriminate and categorize approval and comfort utterances even when utterances were unfiltered, containing the full range of frequencies that naturally occur in IDS. These studies show that by 5 or 6 months of age, infants detect differences in affective prosody, including approval and prohibition. However, one commonality across the studies reviewed above is that infants were presented with prosody in IDS while viewing either no visual information or static nonaffective visual information such as a checkerboard pattern. Thus, these studies leave open the question of whether at a younger age infants could detect changes in prosody in audiovisual speech if the speech samples were accompanied by the dynamically moving face of the speaker, providing intersensory redundancy, as is typical in the natural environment.

Multimodal presentation has been shown to promote infant detection of affect in faces and voices. Caron, Caron, and MacLean (1988) found that 5-month-olds, but not 4-month-olds, could discriminate the emotional expressions of happiness and sadness when presented in a multimodal context. A study by Walker-Andrews and Grolnick (1983) suggests that 5-month-old infants can reliably discriminate between happy and sad affective utterances but appear to do so only in conditions where facial expressions accompany the vocal expressions. Walker-Andrews and Lennon (1991) also found evidence that 5-month-olds can discriminate changes in the vocal expressions of happy and angry affects. Infants detected a change in vocal affect, but only when the soundtrack was accompanied by a face and not when it was accompanied by a checkerboard. These studies raise the question of what exactly it is about multimodal presentations that facilitate infant detection of affect and prosody.

Research generated by the intersensory redundancy hypothesis indicates that it is the redundancy across synchronous facial and vocal information that facilitates detection of affect. By comparing detection of affect in the presence of intersensory redundancy (synchronous audiovisual speech) versus the absence of intersensory redundancy (asynchronous audiovisual speech; unimodal auditory speech; unimodal visual speech), Flom and Bahrick (2007) demonstrated the critical role of intersensory redundancy in bootstrapping infant detection of affect in audiovisual speech. At 4 months of age infants discriminated affective information (e.g., happy, sad, angry) in synchronous audiovisual speech, at 5 months they discriminated the affect in auditory speech, and only by 7 months did they discriminate the affect in unimodal visual speech. Affect was not discriminated in asynchronous audiovisual speech, demonstrating that temporal synchrony between the audio and visual information was necessary for infant discrimination. Thus, similar to findings from studies of infant detection of rhythm (Bahrick & Lickliter, 2000) and tempo (Bahrick et al., 2002), intersensory redundancy provided by audiovisual synchrony is necessary for promoting discrimination early in infancy. Thus, we predicted that this should also be true for infant detection of prosodic information at 4 months of age.

The current study was designed to assess whether intersensory redundancy facilitates infants’ ability to abstract prosodic information specifying approval versus prohibition. We examined whether infants detected a change in prosody, from approval to prohibition or from prohibition to approval, in conditions where intersensory redundancy (i.e., temporal synchrony) was present versus absent. Intersensory redundancy is present during synchronous audiovisual speech but is absent in asynchronous audiovisual speech and unimodal auditory speech. Using an infant-controlled habituation paradigm, we asked under which of these three conditions 4-month-olds could detect a change in prosody. If intersensory redundancy bootstraps early detection of prosodic changes, we predicted that infants would detect these changes in the presence, but not in the absence, of intersensory redundancy.

Furthermore, in each condition we assessed whether infants could generalize prosodic information to a new speech passage, similar to the design used by Spence and Moore (2003). If so, this would provide data to suggest that infants could abstract invariant information specifying prosodic information across changes in speech passages. Infants were randomly assigned to condition (bimodal synchronous, unimodal auditory, or bimodal asynchronous) and prosody change test type (change or no change). They were habituated to a passage conveying either approval or prohibition and then were tested with a novel passage conveying either a new (change) prosody or the familiar (no change) prosody. We predicted that 4-month-olds would detect the invariant prosodic information across multiple passages and discriminate a change in prosody when given bimodal synchronous stimulation but not when given unimodal auditory or bimodal asynchronous stimulation. Furthermore, the asynchronous audiovisual condition provides the same amount and type of stimulation as the synchronous audiovisual condition and, thus, serves as a control for a number of possible alternative interpretations of differences between the two conditions, including differential arousal effects of the two prosodies. Thus, any differences between the synchronous and asynchronous conditions could be attributed to intersensory redundancy (i.e., audiovisual temporal synchrony).

Section snippets

Participants

A total of 72 4-month-old infants (M = 125.61 days, SD = 3.96) participated in the current study. Of these, 38 were male and 34 were female. All infants were delivered full-term (>37 gestational weeks) without complications and had Apgar scores of 9 or greater. Regarding race/ethnicity, 59 infants were Hispanic, 9 were non-Hispanic White, and 4 were non-Hispanic Black. Families were either English–Spanish bilingual or monolingual English speakers. An additional 18 infants were tested but were

Planned analyses

To determine whether infants discriminated the prosody change, we calculated visual recovery scores by subtracting mean visual fixation time on post-habituation trials from mean visual fixation time on test trials in each condition. Visual recovery scores significantly greater than zero indicate discrimination. Our primary hypothesis—that at 4 months of age infants would require intersensory redundancy to detect prosody change—was tested in two ways: first, by looking at differences among

Discussion

The current study assessed whether intersensory redundancy could facilitate 4-month-old infants’ ability to abstract prosodic information specifying approval versus prohibition in IDS. We predicted that intersensory redundancy provided by naturalistic, synchronous audiovisual speech would facilitate detection of prosody. According to the intersensory redundancy hypothesis, information that is presented redundantly and synchronously across sensory modalities facilitates detection of amodal

Acknowledgments

This work was supported by National Institutes of Health (NIH) Grants K02-HD064943 and RO1-HD053776 awarded to the first author and by an American Psychological Association (APA) PRIME fellowship and NIH/NIGMS (National Institute of General Medical Sciences) Grant R25 GM061347 awarded to the fourth author. We thank Mariana Vaillant-Molina, Ana Bravo, Melissa Argumosa, and Laura Batista for assistance in data collection.

References (57)

  • M. Soderstrom

    Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants

    Developmental Review

    (2007)
  • M. Spinelli et al.

    Does prosody make the difference? A meta-analysis on relations between prosodic aspects of infant-directed speech and infant outcomes

    Developmental Review

    (2017)
  • A.S. Walker-Andrews et al.

    Discrimination of vocal expressions by young infants

    Infant Behavior and Development

    (1983)
  • A.S. Walker-Andrews et al.

    Infants’ discrimination of vocal expressions: Contributions of auditory and visual information

    Infant Behavior and Development

    (1991)
  • L.E. Bahrick et al.

    Intersensory redundancy facilitates discrimination of tempo in 3-month-old infants

    Developmental Psychobiology

    (2002)
  • L.E. Bahrick et al.

    Intersensory redundancy guides attentional selectivity and perceptual learning in infancy

    Developmental Psychology

    (2000)
  • L.E. Bahrick et al.

    Intersensory redundancy guides early perceptual and cognitive development

  • L.E. Bahrick et al.

    Learning to attend selectively: The dual role of intersensory redundancy

    Current Directions in Psychological Science

    (2014)
  • L.E. Bahrick et al.

    Infant discrimination of faces in naturalistic events: Actions are more salient than faces

    Developmental Psychology

    (2008)
  • G.A. Bryant et al.

    Recognizing intentions in infant-directed speech: Evidence for universals

    Psychological Science

    (2007)
  • A.J. Caron et al.

    Infant discrimination of naturalistic emotional expressions: The role of face and voice

    Child Development

    (1988)
  • J. Colombo et al.

    Infants’ detection of analogs of “motherese” in noise

    Merrill-Palmer Quarterly

    (1995)
  • R.P. Cooper et al.

    Preference for infant-directed speech in the first month after birth

    Child Development

    (1990)
  • A. Fernald

    The perceptual and affective salience of mothers’ speech to infants

  • A. Fernald

    Intonation and communicative intent in mothers’ speech to infants: Is the melody the message?

    Child Development

    (1989)
  • A. Fernald

    Approval and disapproval: Infant responsiveness to vocal affect in familiar and unfamiliar languages

    Child Development

    (1993)
  • A. Fernald et al.

    Prosody and focus in speech to infants and adults

    Developmental Psychology

    (1991)
  • A. Fernald et al.

    A cross-language study of prosodic modifications in mothers' and fathers' speech to preverbal infants

    Journal of Child Language

    (1989)
  • Cited by (13)

    View all citing articles on Scopus
    View full text