Elsevier

Journal of Phonetics

Volume 31, Issues 3–4, July–October 2003, Pages 599-611
Journal of Phonetics

Meter and speech

https://doi.org/10.1016/j.wocn.2003.08.001Get rights and content

Abstract

Speech is easily produced with regular periodic patterns—as if spoken to a metronome. If we ask what it is that is periodically spaced, the answer is a perceptual ‘beat’ that occurs near the onset of vowels (especially stressed ones). Surprisingly, when periodically produced speech is studied it exhibits attractors at harmonic fractions (especially halves and thirds) of the basic periodicity. It is shown that the Haken–Kelso–Bunz model provides conceptual tools to account for the frequency histogram of acoustic beats in the speech. Why might there be attractors at periodically spaced phase angles? It is hypothesized that there are neural oscillations producing a pulse on every cycle, and that these pulses act as attractors for the beats at the onsets of syllables. Presumably these periodic time locations are generated by the same physiological mechanism as the periodic attentional pulse studied for some years by Jones (Psychol. Rev. 96 (1989) 459; Psychol. Rev. 106 (1999) 119). We propose that neurocognitive oscillators produce periodic pulses that apparently do several things: (1) they attract perceptual attention; (2) they influence the motor system (e.g., when producing speech) by biasing motor timing so that perceptually salient events line up in time close to the neurocognitive pulses. The consequent pattern of integer-ratio timings in music and speech is called meter. Speakers can control the degree to which they allow these metrical vector fields to constrain their timing.

Introduction

As Peter Jusczyk observed, children learn prosodic structure very early in the language acquisition process (Jusczyk, 1997; Mehler et al., 1988). Other data show that prebabbling infants notice deviations from regular timing of perceptual centers or P-centers (which are temporally close to vowel onsets, Fowler, Smith, & Tassinary, 1986). These observations suggest that children may be able to both produce and perceive simple periodic patterns in speech well before producing their first words. Frequently speakers produce speech in a periodic way, sometimes by coupling their speech production to another speaker or to a metronomic pattern, e.g., when chanting or declaiming. This essay reviews some relevant phenomena and proposes general theoretical mechanisms to account for these behaviors. Since the mechanisms are very simple, we might expect them to appear fairly early in the development of speech.

We will review some experimental observations on the kind of event that most often recurs periodically and some properties of periodic speech, then sketch some basic ideas about how periodic patterns in speech might arise.

One of the most important discoveries about periodicity in speech has been known for some time although its importance for global aspects of speech timing may have been underestimated. George Allen (1972), Allen (1975) showed that if English speakers are asked to align a finger tap with a word, they will line it up close to the onset of the vowel in the stressed syllable of the word. This implies that there is a perceptually salient acoustic event at these time points in speech. Subsequent research on ‘perceptual centers’ or ‘P-centers’ was able to refine the notion of the ‘beat’ associated with prominent syllables by showing that large initial clusters tend to move the P-center temporally to the left (into the consonant cluster, e.g., in skate vs. ate) while final clusters can move the P-center somewhat to the right (into the vowel, as in baa vs. banks, Morton, Marcus, & Frankish, 1976; Marcus, 1981; Pompino-Marschall, 1989). These perturbations, however, are small (5–15 ms) relative to the repetition cycle (typically about 500 ms). Apparently, the beat location can be approximated automatically (Scott, 1993) by measuring the amount of energy in lower frequencies (between 200 and 800 Hz), smoothing sufficiently and then looking for large energy onsets which are prominently encoded in the auditory nerve (Delgutte & Kiang, 1984). When speakers attempt to produce a series of regular events with their speech, these observations about beats and pulses imply they will regularize the spacing of vowel onsets, especially stressed vowels (at least for English). This clarifies the question of what it is that is periodic, and thus what ‘periodically produced speech’ might mean. And since other aspects of the signal play only a fairly small role, these findings encourage the use of automatic measurement methods that simulate the beat locating aspects of auditory performance.

Aside from the P-center work, there has been other research on simple periodic speech phenomena. The case of subjects cyclically repeating a short phrase has been shown to lead to the harmonic timing effect. A number of studies have shown that when speakers repeat a short piece of text many times, they exhibit a strong preference for locating prominent (e.g., stressed) syllable onsets at simple harmonic fractions of the repetition cycle (Port, Tajima, & Cummins, 1996; Cummins & Port, 1998; Tajima, 1998). For example, Cummins and Port (1998) presented English-speaking subjects with a two-tone metronome pattern. Tone A marked the beginning of each cycle and alternated with tone B that was randomly located at phase angles between 0.20 and 0.80 of the A–A cycle. The subjects’ task was to repeat a phrase like ‘Dig for a duck’ so that the first stressed word lines up with tone A and the final stress lines up with tone B.1 The location of onset of the final syllable, duck, was measured as a particular phase angle between 0 and 1 (the beginning of the next cycle). The frequency histogram of performance for all speakers is shown in Fig. 1. Although the target phase angles for the onset of the final syllable were distributed uniformly over the interval from 0.20 to 0.80 of the repetition cycle, the speakers did not reproduce anything resembling the flat input distribution but were strongly biased to locate their onsets near just 3 locations in the cycle: 1/3 for all the early phase angle targets, 1/2 for targets near the middle of the cycle and 2/3 for most target phases later than about 0.57. This bias is called the harmonic timing effect because locations like 1/2 and 1/3 would be the phase-zero pulses for (phase-locked) harmonic frequencies of the fundamental. Similar results have now been observed in a number of experiments and the phenomenon can easily be demonstrated to oneself (by repeating a 4–6 syllable phrase and noticing where the phrase-final stress occurs when the pattern is stably repeated).

Although this experiment employed only English speakers, one might expect that other languages should at least have a bias to pay special attention to vowel onsets and to favor low-frequency harmonics whenever nested meters are constructed. There is some data directly comparing English and Japanese in a similar task (Tajima & Port, 2003). The speakers of the two languages adjusted to perturbing influences on timing in language-specific ways, but the data clearly showed that speakers of Japanese were paying attention to the vowel onsets in this task, just as much as the English speakers.

Notice that the speech results demonstrate not merely regularity at the frequency of the metronome, but also at higher frequencies. There is periodicity on two time scales: one at the repetition cycle rate and another either 2 or 3 times faster than the metronome but phase-locked to it (so the phase-zero pulse is actually two simultaneous pulses). What kind of cognitive mechanism could account for these particular constraints on speech timing is the primary issue we are concerned with here.

These experiments show that when there is periodicity at one level, there may sometimes be periodicity at a harmonic of that frequency. This feature of motor temporal behavior is not restricted to speech, but can be observed in simple limb movements as well.

Section snippets

Non-speech periodic behavior and the HKB model

It may be appropriate to compare the harmonic timing phenomenon to oscillatory finger motion as illustrated in Kelso's finger-wagging task. Kelso (1984) had subjects oscillate one finger on each hand to the left and right. When the phase relationship of the fingers is such that they simultaneously move toward and away from the midline (described as 0 phase), performance is easiest. Most phase relationships between the fingers are very unstable although, at a slow enough tempo, the fingers can

Meter and periodicity

Although linguists and students of poetics often describe meter in terms of serial patterns of strong and weak syllables, the most intuitively natural notion of meter seems to be that of music where it is based on periodic structures in continuous time.3

Conclusion

The issue explored in this essay is that, in many situations, speakers will exhibit periodic location of salient events like vowel onsets. For English, this is especially true of syllables with pitch accent or stress. It is proposed that this periodic behavior reflects periodic attractors in relative phase that are generated by one or more internal oscillators producing pulses that are sometimes coupled to external periodicities. These oscillations can be described as neurocognitive because

Acknowledgements

Thanks to Adam Leary for help with figures and to Sarah Hawkins and Noel Nguyen for helpful comments on earlier drafts.

References (41)

  • J. Bertoncini

    Infants perception of speech unitsPrimary representational capacities

  • B. Delgutte et al.

    Speech coding in the auditory nerveI. Vowel-like sounds

    Journal of Acoustical Society of America

    (1984)
  • D. Eck

    Finding downbeats with a relaxation oscillator

    Psychological Research

    (2002)
  • C.A. Fowler et al.

    Perception of syllable timing by prebabbling infants

    Journal of the Acoustical Society of America

    (1986)
  • L. Gerken

    Prosodic structure in young children's language production

    Language

    (1996)
  • H. Haken et al.

    A theoretical model of phase transitions in human hand movements

    Biological Cybernetics

    (1985)
  • M.R. Jones et al.

    Dynamic attending and responses to time

    Psychological Review

    (1989,[object Object])
  • P. Jusczyk

    The discovery of spoken language

    (1997)
  • J.A.S. Kelso

    Phase transitions and critical behavior in human bimanual coordination

    American Journal of Physiology

    (1984)
  • S. Kelso

    Dynamic patternsThe self-organization of brain and behavior

    (1995)
  • Cited by (117)

    • The Evolution of Rhythm Processing

      2018, Trends in Cognitive Sciences
    View all citing articles on Scopus
    View full text