Experimental study of affect bursts

https://doi.org/10.1016/S0167-6393(02)00078-XGet rights and content

Abstract

The study described here investigates the perceived emotional content of “affect bursts” for German. Affect bursts are defined as short emotional non-speech expressions. This study shows that affect bursts, presented without context, can convey a clearly identifiable emotional meaning. The influence of the segmental structure on emotion recognition, as opposed to prosody and voice quality, is investigated. Agreement between transcribers is used as an experimental criterion for distinguishing between reflexive raw affect bursts and conventionalised affect emblems. A detailed account of 28 affect burst classes is given, including perceived emotion and recognition rate in listening and reading perception tests as well as a phonetic transcription of segmental structure, voice quality and intonation.

Zusammenfassung

Die hier vorgestellte Studie untersucht den wahrgenommenen emotionalen Gehalt von “Affect Bursts” für das Deutsche. Affect Bursts werden definiert als kurze, emotionale, nichtsprachliche Ausdrücke. Diese Untersuchung zeigt, dass Affect Bursts, ohne Kontext präsentiert, eine klar identifizierbare emotionale Bedeutung vermitteln können. Der Einfluss der segmentellen Struktur auf die Emotionserkennung, gegenüber Prosodie und Stimmqualität, wird untersucht. Übereinstimmung zwischen Transkribierern wird als ein experimentelles Kriterium zur Unterscheidung zwischen reflexiven Rohen Affect Bursts und konventionalisierten Affekt-Emblemen verwendet. Eine detaillierte Beschreibung von 28 Affect Burst Klassen wird gegeben, die wahrgenommene Emotion und Erkennungsrate in Hör- und Lese-Perzeptionstests beinhaltet, sowie eine phonetische Transkription von segmenteller Struktur, Stimmqualität und Intonation.

Résumé

L’étude présentée ici s’interroge sur le contenu émotionnel perçu des “Affect Bursts” pour l’Allemand. Les Affect Bursts sont définis comme expressions courtes, émotionnelles et non-verbales. Cette étude démontre que les Affect Bursts, présentés sans contexte, peuvent transmettre un sens émotionnel clairement identifiable. L’influence de la structure ségmentale sur la reconnaissance des émotions, vis-à-vis de la prosodie et du timbre, est étudiée. Le degré d’accord entre les transcripteurs est utilisé comme critère expérimental pour distinguer entre les Affect Bursts Crus, réflexifs, et les Emblèmes Affectives conventionalisées. Un compte rendu détaillé de 28 classes d’Affect Bursts est présenté, comprenant l’émotion perçue et le taux de reconnaissance dans des tests de perception orale et écrite, ainsi qu’une transcription phonétique de la structure ségmentale, du timbre et de l’intonation.

Introduction

Studying emotional expression in speech is inherently difficult. Problematic issues range from the description of emotion itself (Cowie, 2000; Cowie and Cornelius, 2003), via the collection of emotional speech material (Campbell, 2000), to the appropriate evaluation in perception tests (Cauldwell, 2000). Even the delimitation of the domain under study with respect to neighbouring topics such as expression of attitudes (Wichmann, 2000) and even linguistic prosodic structure (Scherer et al., 1984) is difficult and fuzzy.

This study investigates a phenomenon at the heart of many of these difficulties: the so-called “affect bursts”. After introducing the concept, I will summarise some of the difficult aspects of the research domain in general and how these apply to the phenomenon under study here. The particularities of affect bursts compared to the broader field of speech and emotion are highlighted, and research questions are formulated before the experimental study is described.

The concept of affect bursts has been introduced by Scherer (1994). He defines them as “very brief, discrete, nonverbal expressions of affect in both face and voice as triggered by clearly identifiable events” (p. 170). Coined in the context of the psychological literature on emotion expression, the term affect burst overlaps strongly with what might be called “affective interjections” in linguistics. However, the limits of the domains of affective interjections and affect bursts differ. On the one hand, a verbal interjection expressing an emotion (“Heaven!”) would not be considered an affect burst due to its verbal nature. On the other hand, a non-phonemic affect burst like laughter or a rapid intake of breath would probably not be considered an interjection.

The question of the sign status of affect bursts is discussed by Scherer (1994) from the point of view of his push–pull distinction (Scherer, 1988). Push effects are physiological factors (like pain) leading to an expression; pull effects are social rules and expectancies representing culturally shared “targets” for appropriate expressions in a given situation. While both types of effects are always present, one of them may prevail in a given situation. In this line of ideas, Scherer (1994) proposes to make a distinction between ‘raw affect bursts’ on the “push” end of that continuum, and ‘affect emblems’ on the “pull” end of the continuum. Consequently, raw affect bursts are raw, reflexive vocalisations that are expected to be barely conventionalised, thus relatively universal, and show strong inter-individual differences. Affect emblems, on the other hand, are conventionalised symbols, i.e. strongly culture-dependent, showing comparatively few and small individual differences. As raw affect bursts and affect emblems are seen as extreme points on a burst-emblem continuum, all sorts of mixtures are expected to exist.

The term affect burst is used as a general term referring to the entire continuum between raw affect bursts and affect emblems. Fig. 1 illustrates the terms used.

The scientific study of emotions in speech is facing major problems, both methodological and conceptual. In a review of the literature, Scherer (1986) comes to the conclusion that due to these problems, “there has been neither continuity nor cumulativeness in the area of the vocal communication of emotion” (p. 143). A number of problems that seem relevant to the present study are summarised below.

The first challenge in studying the expression of emotional states in speech is the adequate description of these states themselves. Many studies that simply define the states under study using plain emotion words, such as “anger”, “fear”, “sadness”, etc., come to contradicting results (Scherer, 1986) due to the ambiguity of the terms employed. For example, for anger, the emotional properties, and consequently the vocal realisations, of “hot anger” and “cold anger” are very different.

In order for research results to be meaningful and interpretable, it seems thus necessary to attempt a more precise description of the emotional states studied (Cowie, 2000). In studies using acted speech, a reasonable way of describing the emotional connotations encoded seems to be the use of frame stories describing the imagined situational context in which an utterance is spoken by the actors (Leinonen et al., 1997; Schröder, 1999).

Another approach, capturing some basic properties of perceived speaker emotion in perception studies, uses emotion dimensions (Cowie et al., 2001; Dietz and Lang, 1999; Pereira, 2000; Schröder et al., 2001). Three dimensions are commonly considered most relevant: arousal (or activation), i.e., the degree of physiological arousal and readiness to take some action; valence (or evaluation), in terms of positive or negative evaluation of some object or event; and control (or power), i.e., how dominant or submissive the speaker is. As a perception-oriented tool, emotion dimension ratings provide a quantifiable description of the listener’s perception of the speaker’s emotional state (see a discussion of speaker-centred and listener-centred descriptions of emotions in (Cowie, 2000)).

Emotion is not an easy to grasp phenomenon, and as a consequence, a clear delimitation of the research domain against neighbouring fields is difficult. While attempts are sometimes made to distinguish more physiological emotions from more cognitive attitudes (Wichmann, 2000), that distinction seems to be gradual rather than strictly categorical. Similarly, the distinction between emotion expression through prosody on the one hand and the linguistic functions of prosody on the other hand cannot be clearly drawn, because of interaction phenomena: Linguistically defined intonation contours have been shown to convey emotional meaning depending on sentence type (Scherer et al., 1984). This illustrates the difficulty to draw a clear boundary between the linguistic and paralinguistic aspects of vocalisations, a theme on which we will see a variation when discussing the question of the word status of affect emblems.

The impossibility to obtain spontaneous emotional speech in a controlled way is a problem to every single study on emotional speech. Some studies give priority to spontaneity, taking into account the increased difficulty in describing the emotional states expressed (e.g., Campbell, 2000; Douglas-Cowie et al., 2000). Most studies, however, use acted emotion, an approach justified by Banse and Scherer (1996) with the argument that even in real life, people need to enact emotional expression with a certain amount of voluntary control. The approach is admittedly not without problems: Depending on the methodology and the talent of the actors, acted emotional expression can be well distinguishable from spontaneously occurring emotion (Schröder et al., 1998). Therefore, a number of studies used experts to evaluate the quality of the acted material in pre-selection procedures (Banse and Scherer, 1996; Leinonen et al., 1997; Schröder, 1999), only using in the actual study what was rated “successful displays” by the experts.

Emotion is expressed through multiple channels. These include at least facial expression (Ekman, 1982), verbal content (Scherer and Ceschi, 2000), and the voice, comprising gradual (Banse and Scherer, 1996) and categorical (Mozziconacci, 1998) prosodic parameters, voice quality (Gobl and Nı́ Chasaide, 2000), and articulation precision (Kienast et al., 1999). It is probable that emotion has an effect on other expressive behaviour as well, such as gestures, wording, speaking disfluency, etc.

Many of these channels through which emotion is expressed have been shown to be able to convey emotion on their own, including facial expression, prosody, and voice quality. Other channels may be influenced by the emotion without actually containing sufficient information to allow emotion recognition. This might be the case for articulation precision, gestures and others.

While it is important to establish the contribution of a given channel by studying it in isolation, one question that would merit much more attention is that of the interaction among these channels when they co-occur in a multi-modal and/or situated context. It has been shown that voice quality, F0 range and intonation contour type do not interact when producing a perceived emotional message (Ladd et al., 1985), but that intonation contour type interacts with verbal content (Scherer et al., 1984); that a given utterance conveys a different emotional message when presented in isolation and with situational context (Cauldwell, 2000); and that a coherent facial and vocal display of a given emotion is perceived as more natural than a facial emotion display accompanied by a neutral voice (Stallo, 2000). However, there seem to be few studies investigating the simultaneous display of conflicting messages as can be observed, e.g., in irony.

Affect bursts, although theoretically described in detail, do not seem to have been extensively studied experimentally. Existing descriptions of interjections come from a linguistic background (Ehlich, 1986; Scherer, 1994; Zerling, 1995). However, these studies give definitions and classifications that seem to be based mainly on the authors’ intuitions, and do not give any indications of whether and to what extent the corresponding vocalisations are actually perceived as carrying identifiable emotional meaning.

The problems in studying affect bursts are slightly different from those typically encountered in speech and emotion studies (see Section 1.2). Still, there are useful methods that can be applied in this context. After a discussion of the specificity of the topic under study, the research questions that naturally arise are stated, experimental criteria are formulated, and the methodology adopted is outlined.

Typically studied channels for emotion expression (see Section 1.2.4) are global in the sense that they can co-occur with spoken language, accompanying a spoken utterance. This is different for affect bursts, which by definition are short, delimited events. That different nature of affect bursts is a reason to question whether observations made, e.g., in the domain of speech prosody are generalisable to affect burst prosody, and vice versa.

In affect bursts, the segmental structure itself is expected to carry emotional meaning. In most studies of emotion and speech, some constant verbal content serves as a “carrier” for emotional prosody and voice quality, and the latter are varied. Semantically neutral sentences (e.g., Paeschke and Sendlmeier, 2000) or pseudo-sentences consisting of logatomes (Banse and Scherer, 1996) are often used as carriers, but also single-word utterances such as a name (Leinonen et al., 1997).

Unlike these studies, an investigation of affect bursts will need to consider the segmental structure as an essential part of the affect burst. Along the lines of thought of multi-channel emotion expression (see Section 1.2.4), it then makes sense to ask for the relative contributions of the segmental structure and of prosody and voice quality on emotion recognition.

A further specificity of studying affect bursts stems from the distinction between raw affect bursts and affect emblems proposed by Scherer (see Section 1.1). Experimental criteria will need to be formulated for characterising affect bursts as raw bursts or emblems, while taking into account the non-categorical nature of the distinction.

From the preceding considerations, the following questions arise:

  • (I)

    Can affect bursts, produced by actors, convey the intended emotional meaning when presented in isolation and audio only?

  • (II)

    What is the contribution of the segmental structure of affect bursts on emotion recognition?

  • (III)

    Can a distinction between raw affect bursts and affect emblems be proposed based on experimental criteria?

Scherer’s definition of affect bursts needs to be adapted for the purpose of this experimental study. On the one hand, the facial–vocal interaction and synchronisation that he stresses can be left out, because this study is only concerned with the vocal aspect of affect bursts. On the other hand, the intrinsically fuzzy boundaries of the concept need to be stated as explicitly as possible in order for the definition to be useable as a selection criterion. Therefore, the following working definition was used:

Affect bursts are short, emotional non-speech expressions, comprising both clear non-speech sounds (e.g. laughter) and interjections with a phonemic structure (e.g. “Wow!”), but excluding “verbal” interjections that can occur as a different part of speech (like “Heaven!”, “No!”, etc.).

This definition is meant to delimit the concept of affect bursts as illustrated in Fig. 1. Again, it must be clear that the boundaries are fuzzy, and that neither on the physiological end nor on the verbal end, a clear delimitation is feasible. In this respect, it seems difficult to draw the line around the concept of affect bursts in a more clear-cut way than the boundary between paralinguistic and linguistic phenomena in general (see Section 1.2.2).

While Scherer’s distinction between raw affect bursts and affect emblems, summarised above, is formulated in terms of production, it seems to make sense to propose an additional, perception-based criterion that can be used in this experimental study. As a conventionalised symbol, an emblem should correspond to a reference pattern in a listener’s mind. Similar to a lexical entry (a word), that mental pattern comprises an expected form and a meaning. When a given vocalisation of that emblem is matched against the pattern, the expected phonemic form may influence the perception through top-down processing (McQueen and Cutler, 1997), leading to the perception of a more standardised phonemic form. For a raw affect burst, on the other hand, no such reference pattern should exist. Consequently, bottom-up processes would play a more important role in the perception of the phonetic form, leading to more variability in the perceived form, especially for non-expert transcribers.

The following criterion is therefore proposed:

  • (1)

    Affect emblems, when transcribed, are expected to show less variability between transcribers than raw affect bursts.

    In addition, in the mental representation, a meaning (in this case: an emotion) is associated to the phonemic form of the emblem, leading to a second criterion:

  • (2)

    The emotion recognition from a phonemic transcription should be quite accurate for affect emblems.


However, no prediction can be made about the recognition accuracy of raw affect bursts, which may or may not rely on the segmental structure for conveying emotional meaning.

A list of “German” affect bursts was compiled. On the basis of this list, 10 emotion categories were established. Acted realisations of affect bursts intended to express these emotion categories were recorded. The intended connotation of the emotion words was specified through frame stories (see Section 1.2.1). A pre-selection of the n most successful examples was made based on expert ratings (see Section 1.2.3).

Question (I), the question of recognisability, was addressed in a forced-choice perception test where affect bursts are presented in isolation and audio only. In addition, the perceived emotional meaning was assessed using scales representing the three emotion dimensions typically studied (see Section 1.2.1). That information was used to establish the degree of perceived emotional similarity between the different affect bursts. Confusions were interpreted in the light of that information.

Affect bursts were transcribed phonetically and grouped into classes based on segmental phonetic similarity. A detailed account of the perceptual properties of these classes was given.

Question (II), the question of the contribution of segmental structure, was addressed in a written perception test based on orthographic transcriptions.

Question (III), the distinction between raw affect bursts and affect emblems, was addressed by applying the criteria developed under Section 1.3.3 to the orthographic transcriptions.

Section snippets

Collection of a list of affect bursts

While the concept of “raw” affect bursts claims a low degree of conventionality, and thus a relative language-independence, the same is not true for affect emblems that are conventionalised and thus most likely culture- and language-dependent. As any affect burst is considered to be located somewhere on the raw burst-emblem continuum (see Fig. 1), any list of affect bursts should, in a first step, be compiled for a given language.

Therefore, a list of “German” affect bursts was assembled from

Recognition rates

In the listening test, that addressed Question (I), the fundamental question whether affect bursts can convey emotional meaning, the overall mean recognition rate is 81.1%. The mean recognition rates for the 10 emotions are shown in Table 2.

Admiration, disgust and relief are recognised from affect bursts with more than 90% accuracy. The least recognised categories are threat and anger with just over 60% accuracy. In the cases where identification was not as intended, it is interesting to look

Recognition

The recognition accuracy for 10 emotions expressed through affect bursts, presented without context and audio only, is very high (81% in mean). For many affect bursts, there is very little ambiguity (accuracy>90%). This suggests that affect bursts are a highly effective means of expressing emotion. The recognition rates are considerably higher than those found for the expression of emotion through prosody and voice quality: On pseudo-sentences consisting of logatomes, Banse and Scherer (1996)

Conclusion

The high overall recognition rate of 81% indicates that affect bursts, when presented audio only and without context, seem to be an effective means of expressing emotion. Moreover, 10 different emotion categories can be distinguished quite reliably. The grouping of individual vocalisations into affect burst classes, on the grounds of phonetic similarity, showed that confusions between emotion categories tend to be due to individual, ambiguous affect burst classes. For all emotions except anger,

Suggestions for future research

The study of affect bursts is at its very beginning. The current study has maybe opened a pathway, showing experimentally that affect bursts are highly recognisable and can convey a number of different emotions. From here, many questions can be investigated, including:

  • Which emotions are typically expressed through affect bursts?

  • In which contexts (situations, speaking styles, type of utterance, location with respect to speech utterances) do affect bursts naturally occur?


These questions could be

Acknowledgements

Thanks to Ralf Benzmüller for sharing his observations of affect bursts. Thanks to Roddy Cowie, Jürgen Trouvain and three anonymous reviewers for very valuable feedback and suggestions.

References (37)

  • E. André et al.

    The automated design of believable dialogs for animated presentation teams

  • R. Banse et al.

    Acoustic profiles in vocal emotion expression

    J. Pers. Soc. Psychol.

    (1996)
  • N. Campbell

    Databases of emotional speech

  • R.T. Cauldwell

    Where did the anger go? The role of context in interpreting emotion in speech

  • R. Cowie

    Describing the emotional states expressed in speech

  • Cowie, R., Cornelius, R.R., 2003. Describing the emotional states that are expressed in speech. Speech Communication 40...
  • R. Cowie et al.

    What a neural net needs to know about emotion words

  • R. Cowie et al.

    Emotion recognition in human–computer interaction

    IEEE Signal Process. Mag.

    (2001)
  • Dietz, R.B., Lang, A., 1999. Æffective agents: effects of agent affect on arousal, attention, liking and learning. In:...
  • E. Douglas-Cowie et al.

    A new emotion database: considerations, sources and scope

  • G. Drosdowski

    Duden Herkunftswörterbuch, Etymologie der deutschen Sprache [Duden etymological dictionary of German]

    (1989)
  • Dutoit, T., Pagel, V., Pierret, N., Bataille, F., van der Vrecken, O., 1996. The MBROLA project: Towards a set of high...
  • K. Ehlich

    Interjektionen

    (1986)
  • P. Ekman

    Emotion in the Human Face

    (1982)
  • C. Gobl et al.

    Testing affective correlates of voice quality through analysis and resynthesis

  • T. Johnstone et al.

    Acoustic profiles in prototypical vocal expressions of emotion

  • M. Kienast et al.

    Articulatory reduction in emotional speech

  • D.R. Ladd et al.

    Evidence for the independent function of intonation contour type, voice quality, and F0 range in signalling speaker affect

    J. Acoust. Soc. Amer.

    (1985)
  • Cited by (147)

    View all citing articles on Scopus
    View full text