Introduction

Social interactions among individuals are based primarily on the correct encoding and decoding of emotions. The literature on emotion processing has traditionally favored the study of facial emotion recognition (Ekman & Friesen, 1976; Ekman, Friesen, & Hager, 2002). However, individuals use multiple cues from different communicative channels (e.g., facial expression, body posture, and speech) to succeed in these processes (Scherer & Scherer, 2011). Moreover, difficulties in integrating multichannel emotional information can lead to impairments in empathic abilities and social cognition (Minzenberg, Poole, & Vinogradov, 2006; Preißler, Dziobek, Ritter, Heekeren, & Roepke, 2010). For this reason, researchers have recently started investigating what is now defined as emotional prosody.

Depending on how they are pronounced, words and utterances can carry different emotional meanings, over and above their semantic content (Banse & Scherer, 1996). Respiration, phonation, and articulation vary with the emotional state of the speaker (Scherer, 1989). Thus, the following acoustic parameters are crucial in the encoding and decoding of different emotions: the level, range, and contour of fundamental frequency (f0, in hertz); voice intensity (in decibels), and temporal phenomena (tempo, duration, pauses). Two reviews of numerous studies concerning the associations between the vocal expression of emotions and specific acoustic profiles (Banse & Scherer, 1996; Juslin & Laukka, 2003) showed that different combinations of speech rate, pitch level and variation, and intensity convey different emotions. These extensive reviews allow for defining prototypical profiles for each emotion. For instance, the acoustic cues that lead to identifying anger are a fast speech rate, high voice intensity levels and variability, high pitch level and variability. Happiness shares most of these characteristics, but it is defined by a medium-high voice intensity and a less intensity variability. On the basis of these acoustic features of emotions, several studies have investigated how behavioral (Hawk, van Kleef, Fischer, & van der Schalk, 2009; Paulmann, Pell, & Kotz, 2008) and neural mechanisms (Jessen & Kotz, 2011; Peelen, Atkinson, & Vuilleumier, 2010) underlie prosodic emotion processing. Furthermore, researchers have started to shed light on prosodic emotion processing in clinical populations, considering both brain-injured patients (Dellacherie, Hasboun, Baulac, Belin, & Samson, 2011) and psychiatric conditions (Jones et al., 2011; Minzenberg et al., 2006).

To conduct such research, one needs stimuli with controlled characteristics. In fact, the development of a set of vocal stimuli has to take into account several issues. First, the acoustic features of emotional prosody are shaped by the linguistic properties of a specific language (Pell, 2001). Furthermore, lexico-semantic cues pertaining to emotions need to be controlled and isolated to understand the specific role of prosody in the expression of the emotional content. In this regard, some studies have tried to reduce the semantic information of speech through filtering procedures that maintain only the supra-segmental features of vocal stimuli (e.g., Kotz et al., 2003). Other studies have addressed this issue by developing and validating sets of pseudowords or pseudoutterances (Bach et al., 2008; Pell, 2002). Pseudowords are defined as legal nonwords that conform to the orthographic and phonological patterns of a given language. Pseudoutterances are composed of pseudowords embedded in a legal utterance. This method allows for reducing substantially the lexical and semantic properties of such stimuli while conveying language qualities, phonetic-segmental, and suprasegmental features of speech comparable to the real language.

Researchers have developed sets of stimuli with such characteristics for most Indo-European languages. Burkhardt et al., (2005) developed stimuli in German, Castro and Lima (2010) developed a Portuguese set of pseudoutterances, and Juslin and Laukka (2001) developed a series of Swedish emotional sentences. One of the most impressive databases is the multi-language (Spanish, German, and English) stimuli set of pseudoutterances created by Pell and colleagues (Pell et al., 2009). For non-Indo-European languages, sets of emotional pseudowords/pseudoutterances are available in Arabic and Mandarin Chinese (Liu & Pell, 2012). These studies follow similar procedures, generally consisting of pseudoword/pseudoutterance stimuli construction, recording, validation from a pool of participants, and acoustic analyses. However, in some of these contributions, there are a couple of important methodological issues concerning the set of stimuli.

The first issue concerns the generation of proper linguistically legal pseudowords. Stimuli should follow the rules of the language (language-likeness). For example, they should contain a plausible suffix (e.g., “i” or “o” for Italian pseudowords) or a pronounceable sequence of letters (e.g., “wczo” would look like a word for a Polish person but not for an Italian). As was highlighted by Keuleers and Brysbaert (2010), the predominant procedure for the creation of pseudowords consists of changing one or two letters from a list of legal words and of evaluating the language likeness only from the researchers’ judgment. With the exception of Liu and Pell (2012), who asked participants to rate the language-likeness of a list of pseudoutterances, the elaboration of the stimuli has not been a fully controlled process. The second issue has to do with the guarantee of the neutrality of the pseudowords before pronunciation. Although the pseudowords do not exist, they could elicit valence. This is specially the case when the pseudowords are created by changing one or two letters of an existing word. Because of the similarity of the existing word, a pseudoword could also carry a similar valence. This issue can be addressed only by means of valence ratings of the pseudowords.

The main objective of this study was to develop a set of Italian pseudowords with angry, happy, and neutral intonations. To our knowledge, validated prosodic stimuli—pseudowords or pseudoutterances—are not yet available in Italian. The few studies addressing the emotional processing of prosodic stimuli have generally used prosodic subtests of emotion recognition batteries (such as the Comprehensive Affect Testing System; Schaffer, Wisniewski, Dahdah, & Froming, 2009), in which actors produce sentences with different emotional tones (Ariatti, Benuzzi, & Nichelli, 2008; Castagna et al., 2013). Although most of the above-mentioned sets of stimuli covered a wide range of basic emotions, we chose to focus on angry, happy, and neutral emotional prosody. Our choice was motivated by two main elements. First, in research on facial emotion information processing (e.g., Hagenhoff et al., 2013; Kirsh & Mounts, 2007), several neuroimaging and electrophysiological studies on emotional prosody have examined only one positive and one negative emotion (e.g., Kotz & Paulmann, 2007; Kotz et al., 2003; Mitchell, Elliott, Barry, Cruttenden, & Woodruff, 2003; Schirmer, Kotz, & Friederici, 2002). Second, anger and happiness have been used frequently to operationalize negative and positive emotions, particularly when comparing one with the other or in terms of neutral emotion (i.e., Kotz et al., 2003).

In developing such a set of stimuli, we aimed to control the above-mentioned methodological issues. First, to devise a set of pseudowords formally controlled for their language-likeness, we used a specific software that controls for the sub-syllabic structure and transition frequencies of the specific language (Wuggy; Keuleers, & Brysbaert, 2010). Although the procedure of pseudoword creation substantially reduces the lexical–semantic meaning of the stimulus, it does not warrant a neutral valence. We thus conducted a pretest to select neutral stimuli in their written version. Then, a man and a woman recorded the most neutral stimuli in three different emotional tones. Independent judges rated the audio stimuli for their emotional intensity to allow us to select the most prototypical audio stimuli in each emotional tone. Another sample rated the valence and the arousal of each selected audio stimuli. Finally, we examined with statistical analyses whether each emotional set of stimuli was characterized by specific acoustic profiles.

Pseudoword stimulus generation and neutral valence

Method

We used the Wuggy software (Keuleers & Brysbaert, 2010) to generate pseudowords. By means of a particular algorithm, and starting from a given list of legal syllabified words, this software generates pseudowords that obey a specific language’s phonotactic constraints and transition frequencies. This procedure warrants the generation of legal pseudowords conformed to the orthographic and phonological patterns of a given language. Using the Italian submodule of Wuggy, we obtained 150 trisyllabic pseudowords. We selected 100 of these pseudowords on the basis of their ease of pronunciation for Italians. We then aimed to select a set of stimuli that would have the most neutral content in valence. For this purpose, 30 university Italian native speaker students (15 men, 15 women; M age = 22.33, SD = 2.72, range 19–27 years) evaluated the 100 pseudowords in their written form on 9-point scales that ranged from 1 (very negative) to 9 (very positive). We created four versions of the evaluation sheet with different random presentation orders of the pseudowords.

Results

We performed a series of one-sample t tests (test value = 5) on the mean valence of each of the 100 written pseudowords. We chose the 40 most neutral pseudowords (see Table 3 in the Appendix for the list of 40 pseudowords). The selected set had a mean valence of 4.95 (SD = 0.79), not different from the midpoint of the scale, t(29) = − 0.35, p = .731.

Audio tokens, emotion perception, and prosodic properties

With this study, we had four main objectives. First, we aimed to examine the relationship between emotion perception and the acoustic properties of the audio versions of pseudowords recorded with three different emotional prosodies or tones (i.e., happiness, anger, and neutral). Moreover, on the basis of the independent emotion ratings, we intended to select the best audio token for each of the three emotional tones of the 30 pseudowords. We also wished to examine the perceived emotion, valence, and arousal ratings of each emotion tone set. Finally, and most importantly, we aimed to establish the acoustic profiles of the three emotional categories of the selected audio stimuli.

Method

Two professional actors (a man and a woman) recorded the 40 selected pseudowords. The male actor had a baritone-like voice (mean f0 of all recordings = 144 Hz), and the female actor had a mezzo soprano-like voice (mean f0 of all recordings = 215 Hz). Each of them recorded 20 randomly selected pseudowords in three different emotional tones (neutral, happy, angry) with five recordings for each (300 tokens for each actor). The stimuli were recorded in a sound-insulated booth at the Media Laboratory at the University of Milano-Bicocca, using Pro Tools version 10.3.7 software. A high-quality Neumann Tlm 102 microphone was connected to an Apple Macintosh Mac Pro 3.1 QuadCore Intel Xeon computer with a Focusrite Saffire Pro24 DSP audio interface and a SPL Track One voice channel. Digitization was performed at a 44.100-kHz sampling rate and a 24-bit resolution. The peak amplitudes of all pseudowords were normalized to mitigate gross differences in perceived loudness. Pseudowords conveying the three different emotional meanings were recorded in separate blocks. Two researchers monitored the recording procedure, giving cues to the actors about the target emotions. Then, three judges first selected the 30 audio pseudowords with the best pronunciation (15 for the male actor, 15 for the female actor). Then, for each pseudoword in each of the three emotional tones (neutral, happy, and angry), they retained the three best audio tokens in terms of acoustic quality and expressed emotion.

After this preselection, we gathered the intensity ratings of the three different emotions and acoustic properties information for each of the 270 audio tokens (30 pseudowords × 3 emotional tones × 3 recordings). Forty-six native Italian-speaking university students (23 men, 23 women, M age = 22.38, SD = 3.06, range 19–31 years) indicated the intensities of the 270 audio tokens on the happy, neutral, and angry dimensions. Half of the participants first rated all the 270 tokens on the happy dimension, then the neutral dimension, and then the angry dimension. The other half started with the angry dimension, then the neutral dimension, and finally the happy dimension. Within each dimension, the tokens were presented in a random order. The participants were equipped with headphones and set the volume at a comfortable level at the beginning of the procedure. After listening to each token, participants indicated how angry/happy/neutral the pronunciation of the pseudoword was on 21-point scales, from 0 (not at all) to 20 (very much), with 10 indicating intermediate. The interrater reliability was .99.

For the acoustic analysis, we considered different prosodic indexes by using Praat software (Boersma, 2001). On the basis of previous studies of emotional prosody (for a review, see Juslin & Laukka, 2003), we considered the mean fundamental frequency (mean f0, in hertz), pitch variation (mean f0 excursion, in semitonesFootnote 1), mean intensity (in decibels), and speech rate (durationFootnote 2).

Results and discussion

Perceived emotion intensity and acoustic properties of all 270 tokens

We first collapsed the ratings of the 46 participants to produce mean ratings for each of the 270 tokens; only these mean ratings were used in our analysis. We then examined the relationships between the mean ratings and the acoustic properties of the 270 stimuli in two steps. We computed the correlations between all measures to determine whether some acoustic indexes were related to the angry, happy, and neutral ratings, respectively. Then, we ran a series of regression analyses to examine which acoustic index was most predictive of the three different ratings. For the regression analyses, and due to the dependency between some indexes, we only considered f0, intensity, and duration. We ran these analyses entering the gender of the actor as a covariate, because previous research had demonstrated that men and women may differ in terms of their ability to impersonate emotions (e.g., Bonebright, Thompson, & Leger, 1996).

Perceived anger

As is shown in Table 1, the perception of anger correlated with all indexes with the exception of intensity and f0 excursion in semitones. When we entered all three indexes (f0, intensity, and duration) in a regression analysis for predicting the mean ratings of perceived anger, the model explained 20 % of the variance. Duration (β = .47, p < .001) and intensity (β = .14, p = .031) were significant predictors, whereas f0 was not (β = .02, p = .832). When we added gender in a second step, the model explained 28.2 % of the variance (ΔR 2 = .08), F(1, 265) = 28.96, p < .001. Duration (β = .57, p < .001), intensity (β = .27, p < .001), f0 (β = −.27, p = .003), and gender (β = .42, p < .001) were all significant predictors.Footnote 3 Note that when gender was entered alone in a first step, it did not predict the perceived anger (p = .408).

Table 1 Correlations between acoustic indexes and mean perceived emotion ratings of the 270 pseudowords

Perceived happiness

Ratings of happiness correlated with all acoustic indexes (see Table 1). When all three indexes were entered in a regression analysis, the model explained 20.1 % of the variance. Duration (β = −.29, p < .001) and f0 (β = .22, p = .005) were significant predictors, whereas intensity (β = −.01, p = .883) was not. When gender was added in a second step, the model explained 40.4 % of the variance (ΔR 2 = .20), F(1, 265) = 90.24, p < .001. Duration (β = −.45, p < .001), f0 (β = .68, p < .001), intensity (β = −.22, p < .001), and gender (β = −.67, p < .001) were all significant predictors.Footnote 4 When gender was entered alone in a first step, it was not a significant predictor of perceived happiness (p = .934).

Perceived neutrality

Perceived neutrality correlated with all indexes with the exception of duration (see Table 1). When all three indexes were entered in a regression analysis, the model explained 34.1 % of the variance. Duration (β = −.40, p < .001), intensity (β = −.25, p < .001), and f0 (β = −.52, p < .001) were significant predictors. When gender was added in a second step, the model explained 48.2 % of the variance (ΔR 2 = .14), F(1, 265) = 72.27, p < .001. Duration (β = −.27, p < .001), f0 (β = −.90, p < .001), and gender (β = .56, p < .001) were all significant predictors, and intensity (β = −.07, p = .211) was no longer a significant predictor.Footnote 5 When gender was entered alone in a first step, it did not predict the perceived neutrality, although it showed tendency toward significance (p = .076).

In line with results of previous studies (Bachorowski & Owren, 1995; Castro & Lima, 2010; Liu & Pell, 2012; Pell, Monetta, et al., 2009), our analyses confirmed the central role of some acoustic parameters in determining the perception of emotions. Notably, the acoustic parameters were significant predictors also controlling for gender. Considering pitch variability (f0 semitones, notes 3–5), our results report a role of this parameter in predicting happiness but not anger. This result is not in line with the canonical acoustic profiles of anger (Juslin & Laukka, 2003). Nonetheless, as was noted by Banse and Scherer (1996), anger can be connoted as being either “hot” or “cold”. In this context, high f0 variability can be considered a specific feature of hot but not of cold anger. We could thus conclude that our stimuli are much more representative of cold anger. Taken together, these results underline that speech rate, vocal intensity, and f0 represent essential acoustic cues in conveying emotional valence.

Selection of 30 pseudowords for each emotional tone

Considering the 46 participants’ ratings, we then aimed to select the most representative token of each emotional tone (angriness vs. neutrality vs. happiness) for each of the 30 pseudowords. We performed a series of repeated measures analyses of variance (ANOVAs) on the ratings of the three different tokens for each emotion, with post-hoc analyses to choose the most representative. Then, we performed another series of repeated measures ANOVAs on the ratings of the tokens chosen for each emotion, with post-hoc analyses to ascertain, for example, whether the pronunciation of the angry token of “andori” was significantly perceived as being angry rather than happy or neutral. Of the 30 pseudowords, two tokens of the neutral category pronounced by the man did not show satisfactory neutrality ratings. First, the mean neutrality rating of the selected neutral token of the pseudoword “psilumbo” (M = 7.87, SD = 4.12) did not differ significantly from the mean happiness rating (M = 7.87, SD = 4.12, p = .258). Second, for the selected neutral token of “ervetto,” although the neutrality rating (M = 11.24, SD = 5.87) did differ significantly from the happiness (M = 7.22, SD = 3.39) and anger (M = 7.89, SD = 4.51) ratings, it was weaker than the mean neutrality ratings of the other 13 neutral tokens (M = 14.70).

To replace these two pseudowords, we ran an additional study on the remaining ten neutral pseudowords. Thirty-two native Italian-speaking university students (17 men, 15 women, M age = 23.13, SD = 4.05, range: 20–40 years) indicated the intensity of the 90 tokens pronounced by the same man (10 pseudowords × 3 emotions × 3 recordings) according to the happy, neutral, and angry dimensions (interrater reliability, α = .98). On the basis of the mean ratings, we selected the pseudowords “rantaglia” and “zellani.” Please refer to the Appendix, Tables 4 and 5, for the mean emotional ratings of each angry, happy, and neutral selected token. We ended up with a set of 90 tokens composed of 30 pseudowords recorded in each of the three emotional tones (anger, happiness, and neutrality).

Finally, 34 native Italian-speaking university students (17 men, 17 women, M age = 22.26, SD = 1.93, range: 19–26 years) rated each of the final 90 tokens for valence and arousal. After the audio presentation of a token, participants indicated their evaluation of the pronunciation on 21-point scales that ranged from −10 (do not like it at all) to +10 (do like it very much), and then specified the extent to which the pronunciation activated them emotionally on 11-point scales that ranged from 0 (calm) to 10 (aroused/emotional). This procedure was repeated for each token, presented in a random order. The interrater reliabilities for valence and arousal were good, αs = .92 and .91, respectively.

Emotional category and emotional perception, valence, and arousal of the selected 90 tokens

We conducted a series of 3 (emotion category) × 2 (speaker’s gender) ANOVAs on the emotion, valence, and arousal ratings to evaluate the presence of significant differences in the emotional tone of the tokens (neutral, happy, angry with Bonferroni’s corrected post-hoc tests) and of the gender of the speaker. In cases of significant interaction effects, we performed post-hoc analyses. We report all statistics in Table 2.

Table 2 Emotion, valence, and arousal ratings and acoustic indexes of each emotional category of the tokens (overall and pronounced by a man or a woman)

Emotional tone effects

The angry, happy, and neutral tokens, on average, were perceived as having angrier, happier, and more neutral pronunciations, respectively. Moreover, the pronunciations of the angry tokens were evaluated as being the most negative, the pronunciations of the happy tokens were evaluated as being the most positive, and both of them were judged as being more arousing than the neutral tokens’ pronunciations. Please refer to the Appendix (Tables 4 and 8) for the mean emotion, valence, and arousal ratings for each angry, happy, and neutral token of the selected pseudowords, pronounced by both the male and the female speaker.

Speaker’s gender and Emotional Tone × Speaker’s Gender interaction effects

In terms of emotion perception, the speaker’s gender affected the anger and neutrality ratings. Across tokens, the female pronunciation was judged as being more neutral and less angry than the male pronunciation. We also observed an Emotional Tone × Speaker’s Gender interaction effect on the neutrality and arousal judgments. The neutrality judgments were higher and the arousal judgments were lower for neutral tokens than for angry or happy ones, but in a stronger way for those pronounced by a woman than for those pronounced by a man, suggesting that the male speaker might have had a less neutral intonation than the female speaker. Put together, these results can be interpreted in light of the literature about gender differences in the perception and expression of emotions. The literature has shown that sadness and fear are both considered stereotypical female emotions, whereas anger is primarily associated with masculinity (Fabes & Martin, 1991). In line with this theme, in a study on emotion expression and perception by men and women, Bonebright and colleagues (1996) found that male actors were perceived to portray anger better than female actors do.

Emotional category and acoustic properties of the selected 90 tokens

Similar to our analyses in the previous section, we conducted a series of 3 (emotion category) × 2 (speaker’s gender) ANOVAs on each prosodic index with post-hoc analyses in cases of significant interaction effects. We report all statistics in Table 2.

Emotional tone effects

Main effects of emotional tone were present on all of the prosodic indexes. The three emotional sets of tokens were all significantly different in terms of f0 and f0 semitones, with happy tokens showing the highest values, followed by angry and then neutral ones. Moreover, the happy and angry tokens showed significantly higher intensities than did neutral ones. Finally, the angry tokens had the longest durations, followed by the neutral ones, and then the happy ones. We illustrate the acoustic profiles in Fig. 1. Because of the different scales, we standardized all four acoustic measures. Overall, the acoustic profiles of our emotional sets of pseudowords replicate previous findings (Banse & Scherer, 1996; Castro & Lima, 2010; Juslin & Laukka, 2003; Liu & Pell, 2012) in which happy tokens are characterized by the highest f0 values, high intensity, high pitch variability, and a faster speech rate, and angry tokens share pitch-level and intensity-level characteristics with happy tokens. However, in our set, angry tokens showed relatively less pitch variability and a lower speech rate, which contrasts with the typical anger acoustic profile described in the literature (Juslin & Laukka, 2003). This last result is in line with our analyses of the perceived emotion intensity and acoustic properties of all 270 tokens, and corresponds to Banse and Scherer’s description of cold anger, as do the previously discussed results related to f0 variability. These results give further support to the hypothesis that our actors primarily conveyed cold anger in their prosodic expressions.

Fig. 1
figure 1

Acoustic profiles of the three emotional categories of pseudowords

Speaker gender and Emotional Tone × Speaker’s Gender interaction effects

(see Table 2). We found speaker’s gender effects on all acoustic properties, and all were qualified by an interaction with the emotional tone of the tokens. The tokens pronounced by the man had smaller f0, weaker intensities, longer durations, and higher semitones. The interaction effects evidenced different patterns of differences between the emotional tones, depending on the speaker. First, the higher f0 for neutral than for angry tokens was only observed for the female and not for the male speaker. Second, for the male speaker, happy tokens had a higher intensity than did angry ones, which, in turn, were not different from the neutral ones. However, for the female speaker, happy tokens had equal intensity with the angry ones, but both were higher than the neutral tokens’ intensity. Third, for the male speaker, the duration of the angry tokens was longer than that of the happy and neutral ones, whereas for the female speaker, their duration was equal to that of the neutral tokens but longer than that of happy tokens. Finally, for the male speaker, angry tokens had lower semitones than did happy tokens, but the angry semitones were equal to the neutral ones. However, for the female speaker, the semitones of the angry tokens were not different from the happy semitones, but both were higher than the neutral semitones. Although some of these differences echo the natural variations in intonation among men and women (e.g., f0 differences; Childers & Wu, 1991; Puts, Doll, & Hill, 2014), others could be due to the idiosyncratic differences between the two actors, even though their voice types were quite prototypical in terms of f0.

Conclusion

The main goal of this study was to develop and validate a set of vocal emotional stimuli for research on emotional prosody processing following specific methodological criteria. First, we sought to generate pseudowords according to the specific phonotactic and distributional constraints of the Italian language. From this perspective, we relied on the Italian submodule of the Wuggy software (Keuleers & Brysbaert, 2010). The use of a formal validated criterion for pseudowords generation represents progress with respect to the available literature. The subjective judgment typically used in previous studies strongly depends on the researcher’s experience with the specific languages and with pseudowords research. Another aim consisted of selecting only the neutral stimuli from independent valence ratings. The list of the selected pseudowords is available in the Appendix, Table 3. These stimuli could be very useful in elaborating lexical decision tasks, in which one needs to be sure that pseudowords are free of lexical–semantic and valence meanings. Finally, analyses based on independent ratings and the comparison between the acoustic properties of our dataset with well-established literature guarantee such stimuli specifically express the emotions they are intended to convey.

Although this set of stimuli would constitute ready-to-use and controlled material for experimental studies, we would like to note a series of methodological points. First, because we did not give specific cues to the actors in terms of how to express each emotion, it seems that they primarily focused on conveying cold anger. As was noted by Banse and Scherer (1996), most studies on prosodic emotion do not consider the difference between hot and cold anger, which could explain some discrepancies. On the basis of acoustic analyses, our set of angry pseudowords depicts cold and not hot anger. We would also underline the fact that our material was elaborated using only two actors, one man and one woman. Therefore, we cannot disentangle gender differences from idiosyncratic ones in some acoustic parameters of emotion expression that emerged in our results. However, this methodological issue should not prevent the use of our stimuli set, for two main reasons. The actors’ voice types were prototypical of their gender, and the acoustic profiles of the selected set of stimuli they produced corresponded to what it is usually observed for male and female voices.

To conclude, our set of prosodic emotion expressions addresses several methodological issues that have affected previous databases. Furthermore, it overcomes the lack of validated stimuli for the Italian language, and it is available to the research community for future use in research involving emotional prosody from different perspectives (behavioral, clinical, neuropsychological), for both Italian samples and cross-language studies.