Abundant evidence illustrates the effects of the affective features of words or text on language processing. In the case of visual word recognition, such effects have been shown for response latencies (Briesemeister, Kuchinke, & Jacobs, 2011b; Huckauf, Heller, & Gouzouli-Mayfrank, 2003; Kousta, Vinson, & Vigliocco, 2009; Võ, Jacobs, & Conrad, 2006), pupil dilation (Kuchinke, Võ, Hofmann, & Jacobs, 2007; Võ et al., 2008), event-related potentials/ERPs (e.g., Conrad, Recio, & Jacobs, 2011; Hofmann, Kuchinke, Tamm, Võ, & Jacobs, 2009; Kayser, Fong, Tenke, & Bruder, 2003; Kissler, Herbert, Peyk, & Junghofer, 2007; Ponz et al., 2013; Schacht & Sommer, 2009; Trauer, Andersen, Kotz, & Müller, 2012; see Citron, 2012, for a review), transcranial magnetic stimulation/TMS (Weigand et al., 2013), or functional magnetic resonance imaging/fMRI (Grimm, Weigand, Kazzer, Jacobs, & Bajbouj, 2012; Hamann & Mao, 2002; Herbert et al., 2009; Kuchinke et al., 2005; Tabert et al., 2001). Regardless of continuously improving methodologies in neuroscience, the validity of such findings hinges on thoroughly balanced stimulus materials with the potential to induce the type of affect or emotion under scrutiny and to control for other potential sources of influence. Only extensive databases including ratings of many words on several emotional dimensions will allow the appropriate selection of stimulus materials capable of providing reliable answers to specific research questions like the existence of differential effects of different emotion dimensions. These are also very useful for theory development, since current computational models of word recognition still do not include emotional dimensions (e.g., Grainger & Jacobs, 1996; Hofmann, Kuchinke, Biemann, Tamm, & Jacobs, 2011; Perry, Ziegler, & Zorzi, 2007).

Thus, meeting a still-growing demand due to an increasing interest in emotion research, various databases containing emotional evaluations have been provided for different types of stimuli, such as pictures (Lang, Bradley, & Cuthbert, 2008) or sounds (Bradley & Lang, 1999b), but also words in English (Bradley & Lang, 1999a; Heise, 2010; Whissell, Fournier, Pelland, Weir, & Makarec, 1986), German (Schröder, 2011; Võ et al., 2009; Võ et al., 2006), Spanish (Redondo, Fraga, Comesaña, & Perea, 2005; Redondo, Fraga, Padrón, & Comesaña, 2007), Portuguese (Soares, Comesaña, Pinheiro, Simões, & Frade, 2012), French (Silva, Montant, Ponz, & Ziegler, 2012), or Finnish (Eilola & Havelka, 2010).

Among these, the Affective Norms for English Words (ANEW; Bradley & Lang, 1999a) represent the best-known affective dictionary, providing normative evaluative ratings for 1,034 words. The adaptation of this database to other languages enables the comparison of empirical findings on emotion processing across different language contexts. Accordingly, the ANEW has already been adapted for Spanish (Redondo et al., 2007) and European Portuguese (Soares et al., 2012). Here, we present an adaptation for German—the native language of three European countries, where a quickly growing amount of research on emotion processing has been published in recent years.

Note that the comparability between available databases providing emotion ratings for German words (Briesemeister, Kuchinke, & Jacobs, 2011a; Kanske & Kotz, 2010; Schröder, 2011; Võ et al., 2009; Võ et al., 2006) and the ANEW corpus is limited. First, the overlap in the respective word materials only amounts to 40 percent of the ANEW—when compared to the recent version of the most extensive German database, the Berlin Affective Wordlist (BAWL; Võ et al., 2009). Second, these German databases do not provide ratings on all dimensions listed in the ANEW, or they do so for—at least potentially—slightly different dimensions: Unlike the ANEW, the BAWL database (Võ et al., 2009; Võ et al., 2006) does not contain ratings of dominance, and the arousal dimension was operationalized in a different way. All of these factors limit potential comparisons of the emotional connotations of specific words or their respective effects on emotion processing in order to contrast German with other languages/cultures (see, e.g., Conrad et al., 2011; Ponz et al., 2013).

In contrast to discrete conceptualizations of emotion (e.g., Panksepp, 1998), for which affective evaluations are available for English (Stevenson, Mikels, & James, 2007) and German (Briesemeister et al., 2011a), a dimensional approach underlies the ANEW corpus and its adaptations in different languages. This position, prominently advocated by Wundt (1896) and Russell (1980), is based on the seminal studies of Osgood, Suci, and Tannenbaum (1957), who applied factor analyses to semantic differential judgments of words. They identified three dimensions, named “evaluation,” “potency,” and “activity,” which account for a major portion of variance. Applied specifically to emotional states, Mehrabian and Russell (1974) later termed these dimensions “pleasure,” “arousal,” and “dominance”—in order of importance, as defined by the amount of variance explained. ANEW characterizes these three dimensions as “pleasure/valence” (ranging from pleasant to unpleasant), “arousal” (ranging from calm to excited), and “dominance/control” (ranging from dominant to dominated) (Bradley & Lang, 1999a).

Besides these variables, imageability is a further dimension important for understanding the relations between emotion and language. Recent studies have demonstrated a significant influence of words’ imageability on language processing (Altarriba & Bauer, 2004; Altarriba, Bauer, & Benvenuto, 1999; Huang, Lee, & Federmeier, 2010; Kanske & Kotz, 2007; Kousta, Vigliocco, Vinson, Andrews, & Del Campo, 2011). This variable should also provide a close match to the concreteness or abstractness of words—often proposed to determine the processing of emotion content (Vigliocco, Vinson, Druks, Barber, & Cappa, 2011). Accordingly, unlike the ANEW, more recent normative databases, such as the Berlin Affective Word List (BAWL; Võ et al., 2009; Võ et al., 2006), provide imageability ratings together with valence and arousal ratings.

Therefore, our adaptation of the original ANEW corpus to German serves a twofold purpose: First, we wanted to improve the match between normative emotion databases in German and other languages in order to enable future cross-linguistic research on emotion. Second, we also aimed at further developing the general understanding of different emotion dimensions proposed so far across different databases—investigating the mutual relations between these dimensions, after combining them within one, single database.

Our German ANEW database was collected by means of the following steps:

  1. 1

    ANEWing the BAWL: The ratings of valence, arousal, and imageability for the German translations of ANEW words contained in the Berlin Affective Word List (Võ et al., 2009) served as a basis to build upon.

  2. 2

    We also added ratings of valence, arousal, and imageability for German translations of the ANEW words not contained in the BAWL (Võ et al., 2009).

  3. 3

    Finally, we collected dominance ratings for all German translations.

  4. 4

    Contrasting alternative but closely related dimensions: We re-collected arousal ratings for all of the German words, this time using scales and instructions closely matching the original ANEW procedure. The arousal ratings for the BAWL corpus were based on a slightly different scale and instructions, to emphasize the bipolar character of this dimension (see Võ et al., 2009; Võ et al., 2006).

  5. 5

    We also collected ratings for all words on the dimension of “potency” (Mehrabian & Russell, 1974; Schröder, 2011). Though this dimension is closely related to dominance, potency mainly differs in its independence from the raters’ perspective. A more detailed account of these variables will be presented in the discussion of our methods for creating the database.

  6. 6

    Shaping up the database for comfortable use in psycholinguistic research: In addition to these evaluative variables, language statistical measures and other variables of psycholinguistic relevance were added as objective indices for all of the words, in order to make the database most beneficial for further scientific research. These are word frequencies, taken from two recent extensive corpora representing different contexts of language use: the print-based corpus of the Leipzig Wortschatz Projekt (Wortschatz Universität Leipzig, 2013), including over 50 million words, and the SUBTLEX corpus (Brysbaert et al., 2011), with more than 25 million words taken from movie subtitles —together with grammatical class, number of letters, number of syllables, and number of orthographic neighbors.

Method

Study I: ANEWing the BAWL

1. + 2. Expanding the BAWL ratings

The translation of the 1,034 words provided by the ANEW (Bradley & Lang, 1999a) resulted in a total of 1,003 German words, because some of the English words have the same German translation. Ratings for valence, arousal, and imageability for 400 of the words could be retrieved from the BAWL (Võ et al., 2009). Additional ratings for the remaining 603 German words were collected separately for each dimension.

Procedure

Ratings for valence, arousal, and imageability were collected according to the procedure reported in the BAWL (Võ et al., 2009). Note that this procedure involves the use of the Self-Assessment Manikin (SAM; see Fig. 1) only for the arousal dimension, but not for valence.

Fig. 1
figure 1

Self-Assessment Manikin (SAM): Self-evaluation scales used for the dimensions of arousal and dominance

The SAM, a nonverbal pictorial measure derived from the semantic differential scale developed by Mehrabian and Russell (1974) and adapted by Lang (1980), has often been used to assess the three dimensions of valence, arousal, and dominance. Each of these scales is represented by five figures. To facilitate scale comprehension, SAM pictures are normally accompanied by verbal anchors at the extreme ends of the scale. But note that the capacity of the SAM to adequately represent these constructs or dimensions of emotional content is a matter of debate (see Võ et al., 2009, and Võ et al., 2006, for discussions). This is why, when designing the data collection for the BAWL, we had opted for a slightly different procedure—presented below—which we continued to use now to complete the ratings on valence and arousal for the 1,003 German words.

Valence.

Following the procedure for obtaining BAWL ratings, participants were presented with the verbal anchors positiv (“positive”) and negativ (“negative”), defining the ends of a bipolar scale ranging from −3 to +3, with “neutral” (0) in the center.

Arousal (BAWL).

Unlike valence, the concept of arousal is difficult to represent in a purely verbal way. For this reason, we opted to use SAMs for data recollection in both the original BAWL study and the present study (see Fig. 1). But, unlike Bradley and Lang (1999a), we had participants rate arousal on a 5-point scale (rather than a 9-point scale) and only used the SAMs, but not the intervals between them, for the rating.

Probably more important than this difference between a 5- versus a 9-point scale might be the following change concerning the verbal anchors: The anchors for the extreme ends of the arousal dimension, as implemented in the ANEW (Bradley & Lang, 1999a), were calm versus excited. Although the authors’ theoretical account of this scale stressed its bipolar character—ranging from a relaxing extreme to an exciting extreme, with a neutral midpoint—when designing the BAWL ratings (Võ et al., 2006), we felt that these anchors might rather lead participants to interpret arousal as a unipolar and uniformly increasing dimension.

Therefore, when collecting German arousal ratings, we applied the anchors aufregend (“exciting”) and beruhigend, which might best be translated as “calming,” to emphasize an actively relaxing aspect of low arousal, as opposed to the “exciting” opposite end of the scale—understanding arousal as a bipolar concept (see the Appendix).

Imageability.

Imageability was represented in terms of a monopolar, uniformly increasing 7-point scale according to the procedure in BAWL (Võ et al., 2006).

Ratings were collected separately for each dimension: Each participant was presented with a randomized list of words for which ratings had to be given on only one dimension, to avoid transfer effects. To perform the rating, participants were seated in a quiet room. Each list contained 603 items and was rated in a self-paced procedure. Each word was rated on each dimension by at least 20 participants. The words were presented in white letters, together with the rating scale, on a black background on a computer screen. Participants were given written instructions and five practice trials. The same general procedure was applied for all of the following data collections.

Participants

A total of 65 participants took part in the study (36 women, 29 men; the female ratio for any particular rating was no more than 2:1), all of whom were psychology students from the Freie Universität Berlin from the age of 18 to 37 years (mean = 24.9, SD = 4.6). All participants (here and for all of the following data assessments) were native speakers of German and had normal or corrected-to-normal vision. Their participation was rewarded with course credit or a small amount of money.

3. Collecting ratings on dominance

Because dominance ratings were not part of the BAWL, we replicated the procedure used by Bradley and Lang (1999a) in order to achieve optimal cross-language comparability.

Procedure

The SAM for dominance ranges from a small-sized (dominated) figure to a large one (in control). Due to repeatedly expressed difficulties of interpretation, more explicit anchors for the dimensions of dominance and potency were used (see the Appendix). Ratings could also be placed in the spaces between the figures, replicating the 9-point scale originally used by Bradley and Lang (1999a).

For all of the data collections reported from now on, a total of 1,003 words had to be rated. The corresponding list of stimuli was randomized and divided into two subsets containing about half of the stimuli. Each participant thus rated a total of 501 or 502 words. Each half was subsequently split, and the resulting parts were alternately assigned to one of two new experimental lists for the next two participants—a procedure that should assure that all words had similar probabilities to co-occur in a given list for the collection of ratings with any other word from the total list.

Participants

A group of 40 students (24 women, 16 men) from the Freie Universität Berlin participated in the study; they were 21 to 33 years of age (mean = 25.1, SD = 4.2).

Results and discussion

To obtain an estimate of the general comparability of our data with the original ANEW values and corresponding databases in Spanish and Portuguese, we computed bivariate correlations between the values obtained in the four different languages on the three dimensions of valence, arousal, and dominance. The substantial correlation coefficients between, for instance, German and English evaluations for valence (r = .90, p < .001), arousal (r = .62, p < .001), and dominance (r = .60, p < .001) support a reasonable general quantitative comparability across the languages (see Table 1).

Table 1 Bivariate correlations of valence, arousal, and dominance between languages

In particular, the very close relationship between the valence ratings in the four languages (all rs > .9) demonstrates that emotion ratings can—in principle—almost perfectly be replicated across different languages/cultures. However, the substantially lower correlations concerning the other two dimensions suggest that these cross-cultural consistencies also encounter some limitations—apparently affecting different dimensions in specific ways.

In the following discussion, we focus on these apparent discrepancies, considering especially the arousal variable and its relation to the—otherwise cross-culturally remarkably stable—concept of valence.

Although, following the classic work by Wundt (1896), the dimensions of valence and arousal have initially been introduced as two independent factors constituting a two-dimensional affective space (Bradley & Lang, 1999a; Russell, 1980), empirical reports on these variables consistently evidence a U- or boomerang-shaped distribution in which the two variables are positively correlated within the domain of positive, but negatively correlated within the domain of negative valence (Bradley, Codispoti, Cuthbert, & Lang, 2001).

For the present 1,003 German words, the distribution in the bidimensional affective space determined by the dimensions of valence and arousal (see Fig. 2) approximately fits the typical boomerang shape reported by Bradley and Lang (1999a). Accordingly, both the typical patterns of a positivity offset and a negativity bias could be replicated (Cacioppo, Gardner, & Berntson, 1997), as is revealed by a positive correlation for valence and arousal for words of positive valence (i.e., above 0, the midpoint of the valence scale; r = .2, p < .001), and a negative correlation with a considerably steeper slope for words of negative valence (r = −.63, p < .001), for the German sample. In general, a comparable pattern involving a positive offset and a negativity bias is, therefore, observable in all four versions of the ANEW corpus in four different languages. But note, also, that the data from the four languages display interesting discrepancies with respect to the relative strengths of the two effects. As we noted above, the correlation between valence and arousal has a much steeper slope in the negative than in the positive valence domain in our German data, which is comparable to what has been found in the Spanish and Portuguese data, but is opposite the situation for English, where arousal increases especially strongly in positive valence (r = .64 in the positive range, r = −.46 in the negative range; see Table 2). Moreover, both the German and Portuguese data involve a much attenuated positive correlation (r < .27) between the two variables concerning the range of positive words, relative to the English and Spanish data (r > .45).

Fig. 2
figure 2

Correlations of valence and arousal for German (r = −.56), English (r = −.05), Spanish (r = −.15), and Portuguese (r = −.49) words

Table 2 Bivariate correlations of valence (VAL) and arousal (ARO), by languages, for the whole, positive, and negative ranges of words

As a consequence, the overall correlations between valence and arousal continuously increased, from the English (r = −.05), over the Spanish (r = −.15, with z = −2.69; p < .01 for the comparison to English) and Portuguese (r = −.49, with z = −6.2; p < .001 for the comparison to Spanish) data, and finally to the German data (r = −.56, with z = −1.03; p = .3 for the comparison to Portuguese). This clearly suggests that, at least for the Portuguese and German data, the two dimensions do not seem to be orthogonal to one another, challenging the assumption that the dimensions are independent.

However, the interpretation of these apparent differences concerning the internal relations of the two-dimensional space between the original ANEW and our database for German words faces a serious problem: For all emotion ratings of German words that are available so far, the operationalization of the arousal concept has been slightly modified with regard to the original instructions of Bradley and Lang (1999a)—see the Procedure section above for details.

Probably more important than the switch between a 9-point and a 5-point scale could be the more bidimensional interpretation of the arousal variable that potentially was suggested to the participants in our German sample by the translation of the verbal anchor “calm” as beruhigend (which might rather be understood as “calming”).

Certainly, these differences in the use of scales make an interpretation of the present results as evidence for cultural differences in the use of the arousal concept difficult.

To overcome this problem, and to test whether the apparent differences in arousal ratings across languages could have arisen from changes in the scales and instructions, we decided to re-collect ratings for our 1,003 words on the arousal dimension, this time perfectly meeting the operationalization and instructions previously used for the ANEW (see Bradley & Lang, 1999a); hereafter, the new set of ratings will be termed ARO (ANEW), as opposed to ARO (BAWL).

Study II: Contrasting alternative but closely related dimensions

4. Arousal (ANEW)

To ensure comparability across languages, ratings on the dimension of arousal were collected by perfectly matching the procedures applied in the ANEW corpus.

Procedure

We used the original 9-point scale with five Self-Assessment Manikins, including spaces between them as optional rating positions, ranging from a relaxed (calm) to an open-eyed, literally exploding (excited) figure, with additional verbal markers being given only during the written instructions. In contrast to the version of the BAWL (Võ et al., 2009), the left verbal anchor of the scale was changed to ruhig (“calm”) instead of beruhigend (“calming”).

Participants

A group of 40 students (22 women, 18 men) from the Freie Universität Berlin participated in the study. Their ages ranged from 18 to 34 (mean = 23.8, SD = 3.6).

Results

The resulting data did not differ considerably from those obtained using the BAWL scale and instructions (Võ et al., 2009; Võ et al., 2006) with a correlation coefficient of r = .88 between the two. Again, for the new data, the correlation between valence and arousal was negative across the whole sample (r = −.58), moderately positive for the positive range (r = .21), and characterized by an especially steep slope for the negative range (r = −.66). We thus conclude that the previously mentioned differences concerning valence–arousal correlations in the ANEW data across languages cannot be attributed to the different scale used in the BAWL, and we will again refer to these apparent cross-cultural differences in arousal ratings during the General Discussion.

5. Potency

We would like to point out that the relation between past operationalizations of dominance versus potency might be more complex than has been previously considered. While both dimensions are rated using an identical pictorial SAM scale (Bradley & Lang, 1994), the important difference between them resides in the perspective that the participant has to adopt toward the rated concept. In the case of dominance, the participant is asked to establish a relation toward the rated object and then to decide whether or not he or she can dominate the object—the central question being, “How dominant do you feel in relation to the word?” (Bradley & Lang, 1994). In the case of potency, the concepts are rated independently of their relation to the participant, who has to evaluate what potency the object might have, as such (e.g., Heise, 2010; Schröder, 2011).

Procedure

Participants were asked in the instructions to rate the word according to its perceived potency, independently from his or her own person (cf. Heise, 2010; Schröder, 2011). The Self-Assessment Manikin scale ranged from a small-sized figure to a large one using a 9-point scale, which was formally comparable to the one used for the dominance ratings.

Participants

A group of 40 students (23 women, 17 men) from the Freie Universität Berlin participated in the study; their ages ranged from 19 to 37 (mean = 24.5 years, SD = 4.3).

Results

The correlation coefficient between the dimensions of dominance and potency in our ratings was r = −.35. This relatively weak correlation clearly suggests that the scales are not simply reversed, but rather that different specific aspects seem to determine the respective ratings on each dimension. Additionally, the correlations with the dimensions of valence (r = .65 for dominance, r = .25 for potency, z = −47.96; p < .001) and arousal (r = −.47 for dominance, r = .64 for potency, z = 66.02; p < .001) differ considerably across both variables (see Table 3a), suggesting that distinct information is captured by the two measures.

Table 3 Bivariate correlations between the dimensions of valence [VAL], arousal [ARO] (ANEW), arousal (BAWL), dominance [DOM], potency [POT], and imageability [IMA], as well as written word frequency from the Leipzig Wortschatz Projekt (log10 Leip; Wortschatz Universität Leipzig, 2013) or spoken word frequency from SUBTLEX (log10 SUBT; Brysbaert et al., 2011)

6. Language statistical measures

Two different measures of word frequency were joined from the print-based corpus of the Leipzig Wortschatz Projekt (Wortschatz Universität Leipzig, 2013), including over 50 million words, and the SUBTLEX corpus (Brysbaert et al., 2011), with more than 25 million words taken from movie subtitles. Finally, grammatical class, number of letters, number of syllables, and number of orthographic neighbors were generated.

Structure of the database

The database contains a word number (WdNum) identifying each word according to the original word number in the ANEW (Bradley & Lang, 1999a), as well as the original English word (E-word) and its German translation (G-word). Next, the means (Mean) and standard deviations (SD) for each evaluative dimension of valence (VAL), arousal BAWL [ARO (BAWL)], arousal ANEW [ARO (ANEW)], dominance (DOM), and potency (POT) are provided, followed by the imageability ratings for the German words (IMA). In addition, the variables of word frequency—from both written language, taken from the Leipzig Wortschatz Projekt (Wortschatz Universität Leipzig, 2013; freq_Leipzig), and movie subtitles reflecting the use of spoken language, taken from SUBTLEX (Brysbaert et al., 2011; freq_SUBTLEX)—number of letters (#_letters), number of syllables (#_syllables), word class (word_class), and number of orthographic neighbors (Colthearts N: N_orth) are included. All of the latter variables are based on CELEX (Baayen, Piepenbrock, & Gulikers, 1995). The database can be downloaded as supplemental materials accompanying this article.

General discussion

The ANEW (Bradley & Lang, 1999a) represents the best-known affective dictionary for the English language, providing US American ratings on the three dimensions of valence, arousal, and dominance. To facilitate cross-cultural emotion research with verbal materials, we have presented an adaptation for the German language, offering competing operationalizations of the affective measurements, along with additional psycholinguistic variables. For the purpose of a better understanding of the alternative scales measuring similar aspects of affective experience, the different operationalizations are here examined in more detail.

Concerning the two variables of valence and arousal, defining the two-dimensional affective space proposed by some theoretical accounts (Russell, 1980), a closer look at the previously available large-scale databases already makes the assumption of two independent components seem less plausible: The normative databases have classically been characterized by a U- or boomerang-shaped distribution of the two dimensions, and are thus only weakly correlated across the whole range of the valence scale, but closely correlated within the more constrained ranges of either positive or negative valence (see Table 2).

In the case of our German data, even across the whole range of the valence scale, a relatively high negative correlation between the two is given, with arousal decreasing with increasing valence—mostly due to the very tight correlation of the two within the range of negative valence, a pattern consistent across operationalizations of the arousal scale according to both ANEW (Bradley & Lang, 1999a) and BAWL (Võ et al., 2009).

Apparently, though this is rarely emphasized, the issue of high linear correlations between valence and arousal seems to be a rather widely spread phenomenon. A comparable pattern also appears in the Portuguese sample (Soares et al., 2012), where we calculated an overall correlation coefficient for both dimensions of r = −.49. A similar misalignment has also been reported in a cross-cultural comparison of the IAPS pictures (Lang, Bradley, & Cuthbert, 2008) to a Hungarian sample, where the dimensions of valence and arousal correlated to r = −.54 (Deák, Csenski, & Révész, 2010). Ribeiro, Pompéia, and Bueno (2005) found a correlation of r = −.82 for valence and arousal on the basis of evaluation of the IAPS pictures for a Brazilian sample.

As Ribeiro et al. (2005) argued, this might be explained by “the absence of a clearly defined concept of ‘arousal’ ” (p. 214). Whereas the left extreme of the SAM, in the original study by Bradley and Lang (1999a) anchored as calm, relaxed, might be understood as the “absence of alteration of the participant’s normal state”, this judgment mostly holds true for stimuli that are neutral in valence, as has clearly been the case for the American and Spanish sample.

Our replications of arousal ratings with the original ANEW instructions and anchors shows that these difficulties to replicate the data from Bradley and Lang (1999a) concerning the symmetric U-shape of the two-dimensional affective space clearly do not result from methodological issues such as the rating instructions emphasizing the bipolar character of the scale or differences between a 5- and a 9-point scale.

On the other hand, it needs to be acknowledged that the positive emotional connotations of the English words “exciting” or “excited” might in part explain why a more pronounced positive correlation between valence and arousal for words of a general positive valence is given for the US American data, as compared to the data from other languages in which the translations of these verbal anchors have less positive connotations. However, these potential emotional connotation differences across languages at the level of one single word may also point to a more general cultural bias determining the specific way that arousal ratings evolve in affective databases across languages: Whereas correlations of arousal values from an American sample are quite comparable to those of Spanish subjects, the values from Portuguese subjects seem to correlate higher with those from a German sample, and both the Portuguese and German values differ from those in the American and Spanish samples.

We suggest that these specific differences between the four languages involved match with differences characterizing them in terms of cultural norms concerning the personality construct of extraversion. As has been shown in previous studies (Ramírez-Esparza, Gosling, Benet-Martínez, Potter, & Pennebaker, 2006; Veltkamp, Recio, Jacobs, & Conrad, 2013), the construct of extraversion in bilingual individuals, among other things, is subject to change depending on cultural framing. Bilinguals, for instance, reached higher scores of extraversion when filling out a personality inventory in Spanish than when doing so in German (Veltkamp et al., 2013). No such results are available so far for Portuguese, but we believe that labeling the Portuguese culture as being less extraverted than the Spanish can be considered a “common ground” and the same would hold true when comparing German to American culture. As for the present data, our tentative proposal is that cultures favoring extraversion over introversion—as we believe is the case if we compare English and Spanish, on one side of the extra-/introversion dimension, with German and Portuguese, on the other—tend to assign a higher arousal to positive events, thus making them a focus of overt verbal communication. On the other side, more introverted cultures tend to favor silently enjoying positive events, which, in turn, are perceived as being less arousing.

Relative to valence and arousal, the role of dominance and/or potency in language processing has long been ignored, possibly due to its indeterminate nature. From a formal point of view, the two dimensions simply differ concerning the perspective of the raters: whether they feel more or less “dominant” toward something, or whether they perceive something as being more or less “potent”. If a mere inversion of perspective characterized the relation between the two concepts of potency and dominance, a strong negative correlation between their respective values would be expected. The rather weak negative correlation between the two measures revealed by our data, thus, clearly contradicts the assumption of such a simple inversion. Bradley and Lang (1994) already suggested that the dominance scale, with its originally envisaged clinical use, highlights more the raters’ self-centered perspective. Therefore, raters of dominance might focus more on their own coping strategies towards an object, as compared to directly rating its potency. As our data show, this holds implications for the relations of both variables to the valence and arousal dimensions. On the one hand, the particularly high positive correlation of dominance and valence suggests that more positive affect arises as a function of perceiving a higher internal locus of control with regard to a given stimulus. On the other hand, ratings of potency instead display particularly high positive correlations with arousal. Both phenomena are significantly more pronounced in the case of negative than of positive words, both for dominance correlating with valence (r = .27 for positive words, r = .55 for negative words, z = 26.83; p < .001) and for potency correlating with arousal (ANEW) (r = .59 for positive words, r = .67 for negative words, z = 18.61; p < .001) (see Table 3b and c). This pattern shows that both scales cannot be considered, and should not be used as if they were, interchangeable. The specific contrast observed in our data seems to align well with influential appraisal models of emotion (Scherer, 1999; Scherer, Schorr, & Johnstone, 2001), according to which stimuli or events of emotional relevance initially evoke unconscious reactions of the autonomous nervous system—for instance, in the noradrenergic system. The potential of stimuli to trigger such responses might, thus, strongly determine ratings of potency, which accordingly appear to be strongly related to arousal ratings—the dimension of affective space that more or less directly refers to changes in the autonomous nervous system. According to Scherer (1999), these initial reactions to emotional input should then be followed by more conscious evaluation processes involving the evaluation of available coping strategies—presumably assessed by dominance ratings—and leading to a final evaluation of concepts, stimuli, or situations as being pleasant or unpleasant, which would, then, explain the tight positive correlation between valence and dominance ratings.

In sum, clearly more research will have to be done to fully understand the interplay of the various dimensions constituting the affective space within and across different languages (see Conrad et al., 2011, who compared emotional event-related potential effects in response to German and Spanish words). The German adaptation of ANEW will be a further useful and valid measure to support future research on emotion in general and cross-linguistic cognitive and psychophysiological research; in particular, the new database extends the range of the classical ANEW dimensions with the newly added dimension of imageability, a variable that has been shown to influence word processing in a number of experimental studies (Altarriba & Bauer, 2004; Altarriba et al., 1999; Huang, Lee, & Federmeier, 2010; Kanske & Kotz, 2007; Kousta et al., 2011; Vigliocco et al., 2011) and that can be largely or entirely accounted for by two computable measures: the size and density of a word’s context, and the emotional associations of the word (Westbury et al., 2013).