Psychological research often involves stimuli with an affective meaning, such as words, pictures, odors, noises, stories, and films. We use the term affective to refer to an open-ended set of variables that are characteristic of phenomena including emotions, moods, attitudes, evaluation or appraisal, and feelings. These phenomena are related but differ in some respects. Emotions are often thought to consist of changes in multiple components, such as evaluation or appraisal, physiological responses, motor expressions (facial, vocal, gestural), action tendencies, and feelings (subjective experience). Moods are considered to have somewhat fewer components or to have less pronounced values for all of these components. Both emotions and moods are characterized by the variables that characterize their components. Examples of variables characterizing the feeling component of emotions and moods are valence, arousal, and power or dominance (Fontaine, Scherer, Roesch, & Ellsworth, 2007). Variables characterizing the appraisal component of emotions and moods are valence, goal relevance, goal congruence, power or coping potential, agency, novelty, and certainty (Ellsworth & Scherer, 2003). The variable most often mentioned as characterizing attitudes is valence.

Research on affective phenomena can be organized into various lines of research. A first line focuses on the processes involved in the production and perception of affective phenomena. This includes research on the processes involved in (1) the formation, activation, and change of attitudes (e.g., Hofmann, De Houwer, Perugini, Baeyens, & Crombez, 2010) and (2) the components of emotions, such as appraisal (e.g., Moors, 2010; Sander, Grandjean, & Scherer, 2005), action tendencies (e.g., Drake & Myers, 2006), somatic responses (e.g., Bauer, 1998; Lang, Bradley, & Cuthbert, 1998; Phan, Wager, Taylor, & Liberzon, 2002), expressive behavior (e.g., Russell, Bachorowski, & Fernandez-Dols, 2003), and feelings (e.g., Dan Glauser & Scherer, 2008). A second line of research focuses on the relation among different affective phenomena. This line is concerned with the interrelations among the various emotion components (appraisal, action tendencies, expressive behavior, bodily responses, and feelings; Scherer, 2009; Roseman & Evdokas, 2004). A third line of research focuses on the relation between affective and nonaffective phenomena. This includes research on the relations between emotions (or their components), moods, and attitudes, on the one hand, and attention, perception, memory, judgments, and decision making, on the other hand (Compton, 2003; Kensinger, 2004; Laney, Campbell, Heuer, & Reisberg, 2004; Levine & Pizarro, 2004; Williams, Mathews, & MacLeod, 1996; Vohs, Baumeister, & Loewenstein, 2007).

All these lines of research make use of stimuli with an affective meaning. For example, in research on the influence of evaluation on memory, the aim is to manipulate the content of evaluation and to measure its influence on memory. Manipulation of the content of evaluation is accomplished by presenting stimuli that are expected to be evaluated by the participants in a certain way—for example, as positive or negative (e.g., Bower, Gilligan, & Monteiro, 1981). For another example, to examine the influence of evaluations of power on action tendencies, researchers have primed participants with strong and weak words and measured their tendencies to approach and avoid (Smith & Bargh, 2008). Stimulus selection needs to proceed in such a way that researchers can be confident that most participants will evaluate the stimuli in the intended way. The preferred method for stimulus selection is to choose stimuli from previous rating studies.

Such rating studies have been reported for pictures (Lang, Bradley, & Cuthbert, 2008), sounds (Bradley & Lang, 1999b), and words in several languages, including English (Bradley & Lang, 1999a; Brown & Ure, 1969; Stevenson, Mikels, & James, 2007), Spanish (Redondo, Fraga, Padrón, & Comesaña, 2007), French (Bertels, Kolinski, & Morais, 2009; Bonin et al., 2003; Corson & Quistrebert, 2000; Messina, Moré, & Cantraine, 1989; Niedenthal et al., 2004; Syssau & Font, 2005 ), German (Grühn & Smith, 2008; Hager & Hasselhorn, 1994; Lahl, Göritz, Pietrowsky, & Rosenberg, 2009; Võ et al., 2009), and Finish (Eilola & Havelka, 2010). In Dutch, a study by Hermans and De Houwer (1994) provided valence ratings and subjective familiarity ratings for 740 Dutch words, of which 370 were adjectives referring to personality traits and 370 were nouns.

The present study provides affective ratings for 4,300 Dutch words. It goes beyond many other word norm studies (in Dutch and other languages) in several respects. We included more words, which belonged to more grammatical categories, and which were tested on more affective variables in more populations. Specifically, the set of 4,300 words consisted mainly of nouns, adjectives, adverbs, and verbs. The words were evaluated on the variables of valence/pleasantness, activity/arousal, and power/dominance. Power/dominance (also sometimes referred to as potency or control) has not often been included in previous word norming studies (but see Bradley & Lang, 1999a), even though it has been identified as an important variable in emotion research, in addition to valence and arousal (e.g., Fontaine et al., 2007; Osgood, Suci, & Tannenbaum, 1957). In addition, we collected ratings of age of acquisition (AoA), so that the stimuli can be controlled for or manipulated on this variable as well. We chose AoA because it has been presented as the fifth most important factor determining word recognition times, after frequency, word length, similarity to other words, and word onset (Kuperman, Stadthagen-Gonzalez, & Brysbaert, in press). Imageability is another variable that is sometimes controlled or tested, but Brysbaert, Lange, and Van Wijnendaele (2000) found that, in Dutch, it explains virtually no variance once the words are controlled for frequency and AoA (see also Cortese & Khanna, 2007). A list of imageability ratings in Dutch can be found in Van Loon-Vervoorn (1989). It includes ratings for about 67 % of the words in the present list.

The ratings were performed by equally sized groups of male and female students from two Belgian (Ghent and Leuven) and two Dutch (Rotterdam and Leiden-Amsterdam) samples, which allowed us to see how region independent they are. A further strength is that in our study, each participant rated the entire set of words for only one variable. This had the advantage that the ratings for one variable (e.g., valence/pleasantness) could not influence or “contaminate” the ratings for another variable (e.g., activity/arousal or power/dominance; cf. Bestgen & Vincze, 2012; Bradley & Lang, 1994).

Method

Participants

Participants were 224 students (112 women, 112 men) recruited from two universities in Flanders (the Dutch-speaking half of Belgium; Ghent University, N = 64; and the University of Leuven, N = 48) and three universities in The Netherlands (Erasmus University Rotterdam, N = 64; Leiden University, N = 41; and the University of Amsterdam, N = 7). The participants of Leiden University and Amsterdam University were treated as one sample. The remaining universities each constituted one sample. Each of the samples consisted of an equal number of women and men. Participants at Ghent and Leuven received 50 euros for their help. In Leiden and Amsterdam, they received 20 euros. In Rotterdam, they received course credits. The age of the participants ranged from 17 to 58 years (M = 22.08, SD = 4.49). The ratings were obtained between May 2011 and February 2012. Our participants were students because this is the population typically tested in the studies for which the ratings are meant.

Materials and procedure

We selected 4,300 Dutch words from various sources (De Deyne & Storms, 2008; Fontaine, Poortinga, Setiadi, & Suprapti, 2002; Fontaine et al., 2007; Frijda, Kuipers, & ter Schure, 1989; Hermans & De Houwer, 1994; Keuleers, Diependaele, & Brysbaert, 2010; Osgood et al., 1957; Rouckhout & Schacht, 2000; http://synoniemen.net/). The selection of words was guided by the idea that in addition to neutral words, we needed as many words as possible with a marked value for each of the three affective variables. The set mostly contained nouns, adjectives, adverbs, and verbs. We excluded most interjections, most plurals, diminutives, words that have become obsolete, words with a very low frequency in written language, and words that are uncommon in either region (Flanders/The Netherlands). Of the 740 words of the rating list of Hermans and De Houwer (1994), 715 were included in the present list. This allowed us to examine whether the valence ratings of these 715 words generalized to the present study.

Each participant rated the entire set of 4,300 words for one variable only: valence/pleasantness, activity/arousal, power/dominance, or AoA. In each sample, each affective variable (valence/pleasantness, activity/arousal, power/dominance) was rated by 8 women and 8 men. AoA was rated only at Ghent University and the Erasmus University Rotterdam (in each university by 8 women and 8 men).To reduce possible sequence effects, the order in which words appeared in the list was randomized for each participant separately.

Participants who agreed to take part in the study received an e-mail with an Excel file containing two sheets: The first sheet presented the instructions; the second sheet listed the 4,300 words. Samples of these Excel files for each variable are provided as supplementary materials to this article. Participants in the valence/pleasantness condition were asked to judge the extent to which the words in the study referred to something that is positive/pleasant (“positief/aangenaam”) or negative/unpleasant (“negatief/onaangenaam”), using a 7-point scale (1 = very negative/unpleasant, 2 = fairly negative/unpleasant, 3 = somewhat negative/unpleasant, 4 = neutral, 5 = somewhat positive/pleasant, 6 = fairly positive/pleasant, 7 = very positive/pleasant). To ensure that the participants understood the instructions, we provided the following examples with words that did not appear in the list:

If you think that “atom bomb” has a very negative meaning, please choose 1. If you think that “fantastic” has a very positive meaning, please choose 7. If you think that “sprouts” refers to something that is fairly unpleasant, please choose 2. If you think that “relaxing” refers to something that is fairly pleasant, please choose 6.

Participants in the activity/arousal condition were asked to judge the extent to which the words in the study referred to something that was active/arousing (“actief/opgewonden”) or passive/calm (“passief/kalm”), using a 7-point scale (1 = very passive/calm, 2 = fairly passive/calm, 3 = somewhat passive/calm, 4 = neutral, 5 = somewhat active/aroused, 6 = fairly active/aroused, 7 = very active/aroused). The examples provided for this dimension were the following:

If you think that “hammock” has a fairly passive meaning, please choose 2. If you think that “working” has a fairly active meaning, please choose 6. If you think that “meditating” has a very calm meaning, please choose 1. If you think that hyperkinetic has a very aroused meaning, please choose 7.

Participants in the power/dominance condition were asked to judge the extent to which the words in the study referred to something that was weak/submissive (“zwak/onderdanig”) or strong/dominant (“sterk/dominant”), using a 7-point scale (1 = very weak/submissive, 2 = fairly weak/submissive, 3 = somewhat weak/submissive, 4 = neutral, 5 = somewhat strong/dominant, 6 = fairly strong/dominant, 7 = very strong/dominant). The examples provided for this variable were the following:

If you think that “grass stalk” refers to something that is very weak, please choose 1. If you think that “avalanche” refers to something that is very strong, please choose 7. If you think that “servant” has a fairly submissive meaning, please choose 2. If you think that “revenge” has a fairly dominant meaning, please choose 6.

After reading the instructions, the participant opened the second sheet. The 4,300 words were presented in the first column. The participants rated each word by typing a number from 1 to 7 in the second column. After they had typed a number, the meaning of the number appeared in the third column (e.g., when the participant had pressed 2, the message “fairly passive/calm” appeared). When the participant typed a wrong number (outside of the 1–7 range), a red square with the message “wrong code” appeared. Participants were instructed to respond as accurately as possible, but not to think too long. They could type in the letter N when they did not know the word.

The same procedure was used in the AoA condition, except that participants were asked to enter the age at which they thought they had learned the word (Bird, Franklin, & Howard, 2001; Ghyselinck, De Moor, & Brysbaert, 2000). We clarified that this was the age at which they first understood the word when somebody else used it in their presence, even when they did not use the word themselves. The examples given for this variable were the following:

If you think you learned “banana” when you were 3 years old, please fill in 3. If you think you learned “accountant” when you were 11 years old, please fill in 11.

The validity of AoA ratings has been confirmed in studies that obtained a high corrrelation between AoA ratings and the percentage of words known by children of various ages (e.g., De Moor, Ghyselinck, & Brysbaert, 2000; Morrison, Chappell, & Ellis, 1997). Participants were asked to send the completed file back via e-mail to the experimenter in approximately 2 weeks. Afterward, they were invited to collect the monetary reward or course credits.

Results

Outlier analysis

We conducted the following outlier analysis. First, we discarded all ratings on which participants indicated that the word was unknown to them (1.1 %). We then calculated the mean and SD for each word. Next, we counted for each participant the percentage of words for which their rating deviated 2.5 SDs from the mean. Only 1 participant (who rated AoA) had a high percentage of outliers (30.8 %) and was discarded. The percentage of outliers for the other participants ranged from 0 to 17.5 (M = 1.4, SD = 2.3). We then calculated the mean and SD for each word a second time on the remaining data. Furthermore, we excluded the ratings for one word because it had been typed incorrectly in the Excel files. Finally, there were 42 missing values on a total of 963,200 ratings. All in all, 947,462, or 98.4 %, valid ratings were obtained.

Ratings of the affective variables and AoA

An Excel file with the raw data is provided as supplementary materials to this article. It contains the 4,300 words in alphabetical order, together with their English translations (based on Google Translate and Van Dale Groot Woordenboek) and the mean values (Ms), standard deviations (SDs), and sample sizes (Ns) for valence/pleasantness (V), activity/arousal (A), power/dominance (P), and age of acquisition (AoA). The file also contains information about word frequency (FR) and number of letters (Let). The frequency scores were taken from the SUBTLEX-NL database (Keuleers, Brysbaert, & New, 2010). The file contains both frequency per million words and log10 of frequency per million words. Forty words in our study did not appear in the SUBTLEX-NL database. Following Brysbaert and New (2009) and Keuleers et al. (2010), we assigned values of freq pm = .02 and log10 = −1.64 to these words, in line with the size of the SUBTLEX-NL corpus (43.8 million words).

The data for the first four variables are split into three columns: the data of the global sample (All), followed by those of the women (Women), and those of the men (Men). Furthermore, there is a column with the percentage of participants (across all ratings) who indicated that they did not know the word. Finally, we added a column in which each word received a code for the most frequent grammatical category (part of speech) to which it belongs: nouns (N), adjectives and adverbs (A), verbs (V), and a small rest category with numerals and interjections (R). Researchers are referred to the SUBTLEX-NL file for more information about the words (Keuleers et al., 2010a; also available online at http://crr.ugent.be/isubtlex/). It may be noted that, in line with most previous research, participants did not receive explicit instructions about ambiguous words. Thus, this ambiguity may be reflected in the rating variability.

Descriptive statistics of the variables are presented in Table 1. Figures 1, 2, 3 and 4 show plots of the means and standard deviations of the ratings (together with the English translations of some outliers) for all dependent variables. The scatterplot for valence/pleasantnesss (Fig. 1) shows that there are two types of words in the midrange (around the score of 4): (1) words with low SDs upon which participants agree that they are neutral and (2) words with high SDs that elicited both high and low values from different participants (examples are “pugnacious” and “complacent”). Inspection of the scatterplot for arousal/activity (Fig. 2) shows that there is more consensus about the high-arousing and low-arousing words than about the words in the midrange (around the score of 4). The scatterplot for power/dominance (Fig. 3) is somewhat similar to that for valence, but less pronounced. Finally, the scatterplot of AoA shows that the SDs increase with increasing means. This suggests that participants learn similar words in the first years of life but show more variability in later years (also recall that participants were not using a Likert scale for this variable).

Table 1 Summary of variables included in the word list with means (Ms), standard deviations (SDs), and range
Fig. 1
figure 1

Mean valence/pleasantness ratings plotted against the SDs for these ratings for all 4,299 words

Fig. 2
figure 2

Mean arousal/activity ratings plotted against the SDs for these ratings for all 4,299 words

Fig. 3
figure 3

Mean power/dominance ratings plotted against the SDs for these ratings for all 4,299 words

Fig. 4
figure 4

Mean AoA ratings plotted against the SDs for these ratings for all 4,299 words

Reliability

We calculated the split-half reliabilities for each sample separately. Samples were split into halves by using the entrance ranks of the participants (separately for males and females) and making a distinction between the participants with odd and even ranks. For each group, we calculated the mean rating for each word, and we then correlated the means of both groups. As is shown in Table 2, the adjusted correlations using the Spearman–Brown formula were very high, ranging from r = .82 to r = .97. Furthermore, we obtained high correlations of at least r = .82 between the samples. The fact that the correlations between samples were as high as the correlations within samples indicates that the ratings of the words were not subject to strong regional differences, meaning that the average values can be used across the entire Dutch-speaking area (remember that we selected words known in both Flanders and The Netherlands).

Table 2 Split-half reliabilities for each variable within and between samples
Table 3 Correlations between the variables

To further test the generalizability of our ratings, we correlated them with ratings from previous studies. For the valence ratings, there were 715 words in common with Hermans and De Houwer (1994). Figure 5 shows a strong linear relationship between the ratings of both studies, r = .96. For AoA, we correlated our ratings with those of Ghyselinck et al. (2000) and Ghyselinck, Custers, and Brysbaert (2003). For the first study, there were 1,307 words in common and a correlation of r = .93; for the second study, there were 710 words in common and a correlation of r = .95.

Fig. 5
figure 5

Mean valence ratings in the present sample plotted against the mean valence ratings of the sample of Hermans and De Houwer (1994) for all 715 retested words

Correlations between variables

Pearson correlations were calculated between the affective variables, AoA, frequency, and word length (i.e., number of letters) (Table 3). No linear relation was found between valence/pleasantness and activity/arousal, but we did obtain a quadratic relation: After centering the mean ratings of valence/pleasantness, we obtained a positive correlation (r = .29) between the square of the centered valence/pleasantness scores and activity/arousal (Fig. 6). Power/dominance had a positive correlation with valence/pleasantness (r = .27; Fig. 7) and a high positive correlation with activity/arousal (r = .59; Fig. 8). Thus, words rated as more dominant were also rated as more positive and more active.

Fig. 6
figure 6

Mean valence/pleasantness ratings plotted against mean arousal/activity ratings

Fig. 7
figure 7

Mean valence/pleasantness ratings plotted against power/dominance ratings

Fig. 8
figure 8

Mean arousal/acitivity ratings plotted against mean power/dominance ratings

AoA correlated negatively with valence/pleasantness (r = −.17) and positively with power/dominance (r = .08), suggesting that words that were learned early in life were rated as more positive and less dominant. No linear relation was found between AoA and activity/arousal (r = .03).

Frequency had a positive correlation with valence/pleasantness (r  =  .15), activity/arousal (r = .10), and power/dominance (r = .17), indicating that frequent words were rated as more positive, more active, and more dominant.

Word length had a low negative correlation with valence/pleasantness (r = −.08), as well as low positive correlations with activity/arousal (r = .19) and power/dominance (r = .08). This means that longer words were rated as slightly more negative, somewhat more active, and slightly more dominant. AoA had a strong negative correlation with frequency (r = -.60) and a positive correlation with word length (r = .33), indicating that words learned early in life are more frequent and shorter. Frequency and word length (r = −.25) also correlated negatively, which means that more frequent words are shorter. All reported correlations were significant, with p < .001.

Discussion

We collected word norms for 4,300 Dutch words for the affective variables valence/pleasantness, activity/arousal, and power/dominance and for AoA. Ratings for the first three variables were performed with 7-point Likert scales; the AoA ratings reflect the age at which participants thought that they had acquired the words. Ratings were collected at different universities to make sure that they applied to all Dutch-speaking regions. Virtually all words belong to the grammatical categories of nouns, adjectives, adverbs, and verbs. Our study goes beyond previous studies (in Dutch and other languages) in that we obtained ratings on more affective variables, for a larger set of words, covering more grammatical categories, and carried out by more populations.

We observed high split-half reliabilities within samples and equally high correlations between samples, indicating that there is a large agreement among the students within and between the various samples. We also found that the ratings of previous, more limited studies (Ghyselinck et al., 2003; Ghyselinck et al., 2000; Hermans & De Houwer, 1994) generalized to those of the present study. We can therefore conclude that the norms that we obtained are reliable and can be used confidently for the selection of words in affective research.

An exploration of the relations between the affective variables revealed a quadratic relation between valence/pleasantness and activity/arousal. This confirms previous findings of a small but consistent U-shaped relationship between valence and arousal in studies with words (e.g., Bradley & Lang, 1999a; Kanske & Kotz, 2010; Redondo et al., 2007; Võ et al., 2009) and pictures (e.g., Bradley & Lang, 1994; Cuthbert, Bradley, & Lang, 1996).

We also found positive correlations of power/dominance with valence/pleasantness and with activity/arousal. Few previous studies collected ratings for dominance in addition to ratings for valence and arousal, and even fewer studies reported on the relation between dominance and other variables. Studies that did collect ratings for dominance (or related constructs such as potency or control; Bradley & Lang, 1994, 1999a; Grühn & Smith, 2008; see also Keltner, Gruenfeld, & Anderson, 2003) reported positive correlations between dominance and valence. The results are mixed for dominance and arousal: Grühn and Smith (2008; 200 words) reported no correlation (r = −.09, n.s.); an analysis performed by us on the Bradley and Lang (1999a; 1,030 words) data revealed a weak positive correlation (r = .07, p = .021); Bradley and Lang (1994; 21 pictures) reported negative correlations (ranging from r = −.14 to r = −.57). Several factors may have contributed to this divergence. First, participants in our study rated the active and dominant meaning of the stimuli, whereas participants in the other studies rated their own feelings of activity and dominance in response to the stimuli. Thus, a participant may rate a snake as having an active and dominant meaning but his/her own feelings as active and submissive. This may have played less in the Bradley and Lang (1999a) study, because the stimuli were words that referred not only to emotion-eliciting stimuli (like snakes and injuries) but also to emotional states (like fear and anger). Second, the divergent correlations may be due to differences between the samples of words tested (given that each study presented only a subsample of the words). Bradley and Lang (1994), for instance, collected ratings for only 21 pictures. As in all rating studies, the correlations obtained reflect the structure of the specific stimulus set used. Larger stimulus sets are more likely to be representative for the universe of stimuli than are smaller stimulus sets. Third, participants in our study each rated only one affective variable (i.e., between-subjects design), whereas participants in the other studies rated all affective variables (i.e., within-subjects design). Thus, it could be argued that the participants in our study were focused less on the differences between dominance and arousal than were the participants in the other studies. This may explain why we obtained a stronger positive correlation between valence and arousal than did the other word rating studies (Bradley & Lang, 1999a; Grühn & Smith, 2008).

Several of the other patterns of correlations that we observed are compatible with previous findings as well. That is, other studies confirmed that words learned early in life are more positive (Citron, Weekes, & Ferstl, 2009), more frequent (Citron et al., 2009; Ghyselinck et al., 2000; Morrison et al., 1997; Stadthagen-Gonzales & Davis, 2006), and shorter (Ferrand et al., 2008), that frequent words are more positive (Grühn & Smith, 2008) and shorter (Ferrand et al., 2008; Grühn & Smith, 2008), and that high-arousing words are longer (Grühn & Smith, 2008). To conclude, we believe that the present study will be a valuable source of information for affective research that makes use of Dutch words.