The many faces of a face: Comparing stills and videos of facial expressions in eight dimensions (SAVE database)

Garrido, Margarida V.; Lopes, Diniz; Prada, Marília; Rodrigues, David; Jerónimo, Rita; Mourão, Rui P.

doi:10.3758/s13428-016-0790-5

The many faces of a face: Comparing stills and videos of facial expressions in eight dimensions (SAVE database)

Published: 29 August 2016

Volume 49, pages 1343–1360, (2017)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

The many faces of a face: Comparing stills and videos of facial expressions in eight dimensions (SAVE database)

Download PDF

Margarida V. Garrido^1,2,
Diniz Lopes²,
Marília Prada²,
David Rodrigues^2,3,
Rita Jerónimo² &
…
Rui P. Mourão⁴

5377 Accesses
29 Citations
1 Altmetric
Explore all metrics

Abstract

This article presents subjective rating norms for a new set of Stills And Videos of facial Expressions—the SAVE database. Twenty nonprofessional models were filmed while posing in three different facial expressions (smile, neutral, and frown). After each pose, the models completed the PANAS questionnaire, and reported more positive affect after smiling and more negative affect after frowning. From the shooting material, stills and 5 s and 10 s videos were edited (total stimulus set = 180). A different sample of 120 participants evaluated the stimuli for attractiveness, arousal, clarity, genuineness, familiarity, intensity, valence, and similarity. Overall, facial expression had a main effect in all of the evaluated dimensions, with smiling models obtaining the highest ratings. Frowning expressions were perceived as being more arousing, clearer, and more intense, but also as more negative than neutral expressions. Stimulus presentation format only influenced the ratings of attractiveness, familiarity, genuineness, and intensity. The attractiveness and familiarity ratings increased with longer exposure times, whereas genuineness decreased. The ratings in the several dimensions were correlated. The subjective norms of facial stimuli presented in this article have potential applications to the work of researchers in several research domains. From our database, researchers may choose the most adequate stimulus presentation format for a particular experiment, select and manipulate the dimensions of interest, and control for the remaining dimensions. The full stimulus set and descriptive results (means, standard deviations, and confidence intervals) for each stimulus per dimension are provided as supplementary material.

Perceived emotion genuineness: normative ratings for popular facial expression stimuli and the development of perceived-as-genuine and perceived-as-fake sets

Article 07 December 2016

Amy Dawel, Luke Wright, … Elinor McKone

Professional actors demonstrate variability, not stereotypical expressions, when portraying emotional states in photographs

Article Open access 19 August 2021

Tuan Le Mau, Katie Hoemann, … Lisa Feldman Barrett

Padova Emotional Dataset of Facial Expressions (PEDFE): A unique dataset of genuine and posed emotional facial expressions

Article Open access 24 August 2022

A. Miolla, M. Cardaioli & C. Scarpazza

The human face plays a fundamental role in social interaction. The perception of facial attributes of our conspecifics, for instance, seems crucial for evaluating whether a person is approachable or avoidable (Oosterhof & Todorov, 2008). Indeed, many of the inferences, judgments, and decisions we make about other people are based on their physical appearance, namely their facial features (for an extensive review, see Calder, Rhodes, Johnson, & Haxby, 2011).

Faces communicate a variety of information about a person, from gender, ethnic background, and age, to affective states. For example, people form personality impressions from the facial appearance of other individuals, a process often based on rapid, intuitive, and unreflected mechanisms (Ferreira et al., 2012). Evidence for the validity of the information inferred from facial appearance is mixed with some studies suggesting that these inferences can be fairly accurate, and others showing that facial cues are often misinterpreted (Olivola & Todorov, 2010b; Todorov, Olivola, Dotsch, & Mende-Siedlecki, 2015). Whether accurate or not, people do act upon this information with consequential effects in a variety of domains including mate choices, economic decisions, sentencing decisions, and occupational and electoral success (for reviews, see Todorov, 2012; Todorov et al., 2015).

In the specific domain of face recognition, research has been conducted in a wide range of topics, including identity perception (e.g., Grill-Spector & Kanwisher, 2005), emotion recognition (e.g., Calvo & Nummenmaa, 2016; Russell, 1994), gender discrimination, and age recognition (e.g., M. G. Rhodes, 2009; T. Watson, Otsuka, & Clifford, 2015). These topics have been investigated by means of a variety of methods, including behavioral, cognitive, computational, and neuroimaging (see Calder et al., 2011). Moreover, faces have also been used as stimulus materials in a multiplicity of research areas, including emotion (e.g., Ekman & Friesen, 1971), mimicry (e.g., Hess & Fischer, 2013), emotional contagion (e.g., Hess & Blairy, 2001), interpersonal attractiveness (e.g., Olson & Marshuetz, 2005), weight estimation (e.g., T. M. Schneider, Hecht, & Carbon, 2012), affective priming (e.g., Murphy & Zajonc, 1993), impression formation and person memory (e.g., Todorov et al., 2015), communication and intergroup relations (e.g., Van der Schalk, Fischer, et al., 2011), and eyewitness identification (e.g., Lindsay, Mansour, Bertrand, Kalmet, & Melsom, 2011), and in the study of neuro- and psychological disorders such as autism, prosopagnosia, schizophrenia, and mood disorders (e.g., Behrmann, Avidan, Thomas, & Nishimura, 2011).

Considering the importance and extensive use of faces as stimulus materials, the availability of validated sets acquires utmost importance for the scientific community. The variety of these sets is also important, not only in terms of model features (e.g., age, sex, ethnicity, facial expression) and the dimensions included in the validation procedures (e.g., valence, clarity), but also in stimulus formats (e.g., stills, videos). However, despite the fact that in our daily interactions we often perceive people in motion, most of the available databases include static facial images, which may challenge their ecological validity (e.g., Koscinski, 2013; G. Rhodes et al., 2011; Van der Schalk, Hawk, et al., 2011), and even fewer compare the same faces in different formats (i.e., static vs. dynamic).

In this article, we develop and validate a new set of Stills And Videos of facial Expressions—the SAVE database—that provides norms for the same model displaying three facial expressions (frown, neutral, and smile). The motivations to develop these norms were to validate a stimulus set in both still and video formats and across wide range of relevant evaluative dimensions, as well as to contribute to the phenotypic diversity of the models included in these types of databases. These norms will be useful for different experimental paradigms, particularly when the manipulation (and strict control) of the stimulus characteristics and presentation format is required by the varying demands of different researchers. The review of the available databases presented in the subsequent section will further clarify the relevance of the present work.

Facial expressions databases

The available literature already offers a great amount of validated databases of facial expressions (for a review, see Bänziger, Mortillaro, & Scherer, 2012; for an extensive list, see www.face-rec.org/databases/). These databases are highly diverse regarding model characteristics (e.g., human vs. computer-generated, age, ethnicity, nationality, professional actors or amateur volunteers), expressions portrayed (e.g., specific emotions or mental states), stimulus format (stills or videos), and validation procedures (e.g., coding systems, sample characteristics, and evaluative dimensions included).

Most of the reviewed databases include real human models, with a few including morphed human faces (e.g., Max Planck Institute Head Database: Troje & Bülthoff, 1996) and even avatars (e.g., Fabri, Moore, & Hobbs, 2004). We will focus on databases that include real human models. Among these, some include professional actors (NimStim Set of Facial Expressions: Tottenham et al., 2009), but most use lay volunteers that are either extensively trained (e.g., in using explicit guidelines regarding the optimal representation of the intended facial expressions; Ebner, Riediger, & Lindenberger, 2010) or coached by the experimenters (e.g., by encouraging models to imagine situations that would elicit the intended facial expressions; Warsaw Set of Emotional Facial Expression Pictures—WSEFEP: Olszanowski et al., 2015).

Also, the models portrayed in the different databases are highly diverse in terms of age. For example, some databases exclusively include stimuli portraying children, such as the NIMH Child Emotional Faces Picture Set (Egger et al., 2011), the Dartmouth Database of Children’s Faces (Dalrymple, Gomez, & Duchaine, 2013), or the Child Affective Facial Expression set (LoBue & Thrasher, 2015). Yet, most databases include young to middle-aged adults (e.g., the Karolinska Directed Emotional Faces—KDEF: Lundqvist, Flykt, & Öhman, 1998; NimStim: Tottenham et al., 2009; WSEFEP: Olszanowski et al., 2015) or older adults (e.g., FACES: Ebner et al., 2010).

Face databases also vary in the nationality and ethnicity of the models. For instance, they include Argentinian (Argentine Set of Facial Expressions of Emotion: Vaiman, Wagner, Caicedo, & Pereno, 2015), Chinese (Wang & Markham, 1999), Polish (WSEFEP: Olszanowski et al., 2015), or Swedish models (Umeå University Database of Facial Expressions: Samuelsson, Jarnvik, Henningsson, Andersson, & Carlbring, 2012). Regarding the models’ ethnicity, most databases include exclusively (e.g., Radboud Faces Database—RaFD: Langner et al., 2010; FACES: Ebner et al., 2010; KDEF: Lundqvist et al., 1998) or a majority (the McEwan Faces: McEwan et al., 2014; NimStim: Tottenham et al., 2009) of white or European descent models (for exceptions, see, e.g., the Chicago Face Database—CFD [Ma, Correll, & Wittenbrink, 2015] and the Japanese and Caucasian Facial Expression of Emotion—JACFEE [Matsumoto & Ekman, 1988]).

Regarding the facial expressions portrayed by the models, most stimulus sets include at least a subset of the following emotions: anger, disgust, fear, happiness, sadness, surprise, contempt (e.g., Pictures of Facial Affect: Ekman & Friesen, 1976; JACFEE: Matsumoto & Ekman, 1988; RaFD: Langner et al., 2010; FACES: Ebner et al., 2010), and some also include neutral facial expressions (e.g., NimStim: Tottenham et al., 2009). Others further include embarrassment, pride and shame (University of California, Davis, Set of Emotion Expressions: Tracy, Robins, & Schriber, 2009), or kindness and critical facial expressions (e.g., McEwan et al., 2014). Some databases also contain body expressions (e.g., Bochum Emotional Stimulus Set: Thoma, Soria Bauser, & Suchan, 2013; Bodily Expressive Action Stimulus Test: de Gelder & Van den Stock, 2011).

Regarding the validation procedures, a few databases resort to highly trained raters (using facial action units to evaluate the expressions; e.g., Ekman & Friesen, 1977, 1978), whereas others have used samples of untrained volunteers (e.g., CAFE: LoBue & Thrasher, 2015; NimStim: Tottenham et al., 2009).

Most validation studies have only assessed a limited set of dimensions. Indeed, validation procedures usually focus on emotion recognition, either using forced choice tasks (e.g., Vaiman et al., 2015) or rating scales (e.g., agreement with items such as “This person seems to be angry”; Samuelsson et al., 2012). To our knowledge, only a few exceptions go beyond emotion recognition. For example, the CFD (Ma et al., 2015) also includes target categorization measures (age estimation, racial/ethnic categorization, gender identification), and a set of subjective ratings (e.g., threatening, masculine, feminine, baby-faced, attractive, trustworthy, unusual) as well as objective physical facial features (e.g., nose width, lip thickness, face length, distance between pupils). Likewise, the RaFD (Langner et al., 2010) includes measures of intensity, clarity and genuineness of expression as well as overall valence and target attractiveness.

Another distinctive feature in the available face databases is stimulus format. Most databases include static stimuli (i.e., stills or photographs of facial expressions). However, a few video databases have recently been developed and validated (e.g., MAHNOB Laughter Database: Petridis, Martinez, & Pantic, 2013; Cohn–Kanade AU-Coded Facial Expression Database: Kanade, Cohn, & Tian, 2000; Geneva Multimodal Emotion Portrayals Core Set: Bänziger et al., 2012; and the Amsterdam Dynamic Facial Expression Set: Van der Schalk, Hawk, et al., 2011). For example, ADFES includes brief videos (maximum 6.5 s) of North-European (Dutch) and Mediterranean (second- or third-generation migrants of Turkish or Moroccan descent) models displaying joy, anger, sadness, fear, disgust, surprise, contempt, pride, and embarrassment. These videos were evaluated regarding emotion recognition and model ethnicity, but also in other dimensions such as overall valence, arousal as well as perceived directedness, perceived causation of the emotion, liking and approach–avoidance. Another example is the EU-Emotion Stimulus Set (O’Reilly et al., 2016), which includes videos (2–52 s long) of a broader set of 20 emotions/mental states (e.g., afraid, happy, sad, bored, jealous, and sneaky), and also body gestures and contextual social scenes. The videos were assessed regarding emotional display (forced choice task), valence and intensity of the expression, and the arousal felt by the participants upon exposure to a given video.

Static versus dynamic facial expressions

The majority of studies addressing the processing of facial information have predominantly used static facial stimuli. However, the extensive use of these types of stimuli has recently been questioned (e.g., Horstmann & Ansorge, 2009; Roark, Barrett, Spence, Abdi, & O’Toole, 2003). Specifically, the critiques refer to the low ecological validity of static stimuli, namely because they lack in temporal aspects of facial motion that are relevant for the recognition of facial expressions (e.g., Alves, 2013; O’Reilly et al., 2016). Presentation format might thus affect the amount of information that is retrieved from a given stimulus (e.g., Langlois et al., 2000).

Studies comparing static and dynamic facial expressions (with the latter referring to the buildup of a facial expression from a baseline expression to the full-display of the emotion) are scarce. However, the few studies that have compared participants’ ability to recognize expressions evolving through time from static images of full expressions (e.g., Cunningham & Wallraven, 2009; Fiorentini & Viviani, 2011) have reported interesting results. For example, perceivers performed better when pictures of emotional displays were presented in sequence than with a static presentation (Wehrle, Kaiser, Schmidt, & Scherer, 2000). Moreover, some studies have suggested that perceivers exposed to dynamic (vs. static) emotional displays were more physiologically aroused and exhibited greater facial mimicry (for a review, see Rymarczyk, Żurawski, Jankowiak-Siuda, & Szatkowska, 2016).

But these studies have yielded far from consensual results. For example, some studies reported that dynamic stimuli offer processing advantages (e.g., Ambadar, Schooler, & Cohn, 2005; Bould & Morris, 2008; Cunningham & Wallraven, 2009; Wehrle et al., 2000), namely due to the salience of emotional expressions during dynamic stimuli exposure (e.g., Horstmann & Ansorge, 2009; Rubenstein, 2005). Other studies suggested the contrary (e.g., Fiorentini & Viviani, 2011; Katsyri & Sams, 2008). Yet others indicated no difference between the effects of different stimulus presentation formats. For example, Gold and colleagues (2013) observed that the dynamic properties of facial expressions play a very small role in the perceivers’ ability to recognize facial expressions. Also, the results from Hoffmann, Traue, Limbrecht-Ecklundt, Walter, and Kessler (2013) suggest that stimulus presentation format does not influence the overall recognition of emotions, although the recognition of specific emotions (e.g., surprise and fear) seems to benefit from dynamic presentation. In the context of research on facial attractiveness, static versus dynamic stimuli comparisons did not yield differences in the evaluation of target attractiveness (e.g., Koscinski, 2013; G. Rhodes et al., 2011).

However, the disparity of results across studies may derive from confounds between the amount of information conveyed by static versus dynamic stimuli and the perceivers ability to use such information (Gold et al., 2013). The lack of consistency across studies may also result from the nature of the stimulus materials used. Although some studies have used stimuli based on real human models (actors or nonactors), others included avatars or computer edited faces (e.g., Cigna et al., 2015; Gold et al., 2013; Horstmann & Ansorge, 2009). Yet some authors (e.g., Sato, Fujimura, & Suzuki, 2008) suggest that the use of “real people” is more suitable when using dynamic stimuli.

Finally, and to the best of our knowledge, none of these databases included stimuli that matched the same facial expression across formats (i.e., stills and videos), with the facial expression set “on hold” for a fixed period of time in the video format, and compared them across several subjective dimensions. This is exactly the type of stimulus set we are presenting and validating in this article.

Overview

The present article presents a set of standardized stimulus materials of real human faces that combines important features of the available databases and that can be adjustable to specific research demands. Our database includes subjective normative ratings of stills and videos (of 5 and 10 s) of the same model displaying a negative (frown), neutral, and positive (smile) facial expressions, in several relevant dimensions (attractiveness, arousal, clarity, genuineness, familiarity, intensity, valence, similarity).

We had multiple motivations for developing these norms. First, most of the existing sets include emotion recognition as the main dependent variable. Instead, in the present study we were interested in having each face evaluated in several dimensions. The normative ratings in several dimensions allows the selection of subsets of stimuli to manipulate a specific dimension (e.g., valence) while controlling for others (e.g., attractiveness), particularly in studies using faces as stimuli outside the emotion recognition domain.

Second, and as stated above, evidence comparing emotion recognition of dynamic versus static stimuli is mixed. To our knowledge, none of the existing validated sets permits the comparison of the same model displaying the exact same facial expression in different formats (stills vs. videos) across several dimensions. Our set includes these types of stimulus. Importantly, our videos do not depict the buildup of an expression, but present a facial expression set on hold. This allows a direct comparison between stimuli formats. Moreover, we investigate the impact of these stimuli formats in several subjective dimensions.

Third, our database also includes faces with neutral expression (e.g., NimStim: Tottenham et al., 2009) that can be used as a baseline against which the effects of other facial expressions are compared with. For example, the evidence that a positive or negative face prime influences performance (e.g., Murphy & Zajonc, 1993) becomes more convincing when such a baseline is used.

Finally, although previous sets have included models from different nationalities, no published set includes Portuguese models. Note that most European databases were developed and tested in Northern Europe with models that often have phenotypic features (e.g., hair or eye color) that are different from Southern European ones. Moreover, even those databases that included (the so-called) Mediterranean models (Turkish or Moroccan descendants; ADFES: Van der Schalk, Hawk, et al., 2011) may not be suitable as the facial features of people in other Mediterranean countries in Europe (e.g., Spain, Italy, France, Greece) can be phenotypically quite different from, at least, those of Moroccan models.

In the following section, we provide an overview of the dimensions that have been reported in the literature and that were used in the present study to evaluate the faces. The relevance of these dimensions for face evaluation and their associations are also discussed. These dimensions were selected from those that are commonly used to evaluate other types of visual stimuli (e.g., symbols: Prada, Rodrigues, Silva, & Garrido, 2015; pictures [IAPS]: Lang, Bradley, & Cuthbert, 2008), as well as from face databases (e.g., RaFD: Langner et al., 2010; ADFES: Van der Schalk, Hawk, et al., 2011) that go beyond the scope of emotion recognition.

Dimensions of interest

Valence

Valence is defined by the intrinsic attractiveness or aversiveness of a given stimulus (e.g., Frijda, 1986). Not only it is a basic property of emotion experience, but is also a fundamental component of emotional responding (Barrett, 2006). Therefore, emotional valence can modulate the characteristics and intensity of emotional responses (Adolph & Alpers, 2010; Nyklicek, Thayer, & Van Doornen, 1997). This modulation is especially true for facial stimuli (Langner et al., 2010; Russell & Bullock, 1985). For example, eyebrow frowning (produced by contracting the corrugator supercilii) is associated with unpleasant experiences, and raised lip corners (produced by contracting the zygomaticus major) are associated with pleasant ones (for a review, see Colombetti, 2005). The valence of facial stimuli has been assessed in a few validation studies (e.g., Adolph & Alpers, 2010; Langner et al., 2010; McEwan et al., 2014; O’Reilly et al., 2016; Van der Schalk, Hawk, et al., 2011). In the present study, we asked participants to indicate the extent to which the expression displayed by the target was negative–positive (1 = Very negative, 7 = Very positive; e.g., Langner et al., 2010; McEwan et al., 2014; O’Reilly et al., 2016).

Arousal

Arousal emerges as a highly relevant dimension of affect, differentiating states of excitement or high activation from states of calm/relaxed or low activation (Osgood, Suci, & Tannenbaum, 1957). Arousal has been assessed in some of the available normative ratings of facial stimuli (e.g., Adolph & Alpers, 2010; Goeleven et al., 2008; McEwan et al., 2014; Van der Schalk, Hawk, et al., 2011). Some of these studies have already established that arousal interacts with other variables, as it is the case of valence. Indeed, convergent empirical evidence indicates that the higher the positive or negative valence of a stimulus is, the more arousing the stimulus is perceived to be (Backs, da Silva, & Han, 2005; Barrett & Russell, 1998; Ito, Cacioppo, & Lang, 1998; Lang et al., 2008; Libkuman, Otani, Kern, Viger, & Novak, 2007). In the present study, arousal was measured for each stimulus by asking participants to indicate to which extent the expression displayed by the target was relaxed or excited (1 = Very relaxed, 7 = Very excited; e.g., McEwan et al., 2014; Van der Schalk, Hawk, et al., 2011).

Clarity

Clarity refers to the amount and quality of the emotional information available to the perceiver (Ekman, Friesen, & Ellsworth, 1982; Fernandez-Dols, Sierra, & Ruiz-Belda, 1993). Thus, clarity is fundamental to the perception of facial expressions as well as to achieve mutual adjustment between people (e.g., Bach, Buxtorf, Grandjean, & Strik, 2009). Clarity has also been defined as referring to the reliability of the signal that permits a quick, accurate and efficient recognition of a facial expression (e.g., Tracy & Robins, 2008). Clarity is therefore often inferred from the capability for accurately identifying the emotions (Bach et al., 2009). Some studies have shown that clarity judgments depend on whether the expression of a specific emotion is genuine or simulated (Gosselin, Kirouac, & Doré, 2005). Indeed, although accuracy in judging simulated expressions is generally high (Ekman, 1982), other evidence suggests that performance in judging the clarity of genuine emotional expressions is not better than chance (Motley & Camden, 1988; Wagner, MacDonald, & Manstead, 1986). Still other studies have shown that clarity can be relatively independent of genuineness (e.g., Langner et al., 2010). Clarity is also positively related to intensity (e.g., Langner et al., 2010). In the present study, subjective clarity was measured by asking participants to judge the extent to which the facial expression displayed by the target was clear (1 = Very unclear, 7 = Very clear; e.g., Langner et al., 2010).

Intensity

The perceived intensity refers to an estimate of the magnitude of the subjective impact of an emotional event or stimulus, and is probably one of the most noticeable aspects of an emotion (Sonnemans & Frijda, 1994). Higher perceived intensity in a facial expression is likely to improve decoding accuracy, but does not necessarily lead to more intense emotional states (Adolph & Alpers, 2010). However, empirical evidence has been showing that the perception of intensity in emotional expressions is not straightforward. Indeed, Hess, Blairy, and Kleck (1997) showed that high intensity was only perceived in negative facial expressions of male actors and positive facial expressions of female actors. In the present study, intensity was measured by asking participants to rate the weakness or strength of the facial expressions depicted in the stimuli presented (1 = Very weak, 7 = Very strong; e.g., Langner et al., 2010).

Attractiveness

The attractiveness of a face refers to the perceived facial appearance of a given target person (e.g., Koscinski, 2013). Some studies have already established that averageness and symmetry of a face are important characteristics for the face to be perceived as attractive (e.g., G. Rhodes, 2006). Attractive faces are also perceived as more similar (e.g., Miyake & Zuckerman, 1993), more positive (Reis et al., 1990) and more familiar (e.g., Monin, 2003). This dimension has important consequences in different interpersonal processes, such as impression formation (e.g., Eagly, Ashmore, Makhijani, & Longo, 1991), social distance (e.g., Lee, Loewenstein, Ariely, Hong, & Young, 2008), perception of mate quality (G. Rhodes, Halberstadt, & Brajkovich, 2001, 2001) and feelings of attraction (Rodrigues & Lopes, 2016) . Attractiveness is one of the few dimensions in which static images and video presentations have been compared (e.g., G. Rhodes et al., 2011; Rubenstein, 2005; for a review see Koscinski, 2013). Overall, these judgments did not differ according to presentation modality (e.g., Koscinski, 2013). Some of the existing databases include this dimension, although usually models are only evaluated when displaying a neutral expression (e.g., RaFD: Langner et al., 2010; CFD: Ma et al., 2015). In the present study we asked participants to indicate the extent to which they considered the target as attractive (1 = Very unattractive, 7 = Very attractive; Langner et al., 2010; G. Rhodes et al., 2011) across facial expressions and presentation format conditions.

Similarity

Similarity with a target refers to the perception of how similar a given target is to the individual (Byrne, 1997). Several studies have shown that similarity can refer to aspects such as attitudes, values or beliefs, personality traits or attributes such as physical appearance or physical attractiveness (Montoya & Horton, 2013). Research has shown that in the absence of additional objective information about the target, individuals tend to perceive greater similarity to oneself (Hoyle, 1993), an effect that is maintained even after an interaction with the target (Montoya, Horton, & Kirchner, 2008). Presumably, this occurs because perceived similarity helps decrease the uncertainty associated to the target (Ambady, Bernieri, & Richerson, 2000). In the present study, we asked participants to indicate to which extent they perceived the target to be similar to themselves (1 = Not at all, 7 = Very; e.g., Norton, Frost, & Ariely, 2007).

Familiarity

The perceived familiarity with a face refers to the averageness of the physical attributes of a given target, such that the more average or prototypical a face is, the more familiar the face is perceived (e.g., Langlois, Roggman, & Musselman, 1994). Familiarity is highly relevant for person perception because it influences judgments in several other dimensions. For instance, more familiar targets are perceived as more similar to oneself (Moreland & Beach, 1992), elicit more positive feelings in the individual (Garcia-Marques, 1999) and greater muscular activity in accordance with these feelings (e.g., zygomaticus major; Winkielman & Cacioppo, 2001). Likewise, positive stimuli are perceived as more familiar (Garcia-Marques, Mackie, Claypool, & Garcia-Marques, 2004) and when participants contract a specific facial muscle (zygomaticus major) while looking at a stimulus, they perceive the stimuli as more familiar (Phaf & Rotteveel, 2005). In the present study we asked participants to indicate the extent to which they considered the target to be familiar (1 = Not familiar at all, 7 = Very familiar; Kennedy, Hope, & Raz, 2009).

Genuineness

The genuineness of facial expressions refers to the extent to which a given expression is considered a truthful reflection of the emotion the target is experiencing (Livingstone, Choi, & Russo, 2014). This is a highly relevant dimension for social interaction, as targets are perceived differently when portraying a genuine or a simulated emotion. For instance, targets are perceived more positively when depicting a genuine smile (e.g., Duchenne smile, possibly indicating happiness), than when depicting a forced smile (Miles & Johnston, 2007). In the present study, we asked participants to rate how faked or genuine was the facial expression portrayed by the target (1 = Faked, 7 = Genuine; Langner et al., 2010).

The brief literature review presented on the evaluative dimensions selected suggests the relevant role of each one of them (and their interactions) for a comprehensive assessment of facial expressions. In the following sections we present the development of the stimulus materials and subsequently we examine the impact of facial expression as well as the role of stimuli presentation format (i.e., stills vs. videos) on each evaluative dimension. We also present the subjective norms for each stimulus in each of these dimensions and the correlations between dimensions.

Development of the stimulus set

Method

Participants

Twenty white Portuguese students (60 % male; M _age = 21.75 years, SD = 1.97) from different universities located in Lisbon participated in the development of the stimulus set by posing to a camera in three different facial expressions: frown, neutral, and smile. The order of the posed facial expression was counterbalanced. The procedure was conducted in agreement with the Ethics Guidelines issued by the Scientific Commission of the host institution. For their collaboration participants were compensated with a €5 voucher.

Apparatus

We used a JVC Video Camera (Model HDR-CX210E), and participants were filmed with 1,080 × 1,920 pixels HD resolution, a frame rate of 50p, resulting in a Mpeg file HD 422. The participants were lit from the front with a 60 cm diameter China ball with a standard 50 W light bulb, and exposure compensation in the camera was made accordingly. The China ball was placed about 60 cm above the camera at an equal distance between the camera and the participant. The lightening apparatus was used to soften the light distribution in the shooting field as well as to avoid shades in the participant’ faces and to prevent eye frowning. There was also a foldout white reflector on a stand about 60 cm to the right of the participants providing some filling light. The shooting room had an armchair backed against a grey wall, and the camera was placed on a tripod in front of the armchair at an approximate distance of 50 cm. A small bright yellow plastic stick was glued on the top of the camera near the lenses and served as participants’ eye fixation point. Participants posed for the camera for approximately 6 min, 2 min for each facial expression. Further information regarding participants’ preparation for the shooting sessions is provided below.

Procedure

The videos were recorded in late 2014 at the psychology laboratory of Instituto Universitário de Lisboa (ISCTE-IUL). Upon arrival, participants were briefed about the goals of their participation (i.e., to develop a set of visual stimuli, namely of people displaying different facial expressions) and its expected duration (i.e., 30 min). The consent form clearly stated that their collaboration was voluntary and that they could withdraw at any time. By signing the consent form participants also agreed that the resulting images and video recordings databases would be made available in academic journals and could be used as stimulus materials in future studies.

All sessions were individual and room temperature and lighting were kept constant. The experimenter asked the participants to change into a white t-shirt and to remove all accessories (e.g., jewelry, glasses) and makeup. Male participants were not instructed to remove facial hair (to have male faces that are more representative of faces people see every day; see Tottenham et al., 2009). Makeup powder foundation was applied to all participants to even-out skin imperfections and control skin shinning. A professional film editor with experience in directing recorded the participants for approximately 2 min per facial expression. Participants were asked to sit in an upright position in the armchair placed in front of a grey wall, facing the camera and to focus their gaze in the fixation point set above the camera. Participants were asked to keep their mouth closed during shooting to avoid showing their teeth (e.g., Tottenham et al., 2009). During the recordings, to obtain a varied set of facial expressions (i.e., frowning, neutral and smiling) the director referred to some scenarios, or asked participants to imagine or remember situations that would elicit the intended expression (for similar instructions, see Dalrymple et al., 2013; Olszanowski et al., 2015). For example, to obtain a smiling expression the director asked the participants to think about a funny event that they had recently experienced.

After recording each facial expression, participants responded to the Portuguese adaptation of the Positive and Negative Affect Schedule (PANAS: Galinha & Pais-Ribeiro, 2012). This measure, originally developed by D. Watson, Clark, and Tellegen (1988), assesses positive (PA) and negative (NA) affect as independent mood dimensions. Participants were presented with a list of 20 words (half PA, and the remainder NA) that described feelings and emotions and were instructed to rate to what extent they were experiencing each one (e.g., “enthusiastic,” “hostile”) at that moment using a 5-point scale (1 = Very slightly or not at all, 2 = A little, 3 = Moderately, 4 = Quite a bit, 5 = Extremely). At the end of the session participants received compensation, were thanked and debriefed.

Results

NA and PA scores were computed for each participant according to the facial expression condition (sum of the responses, maximum 50) and analyzed in a repeated measures analysis of variance (ANOVA): 3 (Facial Expression: frown, neutral, smile) × 2 (Affect Scale: NA, PA). Both factors were manipulated within participants. The results revealed a main effect of facial expression, F(2, 38) = 13.45, MSE = 202.11, p < .001, η _p ² = .414, and a main effect of the affect scale, F(1, 19) = 65.05, MSE = 3,967.50, p < .001, η _p ² = .774. Importantly, the expected interaction between the two factors was significant, F(2, 38) = 13.90, MSE = 357.98, p < .001, η _p ² = .422 (see Fig. 1).

Planned contrasts revealed, as expected, that frowning led to higher NA reports than smiling, t(19) = 3.38, p = .003, d = 1.55, and than posing with a neutral facial expression, t(19) = 3.20, p = .005, d = 1.49. Smiling led to higher PA reports than frowning, t(19) = 3.69, p = .002, d = 1.69, and than posing with a neutral facial expression, t(19) = 4.86, p < .001, d = 2.23. The reports of NA after posing with a neutral facial expressions did not differ from those obtained in the smiling condition, t(19) = 1.09, p = .289, d = 0.50. Likewise, the reports of PA after posing with neutral facial expression did not differ from those obtained in the frowning condition, t(19) = –1.67, p = .112, d = 0.77.

In sum, the results obtained with the PANAS indicated that posing with a given facial expression actually influenced how participants felt afterward.

Final set of stimuli

As referred above, each participant was filmed on average for 6 min (2 min per facial expression). Videos were then edited from their original format using Final Cut Pro for Mac (Version 7) and were converted into MOV format using Apple ProRes 422, MOS (mute of sound; pixel size 1,920 × 1,080) codec. Afterward, videos were reconverted into MOV format using H.264 codec (pixel size 1,024 × 576) in QuickTime Player for Mac (Version 10.4) to be compatible with E-Prime software.

The authors screened the 2 min of shooting of each facial expression and chose the 10 s timeframe in which the models held the intended expression, paying special attention to the final frame so as to avoid displaying eye-blinking. From the 10 s clips, 5 s clips were selected. These two standard durations were set following previous procedures (e.g., Ambady & Rosenthal, 1993; Koscinski, 2013; G. Rhodes et al., 2011) and allow for validation of facial expressions recognition in short presentations (i.e., in the 5 s clips).

The stills were obtained a posteriori by freezing one frame of the 5 s clips using Final Cut Pro for Mac (Version 7), and were stored in JPG format (pixel size 1,280 × 720) with a sRGB IEC61966-2.1 color profile. Stills were then aligned and balanced for color, brightness, and contrast using Preview for Mac (Version 2.0).

The final set of stimulus materials includes 180 stimuli: 60 stills (20 frown, 20 neutral, and 20 smile), 60 5 s videos (20 frown, 20 neutral and 20 smile) and 60 10 s videos (20 frown, 20 neutral, and 20 smile).

Validation of the database

Method

Participants and design

A sample of 120 white Portuguese students (77.5 % female; M _age = 20.62 years, SD = 3.39) at Instituto Universitário de Lisboa (ISCTE-IUL), volunteered to participate in a laboratory study in exchange for course credit. Participants were not acquainted with the models (as confirmed in the end of the experiment). The design included the following factors: 3 (Presentation Format: stills, 5 s videos, 10 s videos) × 3 (Facial Expression: frown, neutral, smile) × 4 (Stimulus Subsets: A, B, C, D). The last factor was manipulated between participants.

Materials

The entire stimulus set of videos and stills previously developed was used. Examples of the stills are presented in Fig. 2.

Procedure and measures

The participants were invited to collaborate in a study about person perception. The study took place at the psychology laboratory of Instituto Universitário de Lisboa (ISCTE-IUL) and was conducted using the E-Prime software. The procedure was in agreement with the Ethics Guidelines issued by the Scientific Commission of the host institution. Upon arrival, participants were informed about the goals of the study and its expected duration (approximately 20 min), that all the data collected would be treated anonymously, and that they could abandon the study at any time. After giving written consent, participants were asked to provide information regarding their age and sex.

All instructions were presented on the computer screen. Participants were asked to rate each stimulus regarding attractiveness, arousal, clarity, genuineness, familiarity, intensity, valence, and similarity (for the detailed instructions, see Table 1). All responses were given via the keyboard.

Table 1 Item wording and scale anchors for each dimension

Full size table

To prevent fatigue and demotivation, each participant evaluated a subset of 45 stimuli: 15 stills, 15 5 s videos, and 15 10 s videos from the total pool of 180 stimuli. Overall, each stimulus was evaluated by a sample of 30 participants. The subsets were organized such that each participant would not evaluate the same model displaying the same facial expression in a different presentation format.^{Footnote 1}

Within each experimental condition, the presentation order of the stimuli was completely randomized for each participant. Each stimulus was presented individually in the center of the screen (black background). The exposure time to videos was determined by their own duration (i.e., 5 or 10 s) and the exposure time to photographs was 5,000 ms. Following the offset of the stimulus, the evaluative dimensions were presented in a random order (one per screen). The intertrial interval was 500 ms. After completing the task, participants were thanked and debriefed.

Results

In the following sections, we begin by presenting the preliminary data analysis regarding outliers, gender differences and reliability. Then, we present the tests comparing to which extent different facial expressions (i.e., frown, neutral, smile) and stimulus presentation format (stills, 5 s video, and 10 s video) influenced stimulus evaluations on each dimension. Subsequently, we present the associations between dimensions.

For each stimulus, we calculated means, standard deviations and confidence intervals obtained for each dimension. The full stimulus set (stills in .jpg and videos in both .mov and .avi formats), and the corresponding database in Excel format, organized by stimulus code, are provided as supplementary material and can also be obtained upon request to the first author.

Preliminary analysis

All participants responded to the entire set of questions for all the stimuli presented in their respective conditions. Thus, there were no missing cases. Outliers were identified by considering the criterion of 2.5 SDs above or below the mean evaluation of each stimulus in a given dimension. The result of this analysis yielded a residual percentage (0.64 %) of outlier ratings. There was no indication of participants responding systematically in the same way—that is, always using the same value of the scale. Therefore, no responses were excluded.

First we tested the consistency of participants’ ratings in each dimension by comparing two subsamples of equal size (n = 60) randomly selected from the main sample. No significant differences between the subsamples emerged (all ps > .100).

Then we tested whether all the stimulus subsets yielded equivalent results, by analyzing the mean ratings in each dimension in a repeated measures mixed ANOVA: 4 (Stimulus Subsets) × 8 (Evaluative Dimension), with the latter factor manipulated within participants. Given that only a main effect of evaluative dimension emerged, F(2, 238) = 80.10, MSE = 31.39, p < .001, η _p ² = .41, and that both the main effect of stimulus subset and its interaction with evaluative dimensions were nonsignificant, Fs < 1, the subsequently reported analysis will disregard the specific stimulus subsets.

To test for gender differences in the way that participants rated the stimuli, the mean evaluations on each dimension were compared between male and female participants. Overall, no gender differences were found (see Table 2).

Table 2 Evaluations (means and standard deviations) in each dimension for the total sample and for males and females separately, with mean difference tests

Full size table

Finally, we calculated the mean ratings in all dimensions, on the basis of model gender (see Table 3).

Table 3 Evaluations (means and standard deviations) in each dimension according to model gender, along with mean difference tests

Full size table

As is shown in Table 3, stimuli portraying female models were evaluated as being more attractive, more similar, and more genuine than those portraying male models (see Langner et al., 2010). Model gender did not influence ratings in the remaining dimensions.

Impacts of facial expression and stimulus format on evaluative dimensions

The evaluation of each target was examined by computing the mean ratings, per participant, in each dimension for the three types of facial expressions (frown, neutral, and smile) and the three presentation formats (stills, 5 s videos, and 10 s videos). These ratings were analyzed, per dimension, in a repeated measures ANOVA, with Facial Expression and Presentation Format defined as within-participants factors. All means and standard deviations, as well as the results of planned comparisons, are presented in Table 4.

Table 4 Evaluations (means and standard deviations) in each dimension according to facial expressions and stimulus presentation format

Full size table