In general, humans act goal oriented; that is, we do not respond in a stimulus-driven or even reflex-like manner, but we select actions to achieve certain intended effects in the environment. Consequently, such actions have to be governed by mental representations of future, to-be attained effects (i.e., goals), rather than by current stimulation (Hommel, Müsseler, Aschersleben, & Prinz, 2001; Kunde, Elsner, & Kiesel, 2007; Prinz, 1997).

There is a good deal of evidence that such anticipatory codes of action consequences are in fact involved in the generation of motor responses. Perhaps the most intriguing evidence stems from so-called “anticipation effects.” Observations of that kind show that certain features of predictable action consequences, though not yet perceptually present, do already affect the motor actions that will produce them (see Rosenbaum & Krist, 1996 for a review). One example for such findings is the so-called action–effect compatibility phenomena. In general, it easier to generate a motor action that produces consequences that are compatible to the action in certain respects. For example, it easier to press a key with the right hand, if this keypress predictably flashes a light on the right rather than on the left side of the actor/observer. Likewise, it is easier to forcefully push a button when this action results in a loud rather than in a soft tone (Kunde, 2001). Many such compatibility phenomena have been shown in recent years and in several different domains of motor control, such as musical performance (Keller, Dalla Bella, & Koch, 2010; Keller & Koch, 2008), typing (Rieger, 2007), or speaking (Koch & Kunde, 2002). Such phenomena have been interpreted as evidence for the idea that action effects are imagined during motor planning—an assumption that dates back to William James’ (1890/1981) ideomotor theory of action control.

Most effects that we intend to produce occur in the physical environment, be it to switch on a light, to grasp an object, or to pick an apple from a tree. Yet, our environment does not only exist of inanimate matter. Of course, the most important component of a human’s environment is its social part—that is, other humans. It is an important question whether similar mechanisms that are involved in goal-oriented action in general govern our actions directed toward other humans as well. One type of action that may be construed that way is the production of facial expressions, thus the coordinated contraction of face muscles. At first glance, this type of motor output appears to escape the constraints of goal-oriented action. After all, facial expressions seem just to “express” our internal emotional states. We laugh because we’re happy, we cry because we’re said, and so on (Eibl-Eibesfeldt, 1973; Ekman, 1972; Salzen, 1991). However, since the days of Darwin (1872), there has been a debate on the functional role of facial expressions, and it has been argued that facial expressions serve social goals as well, such as to change the states of others in a certain way (Fridlund, 1991, 1994; Frijda, 1988; Keltner & Kring, 1998). A smile might be used sometimes to make another person happy, and we sometimes frown to prevent an unwanted person to approach us too closely. The most obvious signal of whether our intentions are fulfilled is the facial expression of our counterpart. Whether my smile evoked a good mood in a social partner is signaled by his or her own smiling. Likewise, the most obvious sign that a counterpart understood my frowning correctly is that he or she shows some signs of frowning or other indications of negative affect. In fact, because we cannot observe but only infer the mental states of others, these bodily cues can be considered the most immediate goals of our facial actions itself.

Even if facial expressions may not always be goal oriented, the feedback from our facial actions is not entirely unpredictable. In the first place, this applies to the proprioceptive feedback. Our own smile “feels” like a smile; that is, we feel the corners of our mouth rising up, the mouth opening and so forth. Also, if visual feedback of our facial expression is available, be it from a mirror, a water surface, or a video monitor, the feedback is always compatible. We see ourselves smiling, frowning, and so on. However, in many cases, the facial expressions of counterparts also correspond more or less closely to our own expressions. For example, most often, our own smile is spontaneously responded to by a smile of a social counterpart (Dimberg, Thunberg, & Elmehead, 2000). At least, it would come as surprise that it was responded to by the facial expression of sadness. There might be situations in which a smile is meant as a provocation that shall not be responded to by a friendly smile but by some signal of fear or anger (Fridlund, 1991, 1994). But, we think it is fair to say that these are rare exceptions and that, conversely, the regular effect of facial action in the social environment is one that is compatible to the own facial action, at least regarding the general valence of that expression in terms of being positive or negative.

If these considerations are correct, then the question arises as to whether similar traces of anticipatory effect codes that have been observed in the generation of other sorts of actions can be observed with facial actions as well. The present study is meant as a first step toward exploring this issue. Specifically, we tested whether facial actions are subject to action–effect compatibility. Participants were to generate a facial expression—namely, smiling or frowning—in response to an arbitrary color cue. In different conditions, these actions predictably produced an either compatible or incompatible facial feedback from a virtual counterpart. In conditions with a compatible expression-effect mapping, a smile (detected by above baseline activity of the musculus zygomaticus major) resulted in the presentation of a smiling face on a computer screen, whereas a frown (detected by above-baseline activity of the musculus corrugator supercilii) produced a frowning face on the screen. In conditions with an incompatible mapping, the opposite was true: A smile produced a frowning face, and a frown produced a smiling face.

We predicted that the generation of facial expressions would be easier (in terms of response time and accuracy) when these actions predictably produced compatible rather than incompatible facial feedback. This would suggest that (a) some cognitive representation of the forthcoming action effect was active during production of the facial expression, and that (b) this anticipatory representation had the power to have an impact on the generation of the corresponding motor pattern.

Experiment 1

The basic research paradigm in Experiment 1 resembled that in other studies on action–effect compatibility. Participants were asked to smile or frown as quickly as possible without making errors by either lifting the corners of the mouth or by drawing the eyebrows together as response to a color stimulus. The electromyographic (EMG)-based detection of these expressions produced the presentation of a face photograph on the computer screen. Each participant went through two different conditions (in balanced order). In a compatible action–effect condition, the detection of smile absolutely predictably produced the presentation of a smiling face on the screen, whereas the detection of a frown produced the presentation of a frowning face. In the incompatible action–effect condition, the detection of a smile produced the presentation of a frowning face, whereas the detection of a frown produced the presentation of a smiling face on the screen.

A useful tool to get some insight into the time course of an experimental effect is a distribution analysis of response times (RTs) (Ratcliff, 1979). This type of analysis tests the extent to which a given effect is present in certain bins of the rank-ordered RTs. Several compatibility effects have a kind of individual signature regarding this temporal dynamic. For example, regular Simon effects with horizontal stimulus–response arrangements decrease with increasing RTs, which may reflect the passive decay (Hommel, 1994) or active suppression (Ridderinkhof, 2002) of task-irrelevant (spatial) stimulus features. In contrast, effect-based compatibility effects typically increase with RT (Kunde, 2001; Paelecke & Kunde, 2007). This might reflect that the anticipation of action effects is a time-consuming process; thus, anticipated action effects have a better chance to affect responding the more time to exert such an impact is available—that is, the longer the interval between stimulus presentation and response onset. In any case, this time course appears to be a signature of effect-based compatibility. To test whether this signature is present also with the proposed expression-effect compatibility phenomenon, we conducted a distribution analysis of RTs here as well.

Method

Participants

Fifty-six students from the University of Dortmund participated for a payment (5 €) or course credit. In both this and the following experiment, participants were naive with respect to the purpose of the study and classified themselves as having normal (or corrected-to-normal) visual acuity.

Apparatus and stimuli

Participants sat in a dimly lit room in front of a 17-in. color monitor, with an unconstrained viewing distance of approximately 60 cm. The participants were presented a green- or blue-colored dot in the center of the screen (3 cm). The task was to contract either the zygomaticus major by moving the edges of the mouth upwards, and the corrugator supercilii by moving the eyebrows toward each other, depending on stimulus color. Half of the participants responded to a green dot by contracting the zygomaticus, and to a blue dot by contracting the corrugator, whereas this mapping was reversed for the other half of the participants. Facial muscle activity was assessed bipolar with reusable Ag/AgCl electrodes, with a contact surface diameter of 5 mm. The electrodes were placed at the zygomaticus and the corrugator muscles following the recommendations of Fridlund and Cacioppo (1986). The EMG was recorded and amplified by a Neumüller Messtechnik-system (Germany) and sampled with a rate of 1024 Hz by an 11-bit A/D converter.

The individual baseline EMG activity was recorded for 30 s at the outset of the experiment for both muscles. During baseline, participants were instructed to keep their facial muscles relaxed, and the EMG activity was averaged. The baseline activity was defined as the sum of EMG values from all measuring points divided by the number of measuring points. Afterward, the participants were instructed to contract the two muscles for 15 s by either pulling their eyebrows together (corrugator) or by raising the corners of their mouths (zygomaticus). For both corrugator and zygomaticus, the sum of EMG measures from all measuring points in the 15-s interval was summed up and divided by the number of measuring points. This was defined as the maximum activity. In the experiment, the RT was the time interval between stimulus onset and the point in time at which the EMG activation of either of the two muscles exceeded baseline activity plus 80% of the corresponding maximum-baseline activity difference.

Immediately after detection of above-threshold activity in either one of the two scanned muscles, a feedback picture was shown in the middle of the computer screen (7.5 × 9 cm) for 500 ms. The feedback consisted of one of 15 smiling faces, or of one of 15 frowning faces (“Pictures of Facial Affect,” Ekman & Friesen, 1976). In the condition with compatible feedback, the recording of activity of the zygomaticus major triggered the presentation of a smiling face, and the activity of the corrugator supercilii triggered the presentation of a frowning face. With incompatible feedback, this action-feedback was reversed (see Fig. 1). In case of an error, the message “Fehler” (German word for “error”) was presented for 2,000 ms. Following 1,500 ms after the offset of the feedback pictures or the error message, the stimulus of the next trial was presented. Within each category of feedback (smiling vs. frowning), the pictures were selected randomly.

Fig. 1
figure 1

Procedure of Experiment 1. Note that the original colors of the stimuli were green and blue

Procedure

After the arrival of the participants, the EMG electrodes were affixed at the corresponding places of the participants’ faces. Then, the participants ran through the calibration procedure described previously. Each participant ran through a condition with a compatible expression feedback and a condition with incompatible feedback. Each of these conditions consisted of 90 trials and was separated by a break of about 5 min. Within each condition, the order of stimuli was random. The order of the compatible and incompatible feedback and the S–R mapping condition were counterbalanced across participants (14 participants in each combination of S–R mapping and order of feedback compatibility condition).

Results

Responses with RTs below 200 ms and above 2,000 ms were considered as outliers and were removed (12.4%). For each participant, the RTs for trials with congruent and incongruent expression faces were rank ordered separately. Then, each RT distribution was divided into five proportional bins, and the mean RTs and error rates within these bins were subjected to ANOVAs with bin (1–5) and expression effect congruency (congruent vs. incongruent) as repeated measures.Footnote 1 The mean RTs from the factorial combination of these factors are shown in Fig. 2 (top panel).

Fig. 2
figure 2

Response times (RTs) as a function of RT bin and action–effect congruency in Experiment 1 (top) and Experiment 2 (bottom)

There was a significant increase of RTs with RT bin, F (4, 220) = 192.91; p < .001, which reflects the normal within-participants variability of RTs. More importantly, responding was significantly faster with congruent (371 ms) than with incongruent (391 ms) expression feedback, F(1, 55) = 4.18; p < .05. This effect increased with increasing RT bin, producing a significant interaction of these factors, F(2, 220) = 5.39; p < .01. T-tests revealed significant influences of expression effect congruency from the third bin on (ps < .05, one tailed).

The error rates within the five RT bins were 15.6%, 12.2%, 9.3%, 7.9% and 9.7% with incongruent expression effects, and 14.2%, 11.53%, 10.53% 8.9% and 9.8% with congruent expression effects. The corresponding ANOVA of error rates revealed a significant effect of bin, F(4, 220) = 8.85; p < .05. No other effect reached significance.

Discussion

The results of Experiment 1 showed that the intentional activation of facial muscles is affected by the congruency of the visual feedback that these facial expressions predictably produce. Activation of the zygomaticus proceeds more quickly if this activation is known to trigger the presentation of a happy (rather than an angry) face, whereas activation of the corrugator proceeds more quickly if this activation is known to trigger the presentation of an angry (rather than a happy) face. Hence, the production of these facial actions is affected by anticipations of their visual consequences. We consider this as first evidence for a contribution of codes of anticipated action consequences in the generation of facial expressions.

The action and feedback features relevant in the present study conceivably relate to the meaning of the action and its visual consequences. Contraction of the zygomaticus is commonly interpreted as an expression of happiness, whereas contraction of the corrugator is commonly interpreted as the expression of anger. Likewise, the faces that were fed back to the participants conveyed the same meaning. Consequently, the expression effect compatibility effects demonstrated in Experiment 1 should hinge on the interpretation of the feedback faces as either smiling or frowning. To test this assertion, we used effect faces that were presented upside down in Experiment 2. It is known that the recognition of inverted faces as well as the extraction of their emotional content is much harder for inverted than for normal upright faces, and is possibly mediated by different processes (Mc Kelvie, 1995; Valentine, 1988; Yin, 1969).

Experiment 2

Method

Participants

Sixty students from the University of Dortmund participated. None of them had participated in Experiment 1.

Apparatus, stimuli, and procedure

The apparatus and the procedure were identical to those in Experiment 1. The only exception was that the effect faces were now all presented upside down.

Results and discussion

Responses with RTs below 200 ms and above 2,000 ms were considered as outliers and were removed (6.7%). For each participant, the RTs for trials with congruent and incongruent expression faces were rank ordered separately. Then, each RT distribution was divided into five proportional bins, and the mean RTs and error rates within these bins were subjected to ANOVAs, with bin and expression effect congruency as repeated measures. There was a significant effect of bin, F(4, 236) = 208.54; p < .001, but neither an effect of expression effect congruency nor an interaction of congruency and bin (both Fs < 1, see Fig. 2, bottom panel).

The error rates within the five RT bins were 9.7%, 8.4%, 7.8%, 6.8%, and 5.7% with incongruent expression effects, and 9.4%, 8.1%, 7.7%, 7.3%, and 6.1% with congruent expression effects. The corresponding ANOVA of error rates revealed a significant effect of bin, F(4, 236) = 3,85; p < .05). No other effect reached significance.

A final analysis combined the data of Experiments 1 and 2 and treated experiment as a between-participants factor. In this analysis, the main effect of congruency, F(1, 114) = 3.8, p < .07, and the interaction of experiment and congruency missed conventional levels of significance, F(1, 114) = 2.60; p < .11 for RTs. Conceivably, the lack of a significant interaction is due to the relatively low power of the between-participants comparison, which for the F test of the Congruency × Experiment interaction amounted to only .358 (given α = .05). Future research that relies on a complete (and more powerful) within-participants manipulation of action–effect congruency and face orientation should confirm the differential impact of upright and inverted faces. Still, the present results already show that with inverted faces, the expression–effect congruency effect drops to a nonsignificant level.

General discussion

The present study shows for the first time that the production of facial actions is affected by the congruency of anticipatable visual feedback of these actions. The effect as such and its temporal dynamics resembles those obtained with other motor responses and response consequences (see Kunde et al., 2007, for a review). The observation that RTs are affected by response effects that are not yet physically present at the point in time the RT is measured suggests that such effects are imagined during action planning. This conclusion fits well with ideomotor models of action control (Greenwald, 1970; Hommel et al., 2001). According to these models, actors first acquire associations between motor actions and the perceptible consequences of actions (Elsner & Hommel, 2001). Subsequently, to intentionally produce a certain motor output, these acquired associations are activated in the “opposite” direction, so that the recollection of a certain effect activates the motor pattern with which it has become associated before. Strict versions of such models propose that there is, in fact, no other way to access a motor action than by recollecting its sensory consequences (Mechsner, Kerzel, Knoblich, & Prinz, 2001).

Action–effect compatibility effects of the type observed in the present study can be explained by the mutual priming of codes representing a future action’s proximal (proprioceptive, tactile) and distal (e.g., visual) effects during action planning (Kunde, Koch, & Hoffmann, 2004). According to this account, the code activation threshold at which an effect-associated motor pattern is emitted is reached sooner when certain features of the proximal and distal reafferences of that motor pattern correspond than when they do not correspond. In the present case, the proximal codes of the facial actions conceivably concern the proprioceptive experiences of smiling or frowning, whereas the distal codes concern the visual feedback of these actions (hence the presentation of a smiling or frowning face). Although in previous versions of action–effect compatibility, proximal and distal action effects overlapped regarding more or less abstract features such as spatial location (Kunde, 2001), duration (Kunde, 2003), intensity (Kunde et al., 2004), or verbal meaning (Koch & Kunde, 2002), the proximal and distal effects in the present study overlapped because they belonged to the same facial motor action. It is therefore an interesting question for future research whether similar compatibility effects ensue when the overlap between facial actions and visual feedback becomes more abstract, such as when, for example, the words smiling/frowning or other positive/negative words rather than photos are used as action feedback.

Our results suggest that production of facial expressions involves processes that bring the consequences of these expressions to mind before the consequences actually occur. Thus, facial expressions may not only have the potential to be goal-oriented actions (e.g., Fridlund, 1991, 1994), but may also be mediated by similar processes. This appears to us to be an important observation that, to some extent, challenges theories that consider emotion as the only source of facial action (Izard, 1971). This observation also fits to the general view that apparently “special” actions, such as stimulus-oriented responses in choice reaction tasks (Hommel, 1993) or approach-avoidance responses (Eder & Rothermund, 2008) can be construed as goal-oriented actions on closer inspection. We note, however, that we obtained our effects in a situation in which participants were asked to produce these expressions intentionally. So, it has to be studied to which extent the spontaneous expression of emotion is affected by anticipatory effect codes as well.

Beyond the motor control processes discussed previously, there might be other reasons for why it is harder to smile if another person will not smile back. One such reason might be social mimicry—that is, the tendency to imitate another persons’ behavior (e.g., Chartrand & Bargh, 1999). Another person’s smile may less easily induce a smile of our own, if we have experienced that this person does not smile back to us. We note that in the present study, participants did not respond to the facial expression of another person. They responded to an arbitrary color stimulus (see Fig. 1). Therefore, accounts that presuppose an imitative (or another kind of) response to another face run into problems here, simply because there was no stimulus person to respond to. Another reason for reluctant smiling might be fear of rejection or social exclusion (Williams, Forgas, von Zippel, & Zadro, 2005). This may explain hesitations when frowning can be expected as consequence, because frowning represents a negative (punishing) feedback, whereas a smile represents a positive (reinforcing) feedback. However, frowning and smiling were exactly as frequent with compatible and with incompatible action–effect mapping. Therefore, what counts is not the frequency of reinforcing or punishing facial feedback per se; rather, it is the specific combination of the own and fed back facial expression, hence the compatibility relationship.

The size of the observed effect in Experiment 1 was numerically not very impressive. There are several reasons why this might have been so. First, action selection was rather simple, and anticipated facial expressions show up more strongly in performance, when action selection is harder—for example, when a choice between more than two facial expressions is required. Second, the feedback faces were, technically speaking, irrelevant. That means that participants could have solved the task without taking the feedback into account. The fact that the compatibility of the feedback had an impact on behavior nevertheless shows that they did not ignore these effects. Still, action–effect compatibility effects of the type shown in the present study increase when action feedback is not only predictable but is also actually intended by the participants (Ansorge, 2002; Hommel, 1993). Third, the action feedback was artificial when compared to interactions in everyday life. For the sake of simplicity, we used photographs of faces rather than facial expressions of a real interaction partners. Conceivably, real facial expressions that contain all the movement cues of facial expressions that static images lack might intensify the effect. We consider this as a worthwhile question for future research.

To conclude, we have shown in the present study that the generation of facial expressions is affected by the predictable visual consequences of these actions. This suggests that such facial expressions are affected, and are possibly controlled, by the anticipation of their consequences. Facial expressions may thus be construed as goal-oriented actions. We hope that this observation fuels the debate on the functional role of emotion in general, and that of facial expressions in particular.