Revisiting intersubjective action-effect binding: No evidence for social moderators

Riechelmann, Eva; Weller, Lisa; Huestegge, Lynn; Böckler, Anne; Pfister, Roland

doi:10.3758/s13414-019-01715-6

Revisiting intersubjective action-effect binding: No evidence for social moderators

Published: 22 March 2019

Volume 81, pages 1991–2002, (2019)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Revisiting intersubjective action-effect binding: No evidence for social moderators

Download PDF

Eva Riechelmann¹^na1,
Lisa Weller¹^na1,
Lynn Huestegge¹,
Anne Böckler¹ &
…
Roland Pfister¹

Abstract

Effect-based accounts of human action control have recently highlighted the possibility of representing one’s own actions in terms of anticipated changes in the behavior of social interaction partners. In contrast to action effects that pertain to the agent’s body or the agent’s physical environment, social action effects have been proposed to come with peculiarities inherent to their social nature. Here, we revisit the currently most prominent demonstration of such a peculiarity: the role of eye contact for action-effect learning in social contexts (Sato & Itakura, 2013, Cognition, 127, 383–390). In contrast to the previous demonstration of action-effect learning, a conceptual and a direct replication both yielded evidence for the absence of action-effect learning in the proposed design, irrespective of eye contact. Bayesian statistics supported this claim by demonstrating evidence in favor of the null hypothesis of no effect. These results suggest a limited generalizability of the original findings—for example, due to limitations that are inherent in the proposed study design or due to cultural differences.

Social Identity Theory

Observation Methods

Positive Psychology: An Introduction

Anticipating how other people will react to one’s own actions is a fundamental part of action control in social interactions. Another person’s predictable behavior can even be used to represent, select, and control one’s own movements. Such sociomotor actions have recently attracted considerable interest (e.g., Flach, Press, Badets, & Heyes, 2010; Kunde, Lozo, & Neumann, 2011; Müller, 2016; Pfister, Dignath, Hommel, & Kunde, 2013; Weller, Schwarz, Kunde, & Pfister, 2018). The sociomotor framework (Kunde, Weller, & Pfister, 2018) is rooted in ideomotor approaches to action control, which propose that human actions are controlled by anticipating the sensory consequences they typically evoke (Harleß, 1861; Herbart, 1825; James, 1890; see Pfister, 2019; Shin, Proctor, & Capaldi, 2010, for reviews). This idea takes into account that each action inevitably produces a range of sensory effects—for instance, knocking on a table results in a specific sound, a visual image of the respective motion, and proprioceptive sensations of the movement. Ideomotor theory proposes that agents acquire bidirectional associations between a motor action and these sensory effects (i.e., action–effect associations). These associations can be used for action control: In order to perform a specific action, its sensory effects are anticipated, which in turn trigger the associated motor action. There is considerable empirical support for the assumptions of ideomotor theory—for example, from the manual domain (Elsner & Hommel, 2001; Kunde, 2001, 2006; Pfeuffer, Kiesel, & Huestegge, 2016; Pfister & Kunde, 2013; Wolfensteller & Ruge, 2011) and from the oculomotor domain (Herwig & Horstmann, 2011; Huestegge & Kreutzfeldt, 2012; Riechelmann, Pieczykolan, Horstmann, Herwig, & Huestegge, 2017).

The term sociomotor actions specifically refers to situations where the action of an agent not only triggers certain sensory effects in the inanimate environment but also consistently evokes a certain behavior of another person (Kunde et al., 2018). Based on ideomotor theory, it is assumed that the agent can acquire a bidirectional association between his or her action and the sensory effect—that is, the other person’s response (also labeled “intersubjective action–effect binding”; Sato & Itakura, 2013). Anticipating the other person’s behavior then can reactivate the agent’s action. Support for this claim comes from several studies investigating learning and anticipation of social action effects using different experimental designs (e.g., Herwig & Horstmann, 2011; Kunde et al., 2011; Müller, 2016; Müller & Jung, 2018; Pfister et al., 2013; Pfister, Weller, Dignath, & Kunde, 2017; Sato & Itakura, 2013).

Previous work on sociomotor actions has mainly aimed at showing that anticipations of a partner’s behavior are indeed implemented in action control (Kunde et al., 2018; Pfister et al., 2013; see also Wolpert, Doya, & Kawato, 2003, for a related framework). Although the available evidence clearly supports this claim, only few studies have addressed possible peculiarities of social action effects as compared with effects on the agent’s body or on the agent’s inanimate environment. A notable exception to this rule is a study by Sato and Itakura (2013), who investigated action–effect learning for social action effects as well as social moderator variables in this process. In this study, participants first underwent an acquisition phase in which they repeatedly experienced novel action–effect associations between key presses and a certain mouth gesture of an on-screen face. More precisely, participants could choose between pressing a left or right key on each trial, and their key press consistently triggered a certain change of the on-screen face—for example, lip protrusion or cheek puffing. In a subsequent test phase, participants had to respond with the same left and right key press to novel, arbitrary targets, and the former effect stimuli (i.e., mouth gestures) were presented as primes before the imperative stimuli. These mouth gestures could either be congruent to the to-be-executed response (i.e., the imperative stimulus required the response that had produced the respective mouth gesture in the acquisition phase), incongruent (i.e., to-be-executed and prime-associated responses did not match), or neutral (i.e., inducing a mouth gesture that had not been experienced in the acquisition phase). The authors argued that if participants had acquired bidirectional associations between the key presses and the effect stimuli in the acquisition phase, presenting the effect stimuli as primes in the test phase should activate the associated response. Therefore, responses should be facilitated when the congruent (vs. incongruent) prime is presented.

The results of Experiment 1 in Sato and Itakura (2013) indeed confirmed that participants responded faster in trials involving congruent (vs. incongruent) primes. However, to validate whether the results of Experiment 1 are driven by genuinely social processes as elicited by eye contact (direct gaze of the face stimuli), the authors conducted Experiment 2 where the on-screen face was looking to the left or right instead of directly looking at the participant. Indeed, Experiment 2 revealed no evidence for action–effect learning with averted gaze. These results suggest that action–effect learning and/or retrieval in sociomotor actions can be modulated by genuinely social variables such as eye contact, which is in line with previous research showing that direct eye contact is a powerful moderator of cognitive processes (Senju & Johnson, 2009). The social significance of direct eye contact is further confirmed by studies showing that faces with direct (vs. averted) gaze capture the attention of the perceiver (Böckler, van der Wel, & Welsh, 2014; Senju & Hasegawa, 2005; Senju, Hasegawa, & Tojo, 2005) and facilitate processing of the observed face (Macrae, Hood, Milne, Rowe, & Mason, 2002; Mason, Hood, & Macrae, 2004).

Although eye contact exerts a strong influence on cognitive processes in a range of different domains, the results of Experiments 1 and 2 in Sato and Itakura’s (2013) study can also be explained along different lines. More precisely, the absence of evidence for action–effect learning in case of averted gaze may also be attributed to a lack of attention on the mouth region, where the critical action-contingent changes occurred. Specifically, the face stimuli with averted gaze might have prompted participants to automatically follow this gaze, drawing attention away from the mouth region (see Frischen, Bayliss, & Tipper, 2007, for a review on gaze cueing). Sato and Itakura (2013) designed face stimuli with closed eyes (Experiment 3) to tackle this alternative explanation. Exploring such alternative explanations seems especially warranted in light of studies that showed an impact of social action effects even in the absence of eye contact (e.g., Flach et al., 2010; Pfister et al., 2013; Weller et al., 2018). Because there was still no evidence for action–effect learning in this setting, the authors concluded that the congruency effect as observed in the direct gaze condition (Experiment 1) was indeed due to the eye contact.

However, one might still consider attentional processes as a potential explanation for the null results of their Experiment 3. We argue that visual attention could have been drawn away from the face, causing a lack of attention on the critical mouth region even in Experiment 3, because a nonengaging interaction partner with closed eyes served as stimulus. Several findings—for example, from neonate studies—support this claim by demonstrating that newborns already spend less time looking at a face photograph with the eyes closed compared with the same face photograph with eyes open (Batki, Baron-Cohen, Wheelwright, Connellan, & Ahluwalia, 2000).

Based on this reasoning, the present study was designed to replicate Sato and Itakura’s (2013) study and to additionally test whether gaze cueing toward the mouth region (thereby drawing attention to the location of the effect) can reinstantiate a congruency effect even in the absence of direct eye contact. This finding would substantially challenge the idea that eye contact represents a necessary prerequisite for sociomotor learning. In Experiment 1a, we conducted a conceptual replication of Sato and Itakura’s (2013) Experiment 1 using photographs with direct gaze as stimuli. Experiment 1b addressed the alternative explanation proposed above by including face stimuli with eyes looking downward in order to guide the participant’s attention to the mouth region of the face stimulus (Experiment 1b; for evidence for gaze cuing on the vertical axis, see Langton & Bruce, 1999).

We did not observe evidence for any action–effect learning with either direct or averted gaze in these initial experiments. Given that our stimulus material was different from the stimuli used in the original study by Sato and Itakura (2013), we then decided to run a direct replication including the original stimulus material (Experiment 2).

Experiment 1a: Conceptual replication

Experiment 1a represents a conceptual replication of Experiment 1 reported by Sato and Itakura (2013). We made every effort to produce stimulus material that matched the stimuli used in the original study, while adopting the stimuli to the different cultural backgrounds of European rather than Asian participants.

As in the original study, participants first experienced novel action–effect associations in an acquisition phase. More precisely, they were instructed to perform left and right key presses that triggered distinct mouth gestures of the face presented on the screen. Participants were asked to spontaneously select each key press while choosing each option about equally often. In a subsequent test phase, corresponding faces were presented as primes shortly before an imperative stimulus, which required a speeded left or right response. If associations between key press and resulting mouth gesture were learned in the acquisition phase, the primes presented in the test phase should activate the prime-associated response, thereby influencing response initiation. Thus, in congruent trials, where the prime-associated response and the to-be-executed response matched, we expected to observe reduced reaction times (RTs) compared with incongruent trials, where prime-associated and to-be-executed response did not match.

Method

Participants

We recruited 32 participants who received either course credits or monetary compensation for participation. Participants were naïve with respect to the purpose of the experiment and gave written informed consent before completing the study.

In contrast to Sato and Itakura’s (2013) original study, where no participants were excluded due to an unbalanced proportion of key presses during acquisition, we chose to exclude participants from our initial analyses when the distribution of left and right key presses during acquisition deviated from a balanced distribution at a ratio equal to or exceeding 2:1. Data of eight participants had to be excluded due to this criterion. Data of the remaining 24 participants were analyzed (mean age = 24.6 years, age range: 19–34 years, 20 women, no left-handers). A sample size of 24 participants should ensure a high power of 1 − β > .99 to detect the original effect size of Cohen's d_z = $ \raisebox{1ex}{$t$}\!\left/ \!\raisebox{-1ex}{$\sqrt{n}$}\right.=\raisebox{1ex}{$4.7$}\!\left/ \!\raisebox{-1ex}{$\sqrt{22}$}\right.=1.00, $ as reported for Sato and Itakura’s (2013) Experiment 1.

Apparatus and stimuli

The experiment was programmed using E-Prime 2.0 (Psychology Software Tools Inc., Sharpsburg, PA, USA), and stimuli were presented on a 23-in. TFT-monitor (refresh rate: 60 Hz, spatial resolution: 1,920 × 1,080 pixels). Participants responded on a standard computer keyboard using a left and right response key with their respective left and right index finger.

Stimuli were designed to be maximally comparable with the stimulus set used in Sato and Itakura (2013). We therefore used four color photographs of one forward-facing Caucasian female face (7.6° × 5.7°, height × width), which differed only with respect to the mouth gestures displayed. The mouth gestures were inserted into the same face pictures (using photo-editing software) to maximize control; all gestures corresponded to the variations used in the original study: mouth closed, lip protrusion, tongue protrusion, and cheek puffing. The faces were cropped to an oval shape (see the Appendix for the complete stimulus set). Stimulus material and experimental program are available on the Open Science Framework (https://osf.io/z2dw5/).

Procedure

The procedure of Experiment 1a closely matched the procedure of Sato and Itakura’s (2013) Experiment 1, and mainly differed with respect to the stimulus material used (see the Apparatus and Stimuli section for details). As in the original study, the experiment comprised an acquisition phase and a test phase (see Fig. 1).

Acquisition phase

Each trial of the acquisition phase started with the central presentation of a fixation cross (1,000 ms), which was then substituted by the female face stimulus shown with the mouth closed. Participants were instructed to respond to this neutral target face with a left or right key press using the left or right index finger, respectively. They were further told to spontaneously select each key press, and to press each key about equally often. Each key press was followed by a black screen (presented for 50 ms; for a similar action–effect delay, see Dignath, Pfister, Eder, Kiesel, & Kunde, 2014; Elsner & Hommel, 2001; Hoffmann, Lenhard, Sebald, & Pfister, 2009). After that, the target face reappeared for 300 ms in the form of an action effect where the mouth gesture had changed to either lip protrusion, tongue protrusion, or cheeks puffing. Importantly, the change in mouth gesture was dependent on the selected key press: the left key press always triggered a specific effect (e.g., lip protrusion) whereas the right key press always triggered a different effect (e.g., cheeks puffing). The assignment of mouth gestures to response keys was constant for each participant and counterbalanced across participants. Thus, two out of the three mouth gestures were presented to one participant. The response–effect mapping was not mentioned to the participants, but it was pointed out that the mouth gestures were completely irrelevant for the task. The next trial started 500 ms after effect offset. The acquisition phase consisted of 300 trials in total. After completing the acquisition phase, participants had the opportunity to take a break before the test phase started.

Note that we implemented several minor changes to the original study in the acquisition procedure. In the original study, the key press triggered a change in mouth gesture after a delay of 50 ms without any interruption by a black screen. Further, the trial timing was slightly different. Although there was a fixation interval of 1,000 ms at the beginning and a black screen (presented for 500 ms) at the end of each trial in our study, the original study did not use any such interval, so that the target face was presented on the screen throughout the acquisition phase.

Test phase

In the test phase, the previous effect stimuli served as primes. At the beginning of each trial, one effect prime with lip protrusion, tongue protrusion, or cheeks puffing was presented centrally for a duration of 300 ms. Although each participant was familiar with two of the effect primes from the preceding acquisition phase, there was always one unfamiliar prime which had not been presented before. Following the presentation of the prime, one of two target stimuli “⁎” (0.8 × 0.8 cm) or “#” (1.0 × 0.8 cm) was presented at the center of the screen with the instruction to respond to each target with a key press according to a fixed target–response mapping that was instructed at the beginning of the test phase. The target–response mapping was counterbalanced across participants. Participants were instructed to ignore the prime stimulus and to respond as quickly and accurately as possible to the target stimulus. The test phase comprised 120 trials. As in the original study, the test phase comprised 40 trials with congruent and 40 trials with incongruent primes. In another 40 trials, the neutral stimuli with the mouth gesture not presented during acquisition (i.e., without any possibility to acquire associations to the two response options) served as primes. The next trial started 1,000 ms after the key press.

Design and analysis

The experiment involved the within-subjects factor congruency (congruent vs. incongruent vs. neutral prime) referring to the congruency of the prime-associated response and to-be-executed response in the test phase.

We conducted two types of analyses. First, we performed the exact same analyses as reported for Experiment 1 in Sato and Itakura (2013)—that is, paired-samples t tests to compare error rates and response times between the congruent and incongruent condition while omitting the data of the neutral condition. We report Cohen’s d_z as effect sizes for paired-samples t tests (calculated as $ {d}_z=\frac{t}{\sqrt{n}} $). Second, the analyses were extended to include the neutral condition by performing repeated-measures analyses of variances (ANOVAs) with the factor congruency (congruent vs. incongruent vs. neutral) for error rates and RTs. Because the original Sato and Itakura (2013) study did not report any exclusion criteria due to an unbalanced proportion of key presses during acquisition, we additionally report t tests and ANOVAs with all participants included. Error trials were removed prior to analyzing RTs. For violations of the sphericity assumption, we report Greenhouse–Geisser corrected p values along with original degrees of freedom. As in the original study, all following analyses were performed without outlier correction.

Besides traditional null-hypothesis significance testing, we additionally drew on Bayesian statistics for a better interpretation of nonsignificant results. We calculated nondirectional Bayes factors (BF₀₁) using the BayesFactor package Version 0.9.12-2 of the R software environment Version 3.3.2, with a value of 1 as scale parameter for the prior distribution. BF₀₁ was computed as f (data | H₀) / f (data | H₁), with f denoting marginal likelihoods. We interpreted BF₀₁ > 3 as evidence for the null hypothesis and BF₀₁ < 1/3 as evidence for the alternative hypothesis.^{Footnote 1}

Results and discussion

Acquisition phase

On average, participants responded 387 ms (SD = 114 ms) after presentation of the face stimulus. Descriptively, the distribution of left (52.03%) and right (47.97%) key presses was close to the instructed balanced distribution, even though the statistical comparison indicated a small effect for this comparison, t(23) = 2.19, p = .039, d = 0.45. We ensured that all participants included in the analyses had experienced each key-effect mapping in sufficient quantity (see Participants section for the exclusion criterion).

Test phase

The mean error rate was 4.38% (SD = 3.32) for the congruent and 4.17% (SD = 3.19) for the incongruent condition, and a paired-samples t test indicated no significant difference between the two conditions, t(23) = 0.36, p = .723, d = 0.07, BF₀₁ = 5.99. RTs for correct responses did not differ between the congruent (M = 448 ms, SD = 38.8) and the incongruent condition (M = 444 ms, SD = 36.5), t(23) = 1.00, p = .328, d = 0.20, BF₀₁ = 3.97 (see Fig. 2). Similarly, the repeated-measures ANOVA including the neutral condition was nonsignificant for both error rates and RTs (both Fs < 1; see Table 1). When all participants were included in the analyses, there were still no significant differences between conditions, neither in error rates or RTs (ps ≥ .165) nor in the corresponding repeated-measures ANOVAs (both Fs < 1).

Table 1 Mean response times (RTs) and error rates for Experiments 1a, b, and 2

Full size table

To sum up, the results of Experiment 1a suggest that participants’ behavior was not influenced by the congruency manipulation as implemented in the test phase. Moreover, the analysis of BF₀₁ provided clear evidence for the absence of any congruency effect. The observed pattern of results is at odds with the original observations of Sato and Itakura (2013), especially when considering that the present design should come with high power to detect the previously reported effect size. In the light of these results, we ran Experiment 1b, which featured face stimuli with eyes gazing toward the mouth region. This gaze direction might direct visual attention of the participant to the mouth region of the face (see Langton & Bruce, 1999), eventually boosting the build-up of intersubjective action–effect binding.

Experiment 1b: Modified conceptual replication

In Experiment 1b, we used the same face photographs of the female individual from Experiment 1a, but now the eyes of the on-screen face were always gazing downward instead of looking directly at the participant (see Appendix). By doing so, visual attention of the participant was directed toward the crucial action effect location, that is the mouth region of the on-screen face (Langton & Bruce, 1999). Again, we expected to observe reduced RTs in congruent trials, where prime-associated and to-be-executed response matched, compared with incongruent trials, where prime-associated and to-be-executed response did not match.