Humans often perform tasks in which multiple people work together toward common goals. Through these joint actions, humans can achieve goals that would be difficult to accomplish when acting alone. Movement planning during joint action is more complex, however, because the co-actors need to coordinate their behaviors. One process thought to facilitate this coordination is action co-representation—a process wherein individuals develop a representation of their co-actor’s action plan (Sebanz & Knoblich, 2009). It is thought that by representing the co-actor’s responses, individuals are able to plan their own movements cohesively with those of their co-actor. Several experimental methods have been used to gain insight into joint action and action co-representation (Ray & Welsh, 2011; van der Wel, Knoblich & Sebanz, 2011; Welsh et al., 2005). One method that is frequently used in this exploration is the joint Simon task (Sebanz, Knoblich & Prinz, 2003).

The joint Simon task was designed to explore how co-actors’ actions influence an individual’s response selection. In Sebanz et al.’s (2003) studies, participants executed left and right buttonpresses in response to green and red rings, respectively (relevant color stimulus dimension). The rings were presented on the index finger of a hand that pointed to the left or right, or was neutral (irrelevant spatial stimulus dimension). Compatible trials were defined as those in which the finger pointed to the same side of space as the response indicated by the ring color (e.g., green ring on a left-pointing finger). On incompatible trials, the finger pointed to the side of space opposite the one indicated by the color (e.g., green ring on a right-pointing finger). Participants completed three main tasks. In the two-choice task, they completed the task alone and were responsible for making responses to both color stimuli. In the individual go–no-go task, they completed the task alone but only responded to one color (e.g., green) by executing only one response (e.g., left). They were to withhold their response when the alternative color (e.g., red) appeared. The novel joint go–no-go task was similar to the individual go–no-go task, in that participants only responded to one color. In this condition, however, the participants performed their go–no-go task alongside a co-actor who responded to the other color (e.g., the participant on the left responded to green, while the participant on the right responded to red). Hence, both colors required responses (as in the two-choice task), but each participant performed the task as they would in the individual go–no-go task.

A key pattern of results emerged from Sebanz et al.’s (2003) experiments. In the two-choice and joint go–no-go conditions, spatial-compatibility effects emerged such that response times (RTs) on compatible trials were shorter than those on incompatible trials. In contrast, no such compatibility effects were observed in the individual go–no-go task. Sebanz et al. argued that the same pattern of results occurred in the two-choice and joint tasks because these tasks elicited functionally similar cognitive processes. They argued that stimulus codes were linked with the specific response codes. Therefore, the presentation of the stimulus automatically elicited subthreshold activation of any response coded with it. Because the stimuli in the experiment had multiple dimensions (color and space), each dimension could activate a different response. When the dimensions overlapped, the same response was activated, facilitating response selection and decreasing RTs. When dimensions did not overlap, competing responses were activated and the nontarget response had to be inhibited, increasing RTs. This response facilitation and competition did not occur in the individual go–no-go task because only one response was required and the participants formed only one response code. In contrast, in the two-choice and joint tasks, there were two possible responses, and as a result, the participants formed two response codes. Importantly, participants experienced response activation and competition whether the alternative response was their own (two-choice task) or their partner’s (joint task). Thus, Sebanz et al. argued that the joint Simon effect (JSE) emerged because participants engaged in action co-representation and represented their co-actor’s responses in a way functionally similar to how they represented their own (see also Lam & Chua, 2010; Tsai & Brass, 2007; Tsai, Kuo, Hung & Tzeng, 2008; Welsh, Higgins, Ray & Weeks, 2007).

Despite general support for the action co-representation account, some researchers have argued against this explanation, suggesting instead that the JSE emerges because participants code the location of their response with respect to another salient stimulus in the environment (Dolk et al., 2011; Guagnano, Rusconi, & Umiltà, 2010). The results of a study by Weeks, Proctor, and Beyak (1995) are consistent with this alternative spatial-coding explanation. They found that simply placing an inanimate object beside the responding effector during an individual go–no-go task produced a spatial-compatibility effect. Furthermore, others have reported that when there is no spatial reference, such as in the individual go–no-go task or when the co-actor is outside of peripersonal space, the compatibility effect disappears (Guagnano et al., 2010; Welsh et al., 2007; cf. Tsai et al., 2008; Welsh et al., 2012). Hence, some have argued that the JSE may simply be driven by the presence of an object (e.g., the co-actor or their response), rather than by representations of a co-actors’ actions.

Thus far, this alternative account could not be fully addressed due to a confound between the spatial dimension of the effector and the spatial dimension of the stimuli in the typical joint Simon task: Specifically, the response button and the actor occupied the same side of space. Although this confound is present in typical two-choice and joint Simon studies, Hommel (1993) developed a spatial-compatibility task that addressed this issue by manipulating the actors’ cognitive representations of their actions. Participants sat at a desk with two response buttons (left, right) and two action aftereffect lights (left, right). In the key conditions, pressing one button caused the light in contralateral space to illuminate (e.g., right responses caused left aftereffects). The imperative stimuli were high and low tones presented from speakers in left and right space. Critically, the instructions varied between groups. One group of participants was instructed to “press one of the buttons” in response to stimuli—a response focus (e.g., left press for a high tone). The other group was instructed to “turn on one of the lights” in response to stimuli—an aftereffect focus (e.g., right light [left press] for a high tone). Participants given the response focus demonstrated a typical spatial-compatibility effect (shorter RTs when the stimuli were ipsilateral to the to-be-pressed button), whereas participants given an aftereffect focus demonstrated inverse spatial-compatibility effects (shorter RTs when the stimuli were contralateral to the to-be-pressed button, but ipsilateral to the aftereffect). Thus, through manipulating the feature that actors used to represent their action (response vs. aftereffect), Hommel was able to reverse the direction of the spatial-compatibility effect. These results suggest that actions can be coded by either the effector or the aftereffect and, as such, have been used to support ideomotor (a.k.a. common-coding) approaches to cognition and motor planning (see Hommel, 1993, and Prinz, 1997, for more in-depth discussions). Of greater relevance to the present study, this method disentangles the confound between the spatial dimensions of the task goal and the effectors (or the participants’ bodies).

Thus, the present research adopted the design of Hommel’s (1993) study to contrast the action co-representation and spatial-coding accounts of the JSE by requiring participants to complete an aftereffect-focused task in both two-choice and joint contexts. We only employed the aftereffect-focused task because it was the only condition that disentangled the spatial dimensions of the task goal and the effectors. The two-choice task was a conceptual replication of Hommel’s aftereffect-focused task, so participants were instructed to illuminate one of two virtual lights following the presentation of a low- or a high-pitched tone. We anticipated replicating the inverse (aftereffect-based) compatibility effect in this two-choice task. The more theoretically relevant results for the present study were those from the joint task. In the joint task, the participants performed a go–no-go version of the aftereffect-focused task alongside a confederate completing the other half of the task. If action co-representation underlies the JSE and individuals are able to code the goal of their co-actor in addition to their own, aftereffect-based coding will then drive response conflict/facilitation and cause inverse compatibility effects to emerge. Alternatively, if the response-based spatial-coding account is correct, then aftereffect coding will be irrelevant, and participants will code their response with respect to the location of another relevant stimulus (the co-actor’s response). This set of processes would lead to a typical response-based compatibility effect in the joint task.

Method

Participants

A group of 20 participants (18–37 years old; 16 female, 4 male) from the University of Toronto community were recruited. The participants were right-hand dominant, reported normal or corrected-to-normal hearing and vision, and were naïve to the purpose of the experiment. They provided written informed consent before the study and were financially compensated. All procedures complied with the ethical standards regarding the treatment of human participants in research according to the 1964 Declaration of Helsinki and were approved by the University of Toronto Research Ethics Board.

Apparatus and stimuli

A Dell Optiplex 780 computer running E-Prime 2.0 (Psychology Software Tools, Inc., Sharpsburg, PA) was used to control stimulus presentation and record responses. The visual stimuli were presented on a 19-in. Dell LCD display. Auditory stimuli were presented via an Altec Lansing Series 100 speaker system.

The stimulus events in each trial are shown in Fig. 1. Each trial began with a black screen. After 500 ms, a 5-cm × 5-cm white fixation cross was presented in the middle of the screen for 1,000 ms. Next, two 4.5-cm (w) × 9.5-cm (h) virtual lightbulbs were displayed 9.75 cm to the left and right of the fixation cross. After a random foreperiod of 1,000–3,000 ms, an imperative auditory stimulus was delivered. The imperative stimulus was a 100-ms “low” (200-Hz) or “high” (800-Hz) tone presented from either the left or the right speaker. The participants had 1,000 ms to respond to the stimuli by pressing one of two keys on a QWERTY keyboard. They were to press “z” in response to high tones and “3” (on the number pad) in response to low tones. The keyboard was arranged so that the “z” and “3” keys were in line with the edges of the screen. When a key was pressed, the lightbulb in contralateral space was “illuminated” (filled with bright yellow). The left lightbulb illuminated when the “3” was pressed, and the right lightbulb illuminated when “z” was pressed. The screen returned to the black background after the lightbulb had illuminated for 1,500 ms, indicating that another trial was beginning.

Fig. 1
figure 1

Diagram of experimental events and timelines. Note that the virtual lightbulb on the left of the screen “illuminates” when the “3” button is pressed (diagrammed), and the virtual lightbulb on the right of the screen “illuminates” when the “z” button is pressed (not diagrammed). Compatibility is defined with respect to the relationship between the side of space on which the tone is presented and the location of the response

Tasks and procedure

The experiment was conducted over a single 30-min session with three practice blocks and six experimental blocks. Practice blocks were included because pilot testing revealed that some individuals found it challenging to form links between the action and aftereffect codes and/or to perform the task as instructed (they could not, or did not, focus on the aftereffect). These individuals demonstrated conventional Simon effects in the two-choice task (cf. Hommel, 1993). Hence, practice blocks were included in order to maximize the potential for action/aftereffect binding and to assist in correct task performance. The participants always completed the practice blocks alone and were responsible for executing both responses. The first practice block was designed to familiarize participants with the aftereffect of each keypress and consisted of 20 trials in which a red arrow pointed to one of the virtual lights, prompting the participant to illuminate it by pressing the contralateral key. The second practice block was designed to introduce participants to the imperative auditory stimuli. Twenty trials were completed wherein the imperative stimulus was presented ipsilateral to the to-be-illuminated light. The third practice block consisted of 20 trials and was identical to the experimental blocks. This block familiarized participants with the random presentation of imperative stimuli from the left or right speaker.

Participants began the experimental blocks immediately after practice. There were three two-choice and three joint experimental blocks. Participants completed all three blocks of one task before completing the other task, and the block order was counterbalanced across participants. In both tasks, participants were given instructions to focus on generating the aftereffects (e.g., “illuminate the left light when you hear the low tone”; i.e., Hommel’s, 1993, aftereffect-focused task). In the two-choice task, participants responded to both high and low tones with left- (“z”) and right- (“3”) hand responses, respectively. In the joint task, each participant worked with a confederate. The participants sat on the right and were responsible for illuminating the left light bulb in response to low tones (by pressing “3” with the right hand). The confederate sat on the left and illuminated the right light bulb following high tones (by pressing “z” with the right hand).

Blocks had 40 trials with 10 random presentations of each tone–location combination. Consistent with previous studies (Hommel, 1993), trials were coded as compatible or incompatible on the basis of response location (e.g., low tones from the left speaker—indicating left aftereffects but right buttonpresses—were considered incompatible). At the beginning of each block, participants were reminded to generate the appropriate aftereffect when the imperative stimulus was presented.

Results

Only trials in which participants responded to low tones were analyzed, because this tone was responded to in both tasks. Trials in which the wrong key was pressed or the wrong person responded (response errors) were excluded (3.08 % of trials). Subsequently, an individual- and condition-specific outlier procedure was used in which RTs exceeding ±2 SDs from the mean RT for that condition for that person were eliminated (2.83 % of trials). Finally, because pilot testing had revealed that some individuals were unable to perform the task as required, only participants demonstrating the expected inverted compatibility effect in the two-choice task were included in the final analysis. Because the critical research question was whether the inverted compatibility effects observed in the two-choice task would also be observed in the joint context, participants showing conventional response-based effects (mean compatible RT < mean incompatible RT) were eliminated. Individuals who were unable to perform the task as instructed (indicated by conventional compatibility effects in the two-choice task) could not be expected to perform the task as instructed in the key joint task. Although this theoretically driven selection likely enhanced the magnitude of the benchmark effect in the two-choice task, it preserved the integrity of the critical comparison in the key joint task. Fifteen of the 20 participants showed an inverted compatibility effect (mean compatible RT > mean incompatible RT) in the two-choice task. Mean RTs for these participants were entered into a 2 (Task: two-choice, joint) × 2 (Compatibility: compatible, incompatible) repeated measures ANOVA. Alpha was set at .05 for all tests.

The analysis revealed a main effect of Task, with RTs in the joint task (313 ms) being shorter than those in the two-choice task (397 ms), F(1, 14) = 28.63, p < .001, η 2p = .672. Of greater theoretical relevance, the main effect of Compatibility revealed that RTs on incompatible trials (337 ms) were significantly shorter than RTs on compatible trials (373 ms), F(1, 14) = 33.42, p < .001, η 2p = .705. There was also a significant Task × Compatibility interaction, F(1, 19) = 24.83, p < .001, η 2p = .639. This interaction emerged because the inverse compatibility effect was significantly larger in the two-choice than in the joint task (Fig. 2). It is critical to note, however, that the inverse compatibility effects were significant in both the two-choice, t(14) = 6.601, p < .001, d = 0.606, and joint, t(14) = 2.41, p < .05, d = 0.283, tasks.

Fig. 2
figure 2

Mean response times (in milliseconds) as a function of Task and Compatibility. SEM bars are shown

Although inverse compatibility effects were present in the joint task, it is unclear whether the effect was dependent on the presence of the co-actor. To determine whether a co-actor’s presence drove the significant effects in the joint go–no-go task, a control experiment was conducted in which 30 new participants were recruited to perform an individual go–no-go task (i.e., without a co-actor) in addition to the two-choice task. Analysis of the data from participants demonstrating inverse compatibility effects in the two-choice task of the control experiment (n = 12) revealed a significant Task × Compatibility interaction, F(1, 11) = 4.83, p = .05, η 2p = .3. In contrast to the main experiment, however, the interaction arose because the inverse compatibility effect was only significant in the two-choice task, t(11) = 4.78, p < .005, d = 0.503 (for compatible RTs, M = 425 ms, SD = 86.2; for incompatible RTs, M = 382 ms, SD = 80.9). The compatibility effect in the individual go–no-go task was not significant, t(11) = 0.73, p > .47, d = 0.100 (for compatible RTs, M = 344 ms, SD = 86.4; for incompatible RTs, M = 336 ms, SD = 78.6). Overall, the pattern of results suggests that a co-actor is critical to the emergence of reliable inverse compatibility effects in go–no-go tasks.

Discussion

The present research was designed to contrast the action co-representation and spatial-coding accounts of the JSE by eliminating the confound caused by spatial overlap between the task goal and the response. The analyses revealed inverse compatibility effects in the two-choice and the joint tasks, but no compatibility effects in an individual go–no-go task. These findings are not consistent with the spatial-coding explanation of the compatibility effects observed in previous JSE studies. In contrast, the findings do support the hypothesis that action co-representation is the mechanism underlying the JSE and suggest that a similar set of mechanisms leads to compatibility effects in two-choice and joint tasks.

To elucidate, the inverse compatibility effects in the two-choice task replicated Hommel (1993). These results indicate that participants were able to employ an intentional coding approach, coding their responses with the most salient (instructed) feature. In the present work, the most salient spatial feature was that of the aftereffect (rather than the response). Through intentional coding, imperative stimuli automatically elicited subthreshold activations of the spatial codes for the aftereffect. As a result, the facilitation or conflict in response selection that occurred as a consequence of response code activation to the irrelevant spatial feature of the imperative stimulus did so because of the spatial features of the aftereffect. This aftereffect-based activation occurred despite the fact that the response had a different and potentially functional spatial feature. These results indicate that the direction of the Simon effect is not strictly tied to the location of the effector, but can be manipulated by intentionally focusing on specific task components (Hommel, 1993).

Beyond replicating Hommel (1993), the present research revealed an inverse Simon effect in the joint, but not in the individual, go–no-go task. These contrasting results indicate that the presence of a co-actor drove the effect in the joint go–no-go task. Although the inverse JSE in the joint task was smaller in magnitude than that in the two-choice task, the difference between compatible and incompatible RTs in the joint task was statistically significant and was similar in magnitude (15 ms) to previously observed JSEs (e.g., ~13 ms in Tsai & Brass, 2007; 15 ms in Welsh et al., 2007). Smaller compatibility effects may occur in a joint than in a two-choice task because participants only need to execute one response in the joint task. That is, because the alternate response is not executed in the joint task, representations of the (co-actor’s) alternative responses may never reach the same level of activation as when the actual alternative response is required in the two-choice task (see also Lam & Chua, 2010; Welsh et al., 2007). More importantly, the inverse JSE suggests that a functionally similar set of processes and codings are employed in the joint and two-choice tasks. We suggest that the inverse JSE emerged because participants represented their co-actor’s actions in the joint task similarly to how they represented their own actions in the two-choice task. Because participants intentionally coded their own actions in terms of aftereffects, they encoded their co-actor’s actions similarly. Thus, rather than associating stimulus codes with response codes based on their co-actor’s effector, participants encoded their co-actor’s intended goal. As a result, the spatial features of the aftereffect were responsible for the facilitation or competition in response selection that led to the inverse JSEs observed here.

Importantly, the inverse effects cannot be explained by the spatial-coding account (e.g., Dolk et al., 2011; Weeks et al., 1995). If participants in the joint task simply coded their responses with respect to the spatial feature of a salient object in the environment, the pattern of RTs in the joint task would be opposite to that of the two-choice task. Such was not the case. Hence, the present findings join those from a growing list of studies that have elucidated the context-specific effects of the co-actor in the joint Simon task. For instance, Tsai and Brass (2007) showed that participants exhibit Simon effects when co-acting with a video of a human hand, but not when co-acting with a video of a puppet hand. Lam and Chua (2010) found that the JSE disappeared when both partners responded to the same stimulus. They reasoned that this occurred because the co-represented response was the same as the one that they were to execute, and hence, there was no conflict following spatially incompatible target presentations, despite the presence of the co-actor. Finally, Hommel, Colzato, and van den Wildenberg (2009) showed that the JSE can be modulated by the relationship between co-actors, with effects emerging when there is a positive relationship between partners, but disappearing when there is a negative relationship between partners (see also Iani, Anelli, Nicoletti, Arcuri, & Rubichi, 2011; Kuhbandner, Pekrun, & Maier, 2010). These context-specific results cannot easily be explained via a spatial-coding mechanism that is based on the relationship between the location of the response and any other environmental stimulus. Instead, these findings suggest that action co-representation is the basis of the JSE and that the nature and intention of a co-actor can have modulating effects on how actors represent their co-actor’s actions.

In sum, the present pattern of effects is consistent with the action co-representation account of the JSE. Furthermore, these findings provide initial evidence that ideomotor coding can play a functional role in joint action. According to ideomotor theory (e.g., Prinz, 1997), the representations of actions are tightly bound to the representations of the perceptual events associated with those actions. It has been suggested that the coupling of action and perception codes leads to efficient response selection when individuals act alone. Activation of the representation of a desired effect also activates the action code that brings about that effect. Likewise, one can efficiently predict the consequences of a selected action because the perceptual codes of the aftereffect resulting from that action are simultaneously activated when a response is selected. Sebanz and Knoblich (2009) extended these potential mechanisms to joint action. They suggested that an individual’s perception or knowledge of a co-actor’s action aftereffects activates the response codes necessary for the co-actor to generate those effects (and vice versa). Once activated, the individual can use these co-represented codes to coordinate their own actions with the predicted actions and effects of their co-actor. Although the present research does not directly address this prediction, the finding that individuals can adopt an aftereffect-based coding system in the joint task suggests that ideomotor coding could be used in joint action contexts. Further research will be needed to test these ideas in other joint action contexts.