Although unrelated actions or tasks performed by two distinct individuals are considered by common sense to be independent, numerous tasks have proven otherwise (Richardson, Marsh, Isenhower, Goodman, & Schmidt, 2007; Sebanz, Knoblich, & Prinz, 2003); a typical example of such tasks is the joint Simon task (JST). The JST derives from the standard Simon task (Simon & Wolf, 1963), in which a participant is asked to respond as quickly as possible to two different stimuli (e.g., a diamond and a square), singularly presented in a random fashion on either the left or the right side of a screen. Crucially, the participant is asked to respond with the right hand to a specific shape (e.g., the square) and with the left hand to the other shape (e.g., the diamond). As a consequence, responses to incompatible hand–stimulus positions (incompatible responses) result in slower reaction times (RTs) than do responses to compatible hand–stimulus positions (compatible responses). In the joint version of the standard Simon task, two persons are involved in the same task. Unlike the standard Simon task, each participant is instructed to exclusively respond to one of the two stimuli and to withhold the response for the irrelevant stimulus, thus creating a go–no-go type of task. When a participant performs an individual go–no-go version of the JST, no difference is usually observed between compatible and incompatible responses (Sebanz et al., 2003). However, as soon as two participants perform the task together, a spatial compatibility effect reemerges, as testified by compatible responses being faster than incompatible ones.

Conventionally, the JST has been explained in terms of action co-representation (Sebanz et al., 2003; Tsai, Kuo, Hung, & Tzeng, 2008) or task co-representation (Milanese, Iani, & Rubichi, 2010). Specifically, the actions of a coactor sitting beside an actor are taken into the actor’s action plan, thus giving rise to a potential representation conflict, which manifests itself in a spatial compatibility effect.

Although the action co-representation hypothesis is able to account for different instances of the JST, it currently fails to explain a growing number of recent findings (Dolk, Hommel, Prinz, & Liepelt, 2013; Doneva & Cole, 2014; Guagnano, Rusconi, & Umiltà, 2010; Klempova & Liepelt, 2015; Liepelt, Wenke, Fischer, & Prinz, 2011; Stenzel & Liepelt, 2015). For example, Dolk et al. (2013) showed that the typical spatial compatibility effect can be achieved by employing nonanimated and nonresponsive “coactors,” such as a clock, a metronome, or a waving cat in an individual go–no-go version of the JST. Given the absence of a human coactor and the nature of the events produced by such objects, these effects cannot be easily accounted by the co-representation hypothesis. Dolk and colleagues (Dolk et al., 2014; Dolk et al., 2013) have thus put forward an alternative model, namely the referential-coding account, which rests on the ideomotor theory (Prinz, 1997) and on the theory of event coding (TEC; Hommel, 2009).

According to the ideomotor theory and the TEC, there is a profound bound between actions and their corresponding action effects. Actions are represented by codes of their related perceptual consequences (e.g., the sound of the button press, the movement’s speed, etc.). Given the specific nature of these codes, actions can be bidirectionally triggered by the anticipation of the effects (internally triggered action) or by the perception of the effects (externally triggered action) in the environment.

Moving from the aforementioned assumption, the referential-coding account (Dolk et al., 2014; Dolk et al., 2013) suggests that in the context of the JST, events can be coded as potential effects of an action, regardless of the specific source (e.g., a participant, a computer, or a metronome). Hence, each event produced beside a participant is able to trigger a conflict with the potential events produced by the participant him- or herself (Prinz, 2015). Importantly, the likelihood and the strength of this conflict increase with the number of shared events; thus, if two participants share similar events, the conflict increases, and the spatial compatibility effect increases with it. To mediate this conflict, a participant relies on the features of the task that assure the strongest distinction between her or his own actions and the other participant’s actions. In the typical JST, the most relevant feature is the horizontal space, specifically the participants’ positions. According to the intentional-weighting hypothesis (Memelink & Hommel, 2013), indeed, attending more strongly to the respective response side helps to resolve this conflict.

Spatial and feature-based forms of attention (Liepelt, 2014; Sellaro, Dolk, Colzato, Liepelt, & Hommel, 2015), thus, acquire a fundamental role for discriminating one’s own actions/events from others’ actions/events in the referential-coding account. In a recent line of experiments, Liepelt (2014) investigated whether attention modulates the spatial compatibility effect as a function of spatial saliency. To alter the saliency of a specific location, the author manipulated the positions of the participants’ hands. Namely, when participants’ hands were placed on the monitor’s sides, this constituted a high-saliency condition, whereas the hands placed on their respective knees constituted a low-saliency condition. As expected, the high-saliency condition resulted in a stronger spatial compatibility effect than did the low-saliency condition. Remarkably, this effect was observed only when both participants positioned their hands around the screen. This outcome outlines how, according to the intentional-weighting account (Memelink & Hommel, 2013), increasing the similarity of participants’ events gives rise to a stronger conflict and a significantly stronger spatial compatibility effect, by changing the distribution of attention (Abrams, Davoli, Du, Knapp, & Paull, 2008).

In a previous study, Buetti and Kerzel (2010) investigated the impacts of different response modes (a finger lift and a pointing gesture) and gaze conditions (fixed gaze, free eye movements, and a saccade toward the stimulus) on a standard Simon task. They observed that the pointing gesture systematically caused a spatial compatibility effect in the fixed-gaze and free-eye-movement conditions, but not in the saccade condition. As the authors claimed, such results suggested that “the redirection of attention toward the correct response location contributes to the Simon effect” (p. 2187).

Adapting the experimental logic of Buetti and Kerzel (2010) to the JST, in the present study, we aimed to investigate whether indirectly varying the degree of attention by means of pointing gestures in two individuals could elicit a compatibility effect in the individual condition and modulate the joint condition. Toward this aim, we emphasized the discrepancy between the compatible and incompatible responses by asking participants to perform a pointing gesture toward their respective compatible side, regardless of the stimulus location. Such a manipulation assured that responding to an incompatible stimulus would most likely cause a shift of spatial attention from the stimulus location to the pointing target (situated on the opposite side), and conversely, that responding to a compatible stimulus would not cause any extra attention shift. Thus, we hypothesized that we would observe a spatial compatibility effect in the individual go–no-go condition, and a further modulation in the joint Simon condition.

Methods

Participants

Twenty-four right-handed participants (17 female, seven male; mean age = 22.4 years, SD = 4.3) took part in the experiment. All of the participants had normal or corrected-to-normal vision. Prior to the experiment, each participant gave written informed consent to participate to the study. All of the procedures were conducted in accordance with the guidelines of the local ethics committee of the University of Muenster and the 1975 Declaration of Helsinki. After the experiment, participants received course credits.

Stimuli and apparatus

Participants were seated in a sound-attenuated, dimly lit room in front of a CRT monitor (19 in.) at a distance ~60 cm. The visual stimuli consisted of a white square and a white diamond (1.9° × 1.9°) displayed on a black background (see Fig. 1). Stimuli could appear either to the left or to the right of a centrally presented fixation cross at a visual angle of 9.5°. Responses were given by means of two light-sensor keypads positioned on a desk on the right and left sides of the monitor’s middle line. The keypads were kept at a fixed distance of 15 cm from each other.

Fig. 1
figure 1

Schematic representation of a prototypical trial. The square on the left side of the display represents the current stimulus. The points placed on the right and left sides of the screen represent the targets of the pointing response

Task and procedure

The task consisted of an individual and a joint go–no-go version of the Simon task (Liepelt et al., 2011; Sebanz et al., 2003), in which either a square or a diamond was randomly presented on the left or the right side of the screen. In the individual condition, a single participant was seated on the right or the left side of the monitor, to keep sitting positions constant between the individual and joint conditions, and positions were balanced among dyads of subjects. Each participant was asked to perform a straight pointing response with the right hand to a previously assigned position on the monitor whenever the assigned stimulus appeared. The position of the stimulus was task-irrelevant. Responses were to be given as quickly and accurately as possible. Participants had to withhold their response whenever the not-assigned stimulus was presented. The combination of the participant’s pointing position and the stimulus position determined the compatible and incompatible conditions. In compatible trials, the pointing position and stimulus position spatially corresponded (e.g., right–right), whereas the two positions did not correspond in the case of an incompatible trial (e.g., left–right).

In the joint condition, a second participant was seated beside each participant, and each responded with a straight, right-hand pointing response to the monitor whenever the assigned stimulus appeared there. Stimuli were presented randomly, so that both persons of the dyad responded in a turn-taking fashion.

The manual response consisted in lifting the right index finger from the light-sensor keypad and pointing it toward a white dot placed 9.5° to the right or the left of the fixation cross. Participants were instructed to point exclusively at the dot placed on their respective side of the monitor, regardless of the stimulus location. Crucially, RTs and error rates were recorded when the finger left the light-sensor keypads, to avoid any potential confound due to the pointing movement.

Each trial started with the presentation of a central fixation cross and the two target points (250 ms), followed by the presentation of one of the two visual stimuli (150 ms) together with the fixation cross and the two target points. In the case of a correct response, the fixation cross was presented for 300 ms. In the case of a wrong response, error feedback was provided for 300 ms, in the form of the word Fehler (German for “error”). If no response was given within 1,800 ms, the feedback zu langsam (German for “too slow”) was shown for 300 ms. Following the feedback, there was a constant intertrial interval of 1,750 ms.

Prior to each experimental session (individual and joint conditions), participants performed 16 training trials, followed by two blocks (128 trials) per session of the actual experiment. Trials were uniformly distributed among four combinations of conditions (individual: compatible and incompatible; and joint: compatible and incompatible), thus resulting in 64 trials per combination. The order of the experimental sessions was counterbalanced among pairs of subjects.

Analysis

For the statistical analysis, responses were considered correct when the pointing finger was lifted within a time interval of 150–1,000 ms after target onset; the remaining responses were classified as outliers (0.4 %). The correct RTs were then subjected to a 2 × 2 repeated measures analysis of variance (ANOVA) with the factors Task Setting (individual or joint Simon task) and Compatibility (compatible or incompatible responses). Paired t tests were performed when necessary, and corrected for multiple comparisons by Bonferroni correction. The statistical analysis was performed by using the R environment (version 3.1.2). Due to the overall low error rates (0.1 % for the joint condition and 0.06 % for the individual condition), error rates were not analyzed further.

Results

The repeated measures ANOVA revealed a main effect of compatibility [F(1, 23) = 103.8, p < .0001, η p 2 = .81], indicating that compatible responses (450 ms) were generally faster than incompatible responses (487 ms). No main effect was found for the task setting [F(1, 23) = 0.9, p = .3, η p 2 = .04]. However, a significant Compatibility × Task Setting interaction [F(1, 23) = 8.14, p = .012, η p 2 = .3] was observed, indicating that the compatibility effect in the joint condition [49 ms; t(23) = 9.2221, p < .0001] was larger than the compatibility effect in the individual condition [24 ms; t(23) = 4.3494, p < .0005; see Fig. 2].

Fig. 2
figure 2

Picture representing the average reaction times for compatible (depicted in white) and incompatible (depicted in gray) responses in the individual and joint conditions, respectively. Asterisks indicate p values: ** p < .001, *** p < .0005, **** p < .0001. Error bars represent standard errors of the means

Discussion

The aim of the present study was to assess whether indirectly changing the degree of attention necessary to respond to compatible and incompatible conditions could elicit a spatial compatibility effect in the individual Simon task and modulate the joint Simon task.

Contrary to the results conventionally observed in the individual Simon task (Dolk et al., 2013; Sebanz et al., 2003), as expected, our data showed a large effect of spatial compatibility in the individual condition. Furthermore, this effect appeared to be increased in the joint condition. As we hypothesized, the individual effects seemed most likely to be driven by the attention shift inherent in the response planning (Adam et al., 2008).

Considering the dynamics that guide pointing responses, it is indeed plausible to assume that during incompatible trials, participants—to successfully execute the movement—had to covertly shift their spatial attention from the current target position toward the response point on the opposite side. In contrast, in the compatible condition, participants did not have to shift attention. This explanation is motivated by the fact that participants were explicitly instructed to provide a fast pointing response, and therefore were compelled to plan the pointing action before lifting their finger. Due to this early action planning, participants were forced to perform an extra computation only in the incompatible condition—namely, to shift their attention from the location of the appearing stimulus to the target-point location (on the opposite side). Given that this process requires additional time with respect to the traditional button press procedure, we were also able to obtain a compatibility effect in the individual condition.

Furthermore, if participants had not planned their action in advance, the simple lifting response would have provided RTs virtually undistinguishable from those of responses provided with classic button press responses, which may explain the relatively large joint Simon effect in our study as compared to previous studies.

This process seems to account for most of the difference in RTs between the compatible and incompatible conditions. A further factor that contributed to the results might have been the saccadic eye movements toward the response point. Even though participants were instructed to keep their gaze on the central fixation cross for the whole trial, eye movements were not monitored; thus, we could not rule out effects due to systematic eye movements. However, even assuming the extreme case in which systematic saccades were executed, the contribution of attention would not have been drastically reduced, but most likely the RTs and RT distributions would have changed without affecting the direction of the general spatial compatibility effect.

These conclusions are supported by a recent study by Buetti and Kerzel (2010), in which the authors tested in a standard Simon task the influence of eye movements (or fixed gaze) as a function of two types of responses. Crucially, in the scenario comparable to the present study—the pointing response combined with a fixed gaze—a spatial compatibility effect was observed. The authors, in line with our results, argued that in the absence of eye movements, the effect was driven by a spatial covert attention shift.

More interestingly, in the joint condition we observed that the spatial compatibility effect was further modulated by the presence of a coactor who also pointed to his or her target location. The latter result is in line with the referential-coding account (Dolk et al., 2014; Dolk et al., 2013), and specifically with the intentional-weighting hypothesis (Memelink & Hommel, 2013). The pointing gesture did indeed increase the discrepancy between the compatible and incompatible conditions in the horizontal space. Consequently, in the joint condition an additional weighting—a further focus on the respective response side—was most likely required to resolve the conflict between the actor’s and coactor’s events. As we already mentioned, this effect was significantly reduced in the individual condition, in which conflict could only be attributed to the difference between compatible and incompatible responses, but not to any external event, such as the events produced by the participants themselves during the joint condition. These results clearly show that attention plays a crucial role in triggering and modulating the spatial compatibility effect (Doneva & Cole, 2014; Liepelt, 2014). Specifically, our results show that by manipulating the response action and its intrinsic attention component, a spatial compatibility effect can be observed even in an individual Simon task (Stenzel & Liepelt, 2015). Furthermore, we showed that this effect was modulated by the presence of a coactor performing a referential pointing gesture to his or her target location. Joint action—in particular, joint referential pointing—enhances the impact of target processing on the participants’ actions, by changing the spatial covert attention processes, which provides a new view on dimensional overlap in spatial compatibility effects like the joint Simon effect (Kornblum, Hasbroucq, & Osman, 1990). These findings support a relatively novel view, holding that domain-general cognitive-control processes that adjust the attentional focus impact the amount of self–other integration (Colzato, van den Wildenberg, & Hommel, 2013; Liepelt et al., 2012), which may be a central mechanism allowing people to navigate the social world (Heyes, 2014). Specifically, the attentional demand necessary to solve the go–no-go task can have a strong influence on the self–other integration component of the joint condition. Such a modulatory effect might not just constitute a linear additional effect, on top of the traditional spatial compatibility effect, but it might have different influences on self–other integration: On the one hand, the increased saliency might generally emphasize our own action effects as well as the other’s action effects, thus resulting in a stronger compatibility effect. On the other hand, the higher saliency of the coactor might also strengthen the pure social component, thus further modulating the spatial compatibility effect.

Overall, the present study shows that a different kind of response mode (pointing action) provides additional evidence for the important roles of individual and joint attention (Böckler, Knoblich, & Sebanz, 2011; Liepelt, 2014) in the joint Simon effect, and more generally in joint action.