When two individuals alternate reaching responses to targets located in a visual display, reaction times are longer when responses are directed to where the co-actor just responded. Although an abundance of work has examined the many characteristics of this phenomenon it is not yet known why the effect occurs. In particular, some authors have argued that action representation mechanisms are central to the effect. However, here we present evidence in support of an account in which the representation of action is not necessary. First, the basic effect occurs even when participants cannot see their co-actor’s movement but, importantly, have their attention shifted to a target side via an attentional cue. Second, its time course is too short-lasting to function effectively as a component of action planning. Finally, unlike other joint action phenomena, the effect is not modulated by higher order mechanisms concerned with the personal attributes of a co-actor. Taken together, these results suggest that this particular joint action phenomenon is due to attentional rather than action mechanisms.