Introduction

A large number of recent papers (e.g. Iacoboni et al. 2001; Brass et al. 2001; Kilner et al. 2003; Brass and Heyes 2005) have been based on a conjecture that was first proposed in the middle of the nineteenth century by William James (1890); see Stock and Stock (2004) for a review. This conjecture, commonly referred to as ‘Ideo-Motor theory’ or the ‘Ideo-Motor hypothesis’ (henceforth, ‘IM conjecture’), states “every representation of a movement awakens in some degree the actual movement which is its object; and awakens it in a maximum degree whenever it is not kept from so doing by an antagonistic representation present simultaneously to the mind” (James 1890, p. 1134).

It can be seen that the IM conjecture has three parts. The first is straightforward: (1) a relationship exists between the representation of an action and the resulting movement trajectory. The conjecture then proposes a specific form to the relationship: (2) movements are (completely) activated through ‘imagining’ an action’s effects (including observing a biological agent perform the action, preparing an action or in any other way representing the action); (3) such a ‘represented’ action must be inhibited or else implemented. The first component of the conjecture is consistent with modern computational models of motor learning (e.g. Wolpert et al. 2001) where an inverse model converts a desired trajectory into the appropriate motor commands (the inverse model having developed through a feedback error signal). Thus, the first part of the conjecture is uncontroversial. It is, however, the latter two components that have received much recent attention in the literature. The reason for the renewed interest in the IM conjecture is the discovery of ‘mirror neurons’ that fire both when a monkey executes an action and when the monkey observes another actor execute the same action. Evidence for the existence of such neurons in humans has been provided through fMRI studies (see Rizzolatti 2005). Attempts have been made to understand the role of mirror neurons by relating their activity to component (2) of the IM conjecture. This connection is attractive to researchers because it appears to provide neurophysiological support for the idea that: (a) actions can be learned directly through observation and (b) empathy is achieved via covert simulation of other people’s actions (see Brass and Heyes 2005; Sommerville and Decety 2006; Gallese and Goldman 1998; Meltzoff and Decety 2003; Rizzolatti et al. 2001).

The issue is whether there is any evidence supporting conjectures (2) and (3) outlined above. In short, what is the behavioural evidence that observing another human executing an action causes activation of that movement that requires subsequent inhibition? The prediction from this strong claim is that because viewing an action creates a compulsory activation of a matching action, actually performing that action should be facilitated by being already activated, while performing an incongruous action should be hindered as the activated action is inhibited and the new action prepared. The majority of studies within this area (e.g. Brass et al. 2001; Press et al. 2005; Sebanz et al. 2003) have therefore attempted to support the IM conjecture by showing that an actor’s movements can be facilitated (faster reaction times) by observing someone else make a congruous movement (i.e. the same action in the same direction). Additionally, there has been an attempt by Kilner et al. (2003) to show that an actor’s movements can be disrupted (increased variability) by observing someone else make an incongruous movement (the same movement in an orthogonal direction). These effects are claimed to be specific to observing human (biological) movements, implicating an IM type mechanism rather than a more general stimulus–response compatibility effect.

There may be a fundamental problem with such approaches. For example, researchers (e.g. Brass et al. 2001; Press et al. 2005) have attempted to demonstrate movement facilitation using measures of reaction time (RT). The difficulty with this approach is that RT is influenced by a number of factors, including the visual salience of the imperative stimulus and stimulus–response compatibility (SRC). These factors are separate from any possible effects caused by observing human movement. Thus, regardless of the nature of the stimuli (e.g. the stimuli might be entirely symbolic), RTs are faster when the stimulus is easier to detect (Aicken et al. 2007; Van Donkelaar et al. 1994) or when the required response is spatially or conceptually compatible with the stimulus. RTs are slower when the stimulus is hard to detect or incompatible with the required response (see Vu and Proctor 2004 for a useful overview). The spatial SRC effect can be shown simply by using a choice reaction time task where participants are asked to press a button on the right or a button on the left when a stimulus can appear on the right or left of a computer screen. Participants are faster when asked to press the button on the right when the stimulus appears on the right than when it appears on the left and vice versa (Fitts and Seeger 1953). These spatial compatibility effects have also been shown within a simple response task (SRT) paradigm (Hommel 1996), in which the response is the same throughout a block of trials. RTs are also faster when the response is conceptually compatible (e.g. both ‘opening’ movements regardless of spatial orientation; Press et al. 2005) to the imperative stimulus (Shaffer 1965; DeJong 1995; Stoffels 1996; Vu and Proctor 2004). Again, these effects are separate from any effects that might occur through observation of human movement.

In an attempt to avoid the use of RT measures when seeking support for the IM conjecture, Kilner et al. (2003) asked participants to move their limbs in response to a moving visual target. In one condition (compatible), the participants were asked to directly track the target. In the other condition (incompatible), the participants were asked to move their arm at right angles to the visual target. It is known that humans are competent at directly tracking a target with their hands (see e.g. Miall et al. 2000). If a predictable visual signal is used (e.g. a sinusoidal movement) this task can be completed quite easily. For a less predictable signal, the actor must rely on feedback (with the resulting decrement in performance due to delays, etc.). Any manipulation that makes it harder for humans to use feedback should result in decreased performance on a tracking task. Asking someone to track a target moving orthogonal to their hand makes it harder for them to detect spatial errors between the position of their hand and the position of the target (in fact, the task requires a complicated spatial mapping between hand and target location). Thus, one would predict that performance (measured by variance of the spatial path) on an incongruous tracking task would be lower than performance on a congruous tracking task. These effects are again separate from any influence of human movement on performance, raising issues about the extent to which Kilner et al.’s (2003) paradigm can be taken as support for IM conjecture (2). However, Kilner et al. reported a most unexpected finding where their participants showed the expected difference between congruous and incongruous tracking but only in response to human generated movement. Kilner et al. found no difference between congruous and incongruous tracking in response to a computer generated signal (implemented through a robotic arm). Kilner et al. interpreted this to mean the robotic movement did not produce a movement representation that had to be inhibited—however, from a SRC perspective the lack of a difference between congruous and incongruous tracking is most surprising and thus worthy of further investigation.

The SRC considerations above raise concerns about some existing studies that were designed to provide support for component (2) of the IM conjecture (Brass et al. 2001; Press et al. 2005; Kilner et al. 2003). In short, the facilitation and interference effects might be characteristic of human response mechanisms to any stimuli that vary with respect to salience, predictability or response compatibility (spatial or conceptual). In order for these paradigms to support the component (3) of the IM conjecture, it is necessary to establish that these effects are actually specific to observing human movement. The current study therefore set out to test whether the effects reported by Brass et al. (2001), Press et al. (2005) and Kilner et al. (2003) could be replicated using non-human (symbolic) stimuli.

Methods: Experiment 1

Brass et al. (2001) used a simple response task paradigm where participants were asked to tap or lift their index finger (in respective blocks) as they observed movements of the stimuli on a computer screen. The stimuli were either a finger or a cross and the movement of the stimuli was either an upward movement or a downward movement. The participants’ task was to perform a pre-defined response as quickly as possible when the imperative stimuli moved. This response was the same for all trials in one block and consisted of the participants either tapping their index finger or lifting it. Thus, in one and the same block, participants performed both congruent and incongruent movements. Brass et al. found that RTs were faster when participants performed a congruent movement and when they attended to the finger stimulus compared to the crosses. However the finger and cross stimuli were not matched for salience, and examination of the displays used suggests that this may be an alternative mechanism by which the different RT patterns could have arisen. This current study replicated the Brass et al. (2001) experiment but used more matched stimuli to test whether the Brass et al. (2001) results are open to an alternative explanation.

Participants

Eight students at the University of Aberdeen, (three males) ranging in age between 22 and 30 years (mean age 24.13 years) volunteered for this experiment (two other participants’ data were lost due to technical problems). All were naïve to the purpose of the study. Seven of the eight participants were right-handed and all participants had normal or corrected-to-normal vision. The study was approved by a University ethics committee and was performed in accordance with the ethical standards laid down in the Declaration of Helsinki.

Apparatus

An Optotrak 3020 system (Northern Digital, Waterloo, Ontario, Canada) recorded position data at a sample rate of 125 Hz by tracking an infra red emitter (IRED) placed upon the participants’ left index finger nail. Optotrak data recording and stimulus presentation were synchronised via an electronic trigger. The stimuli were displayed on a Toshiba Tecra 8000 Pentium 233 MHz laptop with a 13.3” display with a frame rate of 60 Hz, a screen resolution of 800 × 600 and 24-bit colour settings.

Procedure

The study used two separate simple response tasks (i.e. the response was the same irrespective of the stimulus). In one block, participants were required to tap their left index finger when the stimulus changed (see below) and in the other block the required response was to lift their left index finger. In both blocks, the starting position was the same—participants rested their left hand on a table with the index finger elevated a few centimetres above the surface. Participants sat in front of the laptop screen at a distance of 85 cm. The stimuli (both the finger and the pen) were approximately 5.4° × 1.4° of visual angle in size for the pen; 5.4° × 1.7° for the finger, and the overall movement of the stimulus was approximately 2°.

Participants were shown a two-frame animation. This first frame showed the stimulus (either a finger or a pen, positioned horizontally) for either 800, 1,600 or 2,400 ms. The second frame showed the stimulus tilted either upwards or downwards, which gave the strong phenomenological appearance of the object moving. The second frame was presented for 500 ms (see Fig. 1). This movement was the cue for participants to respond (by either lifting or tapping their finger, depending on the condition) as quickly as they could. In between trials, a blue screen was shown for either 2,600, 3,400 or 4,200 ms in order to maintain a constant overall trial length of 5.5 s.

Fig. 1
figure 1

The finger (biological) and pen (non-biological) stimuli used in Experiment 1. From left: the starting frame, the raised stimulus and the lowered stimulus

Design

Participants performed two separate sessions, one with the finger stimulus and one with the pen. Each session consisted of two blocks of 120 trials, with one block requiring participants to lift their finger in response to the stimulus, and the other block requiring them to tap their finger as the response. Session and block were both counterbalanced: half the participants saw the pen stimuli in the first session and the finger stimuli in the second and vice versa. Within each session group, half the participants performed the block where the response was to lift their index finger first, and the block where the response was to tap their index finger second, and half did the blocks in the reverse order. In between each block participants had a short break and the two different sessions were separated by an average of three days.

Data analysis

The major dependent variable was reaction time (RT). The stored data files were analysed using Labview (Version 8) software routines. The data were filtered using a dual-pass Butterworth second-order filter with a cut-off frequency of 16 Hz (equivalent to a fourth-order zero phase lag filter of 10 Hz). RT was computed offline as the amount of time between stimulus onset and the index finger beginning to move (when the velocity exceeded a threshold of 5 cm/s). A repeated measurement ANOVA with four within-subjects factors was computed for the dependent variable of median RT. The factors were ‘observed stimuli’, which was either a pen or a finger; ‘observed movement direction’ (up versus down); ‘executed movement direction’ (lifting versus tapping) and ‘onset time of stimulus’ which was either 800, 1,600 or 2,400 ms. Whenever the assumption of sphericity was violated, degrees of freedom have been corrected by using the Greenhouse–Geisser correction.

Errors

There were three types of errors that could occur: participants could start their movement early, before the stimulus moved; participants could execute the wrong response for a given block; or Optotrak could fail to record the position due to occlusion and RT couldnot be computed. RTs smaller than 100 ms and larger than 1,000 ms were excluded from further analysis. We ensured that the errors for each participant did not exceed 10% of the total trials (following Brass et al. 2001).

Results

The reaction time data are shown in Fig. 2, which shows a statistically significant interaction between ‘observed movement direction’ and ‘executed movement direction’ (F (1,7) = 7.6, P < 0.05). Participants were faster when the required response was in the same direction as the observed stimulus movement than if it was in the opposite direction. There was no main effect of ‘observed stimuli’ nor any three way interaction (all P > 0.05). The ‘compatibility’ effect was therefore present for both the biological (finger) and non-biological (pen) stimuli. There was a main effect of ‘onset time of stimulus’ (F (1.07,7.45) = 18.2, P < 0.01) showing that RTs were slower when the stimulus onset time was short.

Fig. 2
figure 2

Reaction times from Experiment 1 for tapping or lifting finger movements when observing either tapping or lifting movements performed by the finger (biological) and pen (non-biological) stimuli

Brass et al. (2001) performed a quintile analysis (Ratcliff 1979) that separated the RT distributions for the compatible and incompatible trials into five bins. They noted that the compatibility effect was larger for trials eliciting slower reaction times. They proposed that this suggested two mechanisms, operating over two time scales. The first mechanism they suggested was simple spatial compatibility that involved fast processing and therefore had an early influence. They also proposed that a second mechanism came into play with more time, suggesting more complex, time-consuming processing. This, they suggested, was likely to be ‘IM compatibility’. We replicated this analysis and a repeated measurements ANOVA with two within-subjects factors was computed. The factors were ‘compatibility’ and ‘quintile number’. However, we were unable to replicate their finding of a significant ‘compatibility’ × ‘quintile’ interaction, F (1.19,8.33) = 1.7, P > 0.5. There was a significant main effect of ‘quintile’ F (1.10,7.68) = 120.8, P < 0.01, which simply confirmed the shape of the quintile distributions.

Discussion

In contrast to Brass et al. (2001), the results in the current study showed that there was no difference in the compatibility effect as measured by reaction times whether participants attended to a biological or a non-biological stimulus. The most parsimonious explanation to these results is therefore that they arise out of the spatial stimulus response compatibility and visual salience effects common to both stimuli types. Thus, data of the type reported by Brass et al. (2001) do not provide unambiguous support for IM conjecture (3). Furthermore, this study was unable to replicate Brass et al.’s (2001) finding that compatibility effects increased over time exclusively for responses to biological stimuli. This effect might therefore not be robust enough to support the existence of the postulated ‘second mechanism’.

Experiment 1 suggests that attempts to support IM conjecture (2) by showing reduced RTs when participants imitate an observed action are confounded by the presence of simple spatial compatibility effects. Press et al. (2005) attempted to circumvent these difficulties by avoiding spatial compatibility effects. Nonetheless, Press et al.’s design retained conceptual response compatibility that, as discussed in the Introduction, might be sufficient to explain their results. Our second experiment therefore explored whether Press et al.’s (2005) findings could be replicated with symbolic stimuli.

Methods: Experiment 2

Press et al. (2005) examined responses made by participants when they were performed orthogonal to the observed stimuli, in order to eliminate any direct spatial agreement between the stimulus and response. The starting stimuli used in their experiment consisted of pictures of a semi-open hand or a robotic hand. The imperative stimulus was either a horizontal opening movement or a closing movement made by the stimulus. Participants were asked to perform either a vertical opening or a closing movement with their hands as quickly as possible when they perceived a movement in the imperative stimulus. Press et al.’s results showed that participants were faster when they performed a congruent movement. While having the stimuli and response orthogonal to each other did rule out spatial SRC as an explanation for this result, there remains the issue of conceptual SRC, i.e. ‘opening’ versus ‘closing’.

Participants

Sixteen students at the University of Aberdeen (four males) ranging in age between 18 and 29 years (mean age 21.9 years) volunteered for this experiment (four participants’ data were removed when they did not follow task instructions). All were naïve to the purpose of the study. Fifteen participants were right-handed and all participants had normal or corrected-to-normal vision. The study was approved by a University ethics committee and was performed in accordance with the ethical standards laid down in the Declaration of Helsinki.

Apparatus

An electromagnetic kinematic recording system (‘Flock of Birds’: Ascension Mini-bird magnetic measurement system) recorded position data at a sample rate of 100 Hz by tracking a (1.1 cm × 0.8 cm × 0.8 cm) marker placed upon the participant’s left index finger nail. The measurement volume for this system was calibrated checking loci every 2 cm in a 3D grid over the reach space. Measurements were reliable and accurate within 1 mm. Data recording and stimulus presentation were synchronised using an electronic trigger. The stimuli were displayed on an Acer TravelMate 4150, 1.6 GHz, with a 15” screen running at a frame rate of 60 Hz and a 1,024 × 768 screen resolution with 32-bit colour settings.

Procedure

The experiment used two separate simple response tasks as in Experiment 1, but the stimuli and the responses were different. In one block, participants were required to close their hand when the stimulus changed (see below) and in the other block the required response was to open their hand. The response was executed orthogonally to the stimulus; the direction of the opening/closing movement was horizontal whilst the direction of the presented stimuli was vertical. Press et al. had the stimulus and response orthogonal to each other to eliminate the possibility that a direct spatial stimulus–response compatibility effect was driving the results. In both cases the starting position was the same, with participants resting their right arm on a table with their right hand semi-open in a comfortable resting posture (see Fig. 3).

Fig. 3
figure 3

The hand (biological) and dot (non-biological) stimuli used in Experiment 2. From left: the starting frame, the opened stimulus and the closed stimulus

Participants sat in front of the laptop screen at a distance of 80 cm. The stimuli (both the finger and the pen) were approximately 11.0° × 14.7° of visual angle in size for the hand in the opened condition, 10.6° × 7.1° in the closed condition; 1.1° × 14.0° for the dots in the opened condition and 1.1° × 1.8° in the closed. The overall movement of the stimulus was ∼2°. Participants were shown a two frame animation. The first frame showed the stimulus (either a hand or a pair of dots, separated by an angle of 7.1° for the hand and 6.1° for the dots) for either 800, 1,600 or 2,400 ms. The second frame showed either the hand stimulus open or closed, or the pair of dots farther apart or closer together, inducing an experienced movement of the stimulus. This frame was presented for 500 ms (see Fig. 3). This movement was the cue for participants to respond (by either closing or opening their hand, depending on the condition) as quickly as they could. In between trials a white screen was shown for either 2,100, 2,900 or 3,700 ms, depending on for how long the first stimulus was presented, in order to maintain a constant trial length of 5 s.

Design

Participants performed two separate sessions, one with the hand stimulus and one with the dots. Each session consisted of two blocks of 30 trials, with one block requiring participants to open their hand in response to the movement, and the other block requiring them to close their hand. Session and block were both counterbalanced: half the participants saw the hand stimulus in the first session and the dot stimuli in the second and the other half had the opposite order.Footnote 1 Within each session group, half the participants performed the block where the response was to open their hand first, and the block where the response was to close their hand second, and half did the blocks in the reverse order. In between each block participants had a brief break and the two different sessions were separated by an average of two days.

Data analysis

Reaction time was computed as described in Experiment 1. A repeated measurement ANOVA with two within-subjects factors was computed for the dependent variable of median RT. The factors were ‘observed stimuli’ (dots or hand) and ‘executed movement direction’ (compatible and incompatible). Whenever the assumption of sphericity was violated, degrees of freedom have been corrected by using the Greenhouse–Geisser correction.

Errors

No participant exceeded the 10% error-threshold, mean error rate was 2.60% (std. 1.42%) but as already highlighted, four participants were replaced because they did not follow the task instructions. RTs smaller than 100 ms and larger than 1,000 ms were excluded from further analysis.

Results

The median reaction time data are shown in Fig. 4. There was a statistically significant main effect of ‘compatibility’ (F (1,15) = 5.78, P < 0.05) with reaction times being faster when the performed movement was compatible with the observed movement. There was neither an effect of ‘stimuli’ (F (1,15) = 0.20, P = 0.66) nor any interaction (F (1,15) = 0.376, P = 0.549) between the variables, indicating that participants were not faster at responding to the biological stimulus (the hand), and that the compatibility effect was present in both stimuli conditions.

Fig. 4
figure 4

Reaction times from Experiment 2 for opening or closing finger movements when observing either opening or closing movements performed by the hand (biological) and dot (non-biological) stimuli

Again a quintile analysis on the reaction time distributions for the congruent and incongruent data was performed. As in Experiment 1 no interaction effect was found. The three-way interaction ‘stimuli’ × ‘compatibility’ × ‘quintile number’ was not significant, F (1.49,22.33) = 0.73, P < 0.45, showing that the trend of greater compatibility effects over time was present for both stimuli conditions. The only reliable effect occurring for ‘quintile number’, F (1.26,18.91) = 249.74, P < 0.01 confirming the nature of the distribution analysis.

Discussion

The current study replicated the Press et al. (2005) finding, but also established that these effects can be obtained using symbolic, non-biological stimuli. This finding is consistent with a body of literature showing that responses are faster when conceptually compatible with the imperative stimuli (Shaffer 1965; DeJong 1995; Stoffels 1996). This replication with symbolic stimuli suggests that Press et al.’s finding that participants are faster when conceptually imitating an action (in contrast to producing a conceptually different response) does not provide support for IM conjecture (2), which predicts the effects will be restricted to observing biological stimuli.

We again replicated the quintile analysis, and a repeated measurement ANOVA with two within-subjects factors was computed. The factors were ‘compatibility’ and ‘quintile number’. As in Experiment 1, no statistically reliable effect was found for the ‘quintile’ × ‘compatibility’ interaction. The general trend reported by Press et al. (2005) and Brass et al. (2001) was observed but this was true for both the human and symbolic stimuli. These findings again suggest that the quintile effects are not particularly robust and are therefore not useful evidence for or against the IM conjecture.

The results of Experiments 1 and 2 suggest that the compatibility effects taken as evidence for the IM conjecture are actually best explained in terms of stimulus–response compatibility characteristics shared by the biological and control ‘symbolic’ stimuli. Indeed, these results suggest that it might be impossible for RT experiments to establish interference or facilitation effects whilst controlling for differences in visual salience or stimulus response compatibility effects. It appears that the paradigms typified by the studies of Brass et al. (2001) and Press et al. (2005) might not be able to test or support the IM conjecture.

In an attempt to avoid these difficulties, Kilner et al. (2003) adopted a different methodology, using congruent and incongruent tracking. Experiment 3 explored whether such differences could be found using symbolic stimuli (a finding that surprisingly was not obtained by Kilner et al.).

Experiment 3

Kilner et al. (2003) attempted to demonstrate the influence of observed actions by asking participants to make arm movements as they attended to another person’s arm movements. In their study, participants either observed a blindfolded human or a robotic arm performing horizontal and vertical movements. Participants were placed in front of the stimulus and were instructed to perform arm movements in time with the stimulus. The participants’ movements could be either congruent (e.g. a horizontal movement when observing a horizontal movement) or incongruent (e.g. a horizontal movement when observing a vertical movement). Kilner et al. measured the variability of movement in the axis orthogonal to the main direction of motion, to look for any influence of the observed movement. The results showed that participants showed higher variability when observing a human performing incongruent movements but, surprisingly, not when observing the robotic arm. Kilner et al. interpreted as evidence that only the human (biological) movement elicited a movement representation that needed to be inhibited.

In the current experiment, the findings of Kilner et al. (2003) were explored. We implemented two conditions, consisting of a moving ‘dot’ instead of an arm. The two conditions were both thus non-biological stimuli but the movements were made by either: (a) generating a pure sinusoidal wave or (b) capturing the human kinematics generated by a human attempting to produce sinusoidal motion (but the displayed signal was then restricted to one dimension despite the human clearly deviating from a straight line path across the two orthogonal dimensions). This created two different signal types, one biologically produced motion and the other not.

Participants

Eight students at the University of Aberdeen (one male) ranging in age between 20 and 28 years (mean age 23.3 years) volunteered for this experiment. All were naïve to the purpose of the study. All participants were right-handed and participants had normal or corrected-to-normal vision. The study was approved by a University ethics committee and was performed in accordance with the ethical standards laid down in the Declaration of Helsinki.

Apparatus

The same motion recording techniques were used as in Experiment 2 to obtain movement trajectories from the participants. The stimuli were projected onto a screen with a Dell 3100 MP projector via a Toshiba Tecra A3, 1.7 GHz. The human kinematic data that were used as a stimulus in the kinematic condition were recorded by an Optotrak 3020 system (Northern Digital, Waterloo, Ontario, Canada). The settings of the Optotrak system used in the current experiment were the same as in Experiment 1.

Procedure

Participants were placed 70 cm in front of a screen on which the stimuli were projected. They were asked to move their right arm, fully extended, either horizontally or vertically, throughout the whole trial, as they observed a horizontally or vertically moving dot. Their task was to track the dot as precise as possible with their index finger. In trials where the participants were moving their arm incongruently to the stimulus (e.g. horizontally as the stimulus was moving vertically) their task was to change direction, in the corresponding dimension, as the stimulus changed its direction, in the orthogonal dimension. Before each trial participants were told what movement they were supposed to perform. The stimuli consisted of a circular dot (2° in size) making vertical/horizontal sinusoidal movements either driven by an algorithm (‘artificial’ condition) or by human kinematics data (‘kinematic’ condition). In both conditions neither stimuli contained any movement in the orthogonal dimension.

The movements of the dot in the artificial condition were driven by a sine function with amplitude of 30 cm and frequency of 0.6 Hz. In the kinematic condition, the movements of the dot were driven by human kinematic data recorded separately where an experimenter performed sinusoidal horizontal and vertical movements whilst blindfolded in accordance with Kilner et al. (2003) although the resultant stimulus movement was constrained to one dimension. Two movements of each movement direction were recorded and their amplitude was set to correspond to the amplitude in the artificial condition. The frequency of the kinematics-driven sinusoidal movement was also 0.6 Hz. This condition therefore consisted of biologically driven movements presented by a non-biological stimulus.

Design

For each of the two conditions, a vertical/horizontal movement was presented four times, including congruent and incongruent movements, and so participants performed two compatible and two incompatible movements for each stimuli. There were a total number of 16 trials which were performed in a randomized order and each trial consisted of 10 cycles.

Data analysis

Kilner et al.’s (2003) analysis was repeated where the mean movement variance in the orthogonal dimension was the dependent factor. Labview (Version 8) software routines were used to analyze the data. The data were filtered using a dual-pass Butterworth second-order filter with a cut-off frequency of 16 Hz (equivalent to a fourth-order zero phase lag filter of 10 Hz). Data for the performed horizontal and vertical movements were segmented into individual up–down and right–left movements. The time points when the movement started and ended were identified as points when the movement speed crossed a threshold of 5 cm/s (each identified point was double checked by eye). The start point of the movement was defined as when the maximum or minimum of the position time series clearly coincided with the minimum of the resultant velocity–time graph. This point defined the start of the nth movement but also the end of the (n-1)th movement. The mean variance was then computed for the movement dimension orthogonal to the executed movement; for a vertical movement the variance in the horizontal dimension was calculated and vice versa. Variance results were averaged across direction for each participant for each condition. This constituted the dependent variable.

A repeated measurement ANOVA with two within-subjects factors was computed for the dependent variable. The factors were ‘condition’ (artificial and kinematics), ‘compatibility’ (compatible versus incompatible movements) and ‘movement direction’ (horizontal and vertical performed actions).

Results

The reaction time data are shown in Fig. 5. The only statistically reliable effect was a main effect of ‘compatibility’ (F (1,7) = 21.5, P < 0.01) which showed that the variance in the orthogonal dimension was greater when an incompatible movement was executed than if the equivalent action was compatible. The absence of any interaction effects (all P > 0.17) makes it clear that the compatibility effect was present irrespective of the stimuli observed.

Fig. 5
figure 5

Data from Experiment 3: Variance of movement in the orthogonal dimension for compatible and incompatible movements in the artificial and kinematics condition

Discussion

Experiment 3 failed to replicate the finding reported by Kilner et al. (2003) in which only tracking a biological stimulus produced a difference between congruent and incongruent tracking. These current results make sense from an SRC point of view: tracking an incongruent target should produce a decrement in performance relative to congruent tracking. The support offered by Kilner et al. (2003) for IM conjecture (2) rested upon the (surprising) fact that observing human movement produced differences between congruent and incongruent tracking whereas tracking non-human movement did not produce such effects. The data from the current experiment, however, show that non-human movement can produce differences between congruent and incongruent tracking. In other words the effect is not specific to biological stimuli. The surprising results of Kilner et al. are therefore not strong support for IM conjecture (2) which is hypothesised to be restricted to viewing biological stimuli.

In actual fact, there is an a priori problem with the general approach adopted by Kilner et al. (2003). The problem is that a computer programme can produce a pure, highly predictable signal with movement confined to one dimension. In contrast, human movement is characterised by spatial errors in the orthogonal two planes and more general effects of ‘noise’. In this context, noise means unpredictability in timing and amplitude. Notably, Kilner et al. reported that there were substantial differences in the temporal characteristics of their robotic and human movement. Thus, a comparison was made between tracking two very different signals. In this light, it can be seen that differences in performance are uninformative. Our prediction was that decreasing the predictability with our ‘kinematic’ signal would produce decrements in performance relative to the pure sinusoidal waveform. It can be seen that differences between computer and human driven stimuli are predicted unless the predictability and spatial variability are taken into account. Nevertheless, differences between congruent and incongruent tracking are expected regardless of signal quality and this is what the current study found using a symbolic stimulus.

General discussion

This article has examined previously applied methods that have been used in an attempt to provide support for IM conjecture component (2)—whether observing somebody performing an action automatically induces the observers to perform the same action themselves. The results from Experiments 1 and 2 showed that there were no differences in the way that participants reacted to abstract stimuli when compared to biological stimuli in an RT paradigm, with both stimuli showing compatibility effects. Experiment 3 showed that incongruous tracking induced significantly greater variance in performance than congruent tracking, even with an abstract stimulus. These results are parsimoniously explained by simple stimulus response characteristics, specifically spatial and conceptual compatibility effects and by the fact that incongruous tracking implies higher feed forward and feedback demands. These results suggest that adopting an RT paradigm or in any other way implementing a ‘compatibility paradigm’ in an attempt to confirm or reject IM conjecture (3) cannot provide unambiguous evidence because spatial and conceptual compatibility effects are inherent (and thus confounding variables) in such designs.

Thus, to date there is a lack of unambiguous behavioural evidence for the claim that observing somebody executing an action automatically induces the same action to be performed in the observer. It is important, though, to separate this notion from the assumption that observed actions bias the observer towards selecting the same action. In a sense these concepts are two different perspectives of IM conjecture component (2); a ‘strong’ and a ‘weak’ view. The ‘strong’ view proposes that observing an action leaves us with no option; we automatically attempt to execute the observed action and must actively inhibit it. This version is implicit in the designs of Brass, Press, Kilner and others—the predicted effects on RT arise from the assumption that the action activated in the observer is being inhibited, which is only required if the activation is above threshold (i.e. able to actually cause a movement if left to run its course). The current study suggests, however, that these designs cannot find unambiguous evidence for the strong view—perhaps taking the weaker view would avoid the difficulties. This is consistent with recent neuro-imaging work that found no inhibitory activity when performing an incongruous action (Williams et al. 2007).

First, the ‘weak’ view would instead suggest that observing somebody perform a certain action will induce a sub-threshold level of activity which biases the observer towards selecting the same action. Because the activation is sub-threshold the action is primed, but does not need to be inhibited. The first consequence of adopting the weaker view is therefore that IM conjecture (3) does not necessarily follow even if empirical support were found for conjecture (2). But second, the weak version of conjecture (2) simply suggests that action activation is biased towards the action just observed with additional activation being required before the action is actually executed. There are numerous biases affecting how we select a specific action from the large number of possible actions. The perceptual appearance of objects (e.g. Gibson 1977) can bias us to interact with our environment in certain ways, as the attributes of objects give us information about how to interact with them (for example, apples afford grasping in the first instance). Recent motor history is also a bias when we are performing actions (Cohen and Rosenbaum 2004)—it is easier to use the same—or a similar—movement as used previously, because the previous movement constitutes a solution (‘inverse model’; Wolpert et al. 2001) that can be re-selected with minimal effort. It is thus not unreasonable to assume that observing others performing actions provides yet another bias that can help the human actor select the appropriate action in the appropriate situation.

Nonetheless, despite the attractive nature of this form of the conjecture, the primary difficulty raised by the current data remains—many of the current methodologies that could be used to explore the potential bias of action observation contain unavoidable stimulus–response confounds. These confounds mean that all of the key effects can be produced using non-biological stimuli, undermining the claim of the IM conjecture that these effects are specific to viewing human movement. The current results suggest that some of the experimental paradigms currently in use do not provide unambiguous behavioural evidence for, or against, IM conjecture (2). We suggest that what is needed is a new paradigm. One possible design that might be able to test the conjecture is to monitor behaviour in a large population and examine the behaviour for evidence of bias in selecting an observed (but task irrelevant) action (e.g. head scratching) whilst completing another (primary) task. Examining mimicry (see Wilson and Knoblich 2005 for a brief review) in this way might allow for conclusions to be drawn concerning the strength of IM conjecture (2) as an action selecting bias. Until such evidence is obtained, however, the key effects used thus far to support the conjecture are best explained in stimulus–response terms, and there is no need to invoke any additional psychological mechanism.