Introduction

The ideomotor theory of voluntary action is an approach with a long history (James, 1890), but it has regained considerable interest in cognitive psychology in recent decades (Stock & Stock, 2004; Shin, Proctor & Capaldi, 2010; Badets, Koch & Philipp, 2014). According to the ideomotor principle, an action is selected and initiated by anticipating the perceptual effects it is expected and intended to have on the environment. In other words, the selection of a response to a given stimulus is driven by the activation of an anticipatory image of that response’s sensory effects; this endogenous activation of intended effect representations primes and eventually launches the actions, so that the representation of action effects can be considered to act as a mental cue for response selection (see Waszak, Cardoso-Leite & Hughes, 2012, for a review).

One implication of ideomotor theory is that the perception of a learned action effect would be expected to trigger the response it was previously associated with (Hommel, 1996; Elsner & Hommel, 2001). To explain how a stimulus and an effect can independently trigger a given response, Hommel et al. (2001) presented an updated, extended version of ideomotor theory: the Theory of Event Coding (TEC). One of TEC’s main assumptions is that elements representing the stimulus context, the selected action, and the effects of that action are stored in a common and abstract code, the “event-file”. According to this view, an event-file represents parts of people’s sensorimotor experience with their environment (i.e., a sensorimotor episodic memory trace) by integrating distributed feature codes referring to the perceived stimulus, any corresponding action, as well as the effects resulting from it.

As Greenwald (1970b) suggested, an experimental way of testing the ideomotor principle is to present as stimuli the effects from previously learned response–effect (R-E) associations, and to test which responses such effects might prime (see also Hommel, 1998). If R-E associations are as bidirectional as the theory holds them to be, presenting one particular effect in a choice reaction task should prime the previously associated action (see Greenwald, 1970a). This experimental setup has been used in previous studies by Hommel (1996; Elsner & Hommel, 2001), in which it has been demonstrated that presenting a previously learned action effect produced more accurate and faster responses when the selected response was the one associated with the effect. Those results were interpreted as supporting the assumption of automatic integration of action–effects into a unified, durable memory trace.

According to Kunde (2001) and Koch & Kunde (2002), to support the idea that action–effects are indeed anticipated through an endogenous activation, it would be essential to show that the effects are anticipated even if not presented before the response selection. Kunde’s protocol was based on the assumption that the relationship between responses and their effects could either be compatible or incompatible, in the same way as can be the case for stimuli and responses. If action–effects were truly anticipated before or during response selection, they should interact with the forthcoming response if effect and response would overlap in feature dimensions and process compatible or incompatible features, such as being located in corresponding or non-corresponding locations. In fact, Kunde (2001) found that responses were faster with spatial correspondence than with spatial non-correspondence between responses and expected effects. This supports the assumption that, during action selection, people do not only consider the features of the to-be-selected action but also the features of other action-produced events, and it provides evidence that the respective feature codes are involved in action selection—just as ideomotor theory suggests.

The major aim of the present study was to integrate the rationales underlying the studies of Elsner and Hommel (2001) on the one hand and of Kunde (2001) on the other. The advantage of the former is that it provides direct evidence for the assumption that the experience of R-E contingencies leads to the establishment of durable, integrated event files, but it did not assess whether people are actually using these event files in intentional action selection. The advantage of the latter is that it does provide evidence for the actual usage, that is, the intentional endogenous activation of action–effect representations during action selection, but it does not assess whether the underlying representations are indeed integrated into durable memory traces.

To test whether this is the case, we combined Kunde’s (2001) R-E compatibility design with a following test task that used the same rationale as Elsner and Hommel (2001) but by applying a stimulus–response (S-R) design developed by Tucker and Ellis (2001). Kunde’s (2001) R-E compatibility design served as acquisition phase, in which three groups of participants were exposed to different R-E conditions. All participants were to categorize two shapes (a square and a circle) by carrying out a “precision grasp” (i.e., they pressed a switch between thumb and index finger) or a “power grasp” (i.e., closing the whole hand; see Tucker & Ellis, 2001). In a control group, these actions have no further consequences. In a R-E compatible group, however, each action triggered the appearance of the picture of an object the size of which was compatible with the respective action (e.g., a cherry produced by a precision grasp or a cucumber produced by a power grasp). In a third, the R-E incompatible group, each action triggered the appearance of an incompatible object (e.g., a cherry produced by a power grasp or a cucumber produced by a precision grasp). We expected that performance would be worse in the R-E incompatible than in the R-E compatible group, as in Kunde’s (2001) original study, while the control group served as reference to see whether this compatibility effect would reflect facilitation through compatibility, interference through incompatibility, or both.

To assess whether the experience of R-E contingencies would indeed lead to the establishment of durable event files (including bidirectional associations between the representations of actions with representations of their effects), we had participants perform in a test phase that immediately followed the acquisition phase. According to Kunde (2001) and ideomotor theorizing in general, one would expect that being exposed to contingencies between actions and effects would lead to the integration of representations of both, irrespective of the compatibility between the two. That implies that event files comprising of bidirectional action-effect associations should be created in both the R-E compatible group and the R-E incompatible group. To assess whether this would indeed be the case, we presented participants with a S-R compatibility task. The pictures that served as action effects in the acquisition phase now served as stimuli, just like in the Elsner and Hommel (2001) study. Participants were to categorize these pictures (artificial vs. natural) by carrying out a power grasp to all objects from one category and a precision grasp in response to all objects from the other. Given that small and large objects were equally distributed over the two categories, this rendered some S-R relationships compatible (e.g., carrying out a power grasp in response to a hairbrush or a precision grasp in response to a cherry) and others incompatible (e.g., carrying out a power grasp in response to a key or a precision grasp in response to a cucumber).

Considering numerous studies documenting the advantage of object-grasp compatible S-R links over incompatible ones (e.g., Tucker & Ellis, 2001, for a similar protocol), we expected better performance for trials in which the grasp used as a response to categorize the object’s picture was compatible with the object’s size than for trials where this relationship was incompatible. More importantly, however, we predicted that the size of this compatibility effect should be moderated by the R-E relationship in the previous acquisition phase. Consider the situation in which this relationship was compatible. If, according to ideomotor theorizing, the repeated experience of the response-effect relationship has established a corresponding event file that includes a bidirectional association between response and effect representation, the representation of the power grasp would have become associated with the representation of a cucumber, say. If then the participant would respond to a cucumber by carrying out a power grasp, performance should benefit from the established association. Not so if the response-effect relationship was incompatible (e.g., if the cucumber was following a precision grasp in the acquisition phase): now the picture of the cucumber would activate the incorrect response and slow down reaction time or lead to an error. Based on this reasoning, we predicted that the S-R compatibility effect in the test phase would be more pronounced after R-E compatible training then after R-E incompatible training, with the control condition (i.e., no effect condition) falling somewhere in between. Such an outcome would indicate that the practice phase has indeed established durable integrations of responses and effects, to the integration of event files that are stable enough to outlive the task they were created by.

Methods

Participants

A total of 126 right-handed participants were recruited for this experiment (105 women, mean age = 20.9 years, SD = 3.9, age range: 18–47). All of them were students at the Paul Valery University (Montpellier, France), had normal motor function in their right hand, and normal or corrected-to-normal vision.

Materials

The apparatus and materials used in this experiment were similar to those used by Tucker and Ellis (2001) and by Derbishyre et al. (2006). The experiment was monitored by E-Prime 2 software (Schneider, Eschman, & Zuccolotto, 2002). The display and timing were controlled by a Fujitsu microcomputer (ESPRIMO Mobile V6535; Fujitsu Technology Solutions) connected to a video projector (Epson EB-U04) for vertical projection. The visual stimuli were projected onto a white and opaque table. Participants sat at the end of the table with their right hand resting in front of their body midline, under the table. The stimulus set was composed of 40 colored pictures, half of them representing natural objects and the other half manufactured ones. The objects were presented at a 1:1 scale and were oriented for a right-hand grasp. Within each category, half the objects were either optimally compatible with a precision grasp (i.e., between the thumb and the index) or with a power grasp (i.e., whole hand). Two geometric shapes were also used as stimuli, a 7-cm square and a circle with a radius of 3.66 cm, both presented in grey color with black edges. Participant responses were recorded on a specially designed hand-held device, which they held in their right hand. The response device was similar to the one used by Tucker and Ellis (2001), (see also Derbishyre et al., 2006): it had two switches, one made for precision grasp and the other one made for a power grasp.

Procedure

After filling in a written consent form, each participant performed the experiment individually during a session that lasted approximately 20 min in total. The experiment consisted of two phases.

During the first phase, referred to as the acquisition phase, participants were exposed to different kinds of S-R-E associations, depending of their R-E compatibility condition (3 groups; see Fig. 1). Each acquisition trial started with a 1000-ms presentation of a black fixation cross on a white background. Then, either a square or a circle was presented as a target stimulus and the participants had to categorize it using the response device. In each group, half of the participants were instructed to respond with a precision grasp when seeing a square and with a precision grasp when seeing a circle, while the other half were given the opposite instructions. There was no time pressure for participants to give their answer. For the first two groups of participants, each response triggered an effect (i.e., a picture of an object presented for 1000 ms). Since the effect was irrelevant to complete the categorization task, participants were only told to look at it. For the first group (R-E compatible condition, n = 52), a power grasp response triggered the presentation of a large object’s picture (e.g., a cucumber), while a precision grasp response triggered a small object’s picture (e.g., a cherry). In this case, the grasp used as a response was always compatible with the effect. For the second group (R-E incompatible condition, n = 52), the opposite mapping was used, so that the grasp used would always be incompatible with the effect. The last group (no effect condition, n = 22) had the exact same instructions except that a 1000-ms blank screen replaced the object’s picture. The participants in each group worked through 36 trials, which were divided in two blocks (i.e., block 1 = 18 first trials; block 2 = 18 last trials).

Fig. 1
figure 1

Schematic illustration of the display and the timing of events in the acquisition and the test phase. One group of participants was assigned to each R-E compatibility condition, respectively, with R-E compatible, R-E incompatible or no effect. All participants went through the same test phase. ITI Intertrial interval

After completing the acquisition phase trials, participants had a rest of 1 min after which they were instructed for the second phase (i.e., the test phase; see Fig. 1) during 2 min. This phase was similar to the one from Tucker and Ellis (2001, experiment 1). Each trial started with the presentation of a 1000-ms black fixation cross on a white background. Then, one of the effect pictures from the acquisition phase was presented as a target stimulus, and participants were required to respond as fast as possible according to a fixed S-R mapping: half of the participants were told to respond by using a precision grasp when seeing a natural object and by using a power grasp when seeing an artificial object. The other half was given the opposite instructions. This way, regardless of the object’s category, half of the responses were compatible with the object’s optimal grasp while the other half were incompatible. The 36 experimental trials were preceded by four practice trials. Each trial ended with a 1000-ms intertrial interval.

Results

Acquisition phase

The mean correct response latencies and the mean error rates were calculated across participants and for each experimental condition. Latencies below and above two standard deviations were removed (this cutoff led to the exclusion of less than 5% of the data). The three groups of participants representing the three types of R-E mapping were considered as a single factor, called R-E compatibility. We performed an analysis of variance (ANOVA) on error rates and latencies, with subjects as random variable, R-E compatibility (R-E compatible, R-E incompatible, and no effect conditions) as a between-subjects factor, and blocks (block 1 and block 2) as a within-subjects factor.

Regarding the error rates, the main effect of the block was the only one significant, F(1,123) = 15.1, p < .0005, η2 p = .11. More errors were generally made during the first half of the acquisition phase than during the second half (see Table 1). For the latencies, we expected a significant R-E compatibility × Block interaction, which is what we found, F(2,123) = 3.73, p < .05, η2 p = .06. In both the R-E compatible and the no effect conditions, the responses were significantly faster during the second block than during the first one, respectively, t(51) = 4.47, p < .0001, d = .62 and t(21) = 4.84, p < .0001, d = 1.03 (i.e., Cohen’s d = mean difference/standard deviation) (see Fig. 2). It is noteworthy that, in contrast, in the R-E incompatible condition, no significant difference was found between the two blocks, t < 1.2. Also, while during the first block only the difference between the R-E incompatible and the no effect condition was found significant, t(72) = 2.39, p < .02, d = .61, all the contrasts reached significance in the second block: between the R-E compatible and the R-E incompatible conditions, t(102) = 2.29, p < .03, d = .62; between the R-E compatible and the no effect conditions, t(72) = 2.17, p < .04, d = .55; between the R-E incompatible and the no effect conditions, t(72) = 3.6, p < .001, d = .91. Additionally, both the R-E compatibility and the block main effects were significant, respectively, F(2,123) = 5.19, p < .01, η2 p = .08, and F(1, 123) = 26.62, p < .0001, η2 p = .18. The responses were in fact significantly faster when no effect was presented during the acquisition phase than in both the R-E compatible and the R-E incompatible conditions, respectively, t(72) = 2.04, p < .05, d = .52, and t(72) = 3.04, p < .005, d = .77. The difference between the R-E compatible and the R-E incompatible conditions did not reach significance (t < 1.7). Moreover, the RTs were generally slower during the first half of the acquisition phase (block 1) than during the second half (block 2), t(125) = 5.05, p < .0001, d = .45.

Table 1 ᅟ
Fig. 2
figure 2

Acquisition phase: mean reaction times (RT) of the acquisition phase for the two blocks in each of the three R-E compatibility conditions (R-E compatible, R-E incompatible and no effect). Test phase: mean reaction times of the test phase for the two S-R compatibility conditions as a function of the R-E compatibility during the previous acquisition phase. *p < .05

Test phase

Mean correct response latencies and mean error rates were calculated across participants for each experimental condition. Latencies below and above two standard deviations were removed (<5% of the data). Type of response (i.e., grasp type: power and precision grasps) and object size (i.e., small and large objects) were integrated into a single factor (S-R compatibility) with two levels: compatibility (power grasps to large, and precision grasps to small objects) and incompatibility (power grasps to small, and precision grasps to large objects). We ran ANOVAs on error rates and latencies, with subjects as random variable, S-R compatibility as within-subjects factors and R-E compatibility as between-subject factor.

Regarding the error rates, neither the main effect of block nor the Block × R-E compatibility interaction reached significance, presumably because participants performed the task rather accurately (overall error rate was 1.7%). Regarding the latencies, we found, as expected, a significant R-E compatibility × S-R compatibility interaction, F(2,123) = 7.1, p < .005, η2 p = .10. Participants were faster to categorize the object’s pictures when the grasp used to respond was compatible with the object size, but only when previously exposed to compatible R-E or no effect during the acquisition phase, respectively, t(51) = 6.09, p < .0001, d = .84 and t(21) = 3.11, p < .01, d = .66. However, this pattern of results completely disappeared for the group previously exposed to incompatible R-E, t(51) = 1.3, p = .2. Also, the main effect of S-R compatibility was significant, F(1,123) = 38.4, p < .0005, η2 p = .24, showing an overall advantage of S-R compatibility over incompatibility. Finally, we ran an ANOVA on RT compatibility effect sizes (S-R incompatible minus S-R compatible) as a function of the R-E condition in the acquisition phase (see Fig. 3). The R-E compatibility effect was significant, F(2,123) = 7.4, p < .001, η2 p = .11, and follow-up contrasts showed that this was due to the significant difference between the R-E compatible and the R-E incompatible conditions, t(102) = 3.77, p < .005, d = .74; neither the difference between R-E incompatible and no effect conditions, t(72) = 1.78, p < .08, d = .45, nor that between R-E compatible and no effect conditions, t < 1.3, reached significance.

Fig. 3
figure 3

Average of the mean compatibility effect size in the test phase (S-R incompatible minus S-R compatible) across all participants for the three previous acquisition conditions (R-E compatible, R-E incompatible and no effect). * p < .05

Discussion

The major aim of the present study was to combine the experimental design used by Kunde (2001), which has the advantage of directly tapping into the spontaneous use of action–effect representations in action selection, and the design employed by Elsner and Hommel (2001), which has the advantage of demonstrating some degree of durability of the resulting R-E representations. Our hybrid design successfully replicated the basic finding reported by Kunde, by showing that performance was worse in the group that had incompatible R-E relationships. This suggests that people do anticipate novel, arbitrary effects that they experienced their actions to produce when selecting these actions. The features these anticipations referred to were systematically different from the features of the actual movements in the incompatible group, which impaired their performance. Note that the corresponding features were related to location in the study of Kunde (2001), but related to object size in the present study, which means that our successful replication demonstrates the generality of the Kunde effect.

Also of interest, we found that the control group without any action effects produced the best performance, that is, better performance than the compatible group. This suggests that processing and/or acquiring novel action effects is effortful and slows down performance in general, in addition to possible compatibility effects. This fits with considerations of Band, van Steenbergen, Ridderinkhof, Falkenstein, and Hommel (2009), who suggested that people entertain active anticipations of action-produced effects that they match against the effects that are actually produced—a scenario that is also consistent with comparator models of action control (Frith, Blakemore & Wolpert, 2000; cf., Verschoor & Hommel, 2017). While the details of this matching process are not yet well understood, it makes sense to assume that preparing and carrying it out takes time and cognitive resources. If these processes were not necessary or less demanding in the control group, this would explain the better performance.

In the test phase, compatible S-R relationships produced better performance than incompatible relations, which replicates a number of previous findings (e.g., Tucker & Ellis, 2001). More importantly for our purposes, however, the size of this compatibility effect varied with the R-E relationship in the acquisition phase. In particular, the effect was significant only if this relationship was compatible or neutral, but not if it was incompatible. This finding is consistent with the idea that the anticipatory images of the previously learned action–effects were integrated into memory traces that must have two characteristics. Firstly, they must have been stable enough to outlive the task carried out in the acquisition phase, suggesting that they are relatively enduring. This fits with the observations of Elsner and Hommel (2001), who also obtained evidence for response-effect associations that were stable enough to transfer to another, unrelated task. Secondly, these memory traces must include the bidirectional association of actions and effects. In the acquisition phase, effects were always following the actions, while in the test phase the previously acquired effects were presented before actions were carried out. If the associations thus acquired in the acquisition phase were unidirectional, one would not have expected effects to activate actions, so that the previous R-E relationship should not have affected the size of the S-R compatibility effect. Because the same set of object’s pictures were used during the acquisition and test phases, one question that remains open is whether the associations between actions and effects were specific, i.e., between the grasp performed and a specific object, or more generic ones, i.e., between a type of grasp and whole classes of objects (those affording either a power or a precision grasp). This question cannot be addressed here and would need to be specifically tested. However, the number of different objects used (n = 36), and the fact they were seen only once during the acquisition phase as task-irrelevant action–effects, decrease the chance of acquiring a high number of specific associations over a limited time, as required in case of specific associations (see also Hommel et al., 2001).

In conclusion, our findings support the ideomotor assumption that action–effects play a critical role in action selection, presumably by providing mental retrieval cues for action representations (Hommel, 2009). Considering the demonstration that the size and presence of the object-grasp compatibility effect depends on previous action-effect learning, our findings also suggest that previously reported S-R compatibility effects of that sort actually originate from ideomotor learning. In a wider sense, these observations may be seen as pointing to the importance of acquiring sensorimotor contingencies for the active guidance and adaptive control of ongoing movements (O’Regan & Noé, 2001; Buhrmann et al., 2013).