Temporal binding past the Libet clock: testing design factors for an auditory timer

Muth, Felicitas V.; Wirth, Robert; Kunde, Wilfried

doi:10.3758/s13428-020-01474-5

Temporal binding past the Libet clock: testing design factors for an auditory timer

Open access
Published: 15 October 2020

Volume 53, pages 1322–1341, (2021)
Cite this article

Download PDF

You have full access to this open access article

Behavior Research Methods Aims and scope Submit manuscript

Temporal binding past the Libet clock: testing design factors for an auditory timer

Download PDF

Felicitas V. Muth¹,
Robert Wirth¹ &
Wilfried Kunde¹

3237 Accesses
10 Citations
9 Altmetric
1 Mention
Explore all metrics

Abstract

Voluntary actions and causally linked sensory stimuli are perceived to be shifted towards each other in time. This so-called temporal binding is commonly assessed in paradigms using the Libet Clock. In such experiments, participants have to estimate the timing of actions performed or ensuing sensory stimuli (usually tones) by means of a rotating clock hand presented on a screen. The aforementioned task setup is however ill-suited for many conceivable setups, especially when they involve visual effects. To address this shortcoming, the line of research presented here establishes an alternative measure for temporal binding by using a sequence of timed sounds. This method uses an auditory timer, a sequence of letters presented during task execution, which serve as anchors for temporal judgments. In four experiments, we manipulated four design factors of this auditory timer, namely interval length, interval filling, sequence predictability, and sequence length, to determine the most effective and economic method for measuring temporal binding with an auditory timer.

Modality differences in timing and the filled-duration illusion: Testing the pacemaker rate explanation

Article Open access 19 December 2018

The influence of auditory rhythms on the speed of inferred motion

Article 25 August 2021

Moving with Beats and Loops: The Structure of Auditory Events and Sensorimotor Timing

Introduction

Opening an app on an outdated smartphone typically comes with a slight and sometimes barely noticeable time interval between tapping the screen and opening of the app. However, the perceived time interval between tap and the presentation of the app’s content is shortened. More precisely, when the tap opens the app, the tap is judged to occur later, and the app is judged to flash earlier, as compared to situations where there is only a tap or only a flashing of an app. This so-called temporal binding phenomenon (also referred to as intentional binding) is widely employed in research on voluntary actions and their subsequent effects (Haggard, Clark, & Kalogeras, 2002; Moore & Obhi, 2012). It describes the finding that an action and a causally linked sensory event are perceptually shifted towards each other in time, as compared to either of the events happening in isolation. That is, if you tapped on the icon on your smartphone but the app did not open, you would have a more accurate temporal estimate of your action than if the app actually opened (though prediction of what will happen induces a small shift in perceived action time as well, Moore & Haggard, 2008). Likewise, if you watched your screen and an app opened without your involvement, you would have a more accurate estimate of the time the app opened than if you actively pressed an icon to open the app.

Due to the lack of explicit awareness of such perceptual shifts, temporal binding is an implicit measure for the sense of agency, i.e., the conception of the self as being responsible for our actions, and through these, changes in the environment (Haggard & Tsakiris, 2009; Moore, 2016; Tsakiris & Haggard, 2003). This sense of agency is informed by predictive and retrospective processes that reflect peoples’ feelings of agency and peoples’ judgments of agency, respectively (Sidarus, Vuorre, & Haggard, 2017; Synofzik, Vosgerau, & Voss, 2013). Temporal binding, which is sensitive to intentions but does not require explicit reflections regarding agency, is supposed to reflect predictive processes based on the agent’s internal sensorimotor models (Synofzik, Vosgerau, & Newen, 2008). On the contrary, Hughes, Desantis, and Waszak (2013) argue that temporal binding is rather driven by temporal expectancy and not intentional causation.

Beyond the fact that temporal binding is sensitive to intentions and is thus often referred to as intentional binding (e.g., Haggard & Tsakiris, 2009; Moore & Obhi, 2012), it has been shown that temporal binding is also informed by causality, which is why intentions are not a prerequisite for it to arise (Buehner, 2012; Suzuki, Lush, Seth, & Roseboom, 2019). It is a widely employed measure for time estimations in both healthy participants and clinical populations such as patients with schizophrenia or Parkinson’s disease (Buehner & Humphreys, 2009; Haggard, Martin, Taylor-Clarke, Jeannerod, & Franck, 2003; Kirsch, Kunde, & Herbort, 2019; Moore et al., 2010). Despite the common use of temporal binding as a measure, as of yet there are not many ways of studying it. Temporal binding is commonly assessed with two paradigms: interval estimation and the Libet Clock (Engbert, Wohlschläger, Thomas, & Haggard, 2007; Tanaka, Matsumoto, Hayashi, Takagi, & Kawabata, 2019). They are both based on the phenomenon that the perceived interval between voluntary self-generated actions and causally linked sensory events is shortened. However, the major difference is that in studies employing the interval estimation method, participants have to estimate the length of the interval between action and effect, while with the Libet clock, both the timing of the action and the timing of the effect have to be estimated independently.

In studies using the Libet Clock, participants have to estimate the timing of their actions and subsequent events by means of a so-called Libet Clock, which is presented on a screen. This clock is designed such that a full rotation of the clock hand takes about 2560 ms rather than 60 seconds. During the experiments, participants view the rotating clock hand while performing voluntary button presses and experiencing their effects (usually sounds). Subsequently, they report the position of the clock hand at specific occurrences. These occurrences are either the participants’ actions or the ensuing effects (for more detail see Fig. 2) (e.g., Libet, Gleason, Wright, & Pearl, 1983; Ruess, Thomaschke, & Kiesel, 2017b). Results show that voluntary actions are systematically perceived as having happened later, shifted towards the effect, when occurring in combination with a sensory event compared to when occurring in isolation (action binding). The same accounts for time estimations of effects following voluntary actions. Subsequent to self-generated actions, effects are judged to have occurred earlier, shifted towards the action, as compared to effects that happened in isolation (effect binding). Consequently, the interval estimation method can only make inferences about the overall binding, while the other method is capable of disentangling action binding and effect binding.

However, the use of the Libet Clock has several limitations as well. Pockett and Miller (2007) focused on different factors which might influence results obtained with this method. The authors emphasize that instructions of whether to report the onset or end of the own movement influence participants’ estimations. They also suggest that the luminance of the clock hand and its size might have an influence on the effects found. Additionally, tasks employing the Libet Clock are visually demanding, as participants have to follow the clock hand with their eyes to make accurate temporal judgments. Thus, the setup is ill-suited for many conceivable settings, especially when they involve tasks with visual effects.

To reduce the task’s inherent visual load and to introduce more flexibility in the experimental task, Cornelio Martinez, Maggioni, Hornbæk, Obrist, and Subramanian (2018) proposed an “auditory Libet Clock.” This method uses spoken letters, which are presented over headphones, rather than the visual clock hand to determine the perceived timing of the actions or events. To the best of our knowledge, at this point, this is still the first study using an auditory timer to measure temporal binding, and the obtained results remain to be replicated and extended. Thus, a thorough and reliable approach to systematically studying temporal binding by means of an auditory timer is needed. The seemingly trivial setup of timed auditory cues has various obvious and less obvious design factors that might affect experimental results and the overall aptness of the method. In this line of research, we varied four design factors that we consider most important and substantial for the design of an auditory measure for temporal binding. Therefore, we systematically manipulated the factors interval length, interval filling, sequence predictability, and sequence length of an auditory timer to study temporal binding in a task with visual effects.

First, interval length, which is the length (duration) of the presented letters, is of utmost importance, as it determines the temporal resolution of the timed auditory stimuli. The shorter the interval, the higher the resolution; however, this resolution gain can come at the cost of discernibility of the individual letters. Hence, we ask: What is the optimal interval length?

Second, interval filling also plays an important role in the configuration of an auditory timer, as it contributes to its temporal resolution. Additionally, it provides anchors for temporal estimations. Previously and subsequently used letters can be used as temporal cues and therefore serve as anchors for participants’ estimations. The salience of these anchors varies with the filling of the interval. Finally, filling time intervals with auditory stimulation can potentially increase the accuracy of duration estimation (Rammsayer & Lima, 1991). Thus, we seek to answer the question: How should intervals be filled?

Third, the predictability of the letter sequence appears to be an important factor, as it might influence participants’ estimation strategies. With decreasing sequence predictability, participants might focus more on auditory anchors while relying less on strategies (e.g., always acting on the same auditory cue). Thus, we ask: Should the sequence of auditory cues be predictable?

Ultimately, the number of letters that constitute the auditory scale most likely has an influence on participants’ task load. With increasing length of the letter sequence, it should become more difficult to remember it and therefore draw more cognitive resources. Therefore, we aim to answer the question: What is the optimal number of auditory cues?

The presented experiments introduce a thorough, theory-driven approach to establishing an auditory timer for measuring temporal binding. Within this context, the four aforementioned factors are systematically manipulated in successive experiments to find the most suitable timing configuration. All experiments were preregistered on the Open Science Framework (OSF) and were approved by the ethics committee of the psychology department of the Julius-Maximilians-University of Würzburg (GZ 2019-09). All raw data and analysis scripts are available at the project repository (https://osf.io/d3vz5/).

Experiment 1: Manipulation of interval length

Experiment 1 tested for the ideal presentation length of letters that constitute the auditory timer for measuring temporal binding. This is what we will refer to as interval length. Letters were either 250 ms, 500 ms, or 750 ms long (for more detail see Apparatus and stimuli). According to the study by Cornelio Martinez et al. (2018), we expected to find temporal binding in the 250 ms condition. Additionally, we were interested to find out how variations in the interval length influence temporal binding as an objective measure. As a manipulation check, both action binding and effect binding should be similar to both types of binding found in previous studies using the Libet Clock. Additionally, we collected participants’ perceived task load in order to determine whether there were differences in the subjective quality of the auditory timer depending on the interval lengths.

Methods

Participants

Forty-eight participants (11 male, 8 left-handed, mean age = 24.1 years, SD = 6.3) recruited over the university’s participant pool (SONA) took part in the experiment. Prior to data collection, a power analysis for paired-sample t-tests was performed using G*Power 3.1 (Faul, Erdfelder, Buchner, & Lang, 2009). Because previous studies have found medium effect sizes for action binding (e.g., Ruess, Thomaschke, & Kiesel, 2017b), we conducted the power analysis with d = 0.40, α = .05. With these parameters, a sample size of 41 would have sufficed to ensure high power (.80). However, in order to counterbalance the conditions, we set the sample size to 48. Prior to the experiment, participants signed an informed consent form and they received either monetary compensation or partial course credit for their voluntary participation. All participants were naïve to the purpose of the study and were debriefed afterwards.

Apparatus and stimuli

Visual effect task

The visual effect task was a single-choice task with a visual effect, i.e., the movement of a cursor. It was completed on an iPad 2, which participants operated with the index finger of their right hand. The iPad’s LED screen, with a 9.7-inch diagonal and a resolution of 1024 × 768 px, was used in landscape mode. Compared to normal keyboards, a touch device gives the user more unambiguous feedback as to when the finger touched the surface. In contrast, with a standard keyboard, there are at least two events that might shape the experienced point in time of that keypress, namely when the finger hit the key and when the key was completely pressed. Additionally, this addresses the pitfalls inherent in other sensory input such as clicking sounds elicited by the keypress that usually accompany the use of computer keyboards. Thus, touchscreen devices seem to be suitable for studying temporal binding^{Footnote 1}. During the experiment, a 3 × 3 grid of circles with diameters of 100 px was presented on the left half of the screen (see Fig. 1). Next to the grid on the right was a keypad with eight spatially arranged arrow keys, each of which measured 100 × 100 px. At trial onset, the center circle (start area) was filled in blue (to illustrate a movable cursor) and displayed the German word for start (“Start”). Simultaneously, one of the other eight circles in the grid displayed the German word for goal (“Ziel”) and was connected to the start area with a straight orange line. The goal location indicated which keypress participants had to perform.

Auditory timer task

During trials, participants repeatedly heard five timed letters over headphones at a preset volume. This letter sequence, consisting of the German letters A, F, I, O, and T, served as auditory timer to reference the perceived timing of actions and effects. In the first experiment, we decided to use a sequence of five letters to ensure that participants would be able to store the entire sequence in their working memory while executing the visual task. Moreover, the selected number of auditory stimuli provided a good temporal resolution when transferred to the visual scale on the iPads, where one pixel represented 2.5 ms (for a systematic manipulation of the number of letters, see Experiment 4: Manipulation of sequence length). The timed auditory letter sequence was designed so that the offset of one letter constituted the onset of the next, so there was no pause in between. In Experiment 1, we varied the length of each letter on three levels^{Footnote 2} (250 ms, 500 ms, 750 ms) between blocks. This resulted in continuous streams of letters that varied only in the broadness of the pronunciation. A representative example of the auditory stream is accessible at the project’s OSF page (https://osf.io/2746f/).

Procedure

Participants encountered four different estimation conditions throughout the experiment (see Fig. 2): (1) Action experimental: Cursor movements followed participants’ keypresses and the perceived timing of the keypress was assessed. (2) Action baseline: Participants’ keypresses were not followed by a cursor movement and the perceived timing of the keypress was assessed. (3) Effect experimental: Cursor movements followed participants’ keypresses and the perceived timing of the cursor movement was assessed. (4) Effect baseline: After a random interval of 2500–5000 ms, a cursor movement occurred without participants’ keypresses and the perceived timing of this cursor movement was assessed. These conditions were used to calculate temporal binding (see Results for more detail). As temporal binding is calculated as the difference between participants’ estimation errors in the experimental compared to the baseline condition, absolute estimation errors will not be reported here, but can be retrieved from the OSF repository (https://osf.io/d3vz5/).

At trial onset, participants saw the grid on the left side of the screen and the keypad on the right side while hearing the letter sequence. The first letter of the letter sequence was selected at random. The circle in the middle of the grid was colored in blue and displayed the German word for start. Simultaneously, one of the other eight circles showed the German word for goal. These two circles were connected with a straight orange line, informing participants which key to press. Participants were asked to press the corresponding arrow key to move the cursor from the start area to the goal area. Additionally, participants received the instruction to wait at least three letters until they performed the keypress. They were also discouraged from pre-planning the time of their keypress and received the explicit information that this was not a speed task, but rather that they could perform the keypresses at their leisure.

In the experimental conditions, their keypress was followed by the respective cursor movement after a random delay of 150, 250, or 350 ms. These delays were chosen in accordance with previous studies (e.g., Haggard et al., 2002; Ruess, Thomaschke, & Kiesel, 2017b; Weller, Schwarz, Kunde, & Pfister, 2020). We used varying delays so participants could not compute the timing of their action by simply subtracting a fixed interval from the perceived timing of the effect and vice versa. This way, they had to concentrate more intently on the event in question. In the action baseline condition, participants only performed a keypress which did not cause the cursor to move. In the effect baseline condition, participants were asked not to press a key. In this condition, the cursor moved after a random delay of 2500–5000 ms after trial onset.

After the last event in each condition (i.e., cursor movement in the experimental conditions and effect baseline condition; keypress in the action baseline condition), the spoken letters presented over the headphones continued for another 1000, 1500, or 2000 ms. Subsequently, participants were asked to report the perceived timing of either their action or the cursor movement by locating it on a visual scale displaying the letter sequence (A-F-I-O-T-A), with the first and last letter being the same to ensure that the entire range of possible estimations was covered. The scale was presented in the center of the screen with a width of 1000 px and a height of 100 px. It had six anchors for each letter, which had three subdivisions each (see Fig. 2). Participants could press any point on the scale to make their temporal judgment. Subsequently, this was translated into a continuous dependent variable reflecting participants’ temporal estimation, 1 px = 2.5 ms, for further analyses. Following correct responses, the next trial started, with an inter-trial interval of 2000 ms, with the presentation of the grid, the start and a new goal area, and the keypad. In cases where participants’ keypresses did not correspond to the predefined path, the cursor followed participants’ keypresses rather than the orange line, and an error message was displayed. After such commission errors, participants received an error message in the form of the German word for error (“Fehler”) in red font in the center of the circle grid. If participants pressed a key in the effect baseline condition, they were informed not to press a button in the same way. This feedback was displayed after the cursor movement was completed and before participants had to give their time estimations.

In addition to the perceived timing, participants made explicit agency judgments on a continuous 100-point scale from −50 to 50. Participants rated their perceived authorship (“The dot moved as I wanted it to”), control (“I controlled the dot’s movement”), and causation (“I caused the dot’s movement”) over the cursor movement. These ratings were given after every eighth trial in the experimental blocks.

As the variable of interest for this experiment was the interval length, this factor was manipulated within subjects. For counterbalancing, we divided the experiment into thirds and assigned a specific interval length (250, 500, or 750 ms) to each of them. The sequence of the four estimation conditions was also counterbalanced across participants, with the prerequisite that they always had to start with the baseline blocks before completing the experimental blocks. The sequence of conditions remained the same throughout all experimental thirds. Overall, participants completed 12 blocks (two baseline blocks, then two experimental blocks, for every interval length) of 40 trials each.

At the end of each third, participants filled out a German version of the NASA Task Load Index (TLX) consisting of six items to investigate subjective task load (Hart & Staveland, 1988). It assesses mental demand, physical demand, temporal demand, performance, effort, and frustration on a continuous 10-point scale from low to high. The experiment took about 90 minutes.

Raw data and analysis scripts are available on the Open Science Framework, https://osf.io/d3vz5/.

Design

The study used a 3 × 4 repeated-measures design with interval length (250 ms vs. 500 ms vs. 750 ms) and condition (action experimental vs. action baseline vs. effect experimental vs. effect baseline) as within-subjects factors.

Data analysis

To assess temporal binding, we first calculated estimation errors as the difference between participants’ temporal estimates and the actual timing of the respective event (timing_estimation − timing_actual). For example, if participants pressed a key 100 ms after they heard the letter “I” but reported this key press as having occurred in the middle between “I” and “O” (i.e., 250 ms after the onset of letter “I”), the estimation error for this particular trial was (250 ms − 100 ms) 150 ms. We discarded erroneous trials and trials in which the temporal binding exceeded 2.5 SDs of the participant’s cell mean in the respective condition (baseline vs. experimental; 250 ms vs. 500 ms vs. 750 ms). Subsequently, we calculated means for each estimation condition and interval length separately. These were then used to calculate the action binding and the effect binding for each interval length. Therefore, participants’ estimation errors in the baseline conditions were subtracted from those in the respective experimental conditions (temporal binding = estimation error_exp − estimation error_base). Positive values indicate that an occurrence in the experimental condition was perceived to have happened later than in the baseline condition, while negative values indicate an earlier perception of an occurrence in the experimental compared to the baseline condition.

To test our hypothesis, we first conducted separate two-tailed t-tests for all types of action binding and effect binding to see whether the differences between experimental and baseline conditions differed significantly from zero, that is, whether participants showed temporal binding. Then, we conducted two one-factorial analyses of variance (ANOVAs), one for action binding and one for effect binding, with interval length (250, 500, 750 ms) as within-subjects factor to uncover specific differences between the individual interval lengths. Follow-up analyses were conducted via two-tailed, paired t-tests. Effect sizes for all paired t-tests were calculated as d_z = \( \frac{t}{\sqrt{n}} \).

For explicit agency judgments, we calculated mean scores for explicit agency ratings (authorship, control, causation) for each condition (action experimental, effect experimental) and each interval length individually. Then, a one-way ANOVA with condition (action vs. effect) as within-subjects factor was conducted to uncover differences in participants’ subjective judgments of agency between conditions in which participants focused either on the action or on the effect. Ultimately, three repeated-measures ANOVAs with interval length (250 ms vs. 500 ms vs. 750 ms) as within-subjects factor were conducted.

To assess participants’ task load with different interval lengths, mean scores for each scale of the NASA TLX were calculated and compared between the three interval lengths. A repeated-measures ANOVA with interval length (250 ms vs. 500 ms. vs. 750 ms) as within-subjects factor was conducted separately for each scale. Follow-up analyses were carried out via two-tailed, paired t-tests. Effect sizes for all paired t-tests were calculated as d_z = \( \frac{t}{\sqrt{n}} \).

Additionally, for nonsignificant results, we used post-hoc Bayes analyses to further examine the evidence for and against the null hypothesis. We calculated Bayes factors using JASP computer software (JASP Team, 2018). As stated in the preregistration, we expected medium to large effects. Thus, we used a scale parameter of 0.25 for the analyses. This corresponds to a probability of 80% that the effect lies between −0.8 and 0.8. As per convention, a Bayes factor of BF₁₀ < 1/3 can be interpreted as evidence in favor of the null hypothesis, while Bayes factors (BF₁₀) greater than 3 yield at least moderate evidence for the alternative hypothesis (Dienes, 2014). As we tested for equality, however, we used the inverse BF₀₁ (with \( {BF}_{01}=\frac{1}{B{F}_{10}}\Big) \) and thus the inverse decision criteria apply (see also Janczyk & Pfister, 2020).

Results

Temporal binding

Erroneous trials (0.8%) and trials in which temporal binding exceeded 2.5 SDs of the participant’s cell mean (2.6%) were excluded from the analyses. Errors occurred mainly in the first trials of effect baseline blocks in which participants were asked not to press a key. Nevertheless, error rates showed obvious floor effects. Therefore, error rates will not be analyzed further (see Dixon, 2008 for comments regarding floor and ceiling effects in the analysis of error data).

Action binding

Data showed significantly larger estimation errors for experimental conditions than for baseline conditions for all comparisons except the action binding in the 750 ms condition, t₂₅₀(47) = 2.57, p = .013, d_z = 0.37, ∆ = 23.06 ms, t₅₀₀(47) = 4.10, p < .001, d_z = 0.59, ∆ = 51.89 ms, t₇₅₀(47) = 1.46, p = .151, d_z = 0.21, ∆ = 39.22 ms. That is, the action was overall reported to be shifted towards the effect, while this was not the case in the 750 ms condition. Participants did indeed judge actions to have occurred later in time when they were followed by a cursor movement than when they were executed in isolation.

The ANOVA for action binding with interval length (250 ms vs. 500 ms vs. 750 ms) as within-subjects factor did not show any significant difference in the magnitude of action binding between the three interval lengths, F < 1, BF₀₁ = 7.71 (see Fig. 3).

Effect binding

Estimation errors of effect differed significantly between experimental and baseline conditions for all three interval lengths, t₂₅₀(47) = −8.21, p < .001, d_z = 1.18, ∆ = −159.77 ms, t₅₀₀(47) = −6.26, p < .001, d_z = 0.90, ∆ = −132.74 ms, t₇₅₀(47) = −3.08, p = .003, d_z = 0.44, ∆ = −83.97 ms. Cursor movements were reported to have happened earlier when a keypress preceded this cursor movement.

The ANOVA for effect binding with interval length (250 ms vs. 500 ms vs. 750 ms) as within-subjects factor revealed a significant difference in binding size between the different interval lengths, F(2,94) = 5.15, p = .008, η_p² = .10. That is, effect binding increased significantly between the 750 ms and the 250 ms condition, t(47) = −2.79, p = .008, d_z = 0.40, and between the 750 ms and the 500 ms condition, t(47) = −2.08, p = .043, d_z = 0.30. There was no clear evidence for or against a difference between the short and medium interval length, t(47) = −1.28, p = .206, d_z = 0.18, BF₀₁ = 1.49.

Explicit agency judgments

Explicit judgments of agency did not differ between conditions (i.e., action experimental vs. effect experimental), F(1,47) = 1.25, p = .270, η_p² = .03, BF₀₁ = 9.38, so explicit agency judgments were calculated across conditions. In general, agency ratings were high for all three types of judgment, authorship (M = 25.23, SD = 19.54), control (M = 22.59, SD = 20.72), and causation (M = 35.17, SD = 13.85).

Subsequently, three repeated-measures ANOVAs with interval length (250 ms vs. 500 ms vs. 750 ms) as within-subjects factor were conducted. Explicit authorship ratings differed significantly between the different interval lengths, F(2,94) = 4.75, p = .011, η_p² = .09. This effect was mainly due to participants’ significantly lower authorship ratings in the 250 ms condition compared to the 500 ms condition, t(47) = −2.73, p = .009, d_z = −0.39, while their ratings in the 500 ms and the 750 ms condition did not show clear evidence for or against a difference, t < 1, BF₀₁ = 2.58. Explicit agency judgments for control and causation were not influenced by interval length, F_control(2,94) = 1.51, p = .226, η_p² = .03, BF₀₁ = 4.18, F_causation < 1, BF₀₁ = 7.41 (see Fig. 4).

NASA Task Load Index

Participants filled out the NASA Task Load Index to determine whether the manipulation of interval length had an effect on perceived task load. Here we report only the subscales on which interval length had an influence. All other results can be found on the OSF repository (https://osf.io/d3vz5/).

Data showed a significant effect of interval length on mental demand (MD), F(2,94) = 16.19, p < .001, η_p² = .26. Mental demand decreased significantly between the 250 ms and the 500 ms condition, t(47) = 4.61, p < .001, d_z = 0.67, and between the 250 ms and the 750 ms condition, t(47) = 5.56, p < .001, d_z = 0.80, while there was no clear evidence for or against a difference between the two longer intervals, t < 1, d_z = 0.07, BF₀₁ = 2.45.

The same held true for physical demand (PD). It differed significantly between the three interval lengths, F(2,94) = 5.24, p = .007, η_p² = .10. While there was a slight decrease in physical demand between the 250 ms and the 500 ms condition, t(47) = 2.11, p = .040, d_z = 0.30, and between the 250 ms condition and the 750 ms condition, t(47) = 4.71, p < .001, d_z = 0.68, there was no clear evidence for or against a difference between the medium and the long interval, t < 1, d_z = 0.08, BF₀₁ = 2.39.

The ANOVA for temporal demand (TD) revealed significant differences between the three conditions, F(2,94) = 37.04, p < .001, η_p² = .44. Temporal demand decreased significantly from the 250 ms to the 500 ms condition, t(47) = 5.59, p < .001, d_z = 0.81, as well as from the 250 ms to the 750 ms condition, t(47) = 7.80, p < .001, d_z = 1.13. Temporal demand in the 500 ms condition was also significantly higher than in the 750 ms condition, t(47) = 3.20, p = .003, d_z = 0.46.

Data showed a significant effect of interval length on performance (P), F(2,94) = 4.48, p = .014, η_p² = .09. Performance gradually increased with increasing interval length. However, there was neither evidence for nor against a difference between either the 250 ms and the 500 ms condition, t(47) = −1.60, p = .115, d_z = 0.23, BF₀₁ = 1.07, or the 500 ms and the 750 ms condition, t(47) = −1.39, p = .170, d_z = 0.20, BF₀₁ = 1.34. Performance in the 250 ms condition was rated significantly higher than in the 750 ms condition, t(47) = −3.02, p = .004, d_z = 0.44.

Data showed a significant effect of interval length on effort (E), F(2,94) = 4.36, p = .016, η_p² = .09. Effort gradually decreased with increasing interval length. Effort was significantly lower in the 750 ms condition than in the 250 ms condition, t(47) = 3.00, p = .004, d_z = 0.43. Further analyses did not show any clear evidence for or against a difference between the 250 ms condition and the 500 ms condition, t(47) = 1.15, p = .258, d_z = 0.17, BF₀₁ = 1.67, or between the 500 ms condition and the 750 ms condition, t(47) = 1.78, p = .081, d_z = 0.26, BF₀₁ = 0.86.

The ANOVA revealed a significant effect of interval length on frustration (F), F(2,94) = 4.31, p = .016, η_p² = .08. Frustration decreased significantly from the 250 ms and the 500 ms condition, t(47) = 2.58, p = .013, d_z = 0.37, and from the 250 ms condition to the 750 ms condition, t(47) = 2.58, p = .013, d_z = 0.37, while there was no clear evidence for or against a difference between the two longer intervals, t < 1, BF₀₁ = 2.46.

Discussion

We investigated whether varying lengths of the letters constituting the auditory timer have an influence on temporal binding. Experiment 1 served the purpose of determining the optimal interval length for our setup. Participants executed a navigation task on an iPad while hearing timed auditory stimuli over headphones. These stimuli were five German letters with three different interval lengths (250, 500, 750 ms). All interval lengths produced effect binding, and the perceived timing of actions in all conditions tended to be shifted towards the effect. However, action binding did not differ significantly from zero in the condition with letters of 750 ms. These results are in line with previous studies using temporal binding as a measure, which also report smaller action binding than effect binding (Beck, Di Costa, & Haggard, 2017; Ruess, Thomaschke, & Kiesel, 2017b). Thus, we conclude that our setup is in principle capable of measuring temporal binding and of replicating previous findings on temporal binding.

All interval lengths showed medium to large effects for effect binding. This, as well as the absolute magnitude of the estimation errors, replicates previous studies examining temporal binding by means of a visual Libet Clock (Ruess, Thomaschke, & Kiesel, 2017b; Schwarz, Weller, Klaffehn, & Pfister, 2019a; Wolpe, Haggard, Siebner, & Rowe, 2013). As effect binding did not differ significantly between short and medium intervals, it seems that there is not one ideal interval length for measuring temporal binding with an auditory timer. Rather, it appears that auditory stimuli with short to medium length, remaining below a certain threshold (in this case 750 ms), seem to be suitable for revealing temporal binding. The same applies for action binding; both effect sizes and absolute estimation errors replicated previous studies at least for the two shorter interval lengths. Therefore, our recommendation is that the auditory stimuli be no shorter than 250 ms but not longer than 500 ms.

Contrary to the implicit temporal binding measures, the length of the presented auditory stimuli did not influence explicit agency judgments. Throughout the experiment, participants rated their sense of agency as high in almost all conditions. The only condition in which explicit sense of agency was slightly diminished was when participants had to rate their authorship over the cursor movements in the 250 ms condition. Previous studies with predictable action–outcome delays have shown that increasing these delays (>200 ms) produces lower explicit agency ratings (Wen, Yamashita, & Asama, 2015). In the present study, action–outcome delays varied on a trial-by-trial basis between 150 ms and 350 ms. Additionally, agency ratings were recorded after every eighth trial, rendering it impossible to map agency ratings to specific action–outcome delays. Therefore, it is plausible that participants made an overall judgment across the previous mini-block, resulting in less differentiated judgments of agency. To sum up, interval length does not seem to have a great influence on participants’ explicit agency judgments, which can therefore be neglected when designing the auditory timer. Researchers should however also bear in mind participants’ task load and frustration during task execution, as this is often detrimental to their concentration and task irrelevant thoughts over the course of the experimental session.

Over the course of the experiment, there was a trend that task load decreased with increasing interval lengths. This was also the case for participants’ perceived effort and frustration, which decreased as the length of the presented letters increased. This pattern reversed for participants’ self-ratings of performance. They judged themselves as doing better on task completion when interval length increased. Consequently, we recommend the utilization of intervals with a medium length for the auditory timer. This way, researches can ensure low to moderate task load while also maintaining participants’ self-image as being competent on the task.

To sum up, with regard to the temporal estimation measure, we decided to use an interval length of 500 ms for subsequent studies. This interval length appeared to create the most robust action binding while also producing reasonably large effect binding. Additionally, considering participants’ task load ratings, the 500 ms interval seemed to evoke a tolerable task load, whereas even shorter intervals unnecessarily increased task load and at the same time descriptively lowered subjective performance ratings. This design decision is supported by participants’ explicit agency judgments, which tended to be slightly lower in the 250 ms condition than in the 500 ms condition.

Experiment 2: Manipulation of interval filling

In Experiment 2 we systematically manipulated the factor interval filling, that is, the way in which the spoken letters were presented. This design factor was chosen as it contributes to the temporal resolution of the auditory timer. Letters were presented in three different ways: filled, half-filled, and sequenced. We expected half-filled intervals to be a poor measure for temporal binding, as the silence in the second half of the interval does not provide temporal information. On the contrary, sequenced intervals should provide participants with more anchors and therefore make temporal judgments easier. The addition of temporal information should however also lead to increased task load.