Introduction

The world consists of complex events that are characterized by diverse features. A simple event, like a traffic light turning from green to red is characterized by perceptual features, like color, form, size or location, and action features, like effector identity or intensity related to a braking action. According to one of the fundamental organizational principles of our brain, these different features of the event are processed in a distributed manner (Felleman & Van Essen, 1991; Jeannerod, 1999; Mesulam, 1998). Several lines of evidence suggest that, during the processing of events, the perceptual and/or action features of these events become temporarily connected via episodic binding (Hommel, 1998, 2004; Kahneman, Treisman, & Gibbs, 1992; Treisman, 1996, 1999). Such integration or binding of features has been demonstrated for response features (e.g., Hommel, 1998; Zmigrod & Hommel, 2010), and for a large number of visual stimulus features like color, shape and orientation (e.g., Hommel, 1998; Kahneman et al., 1992), as well as for auditory stimulus features like pitch, loudness and vocal features (e.g., Bogon, Eisenbarth, Landgraf, & Dreisbach, 2016; Moeller, Rothermund, & Frings, 2012; Zmigrod & Hommel, 2009). Importantly, visual and auditory events typically also involve temporal features like duration. So far, however, there is no direct evidence for the integration of temporal stimulus features.

Although binding has been demonstrated to involve almost any type of nontemporal features, it cannot automatically be inferred that binding also involves temporal features like duration. Despite time being inherent and essential to any stimulus-response event, there are potential arguments against binding of duration to other features. The first critical aspect is expressed by the fact that duration is an intrinsically dynamic feature, which changes over time. In contrast to an auditory stimulus feature, like pitch or loudness, that can be steadily present from the first occurrence of the stimulus until its disappearance, duration is constantly redefined. This means that, for feature binding, when the current stimulus duration is bound, the bound duration immediately becomes obsolete as the stimulus persists. Another reason preventing temporal features from becoming bound into event representations could lie in the functional role of the binding mechanism itself. Facing a dynamic environment with changing features over time, binding has been proposed to structure perception and action temporally by framing it in temporal units (Fournier & Gallimore, 2013; Hommel & Colzato, 2004; Pöppel, 1997; Wittmann, 2011). Here, time representations are part of the cognitive reference frame for building event representations. If temporal features themselves were integrated into event representations by binding, they can no longer serve to temporally structure perception and action. In view of this, it seems unlikely that temporal features like duration representation are integrated into event representations (cf. Thomaschke & Dreisbach, 2015 for a similar argument). On the other hand, duration is an important feature when identifying, discriminating and processing events. Especially in the auditory domain, e.g., during music or speech processing, the duration of events plays an important role in identifying and recognizing events, and therefore its integration seems reasonable (Krumhansl, 2000; Lehiste, Olive, & Streeter, 1976; MacGregor, Corley, & Donaldson, 2010; Repp, 1998). Furthermore, explanations of previous findings of temporal stimulus–response compatibility effects involving stimulus duration and response duration draw on the integration of temporal features into event representations (Grosjean & Mordkoff, 2001; Kunde, 2003; Kunde & Stöcker, 2002).

The aim of the present study was to investigate whether temporal duration is bound in the representation of auditory events. Binding produces a characteristic sequential performance pattern, which has been used routinely to demonstrate binding effects in previous research: participants respond more slowly to partial repetitions (one feature changes, the other repeats) than to complete repetitions or complete changes of features (e.g., Hommel, 1998, 2004; Kahneman et al., 1992; Kleinsorge, 1999; Zehetleitner, Rangelov, & Müller, 2012; Zmigrod & Hommel, 2009, 2010). These so called partial-repetition costs are interpreted as the consequence of the automatic retrieval of previous bindings (Colzato et al., 2012; Colzato, Zmigrod, & Hommel, 2013). Feature bindings established in one trial are retrieved in the successive trial if at least one feature is repeated. Therefore, when all features of the retrieved binding are repeated in the successive trial, no conflict occurs. There is also no conflict when none of the features of the successive trial are part of the retrieved binding (i.e., in the case of a complete change). However, when only one feature of the retrieved binding is repeated in the successive trial (partial repetition), the mismatch between the previously bound feature combination and the feature combination of the current trial causes conflict. In the present study, we used this performance pattern as an indicator for the integration of temporal stimulus features in event representations. In Experiment 1, participants had to respond with two keys to a low or high pitch sinus tone. Critically, the sinus tones were presented with two different presentation durations. If the duration is integrated in auditory event representations, sequential analyses should reveal partial-repetition costs. Repeating the duration in consecutive trials should lead to better performance if the pitch is also repeated, whereas alternation of duration should lead to better performance if the pitch is also alternated.

Experiment 1

Materials and methods

Participants

Seventeen students (age M = 23.31 years, SD = 2.09; one male; all right-handed) from the University of Regensburg participated for course credit. None of the participants reported any hearing impairment. One participant was excluded due to an error rate deviating 3.7 SDs from the sample mean.

Apparatus and stimuli

Participants sat in a dimly lit room facing a computer screen (19” diagonal) at a viewing distance of approximately 50 cm. Responses were collected via the “Y” and “M” keys on a standard QWERTZ keyboard, positioned centrally in front of the participant. The experiment was run in E-Prime (Version 2.0, Psychology Software Tools, Sharpsburg, PA). Target stimuli were four sinewave tones with different pitch (400 Hz vs. 800 Hz.) and duration (50 ms vs. 200 ms) presented via headphones. The loudness of the tones was set at a comfortable listening level (78 dB SPL; same level for all participants), with all pitches of equal loudness.

Design and procedure

A 2 x 2 design was used, with the within-subject factors Pitch (repetition vs. switch) and Duration (repetition vs. switch). Participants were instructed to respond as fast and accurately as possible; to low pitch tones (400 Hz) with the left response key, and to high pitch tones (800 Hz) with the right response key. This assignment was kept constant across participants since higher tones are more associated with right positions and lower tones with left positions (Rusconi, Kwan, Giordano, Umiltà, & Butterworth, 2006). Each trial started with a fixation cross of 300 ms duration. The target tone was then presented accompanied by a blank screen that was visible until the response was given. After an inter-trial interval of 600 ms, the next trial started. When participants responded using the wrong button, an appropriate error message appeared for 1500 ms. The experiment consisted of one practice block of 20 trials and three experimental blocks of 80 trials. The order of trials was randomized with the constraint that, in the experimental blocks, each factor combination (pitch sequence x duration sequence) appeared at least 18 times.

Results and discussion

We analyzed data from the three experimental blocks. The first trial of each block was excluded from analysis. Moreover, error trials, trials following error trials, and trials with RTs deviating more than three SDs from the individual condition mean were excluded from the RT analysis (Bush, Hess, & Wolford, 1993).

Figure 1 plots mean RTs as a function of Pitch and Duration. We conducted a 2 x 2 (Pitch: repetition vs. switch x Duration: repetition vs. switch) ANOVA with repeated measures on both factors. This revealed a significant main effect of Duration, F(1,15) = 10.00, p = .006, ηp 2 = .40, indicating faster responses when the duration was repeated compared to when the duration switched (378 ms vs. 392 ms). The factor Pitch was not significant (F = 1.64). Most importantly, there was a significant interaction Pitch x Duration,Footnote 1 F(1,15) = 10.81, p = .005, ηp 2 = .42. Participants responded slower in pitch repetition trials, when duration switched, compared to when duration repeated.

Fig. 1
figure 1

Mean RTs (ms) and error rates (%) as a function of pitch sequence (repetition vs. switch) and duration sequence (repetition vs. switch) for Experiment 1. Error bars represent inferential confidence intervals according to Tryon (2001) based on the corresponding duration repetition vs. duration switch comparison. Nonoverlap of these confidence intervals is equivalent to significance in a paired t-test with an alpha level of .05

Figure 1 plots mean error rates as a function of Pitch and Duration. An analogous ANOVA for errors yielded a significant main effect of Pitch, F(1,15) = 4.83, p = .044, ηp 2 = .24. None of the other effects was significant (all Fs < .64).

The Duration x Pitch interaction in RT data confirmed partial-repetition costs to a certain degree: at pitch repetitions, performance was better when both pitch and duration repeated, relative to when only pitch repeated and duration switched. During pitch switches, duration did not influence responses. In order to replicate the present finding, and in order to investigate if the asymmetric performance pattern is duration specific or pitch specific, we conducted a second experiment with loudness as the relevant feature.

Experiment 2

Materials and methods

Participants

Nineteen students (age M = 24.61 years, SD = 2.81; nine males; one left-handed) from the University of Regensburg participated for course credit. None of the participants reported any hearing impairment. One participant was excluded due to an error rate deviating 3.1 SDs from the sample mean.

Stimuli and procedure

The procedure of Experiment 2 mirrored that of Experiment 1, with the exception that target stimuli were four sinewave tones with different loudness (78 dB SPL vs. 60 dB SPL) and duration (50 ms vs. 200 ms). All participants were instructed to respond as fast and accurately as possible to soft tones with the left response key, and to loud tones with the right response key.

Results and discussion

Preprocessing was exactly the same as in Experiment 1. Figure 2 plots mean RTs as a function of Loudness and Duration. We conducted a 2 x 2 (Loudness: repetition vs. switch x Duration: repetition vs. switch) ANOVA with repeated measures on both factors. This revealed a significant main effect of Duration, F(1,17) = 7.17, p = .016, ηp 2 = .30, indicating faster responses when the duration was repeated compared to when the duration switched (407 ms vs. 419 ms). The factor Loudness was not significant (F = .68). Most importantly, there was a significant interaction Loudness x Duration,Footnote 2 F(1,17) = 18.73, p < .001, ηp 2 = .52.

Fig. 2
figure 2

Mean RTs (ms) and error rates (%) as a function of loudness sequence (repetition vs. switch) and duration sequence (repetition vs. switch) for Experiment 2. Error bars represent inferential confidence intervals according to Tryon (2001) based on the corresponding duration repetition vs. duration switch comparison. Nonoverlap of these confidence intervals is equivalent to significance in a paired t-test with an alpha level of .05

Figure 2 plots mean error rates as a function of Loudness and Duration. An analogous ANOVA yielded a significant main effect of Loudness F(1,17) = 5.64, p = .030, ηp 2 = .25, indicating that, overall, loudness switches were more error prone than loudness repetitions (2.1% vs. 3.3%). None of the other effects was significant (all Fs < 2.5).

The RT results of Experiment 2 extended the findings of Experiment 1. This time, the partial repetition cost pattern was more symmetric: performance was better when both loudness and duration repeated or switched, relative to when only one repeated or switched, indicating that duration is bound to other features of the auditory event.

General discussion

In two experiments, we investigated whether the duration of auditory events is integrated into coherent event representations via binding processes. Participants had to classify the pitch (Experiment 1) or the loudness (Experiment 2) of sinus tones. Irrelevant for the task, the sinus tones had two different durations. Sequential analysis of RT data indicated binding effects between stimulus duration and other stimulus features. In Experiment 1, performance was better when both pitch and duration repeated, relative to when only pitch repeated and duration switched. In Experiment 2, results revealed a more symmetric partial repetition costs pattern: performance was better when both duration and loudness switched or repeated, and was worse when only one feature switched or repeated.

Both experiments provide evidence for the integration of stimulus duration in event representations. When the task-relevant feature repeated, performance was worse if the duration changed compared to when the duration also repeated. That means, in both experiments, repeating the task-relevant stimulus feature retrieved a previous binding between the task-relevant stimulus feature and the duration, resulting in costs if the previous binding mismatched the current feature combination. The capability of the task-irrelevant duration feature as retrieval cue appeared to be less solid, and seems to depend on the kind of the task-relevant feature. In Experiment 1, when the task-relevant feature switched, the repetition of the duration did not lead to performance costs. This implies that, with pitch as the relevant feature, duration failed to serve as a retrieval cue, and thus no conflict occurred even when the previous binding mismatched the actual feature combination. In Experiment 2, with loudness as the relevant feature, duration actually operated as retrieval cue: repeating duration retrieved the previous binding and caused conflict if the previous binding involved a different loudness feature than in the actual feature combination.

One might argue that our data pattern could be explained by the use of a response strategy: a change in any of the features (also duration) might evoke a tendency to change the response, strong enough to slightly delay the response execution. The use of this so-called bypass rule as a response strategy (Fletcher & Rabbitt, 1978) has been discussed previously in the binding literature (e.g., Frings & Rothermund, 2011; Giesen & Rothermund, 2011; Mayr, Buchner, Möller, & Hauke, 2011). Importantly, Fletcher and Rabbitt (1978) found evidence for the application of this response strategy only in highly practiced participants (five 1.5 h sessions prior to the experimental condition). Unpracticed participants (1 h experimental session) were found not to use this response strategy. Our experimental session lasted about 15 min, therefore it is very unlikely that the application of a bypass rule could account for the partial repetition costs in our study. Rather, the performance patterns revealed in our experiments are the consequence of binding effects involving stimulus duration.

In our experiments, we varied duration, pitch and loudness. Previous literature reported multidimensional perceptual interactions between the perceptual dimensions involved in our study (e.g., Doughty & Garner, 1948; Ekman, Berglund, & Berglund, 1966). According to these perceptual interactions, the pitch or loudness of a 200 ms tone would be expected to be experienced differently than the pitch or loudness of a 50 ms tone of the same pitch or loudness. However, these interactions cannot figure as an alternative interpretation of our results. On the contrary, this compound perception would rather work statistically against our predicted binding pattern. Were a 50 ms and a 200 ms tone with objectively the same loudness or pitch to be represented so differently, there would be subjectively no partial repetitions in our design but only full repetitions and full changes. Thus, we would not find any partial repetition costs. Accordingly, we see these perceptual interactions in conceptual independence of the binding issue. Perception in one dimension biases the perception in another dimension. But still—despite their mutual interactions—the percepts of features need to be bound, to represent them as belonging to one and the same feature. Binding might be influenced by these perceptual interactions; for example, some combinations might be combined easier than others. However, these interactions would not compromise the behavioral signature of binding, i.e., partial repetition costs.

Here we have shown for the first time that temporal features can be bound to other event features. This finding is remarkable for two reasons. First, duration continually changes. Despite this constant redefinition, it is bound into event representations. Second, the flow of time is not only the content (Cole, Barnet, & Miller, 1995; Savastano & Miller, 1998), but also the medium, of the cognitive organization realized by binding (Fournier & Gallimore, 2013; Hommel & Colzato, 2004; Pöppel, 1997; Wittmann, 2011). Binding organizes feature relations in an inherently dynamic way: a certain feature combination is bound during one period of time, but unbound during another. Or, in terms of an episodic memory theory (Los, Kruijne, & Meeter, 2014), relevant binding candidates, i.e., memory traces of feature representations, are weighted by temporal factors. We have shown that time—the actual reference frame for binding—can be bound itself, at least when it is realized as stimulus duration.

The finding that time can be bound to other features is an important advancement for binding theory in general, but it also raises several new questions. First, in the current study, duration is integrated into auditory event representations despite being irrelevant for the task. Thus, our findings may have important implications for other lines of research involving irrelevant stimulus durations (Grosjean & Mordkoff, 2001; Kunde, 2003; Kunde & Stöcker, 2002). Beyond that, an important question is how duration integration influences tasks that involve duration as the relevant feature (Kopec & Brody, 2010). Second, partial-repetition costs in the present study are defined in terms of feature switches. However, in our study, a feature switch was necessarily accompanied by a response switch. Although our effects involving irrelevant stimulus duration most probably rely mainly on stimulus-stimulus integration (Zmigrod & Hommel, 2009), it is not clear whether duration is additionally bound to the response. A number of previous binding studies explicitly dissociated between binding of features to stimulus features or to response features (Herwig & Waszak, 2012; Hommel, 1998; Zmigrod & Hommel, 2010). Similar designs could be employed to investigate this issue in relation to temporal features. A third important question that arises from the present findings relates to the temporal dynamics of temporal feature integration: when does the integration of duration take place, during the ongoing stimulus duration, or after its termination? Or, in other words, is temporal stimulus integration based on absolute (e.g., 200 ms; Bartolo & Merchant, 2009; Meegan, Aslin, & Jacobs, 2000) or relative duration representations (e.g., the longer of two stimuli; Molet & Zentall, 2008; Pinheiro de Carvalho & Machado, 2012; Thomaschke, Kunchulia, & Dreisbach, 2015)? It could be an interesting question for future studies to scrutinize the temporal structure of binding temporal features.