Although time processing has been investigated a great deal, determining the mechanisms by which humans evaluate time is still an interesting issue. For example, it is unclear how nontemporal factors influence the perception of temporal durations. In the present study, we addressed the effect of a concurrent motor task in the discrimination of sub- and suprasecond time intervals marked by either auditory or visual stimuli.

According to the scalar expectancy theory (SET), individuals encode durations by means of a central, amodal timekeeping device, the so-called internal clock (Gibbon, Church, & Meck, 1984; Treisman, 1963). According to the SET model, when a time judgment is made, the response is the result of a three-stage process: a clock stage, a memory stage, and a decision stage. The clock tracks time through a pacemaker–accumulator system; the pacemaker (an oscillator) emits pulses at a certain rate, and the accumulator integrates these pulses. The flow of pulses from the pacemaker to the accumulator is under the control of attentional processes, regulated by a switch (Penney, 2003; Penney, Gibbon, & Meck, 2000) or a gate mechanism (Block & Zakay, 1997, 2008). In general, the higher that attention is, the more precise are the temporal judgments. At the memory stage, the accumulated pulses are compared to previously learned durations, which have been recorded in reference memory (Grondin, 2005; Ulrich, Nitschke, & Rammsayer, 2006). Decisions regarding the duration of a particular interval are made on the basis of a comparison between the actual duration and the duration stored in the reference memory. The SET model accounts for both subsecond and suprasecond durations—that is, durations that do not (and that do) allow strategies such as counting (Grondin, Meilleur-Wells, & Lachance, 1999).

The influence of attention on time perception has been reported for a wide range of durations and using different temporal tasks, such as time estimation, time production, and time reproduction (Brown, 1985, 1997, 2008; Hemmes, Brown, & Kladopoulos, 2004). For instance, when participants are engaged in a time reproduction task together with a concurrent nontemporal task, their temporal estimates are generally less accurate than temporal estimates obtained without the concurrent task (Zakay & Block, 2004). The explanation of this finding has been based on the attentional allocation model, which posits that temporal estimates are influenced by the sharing of cognitive resources between temporal and nontemporal tasks. Nontemporal tasks divert attentional resources away from timing. When temporal and nontemporal tasks are performed simultaneously, participants divide their attentional resources between temporal and nontemporal tasks; therefore, fewer cognitive resources are dedicated to the temporal task, resulting in less accurate temporal estimates (Zakay & Block, 2004).

To investigate the role of attentional resources in time perception, previous studies have added a concurrent task to the temporal one [e.g., reading (visual) tasks (Brown, 1997; Hemmes et al., 2004; Mioni, Stablum, McClintock, & Grondin, 2014; Rakitin, Stern, & Malapani, 2005); articulatory suppression (auditory) tasks (Delgado & Droit-Volet, 2007; Droit-Volet & Clément, 2005; Droit-Volet & Rattat, 2007; see also Rattat & Droit-Volet, 2012), or visuo-auditory tasks (Berry, Li, Lin, & Lustig, 2014)].

It is also known that time perception is influenced by the sensory modality used for marking the time intervals (Grondin, 2003). Timing is more precise when stimuli are presented in the auditory rather than the visual modality (Grondin, 1993; Grondin, Meilleur-Wells, Ouellette, & Macar, 1998; Mayer, Di Luca, & Ernst, 2014; Rammsayer, Buttkus, & Altenmüller, 2012; Ulrich et al., 2006), and this auditory superiority might be due to the automaticity of temporal processing in audition.

However, what remains to be tested is how auditory and visual forms of temporal processing are affected by a concurrent secondary task that is neither auditory nor visual. The aim of the present study was to further investigate the effect of a motor task on a temporal discrimination task. More specifically, we selected a motor task that was neither an auditory nor a visual one, in order to investigate whether its effects differ for auditory versus visual duration discrimination. In the present study, we reduced the amount of attentional resources devoted to the timing task by adding a concurrent, nontemporal motor task (Droit-Volet, 2010; Mioni et al., 2014). Participants were asked to perform a temporal discrimination task with two standard durations, one subsecond—that is, 500 ms—and the other suprasecond—that is, 1,500 ms—in either the visual or the auditory modality, or with a self-paced motor tapping task (i.e., a task neither auditory nor visual). The study was designed to tackle the following questions: How is time perception modulated by the modalities used for marking an interval? Does the addition of a secondary motor task affect performance in a time discrimination task? If interference occurs, are the effects the same in auditory and visual duration discrimination and at different temporal ranges? We expected to find lower discrimination Weber fractions (WFs) in audition than in vision. Moreover, we expected to find lower WF without than with the concurrent task in both modalities. The potential interaction between sensory modality and the effect of the concurrent task would shed some light on whether the mechanisms involved in time processing are (or are not) modality-independent. If we were to find the same extent of interference effects in the two sensory modalities, we could infer the presence of a common underlying attentional mechanism involved in time processing. Alternatively, if we were to find different interference effects between the two modalities, then we could infer the presence of modality-dependent mechanisms for time processing.

Method

Participants

Sixty-two students in the Department of Psychology at the University of Padua took part in the experiment and performed a time discrimination task. They were randomly assigned to one of four discrimination tasks: Sixteen students (mean age = 25.75 ± 3.19 years; ten females, six males) performed the task in the visual modality (visual task); 16 students (mean age = 21.71 ± 1.30 years; 12 females, four males) performed the task in the auditory modality (auditory task); 15 students (mean age = 22.25 ± 1.39 years; 11 females, four males) performed the visual task and the finger-tapping task (visual + concurrent task); finally, 15 students (mean age = 24.06 ± 3.02 years; eight females, seven males) performed the auditory task and the finger-tapping task (auditory + concurrent task). The mean age and sex ratios did not differ significantly across groups (all ps > .05). All participants reported normal hearing and normal or corrected-to-normal vision. None reported motor impairments. They were informed of the general aim of the study and signed a consent form before taking part in the experiment. All experimental procedures were approved by the local ethics committee and were conducted according to the principles expressed in the Declaration of Helsinki.

Apparatus

The experiment was programmed in MATLAB (The Mathworks) using the Psychophysics Toolbox (Brainard, 1997). The software was run on a Pentium IV computer connected to an NEC Multisync FP950 monitor (100-Hz refresh rate). The experiment was conducted in a dark and silent (below 35 dB SPL to the listener’s ear) room.

Stimuli and procedure

Participants were required to compare the durations of successively presented pairs of stimuli. Each trial included a pair of stimuli; one had a standard duration (500 or 1,500 ms), whereas the duration of the other was variable (comparison duration) (see Fig. 1). The two stimuli were separated by a fixed 1,000-ms blank (or silent) interval (interstimulus interval, ISI). The order of presentation of the standard and comparison stimuli was randomized across the trials (roving standard method). At the beginning of each trial, the first stimulus was preceded by a 250-ms-long fixation cross. The blank interval following the fixation cross and preceding the first stimulus of the trial had a random duration of either 400 or 800 ms.

Fig. 1
figure 1

Time discrimination task used in the visual and auditory modalities. The figure describes the single-task condition. In the concurrent-task condition, the finger-tapping task was included from the presentation of the fixation cross until the offset of the comparison stimulus

In each trial, the participants were asked to judge whether the second stimulus was shorter or longer than the first one, in a two-alternative forced choice task. Two response keys, positioned in the center of the keyboard, were used (the “b” and “n” keys), which were marked with a “B” (which in Italian stands for “breve,” meaning “short”) or an “L” (which in Italian stands for “lungo,” meaning “long”), and were counterbalanced across the participants. Participants were instructed to press the keys with the index and middle fingers of their dominant hand. They were asked always not to count during the trials (Rattat & Droit-Volet, 2012). Fifteen practice trials were presented at the beginning of each session to familiarize the participants with the task. No feedback was provided after the practice or after the test trials.

In each block of trials, the duration of the comparison stimulus varied, in accordance with a classic adaptive psychophysical procedure (Grassi & Soranzo, 2009; Soranzo & Grassi, 2014). In the maximum likelihood procedure, the experimenter sets several psychometric functions called hypotheses. After each trial, the likelihood of each hypothesis is calculated after the participant’s response, and the most likely hypothesis is selected. The most likely hypothesis is assumed to contain the participant’s WF. The comparison duration could be longer or shorter than the standard duration by a certain amount t. In the next trial, the comparison stimulus duration was the stimulus duration corresponding to 85 % correct responses of the most likely hypothesis (see Grassi & Soranzo, 2009, for a detailed description and formulas).

Each task was conducted in four sessions, two for each standard duration, presented in a counterbalanced order. Each session comprised three blocks of 25 trials each. In half of the sessions, the discrimination WF was estimated for the 500-ms-duration stimulus; in the remaining two sessions, the discrimination WF was estimated for the 1,500-ms-duration stimulus.

Time discrimination task: Single-task condition

In the visual modality, the stimulus marking time consisted of a grayscale image (5.64° × 2.57°) of the planet Saturn, which was presented centrally on the screen on a black background. In the auditory modality, the stimulus presented for marking time was a bandpass noise, with frequencies ranging from 500 to 5000 Hz. The noise was synthesized at a sample rate of 44.1 kHz and a 16-bit resolution, and was generated by an M-AUDIO Fast Track Pro soundcard. The output of the soundcard was passed binaurally through a pair of circumaural, closed-back, sound-isolating Sennheiser HD 280 pro headphones, at a level of 65 dB SPL. The starts and ends of all noises presented in the experiment were amplitude-modulated with 5-ms raised cosine ramps. In both tasks, participants were asked to always fixate the computer screen and to avoid counting.

Time discrimination task: Concurrent-task condition

Participants performed the time discrimination tasks in both modalities with a concurrent motor task. The concurrent motor task required them to tap alternately on the keyboard the keys “A” and “L” with their index fingers during the whole trial duration—that is, from the onset of the fixation cross until the offset of the second stimulus (Fig. 1). The random blank interval following the fixation cross and preceding the first stimulus of the trial (i.e., from 400 to 800 ms) added extra variability to the trial duration and further prevented the participants from using the rhythm of their tapping to estimate the duration of the stimuli. To avoid interference from the sound produced by the tapping, the participants wore headphones during the visual task. As in the simple condition, participants were asked to always fixate the computer screen and avoid counting.

Data analysis

The participant’s WF was expressed as a proportion, by calculating the absolute difference between the comparison and the standard duration, divided by the standard duration. We used a mixed-model analysis of variance (ANOVA): 2 Standard Duration (500, 1,500 ms) × 2 Modality (visual, auditory) × 2 Type of Task (single, concurrent task), with the first as a within-subjects factor and the last two as between-subjects factors. To examine the performance in the finger-tapping task, the mean number of taps and the mean intertap interval were measured. The mean intertap intervals were compared across standard durations and modalities in order to test the effects of these factors. To this end, a 2 Standard Duration (500, 1,500 ms) × 2 Modality (visual, auditory) ANOVA was run. The alpha level of significance was fixed at .05. Before we ran the ANOVA, the data were checked for normality and sphericity. The Greenhouse–Geisser sphericity correction was applied when needed. The effect sizes of the ANOVA results were quantified by means of partial eta-squared values (η p 2). In the post-hoc analysis, the alpha level was adjusted with a Bonferroni correction for multiple comparisons.

Results

The WFs in each experimental condition are reported in Fig. 2. The analyses yielded a main effect of standard duration [F(1, 58) = 80.88, p < .001, η p 2 = .582], a main effect of modality [F(1, 58) = 34.58, p < .001, η p 2 = .374], and a main effect of type of task [F(1, 58) = 22.96, p < .001, η p 2 = .284]. The results indicated that the WF was higher with the 500-ms duration than with the 1,500-ms duration (.32 ± .13 vs. .22 ± .08, respectively); also, we found the WF to be higher when the discrimination task was performed in the visual rather than the auditory modality (.32 ± .12 vs. .22 ± .09, respectively); finally, the WF was higher in the concurrent-task condition than in the single-task condition (.23 ± .09 vs. .32 ± .13, respectively).

Fig. 2
figure 2

Mean Weber fractions as a function of standard duration in each experimental condition. Error bars represent standard errors

Importantly, a significant Modality × Type of Task interaction emerged [F(1, 58) = 6.51, p = .013, η p 2 = .101]. Post-hoc analyses revealed that the WF was higher in the concurrent-task condition than in the single-task condition only in the visual modality (p < .001), but not in the auditory one (p = .119). The WF was higher in the visual than in the auditory modality in both the single-task (p = .020) and the concurrent-task (p < .001) conditions.

Furthermore, the Standard Duration × Modality interaction [F(1, 58) = 13.21, p = .001, η p 2 = .186] was significant. Post-hoc analyses showed that all pairwise comparisons were significant (p < .001). However, the effect of modality was larger for 500 ms than for 1,500 ms (Cohen’s ds = 1.30 and 0.857, respectively).

Table 1 reports performance at the finger-tapping task in the visual and auditory modalities at 500 and 1,500 ms. The analysis of finger-tapping performance revealed a significant main effect of the standard duration [F(1, 28) = 9.85, p = .004, η p 2 = .260], which revealed that the intertap interval was longer (i.e., the finger tapping was slower) in the trials containing the 1,500-ms standard duration than in those containing the 500-ms standard duration. Neither the effect of modality [F(1, 28) = 1.84, p = .185, η p 2 = .062] nor the Standard Duration × Modality interaction [F(1, 28) = .008, p = .928, η p 2 < .001] significantly affected the intertap interval (i.e., finger-tapping speed).

Table 1 Mean (with SD) numbers of taps produced during trials, mean intertap intervals, and standard deviations of the intertap intervals for each modality (visual, auditory) and standard duration (500 ms, 1,500 ms)

Discussion

In the present study, we investigated the interference effect of a simple motor task on the temporal discrimination of brief (500-ms) and long (1,500-ms) intervals, in the visual and auditory modalities. Toward this aim, participants were asked to discriminate either visual or auditory durations, or to perform the time discrimination task simultaneously with a motor task. The temporal discrimination thresholds were measured and compared between the single-task (visual or auditory) and the dual-task (visual + concurrent and auditory + concurrent) conditions. Furthermore, differences between the sensory modalities and standard durations were examined.

The study yielded a main result: The threshold significantly increased with the addition of the finger-tapping task, but only when the time discrimination task was performed in the visual modality. This finding suggests that a simple motor task generates an interference effect on interval timing in the visual but not in the auditory modality. Such interference likely has an attentional source; in other words, the finger-tapping task acted as a secondary, nontemporal task by reducing some of the attentional resources that would otherwise have been allocated to measuring time (Brown, 1997, 2008; Macar, Grondin, & Casini, 1994). When less attention is allocated to the temporal task, temporal discrimination abilities are disrupted, which leads to higher thresholds.

Although the study was not designed to test the SET model, we can infer that the visual temporal discrimination process was affected by the concurrent motor task, likely by interference at the switch level. The fact that the addition of the motor task affected discrimination in the visual but not in the auditory modality might suggest that the opening and closing of the switch mechanism is more efficient in the auditory than in the visual modality (Penney et al., 2000). According to the parallel-timing model developed by Rousseau and Rousseau (1996), modality-specific switch–accumulator systems receive input from a common pacemaker and feed into common memory and decision mechanisms (see also van Rijn & Taatgen, 2008). Therefore, attention likely modulates time discrimination abilities by acting on two different switches (Wearden & Lejeune, 2008). These claims need to be tested in future research.

Alternatively, the findings might reflect the fact that the auditory temporal discrimination is a more automatic process than the visual one. Indeed, participants could perform the auditory task simultaneously with the self-paced finger tapping without costs to discrimination task performance (Moors & De Houwer, 2006; Shiffrin & Schneider, 1977). Furthermore, it should be considered that auditory stimuli capture attention more automatically than visual ones (Chen, Huang, Luo, Peng, & Liu, 2010; Posner, 1978), and that the auditory is superior to the visual system in detecting signals changing rapidly in time (Grahn, Henry, & McAuley, 2011; Grondin, 2001; Grondin et al., 1998; Rammsayer, 2014). Speech and music perception are good examples, showing that the human auditory system needs to be highly efficient for temporal processing. The present results are also consistent with the results of a recent study showing that the auditory modality is less affected by a sustained attention decline (Berry et al., 2014).

A further result is that the finger-tapping task produced increments of the threshold by the same amount at both standard durations in the visual modality (see Fig. 2). In other words, the interference effect was additive. This evidence suggests that the same attentional switch was involved in processing the 500-ms and 1,500-ms durations (van Rijn & Taatgen, 2008). Interestingly, we measured the mean finger tapping produced during the time discrimination task in both the visual and auditory modalities. The results showed an effect of duration, indicating that participants tapped (in the mean) more slowly when the standard was 1,500 ms than when it was 500 ms, but no effect of modality was found. This result indicates that the differences in threshold between the two modalities cannot be explained in terms of differences in tapping performance. Moreover, as we pointed out, the concurrent finger-tapping task was also introduced to discourage the use of simultaneous counting. The results showed that the repetitive motor responses performed during our motor-tapping task constitute a good strategy for avoiding chronometric counting processes. In fact, the threshold was higher in the concurrent-task condition. Moreover, a significant standard duration by modality interaction emerged, which showed that the differences in temporal discrimination thresholds between the visual and auditory modalities were also affected by the standard duration. Specifically, a higher temporal discrimination threshold for visual than for auditory stimuli was present for shorter durations (in the 500-ms range). This result suggests that the modality effect on temporal discrimination varies as a function of temporal duration, which is in line with a recent study by Rammsayer (2014), who worked with base durations ranging from 100 through 1,000 ms in both the visual and auditory modalities.

Along the same lines, we found that the standard duration significantly affected the threshold, regardless of sensory modality and the presence of the motor task; namely, the discrimination threshold was higher overall at 500 ms than at 1,500 ms. This finding is consistent with the generalized form of Weber’s law (Grondin, 1993, 2001), but not with the recent report from Grondin (2012). The generalized form of Weber’s law applied to time acknowledges that the relative importance of nontemporal sources of variance over the variance issuing from the temporal process is greater when the duration is brief. Note, however, that we cannot totally exclude that the lower threshold observed for longer durations could originate from the use of strategies, such as counting, at least in the no-tapping condition. Indeed, counting is an effective strategy in timing; that is, it reduces the Weber ratio for durations above 1,200 ms (Grondin et al., 1999). However, participants were explicitly asked not to count (Rattat & Droit-Volet, 2012). Furthermore, the addition of the concurrent finger-tapping task should have discouraged the use of simultaneous counting or other strategies.

We can conclude that auditory and visual temporal mechanisms rely partially on common timekeeping mechanisms, and that the auditory temporal discrimination process is more automatic than the visual one. The present findings also highlight the need for taking into consideration in further studies the differential effects across sensory modalities of attention modulation on temporal sensitivity. Our results are also in line with some neuroimaging evidence showing that different neural circuits are involved in processing sub- and suprasecond durations (Lewis & Miall, 2003; Mioni, Stablum, & Grondin, 2014; Tarantino et al., 2010).