The present study investigated time perception in a very basic situation closely related to human auditory communication. Our research question was whether—and if so, how—the perception of a time interval demarcated by two successive events was influenced by the durations of these events. This issue is directly related to rhythm perception. Because the perception of rhythm is basically determined by neighboring intervals demarcated by the onsets of successive sounds (e.g., Handel, 1993; Large, 2008; McAdams & Drake, 2002; Patel, 2008), we were particularly interested in the perception of interonset time intervals. Perception of time intervals delimited by successive sounds is vital to speech perception; comparison of syllable durations, for example, is important in English (Handel, 1989; Spencer, 1996), as well as in Japanese (Amano & Hirata, 2010; Greenberg & Arai, 2004).

Previous research tested how the durations of events (i.e., sound markers) influenced the perception of time intervals in between events, either by utilizing single time intervals marked by two events (Divenyi & Sachs, 1978; Grondin, Ivry, Franz, Perreault, & Metthe, 1996; Grondin, Roussel, Gamache, Roy, & Ouellet, 2005; Penner, 1976; Rammsayer & Leutner, 1996; Woodrow, 1928) or by utilizing multiple time intervals marked by three or more events (Handel, 1993; Hasuo, Nakajima, & Hirose, 2011; Repp & Marcus, 2010; Schubert & Fabian, 2001; Yamashita & Nakajima, 1999). The results of these studies showed that marker duration could influence the perception of time intervals. However, we could not relate the results directly to rhythm perception and human auditory communication, for the following reasons. Most of the studies utilizing single time intervals had focused mainly on discrimination paradigms, and not on the subjective duration itself (Divenyi & Sachs, 1978; Grondin et al., 2005; Penner, 1976; Rammsayer & Leutner, 1996). This was in contrast with the fact that the studies utilizing multiple time intervals had often taken up subjective duration directly, with a clear interest in rhythm in music (e.g., Hasuo et al., 2011; Repp & Marcus, 2010; Schubert & Fabian, 2001; Yamashita & Nakajima, 1999). Even in the latter case, however, it was often difficult to determine the functional relationship between marker durations and the subjective duration of each individual time interval. Although Woodrow (1928) was interested in subjective duration per se in a single-time-interval paradigm, he utilized only offset–onset intervals as “empty” time intervals; they were different from the onset–onset intervals that are essential to auditory communication (e.g., Handel, 1993; Large, 2008; McAdams & Drake, 2002; Patel, 2008). We have to be careful when defining “empty durations,” because differences between the perception of an onset and the perception of an offset in terms of timing have been demonstrated repeatedly in various contexts (Efron, 1970; Fastl & Zwicker, 2007; Grondin, 1993, 2003; Grondin, Meilleur-Wells, Ouellette, & Macar, 1998; Kato, Tsuzaki, & Sagisaka, 2003). Another problem was that the 500-ms time intervals employed by Woodrow (1928) seem too long to be related directly to speech perception (Patel, 2008). Since the findings on time intervals shorter than 400–500 ms were not always consistent with the findings on longer time intervals (e.g., Grondin, 1993; Nakajima et al., 2004), it was necessary anyway to investigate the effects of marker durations utilizing shorter time intervals. Thus, our study was designed to fill these gaps between previous time perception studies that had been developed in a rigorous tradition of psychophysics and rhythm perception studies oriented more closely toward speech and music.

By time interval in the present study, we refer to the temporal distance between the onsets of two successive sound markers—that is, an interonset interval (IOI). We varied the durations and temporal shapes of two successive sound markers systematically and conducted a series of experiments in which the subjective duration of onset–onset time intervals was measured. Because the effects of marker duration might differ between the first (beginning) and the second (end) markers (e.g., Schubert & Fabian, 2001), we changed the duration of each marker independently. By utilizing simple stimuli—that is, single time intervals marked by two sounds of various durations—we obtained data that can be used to clarify the mechanisms underlying time perception. To make it easier to relate the present findings to basic mechanisms of rhythm perception in speech and music, we focused on short time intervals (120–360 ms). IOIs up to 360 ms appear frequently in music (Fraisse, 1982) and in speech (Greenberg & Arai, 2004; Kato et al., 2003; Warren, 2008), but no other study has investigated the effects of sound marker durations systematically, utilizing such short onset–onset time intervals.

We also varied the temporal energy distribution of the sound markers. Such variations must be common to sounds marking time intervals in speech or music (e.g., Gordon, 1987; Marcus, 1981). This is probably the first time perception study to investigate the effects of the temporal energy distribution of the sound markers.

In Experiments 1 and 2, the effects of marker durations were tested directly. In Experiments 3 and 4, we examined whether the effects of marker durations observed in Experiments 1 and 2 were really caused by the change in duration itself; in Experiment 3, the effects of amplitude decrease accompanying the lengthening of marker duration in Experiments 1 and 2 were tested; in Experiment 4, the effects of sound energy distribution in time, which could be related closely to the duration change, were tested. We will discuss these results in terms of (1) marker detection timing, (2) acceleration of an internal pacemaker by continuous sound, and (3) prolongation of the additional processing following the detection of the second marker.

Experiment 1

We varied the durations of the two markers demarcating a time interval with their onsets and measured the points of subjective equality (PSEs) of the interval.

Method

Participants

Six male students at the Kyushu Institute of Design,Footnote 1 with normal hearing, ranging in age from 19 to 23 years, participated. All 6 participants had received basic training in music and training in technical listening for acoustic engineers (Iwamiya, Nakajima, Ueda, Kawahara, & Takada, 2003).

Stimuli and apparatus

Each presentation consisted of four sound markers. The first and the second marker constituted a standard time interval, and the third and the fourth a comparison interval. All markers were pure tone bursts of 1000 Hz. The duration of the markers was 20, 60, 100, 180, 260, or 340 ms, including a rise and a fall time in a raised cosine shape of 3 ms each. We employed two standard durations (IOIs) of 120 and 360 ms. When the standard duration was 120 ms, the duration of the standard-interval markers was 20, 60, or 100 ms. When the standard duration was 360 ms, the marker duration was 20, 60, 100, 180, 260, or 340 ms. All possible first/second marker combinations of the three (for standard durations of 120 ms) and six (for standard durations of 360 ms) marker durations were utilized. In total, there were 45 standard intervals.

The comparison interval markers were always 20 ms. Each presentation started with a silent period of 1.9–2.1 s. In between the standard and comparison was a silent period of 2.4–3.1 s. Both silent periods were randomized to prevent the participants from knowing the beginning of the standard and the comparison interval.

The stimulus patterns were generated digitally (16 bits; a sampling frequency of 44100 Hz), were controlled by a computer (IBM Thinkpad 560X) with an audio card (TDK DMC9000), and were presented via a digital-to-analog converter (Sony DTC500-ES) and headphones (Sony MDR CD-770) to the left ear of the participant in a soundproof room. Monaural presentation was chosen, instead of diotic presentation as in Experiments 24, for technical reasons: This experiment was conducted about 10 years before Experiments 24, and it was technically difficult at the time to generate exactly the same signal from both sides of headphones. The levels of the sounds were measured with a sound level meter (Brüel & Kjær 2209) and an artificial ear (Brüel & Kjær 4153).

Procedure

A method of adjustment was used. The stimulus pattern was presented each time the participant pressed the space bar on the computer keyboard. The participant was instructed to adjust the duration of the comparison time interval until it was perceived as equal to the standard time interval. The duration of the comparison could be changed by pressing the adjustment keys on the computer keyboard (keys of “A,” “D,” “J,” and “L”). There were two types of keys, one for rough adjustments (“D” and “J”) and the other for fine adjustments (“A” and “L”). The longer the participant held the keys down, the larger the amount of change in the comparison interval became, and the minimum step of the adjustment was 1 ms. Printed instructions were shown and read to each participant, and the participant was explicitly informed that the interval to be adjusted was that between the two onsets, and not that between the offset of the first marker and the onset of the second marker. The participant could listen to the stimulus pattern and make the adjustments as many times as he/she wanted. When satisfied, the participant pressed the “enter” key to finish the trial. The final physical duration of the comparison interval on each trial was recorded as the adjusted value.

For each of the 45 standard intervals, there were an ascending and a descending adjustment series, and the PSE for each participant was calculated by averaging the adjusted values obtained from both.

The number of trials in a session was 90 (45 standards × 2 series). These trials were divided into eight blocks. Each block consisted of 13 or 14 trials (2 warm-up trials + 11 or 12 experimental trials in random order). Three sessions were carried out for each participant. The first session was considered as training, and only the results of the second and the third sessions were analyzed.

Before these sessions, a preliminary experiment was conducted to determine the presentation levels of the longer markers. The participant was instructed to listen to the sound markers of 20, 60, 100, 180, 260, and 340 ms in isolation, by clicking a starting button on the computer screen, and to adjust the levels of the markers longer than 20 ms to the loudness of a 20-ms reference marker. The 20-ms reference marker was always presented at a sound pressure level of 88 dB, measured as the level of a continuous tone of the same amplitude, for all participants. Marker durations were not indicated on the screen. The silence between the start button click and the beginning of the marker whose loudness was to be judged was 0.4 s. The participant performed five trials, and the presentation levels of the markers longer than 20 ms were determined for each participant by taking the median of the last four adjusted values. The markers in the main part of the experiment were presented at sound pressure levels of 88 dB for the 20-ms marker, 84.3–86.4 dB for the 60-ms marker, 83.6–86.2 dB for the 100-ms marker, 82.3–85.9 dB for the 180-ms marker, 82.0–85.8 dB for the 260-ms marker, and 81.1–85.8 dB for the 340-ms marker.

Participants could take breaks at any time if they were tired. The preliminary experiment (loudness adjustment) took about 43 min (SD = 22), on average, and the main experiment took about 27 min (SD = 10) per block, on average. Each participant completed the whole experiment over a period of 6 days, on average (7 days at maximum and 4 days at minimum).

Results and discussion

For each of the 45 standard time intervals, PSEs were obtained from 6 participants, and the mean PSE was calculated for each standard. Table 1 shows the mean PSEs and the standard errors of the means (SEMs).

Table 1 Mean points of subjective equality (PSEs) and standard errors of the means (SEMs) obtained in Experiment 1

A two-way (first-marker duration × second-marker duration) ANOVA was performed for each standard interval. For the 120-ms IOI, significant main effects of first-marker duration and of second-marker duration appeared, F(2, 10) = 5.252, p < .05, η 2p = .51; F(2, 10) = 67.217, p < .001, η 2p = .93, respectively. For the 360-ms IOI, the main effect of first-marker duration bordered on significance, F(5, 25) = 2.581, p = .051, η 2p = .34, whereas the main effect of second-marker duration was significant, F(5, 25) = 6.243, p < .01, η 2p = .56. The interaction between first-marker duration and second-marker duration was not significant for the 120-ms IOI, F(4, 20) = 2.65, p = .063, η 2p = .35, but was significant for the 360-ms IOI, F(25, 125) = 2.430, p < .05, η 2p = .33. To look more closely into this significant interaction, we conducted a post hoc Tukey's test (at the .05 level of significance). As regards the effects of first-marker duration, when the second marker was 20 ms, 20 ms differed significantly from 180 and 340 ms, and 60 ms from 180, 260, and 340 ms. As for the effects of second-marker duration, when the first marker was 20 ms, 20 ms differed significantly from the other durations; when the first marker was 60 ms, 20 ms differed significantly from 100, 180, 260, and 360 ms. No other effects were significant.

The perceived duration of interonset time intervals was influenced by the durations of the first and the second markers.With regard to the effects of the first marker, PSEs tended to decrease as the first marker lengthened when the standard duration was 120 ms. No such effect appeared when the standard duration was 360 ms, however, and the PSEs even tended to increase as the first marker lengthened from 60 to 180 ms, if the second marker was 20 ms. As regards the effects of the second marker, PSEs always tended to increase as the second marker lengthened. This tendency was observed in both standard durations and was particularly clear when the second marker was within the range of 20–100 ms.

The effects of first-marker duration differed between IOIs, but the effects of the second marker were similar for both IOIs; lengthening the second marker caused the marked interval to be perceived as longer, especially in the range of 20–100 ms (Table 1). In the next experiment, we further tested the effects of marker durations within this range on the perceived time interval.

Experiment 2

We varied the duration of the markers in six steps within the range of 20–100 ms and measured the PSEs of the interval demarcated by their onsets.

Method

Participants

Nine listeners (6 females and 3 males) with normal hearing participated. They were 2 researchers and 7 students at Kyushu University, 22–30 years of age. None of the participants had participated in Experiment 1. Six had received basic training in music and training in technical listening for acoustic engineers (Iwamiya et al., 2003), 1 had played the piano informally for 14 years, and the other 2 had no experience of music lessons or technical listening training. Two of the participants had taken part in a pilot experiment.

Stimuli and apparatus

The duration of the markers was 20, 30, 40, 60, 80, or 100 ms, including a rise and a fall time in a raised cosine shape of 10 ms. Instead of having the participants adjust the presentation levels of the longer markers to have approximately equal loudness in isolation, we made the total energy of each marker constant (Scharf, 1978). This means that the amplitude of the markers decreased as the markers lengthened and that the markers were physically the same for all participants. The sound pressure level of the 20-ms markers was 83 dB, measured as the level of a continuous tone of the same amplitude. The IOI of the standard was 120, 240, or 360 ms. All possible combinations of the six marker durations were used for the standard. Thus, the total number of the standard stimulus patterns was 108 (6 first marker durations × 6 second marker durations × 3 standard IOIs).

The silent period before the standard interval was 2.0–2.5 s, and the silent period between the standard and the comparison was 3.0–3.5 s. The silent periods were randomized in a range of 0.5 s in order to prevent the participants from anticipating the beginning of the standard interval and the comparison interval. All the other characteristics of the stimuli were the same as in Experiment 1.

The stimulus patterns were generated digitally (16 bits; a sampling frequency of 44100 Hz), were controlled by a computer (Frontier KZFM71/N) with an audio card (E-MU 0404), and were presented diotically to the participant via a digital-to-analog converter (Fostex VC-8), an active low-pass filter (NF DV8FL, 15 kHz), an amplifier (Stax SRM-313), and headphones (Stax SR-303). The levels of the sounds were measured with a sound level meter (Node 2072 or 2075) and an artificial ear (Brüel & Kjær 4153).

Procedure

The procedure was the same as that in Experiment 1, except for the adjustment device and the total number of trials (including the way to divide these trials).

A presentation button, a sliding bar, and adjustment buttons were presented on a computer screen. The sliding bar was employed mainly for rough adjustments. The adjustment buttons were mainly for fine adjustments, and the longer the participants held the mouse-button down, the larger the amount of change in the comparison interval became. The minimum step of the adjustment was 1 ms. Participants could click the presentation button and listen to the stimulus pattern as many times as they wanted.

The total number of trials was 216 (108 standards × 2 series). These trials were randomized and divided into 12 blocks. Each block consisted of 20 trials (2 warm-up trials + 18 experimental trials). Twenty-seven practice trials were carried out at the beginning of the experiment. Each experimental block (20 trials) took about 24 min (SD = 10), on average. Each participant completed the whole experiment over a period of 5 days, on average (7 days at maximum and 4 days at minimum).

Results and discussion

For each of the 108 standard time intervals, PSEs were obtained from 9 participants, and the mean PSE was calculated for each standard. Table 2 shows the mean PSEs and the SEMs.

Table 2 Mean points of subjective equality (PSEs) and SEMs obtained in Experiment 2

A three-way (standard IOI × first-marker duration × second-marker duration) ANOVA was performed, using the values of constant errors (CE = PSE ˗ standard) as the dependent variable. Using CEs, instead of PSEs, did not change the results concerning the effects of marker duration, while it allowed us to compare the magnitude of deviation from the standard in different conditions. The main effect of standard IOI bordered on significance, F(2, 16) = 3.726, p = .047, η 2p = .32 (it was not significant in the Greenhouse–Geisser and the Huynh–Feldt results, p = .082 and p = .078, respectively, which were consulted because sphericity could not be assumed for the main effect of standard IOI). Significant main effects were obtained for both the first- and the second-marker durations, F(5, 40) = 4.511, p < .05, η 2p = .36; F(5, 40) = 6.169, p < .01, η 2p = .44, respectively. The interaction between first-marker duration and standard IOI was significant, F(10, 80) = 2.025, p < .05, η 2p = .20, but the interaction between second-marker duration and IOI was not, F(10, 80) = 0.698, p = .723, η 2p = .08. We conducted a post hoc Tukey's test (at the .05 level of significance), to look more closely into the significant interaction between first-marker duration and standard IOI. As regards the effects of first-marker duration, when the standard IOI was 240 ms, 20 ms differed significantly from 80 and 100 ms and 40 ms from 100 ms; when the standard IOI was 360 ms, 20 ms differed significantly from 100 ms. The effects of first-marker duration were not significant when the standard IOI was 120 ms. The interaction analyses thus indicated that the effect of second-marker duration was similar for all three IOIs but that the effect of first-marker duration appeared only when the IOI was 240 or 360 ms. No other significant interaction effects were found.

The PSEs tended to increase as the first marker lengthened, except when the standard interval was 120 ms. The PSEs increased more stably as the second marker lengthened, and this tendency was observed in all standard IOI conditions.

One remarkable result, although not exactly within our main focus, was that the PSEs were systematically longer than the standard intervals. This was the case even when both markers were 20 ms—that is, equal to those of the comparison interval. This bias probably reflects a time order error. It was positive for most conditions, as one might expect for very short time intervals (e.g., Eisler, Eisler, & Hellström, 2008; Woodrow, 1951).

The results of Experiments 1 and 2 showed that both the first marker duration and the second marker duration influence the perception of a time interval marked by these markers, but in different ways. Lengthening the first marker up to 180 ms tended to cause the interval to be perceived longer, but only when IOI ≥ 240 ms. Lengthening the second marker up to 100 ms caused the interval to be perceived longer in a stable manner for all IOIs, and this effect of the second marker was in line with the findings for longer offset–onset time intervals (Grondin et al., 1996; Woodrow, 1928).

Experiment 3

In the following two experiments, we checked the possibility that the distortion of time perception observed in Experiments 1 and 2 might have been caused by physical factors other than the marker durations themselves. Experiment 3 was designed to check the potential effects of amplitude difference, and Experiment 4 to check the potential effects of sound energy distribution in time.

The changes in marker duration in Experiments 1 and 2 were always accompanied by changes in amplitude; that is, the amplitude decreased as the duration increased. Previous research had shown that the perceptual onset of a tone had been affected by its rise time and maximum level (Gordon, 1987). In the present case, the perceived temporal positions of perceptual onsets might have been affected by the difference in amplitude accompanying the duration changes, resulting in different onset–onset intervals perceptually. It is true that amplitude difference alone cannot explain some parts of the present results; for example, in some cases, the PSEs increased as the markers lengthened even when the first and the second markers had the same duration (see Table 2). In these conditions, the amplitudes of the first and the second markers were always equal, which means that any effect of marker amplitude on the perceptual onset of the first marker should have been canceled by the same effect appearing for the second marker. Thus, the increase in the PSEs for these conditions cannot be explained in a simple manner by amplitude difference. Nevertheless, in Experiment 3, we looked into the potential effects of the amplitude differences that could have accompanied the duration change in our first two experiments.

We examined the influences of marker durations under two experimental conditions: a constant-energy condition with marker amplitude decreasing as duration increased, as in Experiment 2, and a constant-amplitude condition with equal marker amplitudes for all marker durations. If the amplitude difference had been the main reason of the difference in perceived duration of time intervals, the results obtained in Experiment 2 in the constant-energy condition could not be expected to appear to the same degree in the constant-amplitude condition. If the marker durations were really the key factor, however, the same effects should be observed regardless of the marker amplitudes.

Method

Participants

Twelve listeners (7 females and 5 males) with normal hearing participated. They were 1 researcher and 11 students at Kyushu University, and their ages ranged from 21 to 46 years. All of them had received basic training in music, and 9 had received training in technical listening for acoustic engineers (Iwamiya et al., 2003). Three had participated in Experiment 2. The two experiments were separated by a period of at least 5 months.

Stimuli and apparatus

The stimulus patterns were divided into two main conditions: the constant-energy condition and the constant-amplitude condition. The durations of the markers of the standard interval were 20, 40, 60, and 80 ms. In the constant-energy condition, the total energy of each marker was made equal, as in Experiment 2. In the constant-amplitude condition, the amplitude of each marker was made equal to the amplitude of the 20-ms marker in the constant-energy condition. Thus, the 20-ms markers were identical in both conditions. Their sound pressure level was 82 dB, measured as the level of a continuous tone of the same amplitude. The other aspects of the stimuli were the same as in Experiment 2.

For stimulus presentation, the cutoff frequency of the active low-pass filter (NF DV8FL) was 16 kHz in this experiment, and the amplifier was a STAX SRM-323A. The other aspects of the stimuli and apparatus were the same as in Experiment 2.

Summarizing, there were 48 standards (4 first-marker durations × 4 second-marker durations × 3 standard IOIs) both for the constant-energy condition and for the constant-amplitude condition.

Procedure

The procedure was the same as that in Experiment 2, except for the total number of trials and the way to divide these trials. The total number of trials was 192: 96 in each of the constant-energy condition and the constant-amplitude condition (48 standards × 2 series), divided into blocks of 18 trials (2 warm-up trials and 16 experimental). Half the participants first performed the constant-energy blocks, followed by the constant-amplitude blocks. The order was reversed for the other participants. Before beginning each series of blocks (for each condition—i.e., constant-energy condition and constant-amplitude condition), 12 practice trials were carried out in a separate block.

Each experimental block (18 trials) took about 22 min (SD = 8), on average. Each participant completed the whole experiment over a period of 4 days, on average (6 days at maximum and 3 days at minimum).

Results and discussion

The data from 1 male participant were excluded from analyses because two of his PSEs were too different from the corresponding PSEs obtained from the other participants.Footnote 2 Thus, for each of the 96 standard time intervals, PSEs were obtained from 11 participants, and the mean PSE was calculated for each standard. Table 3 shows the mean PSEs and the SEMs for the constant-energy condition, and Table 4 the mean PSEs and the SEMs for the constant-amplitude condition.

Table 3 Mean points of subjective equality (PSEs) and SEMs obtained in the constant-energy condition in Experiment 3
Table 4 Mean points of subjective equality (PSEs) and SEMs obtained in the constant-amplitude condition in Experiment 3

A four-way (amplitude condition × standard IOI × first-marker duration × second-marker duration) ANOVA was performed using the CE values. The main effect of amplitude condition was not significant, F(1, 10) = 3.224, p = .103, η 2p = .24, nor was the main effect of first-marker duration, F(3, 30) = 0.418, p = .741, η 2p = .04. The main effects of standard IOI and second-marker duration, however, were significant, F(2, 20) = 7.169, p < .01, η 2p = .42; F(3, 30) = 34.511, p < .001, η 2p = .78, respectively. The interactions between standard IOI and first-marker duration, between amplitude condition and second-marker duration, between first-marker duration and second-marker duration, and between amplitude condition, standard IOI, and second-marker duration were significant, F(6, 60) = 2.794, p < .05, η 2p = .22; F(3, 30) = 5.015, p < .01, η 2p = .33; F(9, 90) = 3.295, p < .01, η 2p = .25; F(6, 60) = 2.697, p < .05, η 2p = .21, respectively.

No systematic effect of the first marker appeared, but the PSEs increased as the second marker lengthened. This tendency was observed for all standard IOIs and was basically the same in the constant-energy condition and the constant-amplitude condition. In fact, the amount of the PSE change was larger in the constant-amplitude condition (the new condition).

Thus, the marker durations affected the perception of the time intervals in the same way regardless of the amplitude change. If the difference in marker amplitude had been the main factor causing the difference in the perception of time intervals in Experiment 2 (constant-energy condition), keeping the amplitude of all markers constant would have kept the PSEs from changing systematically: In the present experiment, no comparable increase of PSEs accompanying the lengthening of the markers should have been observed in the constant-amplitude condition. This was not the case. On the contrary, the increase in the PSEs seemed to be slightly but systematically larger in the constant-amplitude condition than in the constant-energy condition.

Experiment 4

In Experiment 4, we checked the effects of the temporal sound energy distribution within each marker on the perceived duration of a time interval. Changes in marker duration necessarily cause changes in sound energy distribution in time; that is, the temporal distribution of sound energy as a whole shifts away from the onset as the marker lengthens. We examined whether fixing the marker duration and changing only the temporal distribution of sound energy would influence the perceived duration of a time interval.

We changed the sound energy distribution within markers by varying the rise and fall times and measured the PSEs of time intervals of 120, 240, and 360 ms.

Method

Participants

Twelve listeners (3 females and 9 males) with normal hearing participated. They were 1 researcher and 11 students at Kyushu University, 22–30 years of age. Seven had received basic training in music and training in technical listening for acoustic engineers (Iwamiya et al., 2003). Of the remaining 5, 3 had played the piano informally for 8–14 years, 1 was an amateur musician who had been playing percussion for 8 years, and 1 had no experience of music lessons or technical listening training. Two had participated in Experiment 2, and 5 had participated in Experiment 3. Experiment 2 and Experiment 4 were separated by a period of at least 6 months, and Experiment 3 and Experiment 4 by at least 1 week. (Experiment 3 and Experiment 4 were conducted with some overlap in the experimental periods. Two participants took part in Experiment 4 before Experiment 3, and the other 3 participated in Experiment 3 before Experiment 4.)

Stimuli and apparatus

All markers were pure tone bursts of 1000 Hz. The duration of the markers was 20, 60, or 100 ms, including a rise and a fall time. The rise time and the fall time of the 20-ms markers were both 10 ms. The rise and fall times of the 60- and 100-ms markers were varied in three steps so that the amplitude peak in time (the densest point of sound energy) was changed systematically. For the 60-ms markers, the rise and fall times were 15 and 45 ms (the amplitude peak was in the “beginning”), 30 and 30 ms (“middle”), or 45 and 15 ms (“end”); those for the 100-ms markers were 25 and 75 ms (“beginning”), 50 and 50 ms (“middle”), or 75 and 25 ms (“end”). To make sure that all markers were audible throughout their physical duration, a steady-state tone in the same phase with 10-ms rise and fall times was added as a “base” to the 60- and 100-ms markers. This additional base, if presented in isolation, was 20 dB weaker than the main tone with varied rise and fall times (the amplitude of the “base” tone was one tenth of the amplitude of the main tone with varied rise and fall times and was about 9% of the amplitude of the presented tone). We made the total energy of each marker constant, as in Experiment 2. The sound pressure level of the 20-ms marker was 85 dB, measured as the level of a continuous tone of the same amplitude.

In the experimental condition, one of the markers of the standard interval was either 60 or 100 ms, with systematically changed rise and fall times, whereas the other marker was 20 ms. The standard duration was 120, 240, or 360 ms. Thus, the number of stimulus patterns in the experimental condition was 36 (2 positions of the longer marker × 2 durations of the longer marker × 3 combinations of rise and fall times × 3 standard IOIs). In the control condition, both markers of the standard interval were 20 ms, and the standard duration itself was 120, 240, or 360 ms. We also utilized all possible combinations of the 60-ms marker and the 100-ms marker with the amplitude peak in the “middle.” These additional stimulus patterns were employed to examine whether the marker duration itself had similar effect as in Experiments 13 even when the rise times of the markers were different. The number of stimulus patterns for this additional condition was 12 (2 first-marker durations × 2 second-marker durations × 3 standard IOIs). Thus, the total number of the standard stimulus patterns was 51 (36 experimental + 3 control + 12 combinations of 60- and 100-ms markers). The other aspects of the stimuli were the same as in Experiment 2, and the apparatus was the same as in Experiment 3.

Procedure

The overall procedure was the same as that in Experiment 2. The instructions were basically the same as in Experiments 1–3, but participants were not explicitly told to judge the interval from onset to onset. This modification in the instructions was made because it might have been unnatural and difficult for the participants to focus strictly on onsets when the rise times of sounds were lengthened; the temporal positions of the “perceptual onset” and the “perceptual attack time”Footnote 3 (which should be important in rhythm perception) were said to differ for sounds with long rise times (Gordon, 1987). We decided not to emphasize to the participants that they should focus strictly on onsets, because in the present study and, particularly, in this experiment, we were more interested in relating the results with rhythm perception in natural situations.

The total number of trials was 102 (51 standards × 2 series). These trials were randomized and divided into 13 blocks. Each block consisted of 8 or 10 trials (2 warm-up trials + 6 or 8 experimental trials). Fifty-one practice trials in separate blocks were carried out utilizing all 51 standard stimuli at the beginning of the experiment.

Each experimental block (15 trials) took about 13 min (SD = 7), on average. Each participant completed the whole experiment over a period of 4 days, on average (7 days at maximum and 2 days at minimum).

Results and discussion

For each of the 51 standard time intervals, PSEs were obtained from 12 participants. Table 5 shows the mean PSEs and SEMs for the control condition and the experimental condition with different amplitude peak positions.

Table 5 Mean points of subjective equality (PSEs) and SEMs obtained in Experiment 4

We conducted a three-way (standard IOI × first-/second-marker duration × amplitude-peak position) ANOVA utilizing the CE values, separately for the effects of the first marker and the second marker. As for the first marker, a significant main effect of first-marker duration was obtained, F(1, 11) = 7.735, p < .05, η 2p = .41. The main effect of amplitude peak position of the first marker was not significant, F(2, 22) = 0.397, p = .677, η 2p = .03. As for the second marker, significant main effects of the standard IOI and second-marker duration were obtained, F(2, 22) = 4.044, p < .05, η 2p = .27, and F(1, 11) = 12.894, p < .01, η 2p = .54, respectively. The main effect of the amplitude peak position of the second marker was not significant, F(2, 22) = 2.516, p = .104, η 2p = .19. No other significant effects appeared (p > .05). The results showed that changing only the sound energy distribution did not affect the perceived length of the time intervals systematically, for both the first marker and the second marker.

We calculated the physical positions of the gravity center of sound energy for the 60- and 100-ms markers used in this experiment. (This was proposed by many colleagues when we presented the data in conferences.) These gravity centers may influence the perception of onset–onset intervals (Howell, 1988). The calculated positions of the gravity centers were at 23 ms (for the 60-ms marker with the amplitude peak at the ”beginning”), 30 ms (60 ms, ”middle”), 37 ms (60 ms, ”end”), 38 ms (100 ms, ”beginning”), 50 ms (100 ms, ”middle”), and 62 ms (100 ms, ”end”), each from the beginning of the markers. If the gravity center position was the main factor affecting the PSEs, it could be predicted that moving the amplitude peak of the first marker from “beginning” to “end” should decrease the PSEs by 14 ms for the 60-ms markers and by 24 ms for the 100-ms markers. Moving the peak of the second marker should increase the PSEs by 14 or 24 ms. However, the effects of peak position in our results were smaller: The largest difference in the PSEs caused by moving the peak position of the 60-ms marker was 6 ms (240-ms standard, 60-ms second marker), and the largest difference caused by moving the peak position of the 100-ms marker was 15 ms (360-ms standard, 100-ms second marker). It is also to be noted that moving the peak of the first marker from “beginning” to “end,” which was expected to cause a decrease in the PSEs, caused an increase in PSEs when the first marker was 100 ms and the standard duration was 240 ms or longer. Taken together, our results could not be explained simply by the shift in gravity centers. This analysis also confirmed that the onset–onset duration still gave a reasonable first approximation of the results despite the fact that we did not mention anything related to the marker onsets in the instructions of the present experiment.

Although changing only the sound energy distribution did not affect the subjective duration of the intervals in the same way as changing the marker durations did in Experiments 1–3, the PSEs of intervals marked by second markers of 60 or 100 ms (with varied rise and fall times) were longer than those marked by 20-ms markers in the control condition. Lengthening the duration of the second marker lengthened the subjective duration of the interval (Table 5), just as in the previous experiments.

This tendency appeared clearly in the additional conditions and the control conditions combined together, which could be considered a new experiment; the effects of marker durations were to be examined employing sound markers in which sound energy was concentrated in the temporal middle. Table 6 shows the results (all possible combinations of the 20-, 60-, and 100-ms markers with the amplitude peak in the “middle”). We conducted a three-way (standard IOI × first-marker duration × second-marker duration) ANOVA utilizing the CE values. The results showed significant main effects of standard IOI, F(2, 22) = 3.754, p < .05, η 2p = .25, and of second-marker duration, F(2, 22) = 10.478, p < .01, η 2p = .49. The main effect of first-marker duration and all the interactions were not significant (p > .05). The results again indicated that increasing the duration of the second marker lengthened the subjective duration of the time intervals, even though their amplitude envelopes were considerably different from those used in Experiment 13; the effect of second-marker duration turned out to be robust.

Table 6 Mean points of subjective equality (PSEs) and SEMs obtained in the additional conditions in Experiment 4

Together, the results of Experiment 4 indicated that changes in sound energy distribution alone cannot systematically influence the perceived duration of a time interval. This consequently denies the possibility that the effects of marker durations observed in Experiments 1 and 2 were caused mainly by the changes in temporal sound energy distribution within markers.

General discussion

The main purpose of the present study was to examine whether—and if so, how—the perceived duration of a time interval demarcated by two successive sound onsets would be influenced by the durations of the sound markers. The present paradigm is strongly related to more realistic perception, as in speech or music. The present results are summarized as follows. In Experiments 1 and 2, we examined the effects of marker durations directly and found that lengthening the second marker, whose onset marked the end of the time interval, caused the interval to be perceived as longer. This effect was stable and especially clear within the marker duration range of 20–100 ms. Lengthening the first marker tended to make the subjective duration of the interval longer when the IOI was 240 ms or over, but this tendency was limited and unstable. In Experiment 3, we examined the effects of decrease in marker amplitude, and in Experiment 4, we examined the effects of changes in the temporal distribution of sound energy. These factors could have affected the results of Experiments 1 and 2, but, in reality, it turned out that they had little influence in the present paradigm. The change of the marker durations was proved to be the dominating factor for these effects.

The effect of the second-marker duration was in line with the results of Woodrow (1928) and Grondin et al. (1996), who showed that longer second markers cause an increase in subjective duration. Their studies were different from ours, since they focused on offset–onset intervals. However, in their experiments also, the onset of the second marker was assumed to be the termination of the interval to be judged, as in the present experiments. All these results, taken together, indicate that the effect of the second marker is quite stable. It is also to be noted that our standard intervals (onset–onset intervals), 120–360 ms, were shorter than the 500-ms standards (offset–onset intervals) utilized by Woodrow (1928) or the 250- to 750-ms standards utilized by Grondin et al. (1996). Yet a similar effect of the second-marker duration was observed; the effect of marker duration is now more likely to be related to the perception of note (and rest) duration in music or syllable duration in speech.

A question regarding the nature of the time intervals in the present study may arise: Were the participants really able to time from the onset of the first marker to the onset of the second marker? Tse and Penney (2006) reported that preattentive timing of empty intervals occurred from the offset of a marker to the onset of the subsequent marker. Given this, it seems possible that participants listened to the duration between the offset of the first marker to the onset of the second marker, despite the instruction to judge from onset to onset. Another possibility regarding the nature of time intervals is that participants unintentionally processed the whole duration from the onset of the first marker until the offset of the second marker. Both possibilities, however, do not seem plausible from the PSE values in the present experiments: If participants timed the intervals from offset of the first marker to the onset of the second marker, lengthening the first marker, for example, from 20 to 100 ms should decrease the PSEs typically by 80 ms (because the offset of the first marker of 100 ms will be 80 ms closer to the onset of the second marker, as compared with the offset of the 20-ms first marker). If participants timed the intervals from onset of the first marker to the offset of the second marker, lengthening the second marker from 20 to 100 ms should increase the PSEs typically by 80 ms (offset of the 100-ms second marker will be 80 ms further from the onset of the first marker, as compared with the offset of the 20-ms second marker). However, in Experiments 13, lengthening the first marker did not decrease the PSEs, except for a case in Experiment 1. The change in PSEs caused by lengthening a marker was much smaller than what would have been expected if participants judged from offset to onset or from onset to offset: The change in PSE values was mostly up to about 30 ms (see Tables 14). Onset-to-onset timing remains most plausible from the PSEs in the present experiments.

What kind of underlying process could have caused these effects? One possible explanation with regard to the effects of the first marker is that an internal clock or pacemaker runs faster when there is sound than when there is only silence. An internal clock or pacemaker component has been proposed in traditional models of interval timing (Gibbon, Church, & Meck, 1984; Grondin, 2001; Treisman, 1963; for a recent review, see Grondin, 2010). Some studies have demonstrated that the pacemaker speed is higher when the interval to be timed is ”filled up” with a sound than when it is empty (e.g., Penney, Gibbon, & Meck, 2000; Wearden, Norton, Martin, & Montford-Bebb, 2007). According to this hypothesis, the subjective duration of an interval filled with a sound should be longer than the subjective duration of the same interval made silent. For partially empty IOIs, such as the ones used in the present study, lengthening the first marker should mean that the filled portion increases and that the total subjective duration increases. This hypothesis can explain the increase in perceived durations of intervals longer than 240 ms caused by lengthening the first marker in Expriment 2, but then it is not easy to explain why the same tendency did not appear for 120-ms intervals. Anyway, we need a further investigation on this point.

There is one explanation with regard to the effects of the second-marker duration only. It is based on the processing-time hypothesis proposed by Nakajima (1987; see also Nakajima et al., 2004). This hypothesis assumes that the perceptual processing of a time interval does not end immediately after the detection of the second-marker onset, but about 80 ms later, and that the time needed for this ”additional processing” is included in the perceived duration of this time interval. By extending this hypothesis, we may assume that the additional processing of the interval is slightly delayed by the continuation of the sound when the second marker is lengthened. The second marker lasting fully or partially in the additional processing period may interfere with the process, leading to an increase in the time needed for this additional processing, thus lengthening the perceived intervals. Since second markers with higher amplitude can interfere more with the additional processing, the larger effects of the second-marker duration for the constant-amplitude condition in Experiment 3 can thus be explained. This explanation is consistent with Woodrow’s (1928) and Grondin et al.’s (1996) results. Grondin et al. (1996) demonstrated that lengthening the second-marker duration causes the perceived interval to lengthen also when the second marker is a visual signal. The additional-processing explanation may be extended to the visual modality.

Finally, one explanation applies to the effects of both the first and the second markers. It is that the perceptual detection or positioning of a marker onset may have been delayed when the marker was lengthened. For example, the onset of a 100-ms marker may have been positioned perceptually later than that of a 20-ms marker. This hypothesis is in line with the studies on P-centers. A P-center is defined as the moment of perceptual occurrence of a sound (Morton, Marcus, & Frankish, 1976). Some studies have indicated that the temporal location of P-centers is affected by the rise time of sounds; that is, the P-center of a sound occurs later for sounds with longer rise times or gradual increase in intensity (e.g., Gordon, 1987; Howell, 1988; Scott, 1998; Terhardt & Schütte, 1976; J. Vos & Rasch, 1981). Although there have not been many reports on effects of marker duration, some studies have suggested that lengthening a sound can cause the P-center to be located later (Howell, 1988; Marcus, 1981; Scott, 1998; Terhardt & Schütte, 1976; Vos, Mates, & van Kruysbergen, 1995).

From these studies, it can be predicted that lengthening the first marker or the rise time of the first marker should shorten the perceived duration of the interval, while lengthening the second marker or the rise time of the second marker should lengthen the perceived duration of the interval. Since the increase in the intensity of a sound and the corresponding sensory input was steeper for sounds with higher intensity in our Experiment 3, the P-center of a longer tone should have occured faster in the constant-amplitude condition, in which the amplitudes of longer tones were larger, than in the constant-energy condition, leading to a smaller effect of marker duration in the constant-amplitude condition. Our results did not support these predictions. Lengthening the first marker indeed caused the perceived duration of the interval to shorten, but only when the interval was as short as 120 ms, and only in limited cases. In many cases (typically, in Experiment 2), lengthening the first marker caused the perceived duration of the interval to lengthen. The effect of the second-marker duration on the lengthening of the perceived time interval was larger in the constant-amplitude condition, in which the higher amplitude for longer markers should have reduced the effect of the marker duration according to the P-center hypothesis. Furthermore, varying the rise times in Experiment 4 did not clearly show the expected effects either for the first or for the second marker. The hypothesis of delayed onset, or delayed P-center, may explain some aspects of the present data, but it is unsuitable for explaining the general tendencies we obtained.

The three explanations proposed above are not mutually exclusive; two or more explanations can be utilized together. For example, it is possible that the ”filling” effect and the ”P-center” effect occured together and that the balance of these two effects determined the real influences of the first-marker duration.

An alternative explanation for the dissimilarity in the effects of the first-marker duration between the 120-ms intervals and longer intervals, with regard to the effects of both the first and the second markers, includes the notion of a temporal window of sound processing. The estimated length of the temporal window for synthesizing auditory events is about 160–170 ms (Yabe et al., 1998) or 200 ms (Czigler, Winkler, Sussmann, Yabe, & Horvath, 2003; see also Nakajima, Shimojo, & Sugita, 1980). When the interval was 120 ms, both the first- and the second-marker onsets would fit within this temporal window, which means that both markers were probably processed together as one perceptual unit. When the interval was 240 ms or longer, the first- and the second-marker onsets did not fit in the same temporal window, which means that the two markers may each have been processed separately in different units. This difference in the processing of markers may have caused different effects of the first marker for different interval lengths. For example, it may have been easier for the listeners to utilize the information of the (physical) onset of the first marker when the two markers were apart enough to be processed separately, but this may have been more difficult when the markers were too close together; it is possible that other properties of the first marker, such as the sound energy distribution in time, had more influence, as was indeed observed in the data of Experiment 4, on the perception of the time interval when the two markers were close enough to be processed together as one perceptual unit.

Effects of attention should not be ignored. Detection of markers can be faster with more attention (Ivry & Schlerf, 2008). The presence and continuation of the first marker may have drawn the listener’s attention, and when the first marker was lengthened and the silent gap following it was very short, this attention may have caused the second marker to be detected earlier than when the first marker was short. This can also explain the difference in the effects of the first marker for different interval durations; the gap was only 20 ms when the interval was 120 ms and the first marker was 100 ms. The gap may have been short enough to keep the attention very active until the onset of the second marker, causing the second marker to be detected earlier. However, when the gap was longer than 100 ms, as with an interval of 240 or 360 ms, it may have been too long to keep the attention at the maximum level. This explanation, however, does not apply to the results of Experiment 1, where there was a condition for a 360-ms interval in which the gap was as short as 20 ms, but without much decrease in the PSEs. The 340-ms marker, in this case, may have been too long to effectively keep the attention active.

The present study showed that sound marker durations influence the perceived duration of an onset–onset time interval. This is the first case in which such marker duration effects have been investigated with very short onset–onset intervals, which should be one necessary step in relating the findings in the field of time perception to those for rhythm perception as in speech and music (e.g., Large, 2008; Patel, 2008). Although sound onsets have been considered to be the most important cues for perceiving time intervals (e.g., McAdams & Drake, 2002), sound durations also have had substantial effects.

The importance of sound duration, such as vowel duration in speech and note duration in music, for rhythm has been pointed out in previous studies. For example, the perception of a closure (gap) in a disyllable word can be influenced by the duration of a preceding vowel, which can change the meaning of a Japanese word (Amano & Hirata, 2010). In music, it has been reported that changing note durations without changing the onset–onset timing in a dotted rhythm can influence the perceived “dottedness” of the rhythm (Schubert & Fabian, 2001). Findings such as those in the present study should be accumulated in order to promote understanding of the relationship between sound duration and rhythm perception. Our next step will be to relate simple sound patterns to the perception of rhythm in everyday life, which should involve longer sounds and more complexly structured patterns (Grondin, Bisson, & Gagnon, 2011; Hasuo et al., 2011; Schubert & Fabian, 2001).