In typical psychophysical tasks such as the two-alternative forced-choice (2AFC) task, participants are asked to discriminate between a fixed-magnitude standard stimulus s and a variable comparison stimulus c, whose magnitude can be lower, equal to or higher than the magnitude of the standard. In the temporal 2AFC task, these two stimuli are presented successively in one of two temporal orders, that is, 〈s c〉 and 〈c s〉, to balance for potential effects of stimulus order on task performance. Researchers often disregard such order effects by aggregating data across stimulus order. However, if the observed data are analyzed separately for the two stimulus orders, order effects are commonly observed. For example, the order-conditional psychophysical functions observed in a typical 2AFC task might be shifted horizontally from the point of objective equality (i.e., a Type A order effect or time-order error), such that the magnitude of the first stimulus is either judged to be higher or lower than the magnitude of the second one. Theoretically, this might be the sign of a perceptual, decisional, or response bias, and has been extensively studied (cf. Eisler, Eisler, & Hellström, 2008). More important for the purpose of the present study, however, is the so-called Type B order effect (Ulrich 2010; Ulrich & Vorberg, 2009). This effect refers to the phenomenon that the spread of the order-conditional psychometric functions may differ with regard to stimulus order. Specifically, the difference limen (DL) is typically larger and thus discrimination sensitivity lower for stimulus order 〈c s〉 than for 〈s c〉 (e.g., Lapid, Ulrich, & Rammsayer, 2008; Nachmias, 2006; Stott, 1935; Ulrich, 2010; Woodrow, 1935). This specific result pattern has been defined as a negative Type B effect (Dyjas & Ulrich, 2014).

A negative Type B effect was observed for both random and blocked stimulus order (Dyjas, Bausenhart, & Ulrich, 2012; Nachmias, 2006), and in different modalities including vision (Nachmias, 2006, for converging evidence, see also Patching, Englund, & Hellström, 2012), audition and vision (Grondin & McAuley, 2009; Lapid et al., 2008; Ulrich, Nitschke, & Rammsayer, 2006), and haptics (Ross & Gregory, 1964). Negative Type B effects emerge in different tasks domains (for an overview, see Dyjas et al., 2012, Table 1) including discrimination of weights (Ross & Gregory, 1964), duration (e.g., Woodrow, 1935), and visual shapes (Nachmias, 2006), even though their absence is also sometimes reported (e.g., García-Pérez & Peli, 2014, for a line bisection task). Finally, the Type B effect seems to be independent of judgment mode, since it has been demonstrated not only for comparative judgments, but also for equality judgments (e.g., Dyjas & Ulrich, 2014), and reproduction tasks (e.g., Bausenhart, Dyjas, & Ulrich 2014).

Table 1 Means and their standard errors (in italics) of the estimated model parameters of the Internal Reference Model (IRM) and of the basic Sensation Weighting Model (SWM). Parameters were fitted to the individual observed psychometric functions and then averaged across participants

As was shown by Dyjas et al. (2012), the standard psychophysical difference model (Green & Swets, 1966; Macmillan & Creelman, 2005; Noreen, 1981; Sorkin, 1962; Thurstone, 1927a, b; Wickens, 2002), which assumes that participants base their judgments on the difference of the two stimuli’s internal representations, cannot account for the Type B effect (but see, e.g., García-Pérez, 2014, for an extension of this model).Footnote 1

Instead, Dyjas et al. (2012) suggested that participants rely on an internal reference (see also, e.g., Michels & Helson, 1954) which incorporates previous and currently available stimulus information, and thus is dynamically updated from trial to trial. According to this Internal Reference Model (IRM), the Type B effect emerges as a consequence of the dynamical updating process (Dyjas et al., 2012; Lapid et al., 2008). Specifically, IRM predicts that DL for stimulus order 〈c s〉 should be always either larger than or equal to DL for stimulus order 〈s c〉, depending on the weights assigned to previous and currently available stimulus information in the integration process. In other words, according to IRM, the negative Type B effect can either be present or absent, however, it cannot reverse. This general mechanism postulated by IRM should apply to different task domains, stimulus modalities and types of judgment, and is therefore consistent with the robust negative Type B effects reported by the studies cited above.

Likewise, the predictions of IRM should also generalize across different magnitudes of the standard. In our previous studies in the domain of duration discrimination, a standard duration of 500 msec was employed (Dyjas et al., 2012; Dyjas, Bausenhart, & Ulrich, 2014; Dyjas & Ulrich, 2014. Therefore, the goal of the present study was to examine the Type B effect across a broader range of standard durations. Interestingly, and contrary to IRM’s predictions, there are some studies reporting reversed and thus positive Type B effects under particular circumstances (Hellström & Rammsayer, 2004; Hellström, 2003; for converging evidence, see also Hellström, 1979). Regarding duration discrimination, Hellström and Rammsayer (2004) report a positive Type B effect for very short duration stimuli, especially when presented with brief interstimulus intervals (ISI). It was suggested that processing of short durations might differ from processing of longer durations, especially when these are presented with longer ISIs. For instance, memory processes, interference between stimuli, and backward and forward masking might play a crucial role (Allan & Rousseau, 1977; Kallman & Morris, 1984; Kallman, Hirtle, & Davidson, 1986; Rammsayer & Lima, 1991). Moreover, the reversal of the Type B effect might be the sign of qualitatively different timing mechanisms operating at short and long durations (Michon, 1985; Rammsayer & Ulrich, 2011). Nevertheless, the observed reversal of the Type B effect disagrees with a large body of evidence reporting typical negative Type B effects, and contradicts IRM’s predictions. Therefore, in addition to examining the Type B effect across a broader range of standard durations, a second aim of the present study is to investigate the influence of ISI on the Type B effect.

Experiment 1

In Experiment 1, participants performed a 2AFC duration discrimination task with standard durations of 100 and 1,000 ms presented in different blocks of trials. The ISI was kept constant at 1,000 ms. Therefore, this experiment examines whether the Type B effect generalizes to short and long standard durations.

Methods

Participants

19 women and 5 men (mean age 21.1 ± 3.4 years) volunteered in a single session in exchange for course credit. All of them reported normal hearing and were naïve with respect to the hypotheses of the experiment. The data of two participants were replaced because their DLs in one or more conditions were larger than three standard deviations above the group mean in that condition.

Apparatus and stimuli

The experiment was implemented in Matlab (The MathWorks, Inc., Version 2009a) using the Psychophysics Toolbox 3.0.8 (Brainard, 1997; Pelli, 1997). Instructions and feedback appeared on a computer screen in black (< 1 c d/m 2) on a light-gray background (49 c d/m 2). The “y” and “m” key of a standard German keyboard served as the left and right response key, respectively. Auditory stimuli were filled white noise intervals with rise- and falltimes of 10 ms, respectively, presented binaurally through headphones at a peak level of 65 dB SPL. A new sample of white noise was generated for each stimulus on each trial. In the short standard condition, the standard duration was 100 ms, and the duration of c varied from 52 to 148 ms in fixed steps of 12 ms. In the long standard condition, the standard duration was 1,000 ms, and the duration of c varied from 700 to 1,300 ms in fixed steps of 75 ms. Thus, there were 9 levels of c for each standard duration.

Procedure

On each trial, two stimuli were presented successively separated by an offset-to-onset interstimulus interval (ISI) of 1,000 ms. For stimulus order 〈s c〉, the first stimulus was the fixed-duration standard and the second stimulus was the variable comparison stimulus. For stimulus order 〈c s〉, the order of stimuli was reversed. Stimulus order and the level of c varied randomly from trial to trial. Participants pressed the left (right) response key to indicate that the first (second) stimulus was the longer one. Following the response, either “1”, “2”, or “=” was displayed for 400 ms on the screen, indicating that the first or the second stimulus was the longer one or that the two stimuli were identical in duration, respectively. 1,600 ms after feedback onset, the next trial began. If the participant did not respond within 5,000 ms after the offset of the second stimulus, the trial was terminated and “zu langsam” (too slow) was displayed for 800 ms on the screen.

The short and the long standard duration were administered in separate blocks and the order of blocks was counterbalanced across participants. Each duration of c was presented 20 times for each stimulus order, such that a block consisted of 360 trials (20 repetitions × 9 levels of c × 2 stimulus orders). Participants could take a short rest after every 90 trials. At the beginning of each block, participants performed 18 practice trials (each level of c presented once for each stimulus order). Practice trials did not enter data analysis.

Design and dependent variables

The dependent variables were stimulus order (〈s c〉 vs. 〈c s〉) and standard duration (100 ms vs. 1,000 ms), thus there was a 2×2 within subjects design. A logistic psychometric function was fitted to the data of each participant in each condition under the constraint that the average of the psychometric functions for stimulus orders 〈s c〉 and 〈c s〉 passes through the point (s,0.5), a tautology that applies when stimulus order varies randomly and stimuli differ only in one dimension (Bausenhart, Dyjas, Vorberg, & Ulrich, 2012; García-Pérez & Alcalá-Quintana, 2010, 2011a, 2012; Ulrich & Vorberg, 2009; Ulrich 2010). From these psychometric functions, DL and PSE were calculated and submitted to separate repeated-measures analyses of variance (ANOVAs).

Results and discussion

The two scatterplots in Fig. 1 contain the estimated DLs for each participant and for the two standard durations.Footnote 2 The x-axis represents DL for stimulus order 〈c s〉 and the y-axis DL for order 〈s c〉. First, these data exhibit significant positive correlations between both estimates; r = .77, t(22) = 5.7, p < .001, for standard duration 100 ms and r = .80, t(22) = 6.3, p < .001 for 1,000 ms. Second, all but two data points are on or below the main diagonal indicating a negative Type B effect for almost all participants.

Fig. 1
figure 1

Scatterplots of individual DL estimates for Experiment 1 (auditory duration discrimination with a fixed interstimulus interval of 1,000 ms). Standard durations are 100 ms (left panel) and 1,000 ms (right panel). The data points of the two replaced participants with suspiciously large DLs are not shown, because these values were considered outliers according to a predetermined three-sigma criterion. The corresponding values (x i , y i ) of these two participants (i = 1,2) were (x 1 = 59.5,y 1 = 212.0) and (x 2 = 54.0,y 2 = 36.2) for the 100 ms standard and (x 1 = 614.5,y 1 = 454.5) and (x 2 = 348.8,y 2 = 464.2) for the 1,000 ms standard

An ANOVA on DL confirmed the latter impression. Specifically, DL was larger for stimulus order 〈c s〉 (85 ms) than for 〈s c〉 (56 ms), F(1,23)=37.8, p < .001, \({\eta _{p}^{2}} = .62\), that is, a typical negative Type B effect emerged (cf. Fig. 2, top panel). DL was larger for the long standard (126 ms) than for the short standard (15 ms), F(1,23)=167.4, p < .001, \({\eta _{p}^{2}} = 0.88\). Perhaps unsurprisingly, the magnitude of this negative Type B effect increased numerically with standard duration, F(1,23)=31.4, p < .001, \({\eta _{p}^{2}} = .58\). In order to compare the magnitude of the Type B effect across the different standard durations, Weber Fractions (WF) were computed as DL / standard duration. WF was slightly larger for the short (0.15) than the long (0.13) standard duration, F(1,23)=9.9, p < .01, \({\eta _{p}^{2}} = .30\). Also, larger W F s were observed for stimulus order 〈c s〉 (0.16) than for stimulus order 〈s c〉 (0.12), F(1,23)=42.0, p < .001, \({\eta _{p}^{2}} = .65\), reflecting the Type B effect. Crucially, this effect was not modulated by standard duration, F(1,23)=1.6, p = .22, \({\eta _{p}^{2}} = .07\) (cf. Fig. 2, middle panel). Thus, the magnitude of the negative Type B effect is comparable across standard duration, and most clearly, it was not reversed for the short duration.

Fig. 2
figure 2

Results of Experiment 1 (auditory duration discrimination with a fixed interstimulus interval of 1,000 ms). Mean difference limen (DL, top panel), mean Weber Fraction (WF, middle panel), and mean point of subjective equality (PSE, lower panel) ±1⋅S E as a function of stimulus order and standard duration. The standard error SE of the mean was calculated according to Cousineau (2005) for a within-subjects design. Please note that axis breaks and scaling discontinuities for mean DL and PSE were employed to provide suitable scales for both standard durations and to grant comparability with the results of Experiment 2 displayed in Fig. 4

As one expects, PSE was larger for the long standard than for the short standard, F(1, 23) = 112, 181.0, p < .001, \({\eta _{p}^{2}} = 1.0\). Neither the effect of stimulus order, F(1, 23) = 1.1, p = .30, nor the interaction (F < 1) were significant (cf. Fig. 2, lower panel). The overall pattern of results is thus consistent with our previous research (Dyjas et al., 2012, 2014) showing that stimulus order exerts dissociable effects on DL and PSE.

Experiment 2

In Experiment 1, a typical negative Type B effect emerged for short as well as long standard durations. Previous research had demonstrated a reversal of the Type B effect at short standard durations, especially when paired with short ISIs (Hellström, 2003; Hellström & Rammsayer, 2004). Therefore, Experiment 2 examined whether the Type B effect would be modulated by shortening ISI from 1,000 ms to 300 ms. For example, within IRM it seems plausible that the updating of the internal reference with the information from the first stimulus takes some time and therefore cannot be accomplished within a short ISI, before the first stimulus representation is masked by the representation of the second stimulus. Consequently, the Type B effect would be diminished in the short ISI condition. As outlined above, however, no reversal of the Type B effect would be implied by IRM.

Methods

Participants

A new sample of 20 women and 4 men (mean age: 25.2 ± 8.8 years) participated in exchange for course credit. They reported normal hearing and were naïve with respect to the hypotheses. None of them had participated in the previous experiment. The data of one participant were replaced because DLs in all conditions were larger than three standard deviations above the corresponding group means.

Apparatus, stimuli, procedure, and design

These were identical to the ones used in Experiment 1, except for the following changes. First, only the short standard duration was employed. Second, ISI was 300 ms (short ISI) in one block of trials and 1,000 ms (long ISI) in another block of trials; the order of blocks was counterbalanced across participants. Thus, there was a 2 (stimulus order 〈s c〉 vs. 〈c s〉) × 2 (ISI: 300 ms vs. 1,000 ms) within-subjects design.

Results and discussion

Figure 3 depicts the individual DLs for short and long ISIs. First, as in Experiment 1, these data show positive correlations; r = .73, t(22) = 5.1, p < .001 for short ISI, and r = .68, t(22) = 4.4, p < .001 for long ISI. Second, visual inspection suggests that only the data points for the long ISI show a systematic negative Type B effect, whereas the data points for the short ISI rather scatter around the identity line.

Fig. 3
figure 3

Scatter plots of individual DL estimates for Experiment 2 (auditory duration discrimination with a fixed standard duration of 100 ms). Short ISI (left panel) and long ISI (right panel). The data points of one replaced participant with suspiciously large DLs are not shown, because these values were considered outliers according to a predetermined three-sigma criterion. The corresponding values (x,y) of this participant were (x = 65.6,y = 89.3) for the short ISI and (x = 65.6,y = 59.8) for the long ISI

This subjective impression was strengthened by ANOVA. In particular, overall there was no main effect of stimulus order on DL, F(1,23)=1.8, p = .19, \({\eta _{p}^{2}} = .07\) (cf. Fig. 4, top panel). DL was larger for the short ISI (15.2 ms) than for the long ISI (12.0 ms), F(1,23)=14.3, p < .001, \({\eta _{p}^{2}} = .38\). There was an interaction of stimulus order and ISI, F(1,23)=12.1, p < .01, \({\eta _{p}^{2}} = .34\). Specifically, there was a typical negative Type B effect in the long ISI condition, t(23) = 4.4, p < .001, replicating the result of Experiment 1. However, this effect vanished for the short ISI condition, t(23) = 1.4, p = .16. Accordingly, the temporal interval between the two successive stimuli in the 2AFC task modulates the magnitude of the Type B effect. This is consistent with the idea that the integration of the first stimulus into the internal reference is hampered when the second stimulus follows the first one closely in time.

Fig. 4
figure 4

Results of Experiment 2 (auditory duration discrimination with a fixed standard duration of 100 ms). Mean difference limen (DL, top panel) and mean point of subjective equality (PSE, lower panel) ±1⋅S E as a function of stimulus order and ISI. The standard error SE of the mean was calculated according to Cousineau (2005) for a within-subjects design

ISI did not affect PSE, F(1,23)=1.3, p = .26 (cf. Fig. 4, lower panel). However, PSE was larger for stimulus order 〈s c〉 (104 ms) than for order 〈c s〉 (95 ms), F(1, 23) = 30.5, p < .001, \({\eta _{p}^{2}} = .57\). This effect suggests that the magnitude of the first stimulus (s in the 〈s c〉 condition, and c in the 〈c s〉 condition) is overestimated as compared to the magnitude of the second stimulus. This Type A effect corresponds to a positive time-order error, which is often observed for rather short duration stimuli (e.g., Allan, 1977). This effect, however, was modulated by ISI, F(1,23)=6.4, p < .05, \({\eta _{p}^{2}} = .22\), such that in the short ISI condition, the overestimation of the first stimulus compared to the second one was even larger than in the long ISI condition. Again, this is consistent with the findings of previous studies (e.g., Hellström & Rammsayer, 2004).

General discussion

The major aim of the present study was to investigate whether the Type B effect is modulated by the magnitude of the standard stimulus. Experiment 1 demonstrated higher duration discrimination sensitivity for stimulus order 〈s c〉 than for stimulus order 〈c s〉, independent of whether relatively short (100 ms) or long standard durations (1,000 ms) were employed. Thus, the finding of a negative Type B effect generalizes from a standard duration of 500 ms (Dyjas et al., 2012, 2014; Dyjas & Ulrich, 2014) to longer as well as shorter standard durations. Nevertheless, in Experiment 2, this negative Type B effect diminished when both stimuli were separated by a brief ISI (300 ms), rather than a longer one (1,000 ms). Yet, there was no reversal of the Type B effect. Accordingly, these results fit within the scope of IRM, which predicts that discrimination sensitivity for trials with stimulus order 〈s c〉 should be either higher than or equal to sensitivity for trials with stimulus order 〈c s〉 (as shown in the Appendix, IRM also provides a quantitative account of the present data). Therefore, in the domain of auditory duration discrimination, IRM seems to apply to a relatively broad range of standard durations and ISIs.

Within IRM, the absence of the Type B effect for the brief ISI may be attributed to a lack of integration of the current stimulus representation with information from previous trials. This seems plausible under the assumption that the integration does not proceed in a completely automatic but rather in a more controlled and maybe time-consuming fashion. Previous evidence from a cueing study suggests that the integration process is indeed under cognitive control (Dyjas et al., 2014). Clearly, further studies are required to substantiate this speculation, for example by manipulating the time available for the integration process more directly.

As outlined in the Introduction, there is previous evidence for a reversal of the Type B effect in duration discrimination when an even shorter standard duration (50 ms) than in the present experiments was employed, especially with relatively short ISIs (≤ 300 ms, Hellström & Rammsayer, 2004). In addition, the procedure to measure discrimination performance in this study differed in several ways from the one of the present experiments. Specifically, ISI was manipulated between participants and an adaptive testing scheme was administered to assess performance for the two stimulus orders. While stimulus order was randomly intermixed between trials (as in the present experiments), performance at the 25th and 75th percentile was assessed in separate blocks of trials rather than within the same block of trials. Under such conditions, estimates of DL might be influenced by shifts of the underlying psychometric functions between blocks (cf. Ulrich & Vorberg, 2009). These methodological differences thus hamper a direct comparison between the present study and the one by Hellström and Rammsayer (2004).

It should be noted that converging evidence for positive Type B effects has sometimes also been found in other task domains (as loudness discrimination and line length) under certain stimulation conditions and using different assessment methods (Hellström, 1979, 2003). Therefore, of course, we cannot refute the findings of a positive Type B effect under specific conditions. Although such a reversed Type B effect could not be explained by IRM, the present results nonetheless show that IRM is applicable to a wide range of standard durations and ISIs. A more general framework, such as Sensation Weighting (SW; Hellström, 1979, 1985), would be needed to account for any reversal of the Type B effect.Footnote 3 According to this framework, the internal representations of both stimuli in a discrimination task would be weighted differentially, with the assigned weights depending on the ISI duration. Accordingly, if the second stimulus receives a larger weight than the first stimulus, the SW framework implies a negative Type B effect, whereas a reversed effect is implied when the first stimulus receives a larger weight than the second stimulus (please refer to the Appendix for a quantitative account of the present results based on this framework). In any case, neither the typical negative Type B effect nor a reversal of this effect can be explained by psychophysical accounts based on the standard difference model (Thurstone, 1927a, b).

In summary, the results of this study demonstrate that the negative Type B effect is robust across different stimulus magnitudes. In general, the widespread presence of Type B order effects provides a benchmark for the formulation and advancement of psychophysical theories of stimulus discrimination.