Chronometric analyses of mental processes are typically restricted to mean reaction times (RTs), despite the fact that more information is provided by considering the distribution of RTs (e.g., Heathcote, Popiel, & Mewhort, 1991; Yap, Balota, Tse, & Besner, 2008). Indeed, Balota and Yap (2011) estimated that fewer than 11% of chronometric studies consider distributional information. In the present experiments, we use an ex-Gaussian analysis (Heathcote et al. 1991; Hohle, 1965; Plourde & Besner, 1997; Spieler, Balota, & Faust, 1996). This approach fits the empirical data to quantiles and generates three parameters that correspond to different characteristics of the distribution; mu, sigma, and tau. Mu corresponds to the mean of the normal distribution, sigma to the standard deviation of the normal component of the distribution, and tau to the tail of the ex-Gaussian distribution (the mean and standard deviation of the exponential component of the distribution). These parameters provide additional information about changes in the RT distribution that are not reflected in analyses of mean RTs.

The semantic Stroop effect

In recent years, there has been a renewed interest in the semantic Stroop effect (SSE; e.g., Augustinova & Ferrand, 2012, 2014; Augustinova, Flaudias, & Ferrand, 2010; Augustinova, Silvert, Ferrand, Llorca, & Flaudias, 2015; Labuschagne & Besner, 2015; Manwell, Roberts, & Besner, 2004; Risko, Schmidt, & Besner, 2006; Schmidt & Cheesman, 2005). The semantic manipulation consists of color-associated words (e.g., the word sky, associated with the color blue) and is taken to primarily reflect semantic processing (competition at the semantic level if the word and color are incongruent), rather than response competition (as is assumed to be a large component of the standard Stroop effect, which uses color words that appear in the response set; see, e.g., Augustinova & Ferrand, 2014; Augustinova et al. 2015). Despite a resurgence of interest in the SSE, no ex-Gaussian analysis of this effect has been reported to date. In contrast, numerous studies have reported ex-Gaussian analyses of the standard Stroop effect. Across these investigations a consistent pattern has emerged, such that the Stroop effect is seen in all three ex-Gaussian parameters (see Balota et al. 2010; Heathcote et al. 1991; Spieler et al. 1996).

What should we expect to see in an ex-Gaussian analysis of the SSE?

One account of the SSE is that it is largely restricted to semantic-level processing and that response competition (argued to be a major component in the standard Stroop effect) is, at least to some extent, absent. To the extent that tau is associated with response competition (Spieler et al. 1996), little or no SSE ought to be present in this parameter (though we note that, logically speaking, response competition may also influence mu). Here we report the results of four semantic Stroop experiments in which participants named the color of a printed word and ignored the irrelevant meaning of the word. The major finding was that across all of these experiments we found an SSE on mean RTs, but in the ex-Gaussian analysis the effect was confined to mu and was absent from sigma and tau.

Experiments 1 and 2

Method

Participants

A total of 40 undergraduate students from the University of Waterloo participated in Experiment 1, and 40 new participants from the same participant pool took part in Experiment 2. Each participant was tested individually and received course credit for participating. All had normal or corrected-to-normal vision, as well as normal color vision.

Stimuli

The stimuli consisted of the neutral words keg, jail, table, and palace, and the color-associated words sky, frog, lemon, and tomato. These items were taken from Manwell et al. (2004) and were matched for length and frequency with the colors in the response set (red, green, blue, and yellow). Items were presented individually in lowercase Courier New font, size 18. Each of the letters in a word was colored using the four colors from the response set: red (RGB: 255, 0, 0), green (RGB: 0, 255, 0), blue (RGB: 0, 0, 255), and yellow (RGB: 255, 255, 0). All letters were presented in the same color, with color-associated words always being presented in an incongruent color (e.g., sky would be presented in green, as opposed to blue).

Design

In both experiments, we manipulated a single, within-subjects factor, Relatedness, with two levels (neutral vs. color-associated). Experiment 1 had a total of 96 experimental trials, with 48 trials per condition. Experiment 2 was identical to Experiment 1, with the exception that, to increase power, (a) a second block of 96 Stroop trials was presented, and (b) in a third block consisting of 48 trials, participants named the color of a rectangle presented in the center of the screen with no accompanying words. The rectangles were colored using the same four colors as in the prior blocks. The data from the third block yielded nothing of interest and are not considered further.

Apparatus

The stimuli were displayed on a 22-in. LG Flatron W2242TQ color monitor (29.5 cm high × 47.5 cm wide). Stimulus presentation and data recording were controlled by the E-Prime 2.0 experimental software, which was run on an Ultra Vault PC with an Intel Core 2 Quad CPU with a 2.40-GHz processor. The display had a refresh rate of 60 Hz and a resolution of 1,680 × 1,050 pixels, and the screen resolution in E-Prime was set to 640 × 480. Participant responses were collected via an Altec Lansing microphone headset attached to a voice key assembly. RTs were measured to the nearest millisecond.

Procedure

Participants were seated approximately 70 cm away from a computer monitor. The experiment began with a set of 16 practice trials that were followed by a single block of 96 experimental trials. Each trial began with the presentation of a fixation marker (+), which remained on the screen for 500 ms. Words were then presented centered at fixation. The participants’ task was to name the font color of the displayed word out loud. They were instructed to ignore the color carrier word and to respond as quickly and accurately as possible. The display remained on the screen until participants had made a response. Once a response had been made, the screen remained blank until the researcher had coded the response as correct, incorrect, or spoiled (e.g., a cough or microphone failure caused the microphone to trigger too early or too late). This was followed immediately by the appearance of the fixation marker.

Results

Experiment 1

Spoiled trials (7.2%) and trials on which an incorrect response was made (0.3%) were removed prior to the data analysis. Following Spieler, Balota, and Faust (1996, 2000), correct RTs were subjected to an outlier removal procedure in which RTs more than three standard deviations above or below the mean RT per participant, per condition, were excluded from all analyses. This resulted in 1.6% of correct responses being discarded.Footnote 1 The outlier removal procedures in Experiments 2, 3, and 4 were identical to the one used in Experiment 1, and will not be described again.

The data of one participant were discarded due to a large number of errors (more than three standard deviations beyond the mean within any condition). Following the removal of error and RT outliers, the participants’ mean RTs for each condition were submitted to the same outlier removal procedure. One participant was an outlier according to this criterion, leaving the data from 38 participants for further analysis. Table 1 shows the mean RT, 95% confidence interval (CI), and mean percentage error for each condition. CIs were computed following the procedures outlined in Masson and Loftus (2003) for within-subjects designs.

Table 1 Mean reaction times (RT, in milliseconds), 95% confidence intervals (±CI), and percentage errors (%E) as a function of condition in Experiment 1

A t test confirmed a significant SSE, t(37) = 3.98, SE = 3.52, p < .001. Participants were slower to name the font colors of incongruent color-associated words than of color-neutral words. There was no significant error effect, t(37) = –0.46, p = .649.

Experiment 2

The data from four of the 40 participants were removed prior to analysis due to a failure to follow instructions (e.g., talking during trials instead of attending to the task). For the remaining 36 participants, spoiled trials (3.1%) and trials on which an incorrect response was made (1.2%) were removed prior to the data analysis. In addition, 1.8% of the trials with correct responses across participants were removed as RT outliers. One participant was removed as an accuracy outlier, and one other was removed due to an inability to fit their data in the ex-Gaussian analysis, leaving the data from 34 participants for further analysis. Table 2 shows the mean RT, 95% CI, and mean percentage error for each condition.

Table 2 Mean reaction times (RT, in milliseconds), 95% confidence intervals (±CI) and percentage errors (%E) as a function of condition in Experiment 2

A 2 × 2 (Relatedness × Block) analysis of variance (ANOVA) on RTs revealed a significant SSE, F(1, 33) = 11.38, p = .002. There was no significant main effect of block, (F < 1), and no significant Relatedness × Block interaction (F < 1). Further analyses of the SSE are therefore based on the mean RTs averaged across Blocks 1 and 2. We found no significant error effect (F < 1).

Ex-Gaussian analysis

An ex-Gaussian analysis was conducted on the data from Experiments 1 and 2. The empirical data were fitted to quantiles using QMPE (Heathcote, Brown, & Cousineau, 2004). Then t tests were performed on the parameter estimates obtained from this procedure, to determine whether there was a significant SSE in mu, sigma, or tau.

We observed a marginal SSE in mu, t(71) = 1.75, SE = 5.94, p = .085, and no significant SSE in either sigma, t(71) = 0.83, SE = 4.29, p = .408, or tau, t(71) = 0.30, SE = 7.23, p = .762. Because the SSE in Experiments 1 and 2 was quite small (though it was highly significant in mean RTs), the failure to detect an SSE in any of the ex-Gaussian parameters was likely a power issue. We therefore report two additional experiments that we then included in a combined ex-Gaussian analysis across all four experiments.

Experiments 3 and 4

The stimuli, apparatus, and procedure in Experiments 3 and 4 were identical to those in Experiments 1 and 2, except that the letters in the stimuli were spatially cued with a white line above and below each letter, and irrelevant characters drawn from the top of the keyboard separated the letters. An example of the color-associated condition is shown in Fig. 1.

Fig. 1
figure 1

Sample stimuli from Experiments 3 and 4

The purpose of the irrelevant characters was to enhance localization of the spatial cue, because in a separate block only a single letter was cued and participants named the color of that letter (one of these experiments was reported by Labuschagne & Besner, 2015; the other experiment was an unpublished quasi-replication). Here we consider only the results of the blocked condition from these two experiments, in which all letters were homogeneously colored and spatially cued.Footnote 2

Method

Participants

We recruited 41 participants in Experiment 3, and 42 participants in Experiment 4. All of the participants were from the same undergraduate pool as those in Experiments 1 and 2.

Results

Experiment 3

For the 41 participants in this experiment, spoiled trials (7.7%) and trials on which an incorrect response was made (0.3%) were removed prior to the data analysis. In addition, 1.3% of the trials with correct responses across participants were removed as RT outliers. Two participants were removed as accuracy outliers, and one participant was removed as an RT outlier, leaving the data from 38 participants for further analysis. Table 3 shows the mean RT, 95% CI, and mean percentage error for each condition. It is worth noting that (a) these RTs are very close to those seen in Experiments 1 and 2, and (b) the size of the Stroop effect is similar to those seen in Experiments 1 and 2. Furthermore, the size of the SSE is virtually identical to the 18-ms effect reported by Augustinova and colleagues (2010) in the all-letters-cued condition of their Experiment 1. We therefore take the view that the presence of irrelevant symbols between the letters had little bearing on the size of the SSE. However, since we do not know whether anything changed in the distributional analysis, we consider that issue below for Experiments 3 and 4 combined.

Table 3 Mean reaction times (RT, in milliseconds), 95% confidence intervals (±CI), and percentage errors (%E) as a function of condition in Experiment 3

A t test confirmed a significant SSE, t(37) = 3.16, SE = 5.77, p = .003. There was no significant error effect, t(37) = 0.48, p = .632.

Experiment 4

For the 42 participants, spoiled trials (6.3%) and trials on which an incorrect response was made (1.0%) were removed prior to the data analysis. In addition, 1.5% of the trials with correct responses across participants were removed as RT outliers. One participant was removed as an accuracy outlier, and another participant was removed due to an inability to fit their data in the ex-Gaussian analysis, leaving 40 for further analysis. Table 4 shows the mean RT, 95% CI, and mean percentage error for each condition.

Table 4 Mean reaction times (RT, in milliseconds), 95% confidence intervals (±CI), and percentage errors (%E) as a function of condition in Experiment 4

A t test confirmed a significant SSE, t(39) = 4.85, SE = 4.02, p < .001. We found no error effect, t(39) = 0.12, p = .905.

Ex-Gaussian analysis of Experiments 3 and 4 combined

A marginally significant SSE was discernable in mu, t(77) = 1.99, SE = 6.94, p = .050, but not in either sigma, t(77) = 1.46, SE = 5.83, p = .149, or tau, t(77) = 0.59, SE = 9.44, p = .555.

Cross-experiment analysis of all four experiments

Ex-Gaussian analyses were performed as before, but this time on the combined data from all four experiments, in order to maximize power. To reassure readers that Experiments 1 and 2 and Experiments 3 and 4 were comparable, we include a table that shows the mean RTs and parameter estimates separately for each pair of experiments (see Table 5). Additionally, we conducted an analysis on the mean RTs in which Experiments (1/2 vs. 3/4) was included as a factor. There was no significant Experiments × Relatedness interaction (p > .15).

Table 5 Mean reaction times (RTs), p values, and Cohen’s d for Experiments 1 and 2 and Experiments 3 and 4

Three t tests were conducted to determine whether a significant SSE would emerge in mu, sigma, and tau in the combined data of all four experiments. We also report Bayes factors for each of the t tests, because Bayesian analysis provides evidence for how strongly the data favor the alternative versus the null hypothesis (Masson, 2011; Wagenmakers, 2007).

We found a significant effect of relatedness on mu, t(149) = 2.65, SE = 4.59, p = .009 (scaled JZS Bayes factor of 2.6 in favor of the alternative). We also observed a marginally significant effect of relatedness on sigma, t(149) = 1.68, SE = 3.66, p = .096 (scaled JZS Bayes factor of 2.8 favors the null), and no effect of relatedness on tau, t(149) = 0.66, SE = 5.99, p = .509 (scaled JZS Bayes factor of 8.9 favors the null hypothesis). The means for each parameter in this analysis can be seen in Table 6.

Table 6 Mean reaction times (RTs) and parameter estimates from an ex-Gaussian analysis for the combined data of Experiments 1, 2, 3, and 4

General discussion

The data are clear, in that each of four experiments yielded an SSE in the mean RTs; color naming was slower when the color carrier was a color-associated word that was incongruent with the displayed color, as compared to when the color carrier was a neutral word. These results replicate previous reports in the literature (e.g., Augustinova & Ferrand, 2012; Augustinova et al. 2010; Augustinova et al. 2015; Labuschagne & Besner, 2015; Manwell et al. 2004).

According to reports by Spieler et al. (1996) and Heathcote et al. (1991), the standard Stroop effect yields effects on the mean of the normal distribution (mu), the standard deviation of the normal distribution (sigma), and the tail (tau). In contrast, an ex-Gaussian analysis of the combined data from Experiments 1, 2, 3, and 4 here yielded an SSE only in mu; the null effect in tau was particularly small (the scaled JZS Bayes factor of 8.9 is considered positive evidence in favor of the null; Wagenmakers, 2007).

Conclusion

Despite intensive research on many aspects of the standard Stroop effect (see MacLeod’s, 1991, review), no articles in the published literature to date have reported an ex-Gaussian analysis of RTs in the context of the semantic Stroop effect. The present results suggest that there is a difference between the standard and semantically based Stroop effects in terms of how they affect the ex-Gaussian parameters. The former yields effects in mu, sigma, and tau, whereas the latter is restricted to mu (with an effect on sigma that only weakly favors the null hypothesis). Theoretically, the present data are consistent with the view that the SSE is largely confined to semantic interference, insofar as there is no effect in tau, which is argued to reflect the time taken to resolve response competition between the color and the color carrier word (Spieler et al. 1996). That said, we emphasize that there is no reason why response competition might not also be part of the effect in mu.

More generally, we emphasize the point made by Heathcote et al. (1991), as well as by Balota and Yap (2011), Yap et al. (2008), and others: An analysis of the distribution of RTs offers a more informative picture of mental processing than does the standard approach, which considers only mean RTs.