Abstract
Previous analyses of the standard Stroop effect (which typically uses color words that form part of the response set) have documented effects on mean reaction times in hundreds of experiments in the literature. Less well known is the fact that ex-Gaussian analyses reveal that such effects are seen in (a) the mean of the normal distribution (mu), as well as in (b) the standard deviation of the normal distribution (sigma) and (c) the tail (tau). No ex-Gaussian analysis exists in the literature with respect to the semantically based Stroop effect (which contrasts incongruent color-associated words with, e.g., neutral controls). In the present experiments, we investigated whether the semantically based Stroop effect is also seen in the three ex-Gaussian parameters. Replicating previous reports, color naming was slower when the color was carried by an irrelevant (but incongruent) color-associated word (e.g., sky, tomato) than when the control items consisted of neutral words (e.g., keg, palace) in each of four experiments. An ex-Gaussian analysis revealed that this semantically based Stroop effect was restricted to the arithmetic mean and mu; no semantic Stroop effect was observed in tau. These data are consistent with the views (1) that there is a clear difference in the source of the semantic Stroop effect, as compared to the standard Stroop effect (evidenced by the presence vs. absence of an effect on tau), and (2) that interference associated with response competition on incongruent trials in tau is absent in the semantic Stroop effect.
Similar content being viewed by others
Chronometric analyses of mental processes are typically restricted to mean reaction times (RTs), despite the fact that more information is provided by considering the distribution of RTs (e.g., Heathcote, Popiel, & Mewhort, 1991; Yap, Balota, Tse, & Besner, 2008). Indeed, Balota and Yap (2011) estimated that fewer than 11% of chronometric studies consider distributional information. In the present experiments, we use an ex-Gaussian analysis (Heathcote et al. 1991; Hohle, 1965; Plourde & Besner, 1997; Spieler, Balota, & Faust, 1996). This approach fits the empirical data to quantiles and generates three parameters that correspond to different characteristics of the distribution; mu, sigma, and tau. Mu corresponds to the mean of the normal distribution, sigma to the standard deviation of the normal component of the distribution, and tau to the tail of the ex-Gaussian distribution (the mean and standard deviation of the exponential component of the distribution). These parameters provide additional information about changes in the RT distribution that are not reflected in analyses of mean RTs.
The semantic Stroop effect
In recent years, there has been a renewed interest in the semantic Stroop effect (SSE; e.g., Augustinova & Ferrand, 2012, 2014; Augustinova, Flaudias, & Ferrand, 2010; Augustinova, Silvert, Ferrand, Llorca, & Flaudias, 2015; Labuschagne & Besner, 2015; Manwell, Roberts, & Besner, 2004; Risko, Schmidt, & Besner, 2006; Schmidt & Cheesman, 2005). The semantic manipulation consists of color-associated words (e.g., the word sky, associated with the color blue) and is taken to primarily reflect semantic processing (competition at the semantic level if the word and color are incongruent), rather than response competition (as is assumed to be a large component of the standard Stroop effect, which uses color words that appear in the response set; see, e.g., Augustinova & Ferrand, 2014; Augustinova et al. 2015). Despite a resurgence of interest in the SSE, no ex-Gaussian analysis of this effect has been reported to date. In contrast, numerous studies have reported ex-Gaussian analyses of the standard Stroop effect. Across these investigations a consistent pattern has emerged, such that the Stroop effect is seen in all three ex-Gaussian parameters (see Balota et al. 2010; Heathcote et al. 1991; Spieler et al. 1996).
What should we expect to see in an ex-Gaussian analysis of the SSE?
One account of the SSE is that it is largely restricted to semantic-level processing and that response competition (argued to be a major component in the standard Stroop effect) is, at least to some extent, absent. To the extent that tau is associated with response competition (Spieler et al. 1996), little or no SSE ought to be present in this parameter (though we note that, logically speaking, response competition may also influence mu). Here we report the results of four semantic Stroop experiments in which participants named the color of a printed word and ignored the irrelevant meaning of the word. The major finding was that across all of these experiments we found an SSE on mean RTs, but in the ex-Gaussian analysis the effect was confined to mu and was absent from sigma and tau.
Experiments 1 and 2
Method
Participants
A total of 40 undergraduate students from the University of Waterloo participated in Experiment 1, and 40 new participants from the same participant pool took part in Experiment 2. Each participant was tested individually and received course credit for participating. All had normal or corrected-to-normal vision, as well as normal color vision.
Stimuli
The stimuli consisted of the neutral words keg, jail, table, and palace, and the color-associated words sky, frog, lemon, and tomato. These items were taken from Manwell et al. (2004) and were matched for length and frequency with the colors in the response set (red, green, blue, and yellow). Items were presented individually in lowercase Courier New font, size 18. Each of the letters in a word was colored using the four colors from the response set: red (RGB: 255, 0, 0), green (RGB: 0, 255, 0), blue (RGB: 0, 0, 255), and yellow (RGB: 255, 255, 0). All letters were presented in the same color, with color-associated words always being presented in an incongruent color (e.g., sky would be presented in green, as opposed to blue).
Design
In both experiments, we manipulated a single, within-subjects factor, Relatedness, with two levels (neutral vs. color-associated). Experiment 1 had a total of 96 experimental trials, with 48 trials per condition. Experiment 2 was identical to Experiment 1, with the exception that, to increase power, (a) a second block of 96 Stroop trials was presented, and (b) in a third block consisting of 48 trials, participants named the color of a rectangle presented in the center of the screen with no accompanying words. The rectangles were colored using the same four colors as in the prior blocks. The data from the third block yielded nothing of interest and are not considered further.
Apparatus
The stimuli were displayed on a 22-in. LG Flatron W2242TQ color monitor (29.5 cm high × 47.5 cm wide). Stimulus presentation and data recording were controlled by the E-Prime 2.0 experimental software, which was run on an Ultra Vault PC with an Intel Core 2 Quad CPU with a 2.40-GHz processor. The display had a refresh rate of 60 Hz and a resolution of 1,680 × 1,050 pixels, and the screen resolution in E-Prime was set to 640 × 480. Participant responses were collected via an Altec Lansing microphone headset attached to a voice key assembly. RTs were measured to the nearest millisecond.
Procedure
Participants were seated approximately 70 cm away from a computer monitor. The experiment began with a set of 16 practice trials that were followed by a single block of 96 experimental trials. Each trial began with the presentation of a fixation marker (+), which remained on the screen for 500 ms. Words were then presented centered at fixation. The participants’ task was to name the font color of the displayed word out loud. They were instructed to ignore the color carrier word and to respond as quickly and accurately as possible. The display remained on the screen until participants had made a response. Once a response had been made, the screen remained blank until the researcher had coded the response as correct, incorrect, or spoiled (e.g., a cough or microphone failure caused the microphone to trigger too early or too late). This was followed immediately by the appearance of the fixation marker.
Results
Experiment 1
Spoiled trials (7.2%) and trials on which an incorrect response was made (0.3%) were removed prior to the data analysis. Following Spieler, Balota, and Faust (1996, 2000), correct RTs were subjected to an outlier removal procedure in which RTs more than three standard deviations above or below the mean RT per participant, per condition, were excluded from all analyses. This resulted in 1.6% of correct responses being discarded.Footnote 1 The outlier removal procedures in Experiments 2, 3, and 4 were identical to the one used in Experiment 1, and will not be described again.
The data of one participant were discarded due to a large number of errors (more than three standard deviations beyond the mean within any condition). Following the removal of error and RT outliers, the participants’ mean RTs for each condition were submitted to the same outlier removal procedure. One participant was an outlier according to this criterion, leaving the data from 38 participants for further analysis. Table 1 shows the mean RT, 95% confidence interval (CI), and mean percentage error for each condition. CIs were computed following the procedures outlined in Masson and Loftus (2003) for within-subjects designs.
A t test confirmed a significant SSE, t(37) = 3.98, SE = 3.52, p < .001. Participants were slower to name the font colors of incongruent color-associated words than of color-neutral words. There was no significant error effect, t(37) = –0.46, p = .649.
Experiment 2
The data from four of the 40 participants were removed prior to analysis due to a failure to follow instructions (e.g., talking during trials instead of attending to the task). For the remaining 36 participants, spoiled trials (3.1%) and trials on which an incorrect response was made (1.2%) were removed prior to the data analysis. In addition, 1.8% of the trials with correct responses across participants were removed as RT outliers. One participant was removed as an accuracy outlier, and one other was removed due to an inability to fit their data in the ex-Gaussian analysis, leaving the data from 34 participants for further analysis. Table 2 shows the mean RT, 95% CI, and mean percentage error for each condition.
A 2 × 2 (Relatedness × Block) analysis of variance (ANOVA) on RTs revealed a significant SSE, F(1, 33) = 11.38, p = .002. There was no significant main effect of block, (F < 1), and no significant Relatedness × Block interaction (F < 1). Further analyses of the SSE are therefore based on the mean RTs averaged across Blocks 1 and 2. We found no significant error effect (F < 1).
Ex-Gaussian analysis
An ex-Gaussian analysis was conducted on the data from Experiments 1 and 2. The empirical data were fitted to quantiles using QMPE (Heathcote, Brown, & Cousineau, 2004). Then t tests were performed on the parameter estimates obtained from this procedure, to determine whether there was a significant SSE in mu, sigma, or tau.
We observed a marginal SSE in mu, t(71) = 1.75, SE = 5.94, p = .085, and no significant SSE in either sigma, t(71) = 0.83, SE = 4.29, p = .408, or tau, t(71) = 0.30, SE = 7.23, p = .762. Because the SSE in Experiments 1 and 2 was quite small (though it was highly significant in mean RTs), the failure to detect an SSE in any of the ex-Gaussian parameters was likely a power issue. We therefore report two additional experiments that we then included in a combined ex-Gaussian analysis across all four experiments.
Experiments 3 and 4
The stimuli, apparatus, and procedure in Experiments 3 and 4 were identical to those in Experiments 1 and 2, except that the letters in the stimuli were spatially cued with a white line above and below each letter, and irrelevant characters drawn from the top of the keyboard separated the letters. An example of the color-associated condition is shown in Fig. 1.
The purpose of the irrelevant characters was to enhance localization of the spatial cue, because in a separate block only a single letter was cued and participants named the color of that letter (one of these experiments was reported by Labuschagne & Besner, 2015; the other experiment was an unpublished quasi-replication). Here we consider only the results of the blocked condition from these two experiments, in which all letters were homogeneously colored and spatially cued.Footnote 2
Method
Participants
We recruited 41 participants in Experiment 3, and 42 participants in Experiment 4. All of the participants were from the same undergraduate pool as those in Experiments 1 and 2.
Results
Experiment 3
For the 41 participants in this experiment, spoiled trials (7.7%) and trials on which an incorrect response was made (0.3%) were removed prior to the data analysis. In addition, 1.3% of the trials with correct responses across participants were removed as RT outliers. Two participants were removed as accuracy outliers, and one participant was removed as an RT outlier, leaving the data from 38 participants for further analysis. Table 3 shows the mean RT, 95% CI, and mean percentage error for each condition. It is worth noting that (a) these RTs are very close to those seen in Experiments 1 and 2, and (b) the size of the Stroop effect is similar to those seen in Experiments 1 and 2. Furthermore, the size of the SSE is virtually identical to the 18-ms effect reported by Augustinova and colleagues (2010) in the all-letters-cued condition of their Experiment 1. We therefore take the view that the presence of irrelevant symbols between the letters had little bearing on the size of the SSE. However, since we do not know whether anything changed in the distributional analysis, we consider that issue below for Experiments 3 and 4 combined.
A t test confirmed a significant SSE, t(37) = 3.16, SE = 5.77, p = .003. There was no significant error effect, t(37) = 0.48, p = .632.
Experiment 4
For the 42 participants, spoiled trials (6.3%) and trials on which an incorrect response was made (1.0%) were removed prior to the data analysis. In addition, 1.5% of the trials with correct responses across participants were removed as RT outliers. One participant was removed as an accuracy outlier, and another participant was removed due to an inability to fit their data in the ex-Gaussian analysis, leaving 40 for further analysis. Table 4 shows the mean RT, 95% CI, and mean percentage error for each condition.
A t test confirmed a significant SSE, t(39) = 4.85, SE = 4.02, p < .001. We found no error effect, t(39) = 0.12, p = .905.
Ex-Gaussian analysis of Experiments 3 and 4 combined
A marginally significant SSE was discernable in mu, t(77) = 1.99, SE = 6.94, p = .050, but not in either sigma, t(77) = 1.46, SE = 5.83, p = .149, or tau, t(77) = 0.59, SE = 9.44, p = .555.
Cross-experiment analysis of all four experiments
Ex-Gaussian analyses were performed as before, but this time on the combined data from all four experiments, in order to maximize power. To reassure readers that Experiments 1 and 2 and Experiments 3 and 4 were comparable, we include a table that shows the mean RTs and parameter estimates separately for each pair of experiments (see Table 5). Additionally, we conducted an analysis on the mean RTs in which Experiments (1/2 vs. 3/4) was included as a factor. There was no significant Experiments × Relatedness interaction (p > .15).
Three t tests were conducted to determine whether a significant SSE would emerge in mu, sigma, and tau in the combined data of all four experiments. We also report Bayes factors for each of the t tests, because Bayesian analysis provides evidence for how strongly the data favor the alternative versus the null hypothesis (Masson, 2011; Wagenmakers, 2007).
We found a significant effect of relatedness on mu, t(149) = 2.65, SE = 4.59, p = .009 (scaled JZS Bayes factor of 2.6 in favor of the alternative). We also observed a marginally significant effect of relatedness on sigma, t(149) = 1.68, SE = 3.66, p = .096 (scaled JZS Bayes factor of 2.8 favors the null), and no effect of relatedness on tau, t(149) = 0.66, SE = 5.99, p = .509 (scaled JZS Bayes factor of 8.9 favors the null hypothesis). The means for each parameter in this analysis can be seen in Table 6.
General discussion
The data are clear, in that each of four experiments yielded an SSE in the mean RTs; color naming was slower when the color carrier was a color-associated word that was incongruent with the displayed color, as compared to when the color carrier was a neutral word. These results replicate previous reports in the literature (e.g., Augustinova & Ferrand, 2012; Augustinova et al. 2010; Augustinova et al. 2015; Labuschagne & Besner, 2015; Manwell et al. 2004).
According to reports by Spieler et al. (1996) and Heathcote et al. (1991), the standard Stroop effect yields effects on the mean of the normal distribution (mu), the standard deviation of the normal distribution (sigma), and the tail (tau). In contrast, an ex-Gaussian analysis of the combined data from Experiments 1, 2, 3, and 4 here yielded an SSE only in mu; the null effect in tau was particularly small (the scaled JZS Bayes factor of 8.9 is considered positive evidence in favor of the null; Wagenmakers, 2007).
Conclusion
Despite intensive research on many aspects of the standard Stroop effect (see MacLeod’s, 1991, review), no articles in the published literature to date have reported an ex-Gaussian analysis of RTs in the context of the semantic Stroop effect. The present results suggest that there is a difference between the standard and semantically based Stroop effects in terms of how they affect the ex-Gaussian parameters. The former yields effects in mu, sigma, and tau, whereas the latter is restricted to mu (with an effect on sigma that only weakly favors the null hypothesis). Theoretically, the present data are consistent with the view that the SSE is largely confined to semantic interference, insofar as there is no effect in tau, which is argued to reflect the time taken to resolve response competition between the color and the color carrier word (Spieler et al. 1996). That said, we emphasize that there is no reason why response competition might not also be part of the effect in mu.
More generally, we emphasize the point made by Heathcote et al. (1991), as well as by Balota and Yap (2011), Yap et al. (2008), and others: An analysis of the distribution of RTs offers a more informative picture of mental processing than does the standard approach, which considers only mean RTs.
Notes
The analyses in Experiments 1 and 2 (and both subsequent experiments) were also performed while including all correct trials, to ensure that the results did not vary due to the outlier removal procedure. The results mirrored those of the analyses in which correct-trial outliers were removed.
The order of the cue conditions (single letter vs. all letters) in both of these experiments was counterbalanced across participants on the basis of order of arrival at the laboratory. No effect of order emerged; therefore, we averaged the data across orders.
References
Augustinova, M., & Ferrand, L. (2012). Suggestion does not de-automatize word reading: Evidence from the semantically based Stroop task. Psychonomic Bulletin & Review, 19, 521–527. doi:10.3758/s13423-012-0217-y
Augustinova, M., & Ferrand, L. (2014). Automaticity of word reading: Evidence from the semantic Stroop paradigm. Current Directions in Psychological Science, 23, 343–348. doi:10.1177/0963721414540169
Augustinova, M., Flaudias, V., & Ferrand, L. (2010). Single-letter coloring and spatial cuing do not eliminate or reduce a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review, 17, 827–833. doi:10.3758/PBR.17.6.827
Augustinova, M., Silvert, L., Ferrand, L., Llorca, P. M., & Flaudias, V. (2015). Behavioral and electrophysiological investigation of semantic and response conflict in the Stroop task. Psychonomic Bulletin & Review, 22, 543–549. doi:10.3758/s13423-014-0697-z
Balota, D. A., Tse, C., Hutchison, K. A., Spieler, D. H., Duchek, J. M., & Morris, J. C. (2010). Predicting conversion to dementia of the Alzheimer’s type in a healthy control sample: The power of errors in Stroop color naming. Psychology and Aging, 25, 208–218.
Balota, D. A., & Yap, M. J. (2011). Moving beyond the mean in studies of mental chronometry: The power of response time distributional analyses. Current Directions in Psychological Science, 20, 160–166. doi:10.1177/0963721411408885
Heathcote, A., Brown, S., & Cousineau, D. (2004). QMPE: Estimating lognormal, Wald, and Weibull RT distributions with a parameter-dependent lower bound. Behavior Research Methods, Instruments, & Computers, 36, 277–290. doi:10.3758/BF03195574
Heathcote, A., Popiel, S. J., & Mewhort, D. J. K. (1991). Analysis of response time distributions: An example using the Stroop task. Psychological Bulletin, 109, 340–347. doi:10.1037/0033-2909.109.2.340
Hohle, R. H. (1965). Inferred components of reaction times as functions of foreperiod duration. Journal of Experimental Psychology, 69, 382–386. doi:10.1037/h0021740
Labuschagne, E. M., & Besner, D. (2015). Automaticity revisited: When print doesn’t activate semantics. Frontiers in Psychology, 6, 1–7. doi:10.3389/fpsyg.2015.00117
MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163–203. doi:10.1037/0033-2909.109.2.163
Manwell, L. A., Roberts, M. A., & Besner, D. (2004). Single letter coloring and spatial cuing eliminates a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review, 11, 458–462. doi:10.3758/BF03196595
Masson, M. E. J. (2011). A tutorial on practical Bayesian alternative to null-hypothesis significance testing. Behavior Research Methods, 43, 679–690. doi:10.3758/s13428-010-0049-5
Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology, 57, 203–220. doi:10.1037/h0087426
Plourde, C. E., & Besner, D. (1997). On the locus of the word frequency effect in visual word recognition. Canadian Journal of Experimental Psychology, 51, 181–194.
Risko, E. F., Schmidt, J., & Besner, D. (2006). Filling a gap in the semantic gradient: Color associates and response set effects in the Stroop task. Psychonomic Bulletin & Review, 13, 310–315. doi:10.3758/BF03193849
Schmidt, J. R., & Cheesman, J. (2005). Dissociating stimulus-stimulus and response-response effects in the Stroop task. Canadian Journal of Experimental Psychology, 59, 132–138. doi:10.1037/h0087468
Spieler, D. H., Balota, D. A., & Faust, M. E. (1996). Stroop performance in healthy younger and older adults and in individuals with dementia of the Alzheimer’s type. Journal of Experimental Psychology: Human Perception and Performance, 22, 461–479. doi:10.1037/0096-1523.22.2.461
Spieler, D. H., Balota, D. A., & Faust, M. E. (2000). Levels of selective attention revealed through analyses of response time distributions. Journal of Experimental Psychology: Human Perception and Performance, 26, 506–526. doi:10.1037/0096-1523.26.2.506
Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779–804. doi:10.3758/BF03194105
Yap, M. J., Balota, D. A., Tse, C., & Besner, D. (2008). On the additive effects of stimulus quality and word frequency in lexical decision: Evidence for opposing interactive influences revealed by RT distributional analyses. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 495–513. doi:10.1037/0278-7393.34.3.495
Author note
This work was supported by Grant No. A0998 from the Natural Sciences and Engineering Research Council of Canada to D.B.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
White, D., Risko, E.F. & Besner, D. The semantic Stroop effect: An ex-Gaussian analysis. Psychon Bull Rev 23, 1576–1581 (2016). https://doi.org/10.3758/s13423-016-1014-9
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13423-016-1014-9