In cognitive psychology, responding to a task with a conflict has been assumed to induce an adjustment of attentional control. This adjustment allows us to activate goal-relevant features while inhibiting irrelevant ones (e.g., Botvinick, Braver, Barch, Carter, & Cohen, 2001). A current debate is about whether these control processes can generalize across conflicts (e.g., Boy, Husain, & Sumner, 2010; Kane et al., 2016; Miyake & Friedman, 2012; Rey-Mermet, Gade, & Oberauer, 2018; Rey-Mermet, Gade, Souza, von Bastian, & Oberauer, 2019; Stahl et al., 2014). Empirical support for the generalization of control processes comes from studies showing an interaction in reaction times (RTs) between different conflicts when the conflicts were combined within the same trial (Hommel, 1997, Experiment 2; Rey-Mermet & Gade, 2016; Stoffels & van der Molen, 1988, Experiment 2; Wendt, Kluwe, & Peters, 2006, Experiment 2a; but see Akçay & Hazeltine, 2011; Egner, Delano, & Hirsch, 2007; Hommel, 1997, Experiment 1; Kornblum, 1994; Simon & Berbaum, 1990; Wendt et al., 2006, Experiment 2b; Stoffels & van der Molen, 1988, Experiment 1). A closer inspection of these studies revealed, however, that the interaction was mainly observed when the interference induced by the conflict was large. The purpose of the present study was to determine whether large interference effects are the necessary condition for observing an interacting pattern between conflicts.

In laboratory paradigms, a conflict is induced in tasks in which an incongruent trial is presented (i.e., a trial in which the stimulus is associated with two different response alternatives). Typically, the conflict tasks are the Stroop task (e.g., MacLeod, 1991), the Simon task (e.g., Simon & Small, 1969), or the flanker task (e.g., Eriksen & Eriksen, 1974). These tasks are described in Table 1. In all three tasks, participants are slower and more prone to errors on incongruent trials than on congruent trials (i.e., trials in which information converge to one response (Eriksen & Eriksen, 1974; MacLeod, 1991; Simon & Small, 1969; Stroop, 1935). This congruency effect occurs because in incongruent trials, the relevant feature has to be activated (i.e., the print color for the Stroop task, the central color for the flanker task, and the location of the response key for the Simon task), whereas the irrelevant feature has to be ignored (i.e., the word meaning, the color of the flanking letters, and the location on the screen, respectively).

Table 1 Description of the Stroop, flanker, and Simon tasks

In a few studies, these conflict tasks have been combined pairwise within the same trial (see Table 2). For example, in the Stroop–Simon combination, participants were asked to respond to the color of the color word while ignoring the position of the word on the screen (e.g., Hommel, 1997, Experiment 1). In the flanker–Simon combination, participants were asked to classify the central target while ignoring the flanking characters as well as the position of the complete stimulus on the screen (e.g., Hommel, 1997, Experiment 2). In the flanker–Stroop combination, participants were asked to respond the color of the central letter while ignoring the flanking letters (e.g., Rey-Mermet & Gade, 2016, Experiment 1). Combining such conflict tasks resulted in mixed findings. Some studies showed no interaction between the congruency conditions of both tasks (Akçay & Hazeltine, 2011; Egner et al., 2007; Hommel, 1997, Experiment 1; Kornblum, 1994; Simon & Berbaum, 1990; Stoffels & van der Molen, 1988, Experiment 1; Weissman, 2019, Experiment 2; Wendt et al., 2006, Experiment 2b). That is, the congruency effect of one task was not affected by whether the trial was incongruent or congruent for the other task. In contrast, other studies showed an underadditive interaction (Frühholz, Godde, Finke, & Herrmann, 2011; Hommel, 1997, Experiment 2; Rey-Mermet & Gade, 2016; Rey-Mermet, Gade, & Steinhauser, 2019; Stoffels & van der Molen, 1988, Experiment 2; Weissman, 2019, Experiment 1; Wendt et al., 2006, Experiment 2a). That is, the congruency effect of one task was smaller when the trials were incongruent for the other task compared with when they were congruent. For example, when the Stroop and flanker tasks were combined, the Stroop congruency effect was reduced when the trials were flanker incongruent compared with when they were flanker congruent.

Table 2 Studies combining Stroop, Simon, and flanker tasks

The interaction between the congruency conditions of both tasks could be explained within the conflict monitoring framework (see, e.g., Botvinick et al., 2001) by assuming that the detection and/or resolution of one conflict affects the resolution of the other conflict. That is, the conflict monitoring system estimates the current levels of conflict for the first task to be processed. For example, in the flanker–Stroop combination, if the first task is the flanker task and this is incongruent (i.e., associated with a high degree of conflict), this leads to a shift of control signal. This shift triggers an enhancement of all top-down biasing processes. This means that not only the flanking letters (i.e., the irrelevant features for the flanker task) but also the meaning of the word (i.e., the irrelevant feature for the Stroop task) would be inhibited. If the Stroop task is also incongruent, this reduces the impact of the word meaning, and thus results in a smaller Stroop congruency effect (see also Egner, 2008). According to this account, control processes generalize across both conflicts, explaining the underadditive interaction (e.g., Boy et al., 2010; Rey-Mermet & Gade, 2016). This account has been challenged by a recent study in which event-related potentials (ERPs) were measured in addition to RTs (Rey-Mermet et al., 2019). In that study, the results showed a sequential resolution of the flanker and Stroop conflicts. That is, an early ERP component (i.e., P2) was associated with the resolution of the flanker conflict, while a later ERP component (i.e., N450) was associated with the resolution of the Stroop conflict. However, although the interaction was still observed in the reaction times (RTs), there was no interaction in the ERP component associated to the Stroop congruency effect. Thus, processing flanker incongruent or congruent trials did not affect the resolution of the Stroop conflict, questioning the conflict-monitoring explanation. To explain the interaction still observed in the RTs, we assumed that resolving the flanker conflict and then the Stroop conflict preactivated the motor responses via direct associations. Thus, by response selection or execution, the correct response would be activated at least twice. This results in faster responding when both Stroop and flanker tasks are incongruent compared with when only the Stroop task is incongruent, explaining the reduction of the Stroop congruency effect in flanker incongruent trials.

So far, the reasons why some studies showed an interaction while others did not are unclear. For example, the discrepancy of findings did not result from combining different conflict tasks pairwise (Stroop with Simon, Simon with flanker, or Stroop with flanker), using different types of stimuli (e.g., arrows or letters) or using different stimulus set size (e.g., two vs. four). Irrespective of these manipulations, the interaction between the congruency conditions of both tasks was sometimes observed and sometimes not (see Table 2). There is, however, one key finding: The interaction was mainly observed when the congruency effect of one task was large (see Table 2, right part). For example, the interaction was found between the Stroop and Simon congruency when the Stroop congruency effect was the largest. Similarly, the interaction was observed between the flanker and Simon congruency when the flanker congruency effect was the largest. This raises the possibility that an interaction of that sort is only observed when the congruency effect is large. The reason is that a large congruency effect can be more probably reduced in its magnitude than a smaller congruency effect (see Hommel, 1997). Thus, in studies reporting no interaction (see Table 2), the small congruency effects may have worked against the possibility of finding an interaction so that it is not possible to determine whether or not the cognitive processes resulting in the interaction were required.

The purpose of the present study was to determine whether a large congruency effect is a necessary condition for observing an interaction between the congruency conditions of two conflict tasks combined within the same trial. This is important because if finding an interaction in the RTs simply requires a substantial congruency effect, the theoretical explanations used to explain the interaction in RTs can no longer be validated or invalidated by the simple presence or absence of an interaction. In the present study, three experiments were conducted in which the Stroop task was combined with the flanker task. This task combination was selected because the congruency effect—in particular, the Stroop congruency effect—was the largest (see Table 2), and thus its magnitude should be best modulated. The size of the Stroop congruency effect was increased by three experimental manipulations: (a) by testing older adults in addition to young adults (Experiment 1); (b) by manipulating the proportion of catch trials in which participants were asked to respond to the meaning of the word (instead to respond the color of the color word; Experiment 2); and (c) by asking participants to respond vocally (Experiment 3). Thus, if a large congruency effect is a necessary condition for observing an interaction between Stroop and flanker congruency (Hommel, 1997), the interaction should be expected to be observed in older adults, in the conditions including catch trials, and with vocal responses.

In all three experiments, the data were analyzed using a null-hypothesis significance testing (NHST) approach with analyses of variance (ANOVAs) and Bayesian hypothesis testing approach with Bayesian ANOVA. Thus, the strength of evidence was assessed not only for the alternative hypothesis (e.g., the presence of the interaction), but also for the null hypothesis (e.g., the absence of the interaction). In addition, a state-trace analysis was performed (see Prince, Brown, & Heathcote, 2012, for an overview on the state-trace analysis). The goal of this analysis was to determine whether a difference due to the experimental manipulations results from additional processes or simply from large congruency effects.

Experiment 1

In Experiment 1, young and older adults were tested in the same design as the one used in previous studies (Rey-Mermet & Gade, 2016; Rey-Mermet et al., 2019). That is, the Stroop task was so integrated with the flanker task that the central letter of the color word was congruent or incongruent to the color of the flanking letters and congruent and incongruent to the meaning of the word (see Fig. 1). Participants were asked to respond to the color of the central letter while ignoring the meaning of the word and the color of the flanking letters. Here, the Stroop conflict results from the competition between the color of the central letter and the meaning of the word. As this single-letter Stroop task was found to eliminate the Stroop congruency effect (Besner, Stolz, & Boutilier, 1997), the presence of this congruency effect was ensured by presenting catch trials. In these trials, participants were asked to respond to the meaning of word while ignoring the color of the letters.

Fig. 1
figure 1

Example of one trial sequence. Participants were asked to indicate the color of the central letter while ignoring the meaning of the word and the flanking letters. The top panel shows example of stimuli for each trial type (Stroop incongruent–flanker incongruent, Stroop incongruent–flanker congruent, Stroop congruent–flanker incongruent, and Stroop congruent–flanker congruent)

Testing older adults has the advantage that older participants typically show larger Stroop congruency effects than young adults (Bugg, DeLosh, Davalos, & Davis, 2007; Spieler, Balota, & Faust, 1996; but see Rey-Mermet & Gade, 2018; Verhaeghen, 2014). Therefore, if a large Stroop congruency effect is the necessary condition for observing the interaction between the Stroop and flanker congruency conditions (Hommel, 1997), an interaction would be expected in both age groups. Moreover, if the size of the interaction depends from the magnitude of the Stroop congruency effect, the reduction of the Stroop congruency effect should be larger for older adults than for young adults. In contrast, if the interaction between the conflicts results from a generalization of control processes, the reduction of the Stroop congruency effect should be smaller for older adults than for young adults. The reason is that according to the account assuming age-related deficits in attentional-control processes (e.g., Hasher & Zacks, 1988; Hasher, Zacks, & May, 1999), older adults would be less able to recruit attentional control than young adults, and thus they would have a smaller shift of control signal when the flanker task is incongruent. This should result in a smaller adjustment of attentional control and thus in a smaller reduction of the Stroop congruency effect for older adults.

Comparing the RTs of young and older adults might be problematic because older adults show slower RTs than do young adults by a constant proportion across many speeded tasks (e.g., Cerella & Hale, 1994; Myerson, Hale, Wagstaff, Poon, & Smith, 1990). To take this into account, the analyses were performed not only on raw RTs but also on RTs after a natural logarithm (log) transformation was applied.

Method

Participants

Sixty-eight young adults (ages between 18 and 35 years) and 59 older adults (ages between 66 and 76 years) took part in the experiment.Footnote 1 All older adults and 24 young adults participated in a previous study (Rey-Mermet et al., 2018). For that study, participants performed the Mini-Mental State (MMS; Folstein, Folstein, & McHugh, 1975) to screen for dementia, the Consortium to Establish a Registry for Alzheimer’s Disease–Plus (CERAD-Plus) test battery (Satzger et al., 2001) to assess their overall cognitive status, the SF-36 Questionnaire (Bullinger, 1998) to assess their health status, and the Beck Depression Inventory–II (BDI-II; Hautzinger, Keller, & Kühner, 2006) to assess depression. In addition to the BDI-II, older adults performed the Geriatric Depression Scale (GDS; Yesavage et al., 1983). Demographic information as well as the results on these background measures are presented in Appendix A.

The study was carried out according to the guidelines of the ethics committee of the Faculty of Arts and Social Sciences at the University of Zurich. All participants gave informed consent. At study completion, participants received CHF 45 (about US$45) or course credits.

Material

Similar to Rey-Mermet and Gade (2016, Experiment 1a), the experiment consisted of a stimulus–response mapping block, flanker and Stroop pure blocks, as well as mixed blocks. The stimulus–response mapping block served as practice to learn the four-choice stimulus–response mapping. The flanker and Stroop pure blocks served to familiarize participants with each task separately and to control for the presence of congruency effects within each task. Mixed blocks were the critical blocks in which both tasks were combined. The stimuli for each block type are described separately.

Stimulus–response mapping block

The stimuli consisted of a row of four “*” colored in either red, brown, violet, or white. These colors were used in order to have German color words (“rot”, “braun”, “violett”, and “weiss”, respectively) with an odd number of letters and thereby appropriate to create a flanker conflict in the mixed blocks. The stimuli were presented on a black background at the center of the computer screen in 54-point Arial font. The stimulus for each trial was determined randomly.

Pure flanker blocks

For the flanker task, the stimuli were the middle letter of the German words for red, brown, violet, and white (i.e., o for “rot”, a for “braun”, l for “violett”, and i for “weiss”). Each letter was presented five times in a row, displayed either in red, brown, violet, or white. All stimuli were presented at the center of the computer screen in 54-point Arial Bold font. In comparison to Rey-Mermet and Gade (2016), the size of the stimuli was increased so that all stimuli were easily perceived by older adults. The stimulus was determined randomly for each trial. A trial was congruent when the color of the central letter was the same as the color of the flanking letters. A trial was incongruent when the color of the central letter was different from the color of the flanking letters. Congruent and incongruent trials were presented equally often.

Pure Stroop blocks

The stimuli were the four German words red, brown, violet, and white (i.e., “rot”, “braun”, “violett”, and “weiss”), displayed in either red, brown, violet, or white. The stimuli were presented on a black background at the center of the computer screen in 107-point Arial Bold font. The stimulus was determined randomly for each trial. A trial was congruent when the color and the meaning of the word corresponded (e.g., the color word “red” printed in red). A trial was incongruent when the word was printed in a noncorresponding color (e.g., the color word “red” printed in brown). Congruent and incongruent trials were presented equally often. For the pure Stroop blocks, 25% of the trials were catch trials in which participants were asked to indicate the meaning of the word while ignoring the color. In these trials, the words “Wort lesen” (i.e., “read the word” in English) appeared in blue 54-point Arial Bold font 150 pixels above the centrally presented stimulus. These catch trials were randomly interspersed in the block.

Mixed blocks

The material for the mixed blocks was the same as for the pure Stroop blocks, except for the following modifications. First, the middle letter of the German words for red, brown, violet, or white (i.e., o for “rot”, a for “braun”, l for “violett”, and i for “weiss”) were displayed either in the same color as in the meaning of the word (Stroop congruent) or in a different color (Stroop incongruent). Moreover, the flanking letters of the color words (e.g., “r” and “t” for “rot”) were presented either in the same color as the color of the middle letter (flanker congruent) or in a different color (flanker incongruent). Thus, there were four trial types, occurring with an equal probability of 25%: Stroop incongruent and flanker incongruent, Stroop incongruent and flanker congruent, Stroop congruent and flanker incongruent, Stroop congruent and flanker congruent (see Fig. 1, top panel). Similar to the pure Stroop blocks, 25% of the trials in the mixed blocks were catch trials in which participants were asked to indicate the meaning of the word while ignoring the colors. These trials were randomly interspersed in the block.

Procedure

A trial sequence is illustrated in Fig. 1 (bottom panel). A trial started with the presentation of a yellow fixation cross presented centrally for 500 ms. Then, the stimulus was presented until a response or 5,000 ms elapsed. In case of an error, the word “Fehler” (i.e., “error” in English) was displayed centrally in yellow for 500 ms. Finally, an intertrial interval of 1,000 ms elapsed. To respond, participants used four response keys (y, v, m, -), which represent every third key on the bottom row of a QWERTZ keyboard and which were mapped to the colors red, brown, violet, and white, respectively.

All participants first performed the stimulus–response mapping block for practice. After this block, half of the participants started with a pure flanker block, followed by a pure Stroop block, whereas the other half started with a pure Stroop block followed by a pure flanker block. After the first two pure blocks, all participants performed six mixed blocks. After these critical blocks, participants performed a pure flanker block and a pure Stroop block in the same order as in the first two pure blocks (e.g., participants who started with a pure flanker block, followed by a pure Stroop block, had then a pure flanker block, followed by a pure Stroop block). The stimulus–response block included 96 trials. Each pure block included 16 practice trials and 96 experimental trials. Each mixed block included 16 practice trials and 72 experimental trials. In comparison to Rey-Mermet and Gade (2016), the number of trials per mixed block was reduced in the present experiment so that large fatigue effects should be avoided in older adults. The number of blocks was, however, increased so that the number of observations was the same in the present experiment as in the previous one. An overall feedback, which consisted of mean RTs and error rate, was displayed at the end of each block. Participants could take brief rests after each block.

At the beginning of the experiment, participants were given short instructions about the task they had to carry out. Before each change of block type (e.g., from the pure flanker block to the mixed blocks), a short instruction for the now relevant task was also given. For the stimulus-response mapping block, participants were instructed to indicate the color of the stimulus. For the pure Stroop blocks, participants were instructed that for most trials they should indicate the color of the color word while ignoring the meaning of the word. However, when “read the word” appeared above the stimulus (catch trial), they should indicate the meaning of the word while ignoring the color of the color word. For the pure flanker blocks, participants were instructed to indicate the color of the central letter while ignoring the flanking letters. For the mixed blocks, participants were instructed that for most trials they should indicate the color of the central letter while ignoring the meaning of the word and the flanking letters. However, when “read the word” appeared above the stimulus (catch trial), they should indicate the meaning of the word while ignoring the colors.

Participants were tested in group up to five during two sessions of approximately 2 hours each (including breaks after every block and a longer break in the middle of each session). Both sessions were separated at least by 12 hours, and maximally by 1 week. At the beginning of the first session, participants signed the informed consent sheet, then both sessions were structured similarly. That is, participants first performed an unrelated working-memory task, and then were given the instructions for the stimulus–response mapping block. The experiment was programmed using Tscope5 (Stevens, Lammertyn, Verbruggen, & Vandierendonck, 2006) and run on IBM compatible computer.

Data preparation

The stimulus–response mapping block, which served as a practice block, and practice trials at the beginning of the pure and mixed blocks were not analyzed. The first trial of each block was considered as a warm-up trial and thus was excluded. The dependent variables were RTs and error rates. For both dependent measures, catch trials and the first trial following a catch trial were also removed in the mixed and pure Stroop blocks. This dismissed 44.41% of the trials for the mixed blocks, and 46.03% of the trials for the pure Stroop blocks. An arcsine square root transformation was applied to the error rates for statistical analysis. For RTs, errors and one trial following an error were removed from the raw data set. Additionally, outlier RTs were excluded by removing RTs faster than three standard deviations (SD) from the mean and slower than three standard deviations from the mean for each congruency condition, participant, and session, separately. This further dismissed 8.71% of the trials for the mixed blocks, 8% of the trials for the pure Simon blocks, and 6.53% of the trials for the pure flanker blocks. Speed differences between young and older adults were controlled for (e.g., Myerson et al., 1990; Salthouse, 1996; Salthouse & Babcock, 1991) by applying a natural logarithm (log) transformation to all RTs. In that way, if RTs were increased in older age only due to general proportional slowing, then log-transformed RTs should not differ between age groups.

Data analysis

For the statistical analyses, the focus was on standard (no-catch) trials from the mixed blocks. For the sake of completeness, performance on pure blocks is presented in Appendix B (see Tables 11 and 12 as well as Fig. 9), and performance on the catch trials of the mixed blocks is presented in Appendix C (see Table 16 and Fig. 12). In the mixed blocks, the goal was to investigate whether both age groups differed in the interaction between Stroop and flanker congruency conditions. To this end, in the NHST approach, a three-way ANOVA was carried out, with Stroop congruency (incongruent, congruent) and flanker congruency (incongruent, congruent) as within-subject factors, and age group (young, older) as a between-subjects factor. This was implemented in R (R Core Team, 2019) with the afex package (Singmann, Bolker, & Westfall, 2018). An alpha level of 0.05 was used for all these tests. Effect sizes are expressed as ηg2 values (generalized eta square; cf. Olejnik & Algina, 2003). In case of a significant three-way interaction, follow-up two-way ANOVAs were performed for each age group separately. In case of significant two-way interactions, follow-up t tests were performed, and effect sizes are expressed with Cohen’s d.

In addition to these ANOVAs, Bayesian ANOVAs were computed. These were implemented with default prior scales in R using the BayesFactor package (Morey & Rouder, 2015). The Bayes factor (BF) in favor of the alternative hypothesis (i.e., BF10) for the main effect model including either the Stroop and flanker congruency was computed by comparing the main effect model against the null model. The BF10 for the two-way interaction models was estimated by comparing the interaction model (i.e., the model with both main effects and the interaction) against the main-effects model (i.e., the model with both main effects, but no interaction). That is, this BF was computed as the ratio BFInteraction Model / BFMain Effects Model. The BF10 for the three-way interaction model was estimated by comparing the three-way interaction model (i.e., the model with both main effects, all two-way interactions, and the three-way interaction) against the model including the main effects and the two-way interactions. That is, this BF was computed as the ratio BFThree-way Interaction Model / BFMain Effects and Two-way Interactions Model (Rouder, Morey, Verhagen, Swagman, & Wagenmakers, 2017). For each model comparison, the Bayes factor in favor of the null hypothesis (BF01) was computed as 1/BF10. Raftery’s (1995) classification scheme was used to interpret the BF. Thus, a BF between 1 and 3 was considered as weak evidence, a BF between 3 and 20 was considered as positive evidence, a BF between 20 and 150 was considered as strong evidence, and a BF larger than 150 was considered as very strong evidence.

In a further step, a state-trace analysis was conducted (see Prince et al., 2012). The goal of this analysis was to determine whether cognitive processes are added in the Stroop congruency effect within and across age groups in flanker incongruent trials compared with flanker congruent trials. To this end, a state-trace plot was displayed in which the Stroop congruency effect for flanker congruent trials is plotted on the x-axis and the Stroop congruency effect for flanker incongruent trials is plotted on the y-axis. In this scatterplot, each participant of each age group (young and older) yields a data point. The aim of this plot was to investigate whether the data displayed in the state-trace plot can be explained with one single line or two lines (one for young adults, one for older adults). If a single line fit the data, it would imply that there was no difference in the Stroop congruency effect between both age groups in the relation between flanker congruent and incongruent trials. Therefore, no process was added in the older age group and a difference between both age groups only occurred because the Stroop congruency effect was larger in the older age group than in the young age group. If two lines were necessary, the interpretation would depend on the pattern that emerged from the two lines. If both lines were parallel to the diagonal but differ in elevation, it would imply that the relation between the Stroop congruency effect in flanker incongruent and congruent trials is additive. In this case, the intercept would be significantly different from zero. In contrast, if both lines diverge with a slope larger than 1, the impact of flanker incongruent trials on the Stroop congruency effect is considered as multiplicative.

To determine statistically whether one or two lines were necessary, the following linear regression model was fitted:

$$ {RT}_{Stroop\ effect- flanker\ incongruent,i}={\beta}_0+{\beta}_1\ \mathrm{age}\ \mathrm{group}+{\beta}_2\ {RT}_{Stroop\ effect- flanker\ congruent,i}+{\beta}_3\ \mathrm{age}\ \mathrm{group}\ \mathrm{x}\ {RT}_{Stroop\ effect- flanker\ congruent,i}+{\varepsilon}_i, $$

where RTStroop effect - flanker incongruent, i is the Stroop congruency effect for flanker incongruent trials from participant i, RTStroop effect - flanker congruent, i is the Stroop congruency effect for flanker congruent trials from participant i, age group is a dummy variable in which young and older age groups were coded with 0 and 1, respectively, β0 is the intercept, β1 is the effect of the age group (young vs. older) on the intercept, β2 is the slope relating flanker congruent trials to flanker incongruent trials, β3 is the effect of the age group on the slope, and εi is the residual for participant i.

As the goal was to determine the impact of age group on the Stroop congruency effect in flanker incongruent trials compared with flanker congruent trials, the primary focus was on the interaction term. Therefore, a full regression model was compared against a restricted model in which the interaction term was removed. Model selection was evaluated via multiple fit indices: R2, the Akaike information criterion (AIC), the Bayesian information criterion (BIC), and the deviance (= -2 × logarithmized likelihood). Except for R2, smaller indices indicate better fit. To examine if one model (restricted vs. full) fit the data reliably better than another, two analyses were performed. First, χ2 difference (Δχ2) tests were conducted on nested models. If the more complex model (i.e., the model with more free parameters) yields a reduction in χ2 that is significant given the loss of degrees of freedom, it is accepted as having better fit. Second, a Bayesian hypothesis test using the BIC approximation was performed (Wagenmakers, 2007). That is, the difference between the BIC for the null hypothesis (i.e., the restricted model) and the BIC for the alternative hypothesis (i.e., the full model) were used to compute a Bayes factor in favor of the null hypothesis (BF01). The Bayes factor in favor of the alternative hypothesis (BF10) was conducted as 1/BF01. Thus, if Δχ2 test was not significant and the BF01 constituted positive to very strong evidence, it would imply that the restricted model had a better fit than the full model. Thus, only one line would be sufficient to account for the data in the state-trace plot. In contrast, if Δχ2 test was significant and the BF10 constituted positive to very strong evidence, it would imply that the full model had a better fit. In this case, the estimates of the fixed parameters were examined. Only if the estimate of the interaction term was significant, two lines—one for the young age group and one for the older age group—were considered as necessary to account for the data in the state-trace plot.

Results

The most relevant results are whether participants in both age groups (i.e., young and older adults) differ in the modulation of the Stroop congruency effect depending on whether the trials are flanker congruent or incongruent (i.e., the interaction between both congruency variables). First, the results of the ANOVAs from the NHST and Bayesian hypothesis testing approach are reported. Then, the results for the state-trace analysis are presented.

Analysis of variance

The descriptive results for raw and log-transformed RTs as well as for error rates are presented in Fig. 2. The results from the NHST and Bayesian analyses are presented in Table 3. For RTs and log-transformed RTs, the ANOVA showed a significant three-way interaction between Stroop congruency, flanker congruency, and age group. The Bayesian analysis suggested strong to very strong evidence in favor of the three-way interaction model. This interaction was then disentangled by conducting follow-up two-way repeated-measures ANOVA, with the factors Stroop congruency and flanker congruency for each age group separately. The results of these follow-up ANOVAs are presented in Table 4. For the young adults, the ANOVA showed a significant two-way interaction for RTs and log-transformed RTs. The Bayesian analysis suggested very strong evidence in favor of the interaction model for both dependent measures. For RTs, the Stroop congruency effect was smaller, but still significant, when the trials were flanker incongruent (124 ms), t(67) = 18.25, p < .001, d = 2.21, BF10 = 3.12 × 1024, BF01 = 3.21 × 10-25, compared with when they were flanker congruent (178 ms), t(67) = 21.74, p < .001, d = 2.64, BF10 = 5.92 × 1028, BF01 = 1.69 × 10-29 (see Fig. 2a). A similar pattern was observed for log-transformed RTs (see Fig. 2b; Stroop congruency effect for flanker incongruent trials: M = 0.15, t(67) = 21.57, p < .001, d = 2.62, BF10 = 3.7 × 1028, BF01 = 2.66 × 10-29; Stroop congruency effect for flanker congruent trials: M = 0.22, t(67) = 25.90, p < .001, d = 3.14, BF10 = 1.78 × 1033, BF01 = 5.60 × 10-34). For the older adults, the ANOVA also showed a significant two-way interaction for RTs and log-transformed RTs (see Table 4). The Bayesian analysis suggested very strong evidence in favor of the interaction model for both dependent measures (see Table 4). For RTs, the Stroop congruency effect was smaller, but still significant, when the trials were flanker incongruent (142 ms), t(58) = 14.85, p < .001, d = 1.93, BF10 = 2.27 × 1018, BF01 = 4.41 × 10-19, compared with when they were flanker congruent (300 ms), t(58) = 14.06, p < .001, d = 1.83, BF10 = 2.00 × 1017, BF01 = 5.01 × 10-18 (see Fig. 2a). A similar pattern was observed for log-transformed RTs (see Fig. 2b; Stroop congruency effect for flanker incongruent trials: M = 0.14, t(58) = 16.54, p < .001, d = 2.15, BF10 = 3.19 × 1020, BF01 = 3.14 × 10-21; Stroop congruency effect for flanker congruent trials: M = 0.26, t(58) = 17.40, p < .001, d = 2.27, BF10 = 3.52 × 1021, BF01 = 2.84 × 10-22). Thus, the reduction of the Stroop congruency effect in flanker incongruent trials was larger for older adults (158 ms and 0.13) than for young adults (55 ms and 0.06 for RTs and for log-transformed RTs, respectively).

Fig. 2
figure 2

Experiment 1. Modulation of the Stroop congruency effect by flanker congruency within a trial for both age groups (young and older). Mean reaction times (RTs) with raw RTs (a) and log-transformed RTs (b) as well as mean raw error rates (c) for both Stroop and flanker congruency. Error bars represent within-subject confidence intervals (see Cousineau, 2005; Morey, 2008)

Table 3 Experiment 1: Inferential statistical values for the three-way analysis of variance (ANOVA), with Stroop congruency (incongruent, congruent) and flanker congruency (incongruent, congruent) as within-subject factors, and age group (young, older) as a between-subjects factor, and Bayes factors (BF) from the model comparisons
Table 4 Experiment 1: Inferential statistical values for the follow-up two-way repeated-measures analysis of variance (ANOVA), with the factors Stroop congruency (incongruent, congruent) and flanker congruency (incongruent, congruent), and Bayes factors (BF) from the model comparisons for each age group (young and older) separately

For the error rates, the three-way interaction between Stroop congruency, flanker congruency, and age group was significant, but the Bayesian analysis suggested inconclusive evidence regarding the interaction model including the three-way interaction. Nevertheless, the three-way interaction was disentangled by computing two-way repeated-measures ANOVA with the factors Stroop congruency and flanker congruency for each age group separately. For young adults, the two-way ANOVA showed no significant interaction, and the Bayesian analysis suggested positive evidence against the interaction model. Thus, the Stroop congruency effect was similar for flanker incongruent and congruent trials (.04 and .05, respectively). For the older adults, the two-way ANOVA showed a significant two-way interaction, and the Bayesian analysis suggested strong evidence in favor of the interaction model. In line with the RT results, the Stroop congruency effect in error rates was smaller, but still significant, when the trials were flanker incongruent (.01), t(58) = 3.56, p = .001, d = 0.46, BF10 = 35.10, BF01 = 0.03, compared with when they were flanker congruent (.03), t(58) = 6.47, p < .001, d = 0.84, BF10 = 5.63 × 105, BF01 = 1.78 × 10-6 (see Fig. 2c). Again, the reduction of the Stroop congruency effect in flanker incongruent trials was larger for older adults (.02) than for young adults (.005).

State-trace analysis

The state-trace plot is depicted in Fig. 3. Goodness-of-fit statistics and model comparison results are presented in Table 5 (upper part). Estimates of the fixed parameters are summarized in Table 6 (upper part). As shown in Table 5, model fit was relatively similar for both models (i.e., with and without the interaction term). For both dependent measures, the Δχ2 was not significant, and the Bayesian analysis suggested small to positive evidence for the model without the interaction term. Together, this indicates that the data displayed in the state-trace plot (see Fig. 3) were better explained with one single line.

Fig. 3
figure 3

Experiment 1: State-trace plot. Solid line results from the regression analysis using the young adults only. Dashed line results from the regression analysis using the older adults only. a Reaction times. b Log-transformed reaction times

Table 5 Experiments 1 and 2: State-trace analysis. Goodness-of-fit statistics and model comparison results between the full regression model and the restricted regression model (i.e., without the interaction term)
Table 6 Experiments 1 and 2: State-trace analysis. Estimates of the fixed parameters for the full regression model and for the restricted regression model (i.e., without the interaction term). Standard errors are presented in parentheses

In sum, the state-trace results speak against a difference in the Stroop congruency effect between both age groups in the relation between flanker congruent and incongruent trials. Therefore, no process was added in the older age group compared with the young age group.

Discussion

The results showed an underadditive interaction between Stroop and flanker congruency in both age groups for raw and log-transformed RTs. Critically, the reduction of the Stroop congruency effect in flanker incongruent trials was larger for older adults than for young adults. Thus, even after age-related differences in processing speed were removed, the size of the interaction sill increased when the magnitude of the Stroop congruency effect increased. These findings cannot be explained by an account assuming age-related deficits in attentional-control processes (e.g., Hasher & Zacks, 1988; Hasher et al., 1999). According to such an account, in comparison to young adults, older adults should show a smaller—not a larger—reduction of the Stroop congruency effect. In contrast, the results of Experiment 1 are in line with the view that a large Stroop congruency effect is more reduced and thus creates an optimal condition for finding an interaction between Stroop and flanker congruency (Hommel, 1997).

Experiment 2

As the single-letter Stroop task was found to eliminate the congruency effect (Besner et al., 1997), the size of the Stroop congruency effect was increased in Experiment 1 as well as in the previous experiments (Rey-Mermet & Gade, 2016; Rey-Mermet et al., 2019) by including catch trials. The same logic was used in Experiment 2, so that this experiment included two conditions: a 25%-catch condition, in which 25% of the trials were catch trials, and a 0%-catch condition, in which no catch trials were presented. The presence of a substantial Stroop congruency effect was, however, ensured in both conditions by using the standard Stroop task instead of the single-letter Stroop task. That is, in Experiment 2, participants saw three colored color words, and they were asked to respond to the color of the central word while ignoring the meaning of the words and the flanking words. Because the distance between the central word and the flanker words is critical for the size of the flanker congruency effect (e.g., Eriksen & Eriksen, 1974), the position of the flanker was also manipulated. For half of participants, flanker words were presented to the left or right of the central of word (see Fig. 4, top part). As this horizontal alignment might create some distance between the central and the flanking words, the flankers were also presented vertically (see Fig. 4, bottom part). That is, for the other half of participants, flanking words were presented above or below the central word. As in Experiment 1, if a large Stroop congruency effect is critical (Hommel, 1997), participants in the 25%-catch condition—compared with those in the 0%-catch condition—would show a larger reduction of the Stroop congruency effect in flanker incongruent trials than in flanker congruent trials.

Fig. 4
figure 4

Experiment 2. Example of one trial sequence presented when flanking words were displayed horizontally (horizontal flanker position, top part) and when flanking words were displayed vertically (vertical flanker position, bottom part). Participants were asked to indicate the color of the central word while ignoring the meaning of the words and the flanking words

Method

Participants

In total, 97 new young participants were tested. Two participants were removed (one because of accuracy rates at chance level, and one because of a technical error). The final sample consisted of 95 participants (74 women, mean age = 23.5 years, SD = 3.5). Twenty-three participants were assigned to the 25%-catch condition with vertical flankers; 24 were assigned to each of the remaining conditions (i.e., 25%-catch condition with horizontal flankers, 0%-catch condition with vertical flankers, and 0%-catch condition with horizontal flankers). At study completion, participants received CHF 15 (about US$15) or course credits.

Material

Material was the same as in Experiment 1, except for the following modifications. First, the stimuli were presented in 40-point Arial Bold font for the stimulus–response mapping block, and 28-point Arial Bold font for the pure Stroop and flanker blocks. Second, for the pure flanker blocks, each letter was presented three times in a row for the horizontal flanker position, and in a column for the vertical flanker position. The distance between the letters was 5 pixels in the horizontal flanker position and 25 pixels in the vertical flanker position. Third, for the mixed blocks, the same three color words were presented either in a row for the horizontal flanker position or in a column for the vertical flanker position (see Fig. 4). The central word was displayed either in the same color as in the meaning of the word (Stroop congruent) or in a different color (Stroop incongruent). The flanking words were presented either in the same color as the color of the central word (flanker congruent) or in a different color (flanker incongruent).

Procedure

The procedure was similar to Experiment 1, except for the following modifications. First, in the mixed blocks, participants were instructed to indicate the color of the central color word while ignoring the meaning of the word and the flanking color words. Second, each pure block included 192 trials, and each mixed blocks included 144 trials. Third, in the 0%-catch condition, no catch trials were presented. The 25%-catch condition was similar to Experiment 1 with 25% of the trials being catch trials. Overall, the experiment consisted of one session of about 90 minutes.

Data preparation

The data preparation was similar to Experiment 1, except for the following modifications. The trimming procedure dismissed 21.95% of the trials for the mixed blocks (44.36% when only the 25%-catch condition was considered), and 22.91% of the trials for the pure Stroop blocks (46.30% when only the 25%-catch condition was considered). The additional exclusions for RTs removed 11.86% of the trials for the mixed blocks, 13.07% of the trials for the pure Stroop blocks, and 9.40% of the trials for the pure flanker blocks.

Data analysis

Analyses for the pure blocks are presented in Appendix B (i.e., Tables 13 and 14, as well as Fig. 10), and analyses for the catch trials of the mixed blocks are presented in Appendix C (see Table 17 and Fig. 13). As in Experiment 1, the focus was on performance on standard (no-catch) trials, and ANOVAs from both a NHST and Bayesian hypothesis testing approach were conducted. That is, a four-way repeated-measures ANOVA were carried out with Stroop congruency (incongruent, congruent) and flanker congruency (incongruent, congruent) as within-subject factors, and catch condition (0%, 25%) and flanker position (horizontal, vertical) as between-subjects factors. In a further step, a state-trace analysis was performed similar to Experiment 1, except that age group (young vs. older) was replaced by catch condition (0% vs. 25%).

Results

The main objective of the study was to determine the impact of catch trials on the interaction between Stroop and flanker congruency. First, the results of the ANOVAs from the NHST and Bayesian hypothesis testing approach are reported. Then, the results of the state-trace analysis are presented.

Analysis of variance

For the ANOVAs, the most relevant results are the Stroop congruency effect, depending on whether the trials are flanker congruent or incongruent in both catch conditions (0% and 25%). The descriptive results are depicted in Fig. 5. The results of the four-way ANOVA are presented in Table 7.

Fig. 5
figure 5

Experiment 2. Modulation of the Stroop congruency effect by flanker congruency within a trial in both catch conditions (0% and 25%). Mean reaction times (RTs, a) and mean raw error rates (b) for both Stroop and flanker congruency. Error bars represent within-subject confidence intervals (see Cousineau, 2005; Morey, 2008)

Table 7 Experiment 2: Inferential statistical values for the four-way analysis of variance (ANOVA), with Stroop congruency (incongruent, congruent) and flanker congruency (incongruent, congruent) as within-subject factors, and catch condition (0%, 25%) and flanker position (horizontal, vertical) as between-subjects factors, and Bayes factors (BF) from the model comparisons

Of most interest, the ANOVA for RTs showed a significant interaction between Stroop congruency, flanker congruency, and catch condition. The Bayesian analysis suggested, however, only weak evidence in favor of the model including the three-way interaction. This interaction was disentangled by conducting follow-up two-way repeated-measures ANOVAs, with the factors Stroop congruency (incongruent, congruent) and flanker congruency (incongruent, congruent) for each catch condition separately. As shown in Table 8, the interaction between Stroop congruency and flanker congruency was not significant in the 0%-catch condition. In line with the NHST analysis, the Bayesian analysis suggested positive evidence against the interaction model. Thus, the magnitude of the Stroop congruency effect was relatively similar across both flanker incongruent and congruent trials (89 and 99 ms, respectively). In both trial types, it was significant, t(47) = 11.96, p < .001, d = 1.73, BF10 = 8.87 x 1012, BF01 = 1.13 x 10-13, and, t(47) = 11.68, p < .001, d = 1.69, BF10 = 3.91 x 1012, BF01 = 2.55 x 10-13, respectively. In the 25%-catch condition, however, the two-way ANOVA showed a significant interaction between Stroop congruency and flanker congruency, and the Bayesian analysis suggested positive evidence in favor of the interaction model (see Table 8). Thus, the Stroop congruency effect was smaller when the stimuli were flanker incongruent (156 ms), t(46) = 14.30, p < .001, d = 2.09, BF10 = 3.41 × 1015, BF01 = 2.93 × 10-16, compared with when they were flanker congruent (206 ms), t(46) = 13.83, p < .001, d = 2.02, BF10 = 1.02 × 1015, BF01 = 9.76 × 10-16. Together, this shows an underadditive interaction between Stroop and flanker congruency in the 25%-catch condition, but no interaction in the 0%-catch condition (see Fig. 5).

Table 8 Experiment 2: Inferential statistical values for the follow-up two-way repeated-measures analysis of variance (ANOVA), with the factors Stroop congruency (incongruent, congruent) and flanker congruency (incongruent, congruent), and Bayes factors (BF) from the model comparisons for each catch condition (0%, 25%) separately

For the error rates, the ANOVA from the NHST revealed no significant interaction including Stroop congruency and flanker congruency. In line with the NHST approach, the Bayesian hypothesis testing approach suggested small to positive evidence against the models including such interactions. Therefore, no speed–accuracy trade-offs compromised the effects of interest.

State-trace analysis

The state-trace plot is depicted in Fig. 6. Goodness-of-fit statistics and model comparison results are presented in Table 5 (lower part). Estimates of the fixed parameters are summarized in Table 6 (lower part). As shown in Table 5, model fit was better for the model without the interaction term than the model including the interaction term. Moreover, the Δχ2 was not significant, and the Bayesian analysis suggested positive evidence for the model without the interaction term. Thus, the data displayed in the state-trace plot were better explained with one single line (see Fig. 6).

Fig. 6
figure 6

Experiment 2: State-trace plot. Solid line results from the regression analysis using the 0%-catch condition only. Dotted line results from the regression analysis using the 25%-catch condition only. Reaction times are given in milliseconds

In sum, the state-trace results speak against a difference in the Stroop congruency effect between both catch conditions in the relation between flanker congruent and incongruent trials. Therefore, no process was added in the 25%-catch condition compared with the 0%-catch condition. This suggests that the interaction between Stroop and flanker congruency became significant in the 25%-catch condition only because the Stroop congruency effect was larger in that condition than in the 0%-catch condition.

Discussion

At first, the results of Experiment 2 suggested that the interaction between Stroop and flanker congruency only occurred when catch trials were occasionally presented (i.e., in the 25%-catch condition). However, the state-trace analysis indicated that the interaction was found in that condition only because the Stroop congruency effect was larger than in the condition without catch trials (i.e., the 0%-catch condition). These findings are in line with the view that a large Stroop congruency effect is a necessary condition for observing a significant interaction between Stroop and flanker congruency (Hommel, 1997).

Experiment 3

The goal of Experiment 3 was to increase the size of the Stroop congruency effect without presenting catch trials. Previous research has shown that the Stroop congruency effect was larger when responses were given vocally (e.g., Sharma & McKenna, 1998; Shichel & Tzelgov, 2018). Experiment 3 thus had the same design as the 0%-catch condition of the previous experiment, but participants were asked to respond vocally. Again, if a large Stroop congruency effect is the necessary condition for observing an interaction (Hommel, 1997), the interaction between Stroop and flanker congruency would be expected to be significant in Experiment 3.

Method

Participants

In total, 28 new young participants were tested. Two participants were removed because they already participated in a previous study using a similar design, and three participants were removed because the computer malfunctioned. The final sample consisted of 23 participants (21 women, mean age = 21.7 years, SD = 2.2). The study was carried out according to the guidelines of the ethics committee of the Catholic University of Eichstätt-Ingolstadt. Informed consent was acquired from all participants. At study completion, participants received 8 € per hour (about US$9) or course credits.

Material and procedure

Material and procedure were similar to the 0%-catch condition with vertical flankers of Experiment 2, except that participants were asked to respond vocally. To this end, a microphone was used, and the experiment was programmed using the E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA). The stimulus remained on-screen until the participant responded vocally into the microphone, which stopped the timer and eliminated the stimulus from the screen. RT in milliseconds was measured from the stimulus onset until the participant responded. Then, the screen blanked for 500 ms. During this blank screen, the experimenter typed whether or not the response was correct so that the computer could evaluate errors. In case of an error, the word “Fehler” (i.e., “error” in English) was displayed for 500 ms. Finally, an intertrial interval of 500 ms occurred. In addition, the pure Stroop and flanker blocks were divided into miniblocks of 96 trials each so that participants had more breaks. In the mixed and pure flanker blocks, the distance between the flankers and the central stimulus was 0.1 cm. All stimuli were presented in uppercase in 28-point Arial Bold font.

Data preparation

The same data preparation was used as in Experiment 1, except that trials in which the microphone was malfunctioning and the experimenter did not code the accuracy of the response were additionally excluded (i.e., 1.01% of trials for the mixed blocks, 1.51% for the pure Stroop blocks, and 1.54% for the pure flanker blocks). Overall, the trimming procedure dismissed 1.70% of the trials for the mixed blocks, 2.03% of the trials for the pure Stroop blocks, and 2.05% of the trials for the pure flanker blocks. The additional exclusions for RTs removed 8.19% of the trials for the mixed blocks, 6.82% of the trials for the pure Stroop blocks, and 4.69% of the trials for the pure flanker blocks.

Data analysis

Analyses on pure blocks are reported in Appendix B (see Tables 14 and 15 as well as Fig. 11). Performance from the mixed blocks were analyzed using both a NHST and Bayesian hypothesis testing approach. That is, a two-way repeated-measures ANOVA was conducted with the factors Stroop congruency (incongruent, congruent), and flanker congruency (incongruent, congruent).

Results

In Experiment 3, the focus was on the modulatory effect of the flanker congruency on the Stroop congruency effect. The most relevant results are thus the RTs from the Stroop congruency effect, depending on whether the trials are flanker congruent or incongruent. The descriptive results are depicted in Fig. 7. The results of the two-way ANOVA are shown in Table 9.

Fig. 7
figure 7

Experiment 3. Modulation of the Stroop congruency effect by flanker congruency within a trial. Mean reaction times (RTs, a) and mean raw error rates (b) for both Stroop and flanker congruency. Error bars represent within-subject confidence intervals (see Cousineau, 2005; Morey, 2008)

Table 9 Experiment 3: Inferential statistical values for the two-way repeated-measures analysis of variance (ANOVA), with the factors Stroop congruency (incongruent, congruent) and flanker congruency (incongruent, congruent), and Bayes factors (BF) from the model comparisons

Of most interest, the ANOVA on the RTs showed a significant two-way interaction between Stroop congruency and flanker congruency. Bayesian analysis suggested small evidence in favor of the interaction model. The interaction was disentangled by conducting follow-up t tests. The Stroop congruency effect was smaller when the stimuli were flanker incongruent (94 ms), t(22) = 17.03, p < .001, d = 3.55, BF10 = 1.65 × 1011, BF01 = 6.04 × 10-12, compared with when they were flanker congruent (109 ms), t(22) = 20.63, p < .001, d = 4.30, BF10 = 7.26 × 1012, BF01 = 1.38 × 10-13. For the error rates, the ANOVA from the NHST approach showed no significant interaction, and Bayesian analysis suggested positive evidence against the interaction model (see Table 9). Together, this shows an underadditive interaction between Stroop and flanker congruency for the RTs.

Discussion

The results of Experiment 3 showed an underadditive interaction in RTs between Stroop and flanker congruency when participants were asked to respond vocally and when no catch trials were presented. These findings further emphasize the magnitude of the Stroop congruency effect as the key component in observing an interaction between congruency conditions of conflict tasks in RTs (Hommel, 1997).

General discussion

The purpose of the present study was to determine whether a large congruency effect is a necessary condition for observing an interaction in RTs between the congruency conditions of two conflict tasks combined within the same trial. To this end, the Stroop and flanker tasks were integrated within the same trial by asking participants to respond the color of the central letter or word while ignoring the meaning of the word and the color of the flanking letters or words. Critically, the size of the Stroop congruency effect was increased by three different manipulations. First, older adults were tested (Experiment 1). Second, the proportion of catch trials (0% vs. 25%)—that is, trials in which participants were asked to respond to the meaning of the word instead of responding to the color of the color word—was manipulated (Experiment 2). Third, participants were asked to respond vocally (Experiment 3). Data were analyzed using different analyses: ANOVA from the NHST approach, Bayesian ANOVA from the Bayesian hypothesis testing approach, and a state-trace analysis. The results of the present study are summarized in Fig. 8, which illustrates the magnitude of the interaction and of the Stroop congruency effect for each participant of each experimental condition in each experiment, as well as both effects averaged across experimental conditions in each experiment.

Fig. 8
figure 8

Modulation of the interaction between Stroop and flanker congruency (i.e. the Stroop congruency effect in flanker congruent trials minus the Stroop congruency effect in flanker incongruent trials) by the magnitude of the Stroop congruency effect. Reaction times are given in milliseconds. In this scatterplot, each empty point represents a participant of each experimental condition (i.e., young or older adults either in the 0%- or 25% catch condition for Experiment 1, Experiment 2, or Experiment 3). Each solid point represents the mean for each experimental condition and each experiment. The solid line results from the linear mixed model in which the magnitude of the interaction was predicted by the fixed Stroop congruency effect with experimental condition and participant as random intercept effects. The slope relating the Stroop congruency effect to the magnitude of the interaction was significant (slope = 0.67, standard error = .07, t(226.14) = 10.07, p < .001)

The results showed an interaction in RTs between Stroop and flanker congruency in conditions with a large congruency effect, such as when young and older adults performed a single-letter Stroop task with 25% of catch trials (Experiment 1), when young adults performed a standard Stroop task with 25% of catch trials (Experiment 2, 25%-catch condition), and when young adults responded vocally to the standard Stroop task (without any catch trials, Experiment 3). In all these conditions, the Stroop congruency effect was larger than 100 ms. There was only one condition in which no significant interaction between the congruency conditions was observed, namely when young adults performed the standard Stroop task without any catch trials (Experiment 2, 0%-catch condition). In this condition, the congruency effect was slightly smaller than 100 ms (i.e., 94 ms). Moreover, as shown in Fig. 8, the size of the interaction was found to be affected by the magnitude of the Stroop congruency effect. This was also observed in Experiment 1 in which older adults were tested in addition to young adults. In this experiment, the reduction of the Stroop congruency effect in flanker incongruent trials was still substantial, even after differences in general processing speed between young and older adults was taken into consideration by applying a log-transformation to all RTs.

At first glance, it might be surprising that as soon as the Stroop congruency effect was slightly under 100 ms (i.e., in the 0%-catch condition of Experiment 2), the interaction was no longer significant. In contrast, it was sufficient that the Stroop congruency effect was slightly above the limit of 100 ms to observe a significant interaction between Stroop and flanker congruency (e.g., 102 ms in Experiment 3). However, participants were overall faster at responding in Experiment 3 than in Experiment 2 (608 ms vs. 755 ms for the 0%-catch condition, respectively). Thus, when the Stroop congruency effect is expressed as a proportional score (i.e., as the difference in RTs between Stroop incongruent and congruent trials, divided by the RTs of Stroop congruent trials), the differences between Experiment 2 (0%-catch) and Experiment 3 become more evident (0.12 and 0.17, respectively).Footnote 2

This suggests that it is not the magnitude of the Stroop congruency effect per se which is critical for the interaction between Stroop and flanker congruency. It is rather the magnitude in relationship with the overall response speed (Frühholz et al., 2011; Hommel, 1997; but see Rey-Mermet & Gade, 2016). This possibility was investigated by computing delta plots for each experiment (see Appendix D). Delta plots illustrate the changes in the congruency effect as a function of response speed (Pratte, Rouder, Morey, & Feng, 2010). In case of the Stroop task, delta plots show an increase of the congruency effect as responses slow down. As shown in Appendix D, this pattern of changes was similar for flanker incongruent trials and flanker congruent trials across the different conditions and experiments. Critically, as soon as the mean RTs were slower than ca. 800 ms, the size of the interaction—that is, the difference in the Stroop congruency effect between flanker incongruent and flanker congruent trials—also increased. This occurred irrespective of the conditions (young vs. older, 0%-catch vs. 25%-catch) and the experiments. These results are in in line with the findings of the state-trace analysis showing that the Stroop congruency effect was simply larger for the older adults than for the young adults in Experiment 1, and for the 25%-catch condition than for the 0%-catch condition in Experiment 2. Thus, although there was a difference between age groups and between both catch conditions, no additional process is necessary to explain the interaction.

Now, the question is: Why is a Stroop congruency effect of 100 ms a limit to observe the interaction between Stroop and flanker tasks? First, the reason of that limit might be some biological constraints, such as the speed of execution when pressing the response key or when articulating the response. Another possibility may be related to the processes underlying the Stroop congruency effect. In previous research, the Stroop congruency effect has been assumed to results from at least two processes, such as the interference induced by incongruent trials and the priming or facilitation induced by congruent trials (e.g., Glaser & Glaser, 1982; Tzelgov, Henik, & Berger, 1992), or the processes induced by a task conflict and those induced by the response conflict (e.g., Steinhauser & Hübner, 2009). A limit of 100 ms in the Stroop congruency effect may suggest that when the Stroop congruency effect is smaller than that limit, one of the two processes or conflicts is not present or not sufficiently processed, thus resulting in the absence of the interaction between the congruency conditions. In particular, in a previous experiment (Rey-Mermet & Gade, 2016, Experiment 1c), the necessity of interference in Stroop incongruent trials was put forward because the interaction between Stroop and flanker congruency was observed when the Stroop congruency effect was measured as the difference between incongruent trials and neutral trials (i.e., rows of colored Xs), but not when it was measured as the difference between congruent and neutral trials. Thus, it is possible that experiencing interference in the Stroop incongruent trials is necessary to observe the interaction between Stroop and flanker congruency, in particular to preactivate the correct response as proposed in Rey-Mermet et al. (2019) account.

In the present study, we increased the Stroop congruency effect in three different ways, that is: (1) by testing older adults, (2) by manipulating the proportion of catch trials in which participants were asked to respond to the word meaning instead of responding to the print color, and (3) by asking participants to respond vocally. One may wonder whether the cognitive processes triggered by each of these manipulations were not important for the interaction to occur and whether these processes did not differ across the different manipulations. For example, as catch trials require participants to switch between two decisions (i.e., responding to the word meaning vs. responding to the print color), the cognitive processes underlying Stroop performance may be affected by this task switching (see, e.g., Kalanthroff & Henik, 2014). Whether these cognitive processes are affected in the same way by vocal responding is so far an open question. To provide a well-founded response, further research is necessary in which Stroop performance in the flanker–Stroop combination would be decomposed by using ex-Gaussian distribution analysis (e.g., Steinhauser & Hübner, 2009) or diffusion modeling (e.g., White, Servant, & Logan, 2018), for example.

To summarize, the findings of the present study show that when conflict tasks—such as the Stroop and flanker tasks—are combined within the same trial, finding an interaction in RTs between the congruency conditions of both tasks requires a large congruency effect, which, in turn, requires slow responses. This emphasizes that finding interactions of that sort can be used to validate or invalidate theoretical explanations only when the precondition—that is, a large congruency effect—is fulfilled.