Conflicts arise at many processing levels throughout perception and action, influencing the efficiency of our actions. A predominant idea of how our cognitive system deals with such conflicts refers to the existence of a conflict-monitoring system (Botvinick, Braver, Barch, Carter, & Cohen, 2001). Whenever conflict is detected and evaluated by this system, reactively adapted mechanisms of cognitive control increase processing selectivity (i.e., selective attention) so that conflict is reduced in subsequent processing episodes. For example, attentional biasing of task-relevant aspects is strengthened and task-irrelevant aspects have less influence, resulting in improved performance (Egner, 2007).

Conflicts can be induced experimentally by providing incongruent stimulus information. For example, in the Stroop task (Stroop, 1935), participants are supposed to indicate the font color of written color words. In congruent trials, the word “red” is also displayed in red, and “red” is the correct answer, whereas in incongruent trials, the word “red” might be displayed in blue (with “blue” as the correct answer). Here, conflict arises between the automatically read color word and the to-be-named font color. Performance is usually better in congruent (no-conflict) than in incongruent (conflict) trials, termed the congruency effect. A similar effect can also be found in flanker tasks (Eriksen & Eriksen, 1974) or Simon tasks (Simon, 1969; Simon & Rudell, 1967).

According to the conflict-monitoring hypothesis (Botvinick et al., 2001; Botvinick, Cohen, & Carter, 2004), conflict detection in a current incongruent trial will invoke mechanisms of cognitive control—for example, increasing the processing weight of the relevant features. This adaptation results in a reduction of the congruency effect after a preceding incongruent trial as compared to after a congruent trial. The first empirical evidence for this sequential modulation was reported by Gratton, Coles, and Donchin (1992). This pattern has proven robust across tasks and paradigms (see Egner, 2007, for a review), although the size of the effect can be manipulated by factors such as reward (van Steenbergen, Band, & Hommel, 2009) or mood states (Schuch & Koch, 2015; van Steenbergen, Band, & Hommel, 2010).

However, there is evidence that adaptation of cognitive control only occurs when the target-specific features remain constant. Kiesel, Kunde, and Hoffmann (2006) investigated the sequential modulation of congruency effects in task switching and found the typical pattern of reduced congruency effects after an incongruent trial only for task repetitions, but not for task switches. For task switches, the congruency in trial n–1 had no effect on the congruency effect in trial n. The authors regarded this as evidence for task-specific conflict resolution.

The notion of specific conflict adaptation is also supported by other studies (see Braem, Abrahamse, Duthoo, & Notebaert, 2014, for an overview). Hazeltine, Lightman, Schwarb, and Schumacher (2011) employed a flanker task including unimodally presented auditory and visual stimuli and found conflict adaptation effects only in successive trials of the same modality. Likewise, Fischer, Plessow, Kunde, and Kiesel (2010) combined sequential congruency effect and dual-task analyses (using a Simon task under single- or dual-task conditions). Remarkably, conflict adaptation was only observed when the task conditions repeated in successive trials (i.e., both were single-task or both were dual-task conditions). Moreover, Akçay and Hazeltine (2011) found conflict adaptation effects only for the same type of conflict (Simon or flanker conflict), and not across conflict types in successive trials.

Here we propose that these seemingly diverse variations of sequential modulation of the congruency effect can be explained by invoking a general principle. We argue that sequential modulation effects are generally based on reliance on contextual information that might signal the need to maintain previous task-relevant cognitive settings (e.g., attentional biases, activated stimulus–response mappings). If so, any change of processing context should disrupt the continuity of cognitive processing across episodes, which in turn should abolish sequential modulation effects.

We tested our proposal using a crossmodal congruency task, in which explicit modality cues indicated the target modality in each trial, followed by simultaneously presented visual and auditory stimuli (number words or visual objects and sounds presented laterally). The stimuli could thus be congruent or incongruent with each other, and we assessed the sequential modulation of congruency effects for modality repetitions and switches separately. Moreover, two different tasks were included, in counterbalanced order, for replication and generalization reasons. Participants performed either a location judgment or a numerical judgment task throughout an entire block of trials.

Importantly, the task itself remained constant within a block, and only the target modality was subject to switches. According to Braem et al. (2014), the conflict-monitoring hypothesis states that conflict monitoring only applies when the task-relevant information remains constant (see also Notebaert & Verguts, 2008). Notebaert and Verguts defined task-relevant information in relation to the cognitive operation of the task, and thus the type of decision that had to be made (e.g., location judgment, in our case). In line with this definition, we used the same cognitive operation in successive trials and only varied target modalities between trials. If task relevance was exclusively related to the cognitive operation, we should find sequential modulation of the congruency effects in both modality repetitions and switches.

If target-specific changes result in a shift of episodic context, which in turn evokes an “attentional reset,” we should not find sequential congruency effects after modality switches. In contrast, in modality repetitions, the processing context remains unchanged, so we expected to find the typical pattern of reduced congruency effects after incongruent trials.

Method

Participants

Twenty students (17 female, three male; mean age of 21 years, SD = 2 years) participated, gave informed consent, and received course credits or monetary compensation (€4). Participants reported normal hearing and normal or corrected-to-normal vision. We excluded the data of one additional participant and retested the condition, because high error rates resulted in missing data for the response time (RT) analysis. With this sample size, our power to detect an effect of Cohen’s d z = 0.66 was approximately .80 (Faul, Erdfelder, Lang, & Buchner, 2007).

Apparatus and stimuli

Participants were seated at 66 cm distance in front of a 21-in. monitor (60 Hz; 1,024 × 768 pixels). Auditory material was presented via Sennheiser PMX 60 headphones. The SR Research Experiment Builder (SR Research Ltd., Mississauga, Ontario, Canada) was used to program the experiment.

Cues indicated the target modality. The auditory cue was a binaural 600-Hz tone. A white “x” served as the visual cue, which was centrally presented against the black screen. In the location judgment task, the bimodal stimulus consisted of a 400-Hz tone presented to the left or right ear and a white diamond of 8-mm side length (diagonal: 11 mm) presented on the left or right side of the screen (172 mm off the center). In the numerical judgment task, the German number words for “two” and “eight” were used as stimuli (zwei and acht). The respective visual number word was presented in small white letters at screen center (Courier, 20 pixels). The auditory number word was presented binaurally. The auditory number words were recorded in cooperation with the Institute of Technical Acoustics at RWTH Aachen University (see Koch, Lawo, Fels, & Vorländer, 2011). In the case of an erroneous response or response omission (and thus RT > 1,500 ms), the German word for “error” appeared centrally on the screen (Fehler; Courier, 20 pixels, white).

Participants’ manual responses were recorded using the left and right “Alt” keys on a centrally placed QWERTZ keyboard. The location judgment of the stimulus corresponded spatially to the left and right keys, respectively. For the numerical judgment, the left key always corresponded to a “smaller” and the right key to a “larger” judgment.

Procedure

Participants were asked for a location (left vs. right) or numerical (smaller vs. larger than five) judgment of the target stimulus by responding manually. They were instructed to respond both quickly and accurately. Each trial started with a binaural auditory or central visual cue for 100 ms, indicating the stimulus’s target modality. A pause of 100 ms followed, resulting in a cue–target interval of 200 ms. Stimuli were presented bimodally. Participants performed either the location judgment or the numerical judgment task first (counterbalanced order across participants). In both tasks, the visual and auditory stimuli could indicate the same or different responses (i.e., congruent vs. incongruent; e.g., tone and diamond to the left vs. tone to the right and diamond to the left). Stimuli were presented for a maximum of 470 ms (the visual stimuli disappeared upon response) in randomized order, with the constraints of a run length not exceeding four trials of the same target modality and an equal distribution of all stimulus combinations. The maximum RT was set to 1,500 ms. Error feedback was presented for 500 ms in erroneous trials. The response–cue interval was 900 ms.

Participants completed eight blocks of 80 trials each, preceded by eight practice trials containing all possible stimulus combinations once. The experiment lasted about 30 min.

Design

The independent within-subjects variables were modality transition (repetition or switch), congruency in trial n (congruent or incongruent), congruency in trial n–1 (congruent or incongruent), and task (location or numerical judgment). The results were collapsed over the visual and auditory target modalities because modality-specific effects were not of particular interest here.Footnote 1 The location and numerical judgment tasks were blocked with a counterbalanced order. The levels of the other variables were randomized, and the levels of target modality and congruency were distributed equally. RT and error rate (ER) were the dependent variables, and the significance level was set at α = .05.

Results

Practice trials and the first trial in each block were excluded from the data analysis. For the RT analysis, trials with RTs shorter than 50 ms were discarded (only two trials), as well as outlier RTs more than ±3 SDs from a participant’s mean (1.5 %). Moreover, error trials (errors, no responses, and slow responses) and trials following errors were discarded (17.7 %). For the error analysis, only trials with RT < 50 ms were excluded.

Two four-way analyses of variance (ANOVAs) with repeated measures designs were conducted for the RT and ER analyses separately. Mean RTs and ERs (of data not collapsed across modalities) are depicted in Table 1. Figure 1 illustrates the congruency effects for the two tasks and modality transitions as a function of congruency in trial n–1. We calculated congruency effects as the performance difference between incongruent and congruent trials.

Table 1 Summary of mean response times (RTs, in milliseconds) and error rates (ERs, in percentages) of noncollapsed data as a function of task (location judgment, numerical judgment), modality (auditory, visual), modality transition (repetition, switch), congruency in trial n (congruent, incongruent), and congruency in trial n–1 (congruent, incongruent)
Fig. 1
figure 1

Congruency effects for response time (RT, upper panel) and error rate (ER, lower panel) as a function of modality transition (repetition, switch) and congruency in trial n–1 (congruent, incongruent) for location and numerical judgments. Error bars indicate 95 % confidence intervals of the congruency effects

We predicted that sequential modulation of the congruency effect (i.e., the two-way interaction of congruency in trial n and congruency in trial n–1) should occur primarily with modality repetitions (i.e., entering a three-way interaction with modality transition). This pattern should be found for both the location and numerical tasks.

For the RT data, we observed a significant four-way interaction of task, modality transition, congruency in trial n, and congruency in trial n–1, F(1, 19) = 6.99, p = .016, η p 2 = .27. We conducted further analyses to disentangle this interaction. The predicted three-way interaction of modality transition, congruency in trial n, and congruency in trial n–1 was significant both for the location judgment task, F(1, 19) = 39.00, p < .001, η p 2 = .67, and for the numerical judgment task, F(1, 19) = 6.21, p = .022, η p 2 = .25. The data patterns were qualitatively similar, and the pattern was only more pronounced for location judgments: Congruency effects in trial n were reduced after incongruent trials n–1 relative to congruent trials n–1, but only for modality repetitions, not for switches.Footnote 2

More specifically, for location judgments, modality repetitions yielded a larger congruency effect in trial n after a preceding congruent trial than after an incongruent trial (93 vs. 27 ms), t(19) = 4.44, p < .001. For modality switches, the congruency effects in trial n did not differ significantly (congruent trial n–1, 43 ms; incongruent trial n–1, 65 ms), t(19) = 1.70, p = .106. For numerical judgments, modality repetitions showed larger congruency effects after congruent than after incongruent trials in n–1 (60 vs. 26 ms), t(19) = 2.19, p = .042, whereas modality switches were not associated with reduced congruency effects in trial n after incongruent trials (congruent trial n–1, 55 ms; incongruent trial n–1, 56 ms), t(19) = 0.05, p = .962.Footnote 3 Figure 1, upper panels, presents this data pattern. For the present purposes, the other effects were theoretically less relevant, but for completeness, the full ANOVA results are listed in Table 2.

Table 2 Summary of ANOVA effects in RTs and ERs

For the ER analysis, the predicted three-way interaction of modality transition, congruency in trial n, and congruency in trial n–1 was significant, too, F(1, 19) = 23.60, p < .001, η p 2 = .55. However, for ERs this interaction did not depend significantly on task, F(1, 19) = 1.36, p = .258, η p 2 = .07. Further analyses indicated that the reduced congruency effect after an incongruent trial was only present for modality repetitions, F(1, 19) = 47.67, p < .001, η p 2 = .72, but not for switches, F < 1, p > .992 , η p 2 < .01. Figure 1 (lower panels) shows these congruency effects. For location judgments, modality repetitions showed a reduction of the congruency effect of 17.8 % (congruent n–1) to 7.9 % (incongruent n–1), t(19) = 5.27, p < .001, but no significant reduction in task switches (15.7 % to 11.9 %), t(19) = 1.87, p = .078. Likewise, for numerical judgments, we observed a reduction in modality repetitions (11.7 % to 3.9 %), t(19) = 3.72, p = .001, but not in switches (10.3 % to 14.0 %), t(19) = 1.96, p = .065.

Discussion

Synopsis of results

We found the sequential modulation of modality congruency effects distinctively only for target modality repetitions both in RTs and ERs, and in both tasks (though the pattern was more pronounced in location task RTs). Yet, whenever the modality switched from one trial to the next, no sequential modulation of the congruency effect emerged.

Conflict monitoring

Reduced congruency effects after incongruent trials are interpreted as an indication of the presence of conflict monitoring (Botvinick et al., 2001; Botvinick et al., 2004; see also Scherbaum, Fischer, Dshemuchadse, & Goschke, 2011, for a discussion of across-trial vs. within-trial modulations). In response to conflict detection, mechanisms of cognitive control increase processing selectivity. Consequently, performance in succeeding conflicting situations is improved, thus resulting in reduced congruency effects after incongruent trials.

In line with this theorizing, we consistently found a reduction of congruency effects after incongruent trials with modality repetitions.Footnote 4 Here, mechanisms of cognitive control seem to be successfully adapted after a conflict, so that succeeding conflicts are less harmful to performance. However, this pattern occurs only for consecutive situations with the same target modality (i.e., modality repetitions). Importantly, successive trials with a modality shift do not show evidence for conflict adaptation. Using visual stimuli, Kiesel et al. (2006) found conflict adaptation effects only for task repetitions, and not for switches. They suggested task-specific conflict resolution. In analogy to their argument, we interpret our findings as evidence for modality-specific conflict resolution. The perceived conflict of trials within one target modality does not affect performance in the next trial adaptively if the target modality changes. Adaptation of cognitive control thus seems to refer to an increase in attentional selectivity that is specific to the target modality of the preceding trial, so that it does not carry over across modality switches.

As an alternative to conflict-monitoring and cognitive-control accounts, it has been argued that the sequential modulation of congruency effects can be explained by partial mismatch/repetition costs (Braem et al., 2014; Egner, 2007; Hommel, Proctor, & Vu, 2004; U. Mayr, Awh, & Laurey, 2003). According to the theory of event coding (Hommel, Müsseler, Aschersleben, & Prinz, 2001), co-occurring stimuli and responses are bound into one event file. If this pairing of stimuli and the associated responses repeats between trials, the episodic memory representation can be easily retrieved and will speed up responses. If, however, only one part of the event file repeats (partial mismatch/repetition), the other part is automatically activated, and this activation needs to be overcome in the current trial, slowing down responses. In complete alternations, there is no preactive representation, and responses are relatively fast.

In our study with only four different stimuli, succeeding congruent or incongruent trials were always complete repetitions or complete alternations. For example, the stimuli in trial n–1 could be auditory left and visual left, and the stimuli in trial n again auditory left and visual left (complete repetition, congruent sequence), or the stimuli in trial n–1 could be auditory left and visual right, and the stimuli in trial n auditory right and visual left (complete alternation, incongruent sequence). On the other hand, succeeding congruent and incongruent trials always included partial repetitions. For instance, the stimuli in trial n–1 could be auditory left and visual left, and in trial n auditory left and visual right (partial repetition of auditory left). Complete repetitions and complete alternations (congruent n–1, congruent n; incongruent n–1, incongruent n) lead to facilitated performance as compared to partial repetitions (congruent n–1, incongruent n; incongruent n–1, congruent n). Congruency effects after incongruent trials are relatively smaller than congruency effects after congruent trials. If the incongruent condition is a complete repetition/complete alternation condition (relatively facilitated performance) and the congruent condition is a partial repetition condition (relatively impeded performance), the congruency effect is smaller than in the reverse case, in which performance in incongruent trials is relatively impeded and performance in congruent trials is relatively facilitated. Although we cannot completely rule out partial mismatch/repetition costs as an explanation for the sequential modulation of the congruency effects in modality repetition trials, this account would predict basically the same pattern in modality switches, but this was not observed at all. Thus, a “pure” episodic-priming account cannot easily explain our data pattern, because the stimuli in modality switches did not differ, forming comparable event files, but leading to a different data pattern.

Theoretical implications

Adaptive conflict resolution seems to be disrupted by modality shifts. Our findings therefore suggest that conflict resolution is modality-specific. Since Kiesel et al. (2006) found that conflict resolution does not overcome task shifts and seems to be task-specific, the combined results therefore suggest that conflict resolution is strongly influenced by the processing context in general (see also Akçay & Hazeltine, 2011; Fischer et al., 2010; Hazeltine et al., 2011).

Whenever target-specific features repeat between trials—here, the target modality—conflict adaptation is used to regulate attention. However, when target-specific features change, the increased attentional bias after an incongruent trial does not hold for the new target or context. Therefore, any shift in target modality, and thus in attentional “processing mode” (Meiran, 1996), causes an attentional reset, which eliminates the sequential congruency effect.

We suggest that the similarity between the encoding mode and the retrieval mode, or between the prime and the probe (S. Mayr & Buchner, 2007), is the decisive factor for conflict adaptation. If the conflict encoding (incongruence in trial n–1) and conflict retrieval (incongruence in trial n) context are similar, increased processing selectivity after a detected conflict can facilitate performance during the next conflict.

Braem et al. (2014) recently suggested that conflict adaptation only occurs across contexts if context “features are simultaneously and actively maintained (in working memory)” (p. 10). Importantly, we observed that conflict adaptation was only present when target modalities repeated across encoding and retrieval, although—within the same task—auditory and visual stimuli appeared simultaneously and could be maintained in working memory (see Koch, Philipp, & Gade, 2006, and Schuch & Grange, 2015, for inhibitory conflicts across higher-order task representations). On the basis of our findings, we can generalize this idea to a more general picture of episodic and attentional similarity. Interestingly, Janczyk (2015) recently observed sequential congruency effects in a backward crosstalk dual-task paradigm across different output modalities. It thus remains to be tested whether switches between different effector modalities would produce similar effects in our paradigm, too.

Conclusions

We found evidence for modality-specific conflict resolution, indicating conflict adaptation only when the attentional state in successive trials remained constant. A shift in attentional state across trials caused attentional reset, which in turn eliminated sequential modulation of congruency effects.