Goal-directed behavior and performance in complex task situations are assumed to critically depend on executive functions. Although there is no clear consensus regarding the number of separable executive functions, shifting between mental task sets (i.e., cognitive flexibility or set shifting) has been proposed to be one of the core components along with inhibition of prepotent responses, and updating and monitoring of information held in working memory (Diamond, 2013; Lehto, Juujärvi, Kooistra, & Pulkkinen, 2003; Miyake et al., 2000). These processes are considered to comprise a fundamental ability of intelligent behavior that is highly relevant for various domains such as education (Diamond, Barnett, Thomas, & Munro, 2007) and mental health (Kashdan & Rottenberg, 2010). The importance of executive functions has also been demonstrated in clinical populations with impaired cognitive control in task-switching situations due to frontal lobe damage or Parkinson’s disease (e.g., Meiran, Friedman, & Yehene, 2004; Rogers et al., 1998), and set-shifting abilities seem to change considerably across a life-time, with a significant decline in older adults (Cepeda, Kramer, & Gonzalez de Sather, 2001; Kray, Eber, & Lindenberger, 2004).

Researchers have thus started to investigate whether executive functions can be improved through cognitive training and whether these improvements transfer to novel stimuli and/or unpracticed tasks (for reviews, see Katz, Shah, & Meyer, 2018; Melby-Lervåg & Hulme, 2013; Schubert, Strobach, & Karbach, 2014). Considerable evidence for training-related improvement of executive functions and its transfer to untrained tasks comes from studies in which participants were trained on updating of information in working memory (e.g., Dahlin, Neely, Larsson, Backman, & Nyberg, 2008; Jaeggi, Buschkuehl, Jonides, & Perrig, 2008; Salminen, Kühn, Frensch, & Schubert, 2016; Salminen, Strobach, & Schubert, 2012). In addition to the performance increments on the trained and related tasks (i.e., near transfer), some of these studies reported “far transfer” effects to structurally different tasks, indicating enhancement of general cognitive control mechanisms or fluid intelligence (Jaeggi et al., 2008; Klingberg et al., 2005; Schmiedek, Lövdén, & Lindenberger, 2010). Other researchers, however, failed to replicate transfer effects to fluid intelligence (as reported by Jaeggi et al., 2008) after cognitively demanding training of working-memory updating with two simultaneous n-back tasks (e.g., Chooi & Thompson, 2012; Redick et al., 2013). Much less research has been dedicated to the training of inhibition and set shifting abilities though (see a recent review by Koch, Poljac, Müller, & Kiesel, 2018), and it has been argued that these two executive functions may not benefit equally from standard computerized cognitive training programs (Pereg, Shahar, & Meiran, 2013).

In the present study, a task-switching paradigm in which participants are instructed to perform two tasks sequentially is utilized to investigate the transfer of a cognitive training that addresses set-shifting abilities. In a typical task-switching situation (Jersild, 1927; Rogers & Monsell, 1995), participants have to switch between two tasks, such as between adding and subtracting numbers or between categorizing letters (e.g., vowel vs. consonant) and digits (e.g., even vs. odd). It has been found that performance drops (i.e., longer response times and lower accuracy) after a task switch (switch trials) as compared to trials on which the task from the previous trial is repeated (repeat trials). These switch costs are assumed to reflect the “specific” demands involved in shifting between two particular mental task sets (i.e., low costs reflecting high cognitive flexibility; Mayr & Kliegl, 2003; Rogers & Monsell, 1995; Strobach, Liepelt, Schubert, & Kiesel, 2012), which is presumably driven by both the activation of the relevant task set and the inhibition of the irrelevant task set(s) (Monsell, 2003).Footnote 1 In addition, there is usually also a performance decrement in mixed-task blocks (in which participants have to switch between tasks based on a particular trial protocol) as compared to single-task blocks (e.g., longer response times). These so-called mixing costs are typically much larger than the switch costs, and they were proposed as a more “global” measure of set shifting, possibly reflecting sustained executive control processes including the maintenance, selection, and reconfiguration of a cognitive task set in working memory (see Allport, Styles, & Hsieh, 1994; Karbach & Kray, 2009; Koch et al., 2018; Koch, Prinz, & Allport, 2005; Mayr, 2001).

Several studies found that both types of performance costs in task switching can be reduced with practice (e.g., Berryhill & Hughes, 2009; Cepeda et al., 2001; Koch, 2001; Kray & Lindenberger, 2000), though residual costs seem to remain even at the end of very extensive training (Stoet & Snyder, 2007). Moreover, training in general seems to have a stronger effect on the mixing costs than on the switch costs (see Strobach et al., 2012 for a study in which the mixing costs were practically eliminated after eight sessions of training), and the evidence for a training-related decrease of switch costs is less consistent (e.g., Minear & Shah, 2008). In addition to these direct practice effects, task-switching training was also found to reduce the performance costs of task switching in untrained tasks that are closely related and structurally similar to the trained tasks in terms of task mechanics and procedural protocols (e.g., Karbach & Kray, 2009; Minear & Shah, 2008; Pereg et al., 2013; Zinke, Einert, Pfennig, & Kliegel, 2012).

In a training study with a developmental focus, task-switching training was found to reduce both the mixing and the switch costs, and far-transfer effects were reported for tasks that involve entirely different executive control functions (Karbach & Kray, 2009). Specifically, the authors showed that 4-day training in task switching reduced both mixing and switch costs in three different age groups (children, young adults, and older adults) for the trained tasks as well as for switching between untrained but structurally similar tasks. Moreover, far-transfer effects were reported for a variety of tasks measuring inhibitory control, working memory, and fluid intelligence, suggesting that task-switching training may not only have improved specific task-coordination skills (enabling near transfer), but it may even have elicited generalized cognitive enhancement, thus also affecting other functions of executive control.

Other studies, however, found task-switching training to primarily reduce the mixing costs, but not the switch costs (i.e., there was no specific effect of the task-switching training, but the switch costs were reduced both after task-switching and after single-task training; Minear & Shah, 2008; Zinke et al., 2012), and there are several studies that were not able to replicate the far-transfer effects to tasks that address executive functions other than set shifting (e.g., Berryhill & Hughes, 2009; Pereg et al., 2013). For instance, Minear and Park (2008) assessed whether task-switching training with a specific pair of categorization tasks (e.g., upper-/lowercase and divisible/not divisible by 5) influences the costs of switching between a different pair of tasks (e.g., consonant/vowel and odd/even categorization). The authors found that the mixing costs were reduced not only for the trained tasks, but also for the transfer tasks (relative to a control group that was trained with the single-component tasks without switching), whereas the training had no specific effect on the switch costs (i.e., switch costs were also reduced in the control group). Such near-transfer effects are theoretically important because they indicate actual improvement of the underlying construct being trained (e.g., set shifting), rather than the acquisition of simple stimulus-response mappings or superficial task-specific strategies (Katz et al., 2018).

Another direct follow-up on Karbach and Kray (2009)’s study was conducted by Pereg et al. (2013), showing that visual task-switching training with mixed-task blocks using the alternating-runs protocol with a task switch occurring on every other trial (and using the same small/large and fruit/vegetable categorization tasks and procedures as in Karbach & Kray, 2009) reduces the costs of switching for untrained tasks (e.g., an upper/lower location and a furniture/electrical appliance categorization task) only if the switches followed the same alternating-runs protocol as during training. However, no improvements were observed when the tasks switches followed a different switching protocol during transfer than during training (i.e., either random task order or task switches after every third trial), suggesting that the training-related improvements may be highly specific to the exact training procedures (Pereg et al., 2013). These results suggest that the cognitive improvement is specific to the exact tasks and procedures that were presented during task-switching training. Specifically, Pereg et al. (2013) argued that the absence of a (near-) transfer effect to a different task-switching protocol may be due to the fact that the alternating-runs protocol of task switching did not improve “pure switching ability” (i.e., set shifting), but it may have trained specific working memory processes that can be used only for task switching with the alternating-runs protocol (e.g., maintenance and updating of the arbitrary stimulus-response mappings in this procedure), but not for the different task-switching protocols during transfer. Further research is required to test this assumption.

Due to the inconsistent pattern of results obtained from different training studies, it is still unclear whether task-switching training results in enhancement of general executive control functions that are supposed to enable transfer to new tasks, stimuli, and procedures (e.g., Karbach & Kray, 2009), or if the training-related improvement is more specific and tied to the exact tasks, stimuli, or procedures that were used during training (i.e., stimulus-specific improvements). The primary goal of the studies mentioned above was to answer the question of whether training in task switching enables transfer to different tasks and cognitive activities that were not part of the training, and most studies found that, under certain conditions, the improvement transfers to some untrained tasks. However, the how question has often been neglected, and the exact learning mechanism underlying such training and transfer is still not understood. For instance, in most studies, both the training and transfer tasks require rapid and flexible processing of visual input, and it remains an open question whether the learning mechanisms are tied to the trained stimulus modalities (e.g., faster or more efficient processing of visual information), or whether the training effects occur at an amodal processing level. Specifically, previous studies showed that visual task-switching training may generalize to different types of visual stimuli or task sets (e.g., from a letter task to a number task), suggesting that the changed mechanisms are not specific to the exact stimuli as represented at a particular cognitive processing level. However, although task-switching training was found to transfer to different stimuli in one modality, the control operations involved in task switching may still differ as a function of the input modalities in which the stimuli were encoded initially during training (e.g., shifting visual vs. auditory attention), and the training-related improvement may be specific to the trained stimulus modality. On the other hand, learning in task-switching could be driven by changes at an amodal processing level, with the same processes being recruited for cognitive tasks in visual, auditory, and other modalities, allowing for transfer of set-shifting abilities across stimulus modalities (compare the idea of an amodal processing bottleneck that has been proposed recently for different types of interference paradigms; Arnell, 2006; Marois & Ivanoff, 2005; Potter, Banks, Muckenhoupt, & Chun, 1998; Vachon & Tremblay, 2008). More specifically, the costs of task switching could be driven by a learning mechanism that is located either at a modality-specific stage or at a more general amodal processing level (which might be related to the prefrontal cortex; Tamber-Rosenau, Dux, Tombu, Asplund, & Marois, 2013). To better understand the underlying learning mechanism, it is crucial to determine whether task-switching training in one modality also reduces the performance costs for switching between tasks in a different modality.

The aim of the present study was to test whether task-switching training in the auditory modality also reduces the costs of task-switching in the visual modality. If the training-related improvement is based on a modality-specific learning mechanism, then it should not enable transfer to tasks in which the stimuli are presented in a different modality (see also Pashler & Baylis, 1991). In other words, if task-switching training led to modality-specific improvement (e.g., enhanced auditory attention, faster and more efficient processing of auditory information), then task-switching training in one modality may only enhance set shifting in the trained modality (e.g., auditory task switching), but it would not be expected to reduce the costs of switching between tasks in a different modality (cf. “strategy-based” training; Schubert et al., 2014). If, however, task-switching training influenced set shifting at a more general and amodal processing level (i.e., “process-based training”; Karbach & Schubert, 2013; Schubert et al., 2014), then training-related improvements should transfer to tasks that require stimulus processing in a different, untrained modality (e.g., visual attention switches). Modality-specific improvement of cognitive control processes has been investigated previously in the context of working memory training, revealing larger performance increments in a visual n-back task after eight sessions of training in the visual modality, as compared to both no training and working memory training in the auditory modality (Schneiders, Opitz, Krick, & Mecklinger, 2011). Based on the accompanying neural changes,Footnote 2 the authors concluded that the visual training must have evoked changes in both modality-specific (visual) and more general (amodal) processes. In a follow-up study, it was found that an auditory working memory training also induces modality-specific gains in working memory updating (i.e., performance in an auditory 2-back task), whereas the improvement did not transfer to performance in an analogous visual 2-back task (as compared to a no-training control group; Schneiders et al., 2012). While these studies showed that working memory training does evoke both modality-specific and amodal changes, the possibility of cross-modal transfer after task-switching training has not been studied yet.

In the present study, the possibility of cross-modal transfer was assessed for task-switching training in the auditory modality in which participants were required to switch between two types of semantic categorization tasks with spoken words presented to the left and right ear (i.e., fruit/vegetable words and even/odd numbers). Several studies have already demonstrated the occurrence of switch costs using a dichotic-listening task requiring auditory attention switches (e.g., Lawo, Fels, Oberem, & Koch, 2014). In that paradigm, two different number words were presented by a male and a female voice in the left and right ear via headphones, and participants were asked to quickly respond to the numerical magnitude of one of the two numbers depending on a visual cue. Performance costs (i.e., prolonged response times) were observed both when the relevant feature had switched within the same dimension from one trial to the next (e.g., from male to female or from right to left), and when the relevant dimension had switched from gender to voice or vice versa, although the feature-switch effect was much larger than the dimension-switch effect (Lawo et al., 2014). While such studies suggest the occurrence of additional costs when switching attention between auditory stimulus features or dimensions (i.e., attention switching), they leave open the question of whether the switching between different task sets (task switching) presented to the two ears would lead to the same switch and mixing costs in the auditory domain that were reported for visually presented tasks (Rogers & Monsell, 1995), and whether these costs can also be reduced as a result of training.

Here we tested whether multi-day training in switching between two auditory tasks also reduces the costs of task-switching for structurally equivalent tasks in the visual modality. Therefore, the training paradigm was developed in analogy to a typical visual task-switching situation (Rogers & Monsell, 1995), with two words from different semantic categories (i.e., numbers and foods) presented simultaneously to the left and right ear (for a similar auditory task-switching paradigm, see Seibold, Nolden, Oberem, Fels, & Koch, 2018). Based on a visual cue, participants quickly categorized the words from a particular semantic category (e.g., the number word) regardless of whether the word was presented in the left or right ear. Auditory task switching was trained by switching the relevant category within a block of trials based on the alternating-runs protocol (i.e., ABBA, referring to a task switch on every second and fourth trial). We tested (a) whether the performance costs observed in auditory task switching (i.e., mixing and switch costs) can be reduced through training, and (b) to what extent these training effects transfer to performance in a visual task-switching situation (i.e., with the same words presented on the screen). As previous studies on task-switching training primarily used stimulus material in the visual modality (for an exceptional cross-modal task-switching situation with auditory and visual tasks, see Strobach et al., 2012; note that training-related decreases of both switch and mixing costs were observed), the present purely auditory task-switching training paradigm allows to test the generalizability of previous knowledge about the training-related changes in set shifting to the auditory modality. Moreover, successful transfer from auditory task-switching training to switching between visual tasks would indicate that the training-induced improvement in set shifting is driven by changes at an amodal processing level (i.e., independent of the particular stimulus modality), thus enabling generalization of the training effects across different stimulus modalities.

Importantly, we used the training protocol proposed by Strobach et al. (2012; see also Salminen et al., 2016; Schubert, Liepelt, Kübler, & Strobach, 2017) comprising a training group as well as an active and a passive control group. Participants in the training group were trained for 4 days with two auditory categorization tasks that were presented with the alternating-runs protocol. To control for possible unspecific training effects, an active control group was trained on the same two auditory categorization tasks in single-task blocks (i.e., each component task was presented in isolation with the same total number of trials as in the training group). Note that a repeated but isolated training of the categorization tasks in single-task blocks should also lead to processing advantages for the component tasks, but it should not influence the processes involved in switching between two sequential tasks. An additional passive control was not trained on any task. The costs of auditory and visual task switching as well as performance in various transfer tasks was then assessed for all three groups at pre- and post-test. According to studies on dual-task training, this design allows us to disentangle the effects of extended training on task-coordination and task-switching processes (i.e., processes that are required only in mixed-task blocks) from training-related improvements in the processing of the single component tasks (Strobach et al., 2012; Strobach & Schubert, 2017).

The degree of cross-modal transfer was assessed by comparing both the mixing and the switch costs in a visual task-switching paradigm between pre-test and post-test. Since the semantic processing requirements in the visual transfer task conditions were structurally similar to those of the training tasks (except the different stimulus modality), cross-modal transfer would suggest that the training-related improvement in task switching occurs at the level of amodal cognitive control functions (i.e., set shifting) rather than at a modality-specific processing stage within the particular sensory-motor system that is being addressed by the training tasks (see Strobach et al., 2012). In addition to the transfer across stimulus modalities, we also investigated the possibility of far-transfer effects of an auditory task-switching training to structurally different tasks requiring response inhibition, verbal and visuospatial working memory, and fluid intelligence (previous results suggest that 4 days of visual task-switching training might affect these cognitive control functions; Karbach & Kray, 2009). Therefore, effects on response inhibition (i.e., the ability to suppress inappropriate responses) were assessed with a Number Stroop task (Salthouse & Meinz, 1995), and fluid intelligence was measured with a short version of Raven’s Advanced Progressive Matrices test (Arthur & Day, 1994; Raven & Raven, 2003). Furthermore, a Corsi span task (Milner, 1971) was used to measure the capacity of visuospatial working memory, and an analogous serial digit span task (e.g., Kattner & Ellermeier, 2018) was used to measure the capacity of verbal working memory. In addition to Karbach and Kray (2009), we also tested the degree of perceptual interference in visual and verbal working memory resulting from the presence of task-irrelevant distractor information. To that effect, the memory disruptions produced by task-irrelevant visuospatial information were investigated for the Corsi span task (Logie, Zucco, & Baddeley, 1990; Quinn & McConnell, 1996), whereas the interference of task-irrelevant speech was investigated for the digit span task (i.e., the irrelevant speech effect; Colle, 1980; Salamé & Baddeley, 1982). It has been argued that this type of distraction in working memory can be eliminated by inhibiting irrelevant stimulus information (e.g., in individuals with high working-memory capacity; Engle, 1996). If the reduced switch or mixing costs after task-switching training are driven by an enhancement of inhibition at the stimulus level (i.e., enabling inhibition of task-irrelevant stimuli; compare Mayr, 2003), then it might be expected that task-switching training will also reduce the degree of perceptual interference in working memory. Hence we can use these paradigms to test whether training-related improvement in auditory task switching will enable participants to reduce the distraction of (in particular verbal) working memory by task-irrelevant information.

Method

Participants

A total of 60 participants (34 women, 26 men) were recruited at the campus of Technische Universität Darmstadt. Their ages raged between 19 and 58 years (M = 24.1; SD = 8.5). The data of three participants were not included in the analysis due to extremely poor performance during visual task switching at pre-test (< 75% correct). Participants were randomly assigned to the training group (i.e., mixed-tasks training; n = 19, 11 women, 19–27 years, Mage = 21.5 years), the active control group (i.e., the single-tasks training; n = 19, 11 women, 19–31 years, Mage = 22.1 years), or the passive control group (i.e., no training; n = 19, 10 women, 19–58 years, Mage = 28.8 years). All participants reported normal hearing and normal or corrected-to-normal vision. Student participants were compensated with course credits.

Apparatus

The experiment was conducted in a single-walled sound-attenuated listening booth (Industrial Acoustics Company, Niederkrüchten, Germany). Sounds were D/A converted at 44.1 kHz (16 bits) by a RME multiface II sound card (Audio AG, Haimhausen, Germany) and passed through a Behringer HA 800 Powerplay PRO-8 headphone amplifier (Behringer, Zhongshan, China) before being played diotically via Beyerdynamics DT-990 headphones (Beyerdynamic GmbH & Co. KG, Heilbronn, Germany). Visual stimuli were presented on a 17-in. LCD monitor. The experimental routines were programmed in MATLAB (Mathworks, Natick, MA, USA) utilizing the Psychophysics toolbox extensions (Brainard, 1997; Pelli, 1997).

Stimuli

For the auditory switching tasks, speech recordings were produced by female co-author L.S. vocalizing eight German numerals (from “zwei” [two] to “neun” [nine]) and eight German plural nouns of fruits and vegetables (“Birnen” [pears], “Bohnen” [beans], “Gurken” [cucumbers], “Kirschen” [cherries], “Melonen” [melons], “Tomaten” [tomatoes], “Zitronen” [lemons], “Zwiebeln” [onions]). The duration of the vocalizations varied between 507 ms (“acht”) and 896 ms (“Zitronen”), and there was a small difference in durations between number words (M = 596 ms; SD = 67 ms) and fruit/vegetable words (M = 708 ms; SD = 107 ms). Each recording was filled with symmetric gaps of silence before and after the vocalization to reach a duration of 1,000 ms (i.e., the vocalizations were centered). The recordings of words were played at an average sound pressure level of 68.1 dB(A) (SD = 2.5).

Procedure

The multi-day experiment consisted of a pre-test session on the first day, a post-test session on the last day, and four intermediate training sessions for the task switching and the single-task training groups. The no-training control group completed only the pre- and the post-test sessions, which were separated by about 1 week. The full structure of the experimental design is shown in Table 1.

Table 1 Summary of the pre-test – training – post-test design

Pre- and post-tests

In the pre- and post-test sessions participants were asked to complete six different cognitive tasks in counterbalanced order (Latin square design) to measure: (1) auditory task switching, (2) visual task switching, (3) Number Stroop (measuring response inhibition), (4) Digit Span (measuring interference in verbal memory), (5) Corsi Span (measuring interference in spatial memory), and (6) Raven’s Advanced Progressive Matrices (fluid intelligence). The entire pre- and post-test sessions took about 60–90 min each.

Auditory task switching started with two single-task blocks followed by a mixed-task block. Each block was preceded by eight additional practice trials for which the data were not analyzed. The first single-task block consisted of 24 trials in which participants were asked to categorize the spoken noun as either a fruit or a vegetable (food task). In the second single-task block (24 trials), participants were asked to categorize the spoken number as being even or odd (number task). In the mixed-task block (48 trials), participants were asked to conduct both the fruit/vegetable task and the even/odd task, and the task switched on every other trial (compare Rogers & Monsell, 1995). For all three blocks, a trial started with the presentation of a central fixation cross for 750 ms before a randomly drawn fruit/vegetable word was presented on one ear and a randomly drawn numeral was presented on the other ear (ear sides were randomly chosen on each trial). A short text message was presented on the screen indicating the current task (“Obst – Gemüse” [fruit – vegetable] or “Ungerade – Gerade” [odd – even]). Participants were instructed to respond as fast and as accurately as possible by pressing the left arrow key if they heard a fruit word or an odd number, respectively, and the right arrow key if they heard a vegetable word or an even number, respectively. After the participant’s response, a feedback message was shown on the screen for 750 ms or 1,500 ms, indicating that the response was correct (“Richtig!”, in green font color) or incorrect (“Falsch!”, in red font color), respectively. If no response was given within 5,000 ms, then the trial was continued and a text message (“Zu langsam!” [too slow!]) was displayed for 750 ms. If the response time was longer than 1,000 ms (but within 5,000 ms), then an additional message was presented for 750 ms prompting the participant to be faster on the next trial (“Das geht aber noch schneller!” [You can do faster!]). The next trial started after the feedback message.

Visual task switching was identical to the auditory task-switching procedure except that the numerals and nouns were not presented acoustically but the same words were presented visually as text in the center of the screen. In analogy to the random ear sides in the dichotic presentation of words, the order of the two words presented on the screen was randomly chosen on each trial (e.g., it could be “Acht Zitronen” or “Zitronen Acht”). The arrangement of single-task and mixed-task blocks as well as the trial sequence were identical to auditory task switching.

In the Number Stroop task, between one and four identical characters (either letters – “H”, “K”, “L”, “P” – or digits – “1”, “2”, “3”, “4”) were visually presented on the screen and participants were asked report the number of characters by pressing the respective number on the keyboard. The displayed characters were letters (neutral trials, e.g., “LL”), digits corresponding to the number of characters (compatible trials, e.g., “22”), or digits not corresponding to the number of characters (incompatible trials, e.g., “33”). Each type of trial (neutral/compatible/incompatible × 1/2/3/4 characters) was repeated 16 times resulting in a total of 192 trials (presented in random order). Participants started with an additional practice block containing all 12 types of trials for which the data were not analyzed. Each trial started with a central fixation cross for 500 ms, followed by the characters, which were presented for 2,000 ms or until the participant hit a response key. A feedback message indicated whether the response was correct or incorrect (750 ms). If no response was made within 2,000 ms, a text message (“Zu langsam!”) indicated that the response deadline had passed.

In the Digit Span task, a random sequence of eight digits (from 1–9) was presented on each trial. On half of the trials, the digits were presented visually on the screen for 1,000 ms each (black font color on gray background), and on the other half of the trials, the digits were presented diotically via headphones. After an additional 6,000-ms retention interval showing a gray blank screen, a 3 × 3 numeric pad was shown on the screen and participants were asked to click the presented digits in serial order. After clicking the eighth digit, feedback was presented on the screen for 1,000 ms showing the number of correctly recalled digits (e.g. “5 von 8 Ziffern korrekt.” [5 of 8 digits correct]). To measure the phonological interference with verbal memory, either an excerpt of free-running Finnish speech or white noise (14 s) was presented diotically as task-irrelevant sound during both the presentation of the digits and the retention interval. Participants were instructed to ignore the sound. Each type of trial (visual / acoustical × speech / noise) was repeated five times resulting in a total of 20 trials. There were two additional practice trials at the beginning of the task (one with speech and one with noise) for which the data were not analyzed.

The Corsi Span task consisted of 24 trials with sequentially presented Corsi blocks. Each trial started with the presentation of a central fixation cross for 750 ms, which was followed by a display of 16 empty blocks arranged on the screen in a 4 × 4 matrix (at +/-3° and +/-9° eccentricity relative to the center of the screen). Six randomly chosen target blocks were then successively highlighted by filling them with red color for 1,250 ms each (i.e., 6,000 ms total target presentation time). After an additional 5,000-ms retention interval, the 16 empty blocks were presented again and participants were asked to click the previously highlighted blocks in correct serial order. To measure interference with spatial memory, half of the trials contained additional irrelevant colored blocks (one every 1,250 ms), which were presented at random locations in the gaps between the empty squares during both the presentation of target blocks and during the retention interval. Text feedback was presented for 1,000 ms showing the number of correctly recalled blocks (e.g., “4 Richtige”) before the next trial started.

To measure fluid intelligence, 18 problems from the 36-item short form of Raven’s Advanced Progressive Matrices test (Arthur & Day, 1994; Raven & Raven, 2003) were presented at the pre-test (odd items) and at the post-test (even items). The problems were presented in order of difficulty starting with the easy problems. Each problem was presented on the screen and participants had to choose one of eight possible solutions to the problem by pressing the respective number on the keyboard. There was no response deadline for any individual problem, but participants had 10 min for the 18 problems. After pressing a response key, participants had to continue with the next problem, and they were not allowed to change their previous responses. Feedback was not presented.

Training tasks

Between the pre- and the post-test sessions, the training group was trained in auditory task switching with a total of 2,880 mixed-task trials, divided across four training sessions (720 trials per day). Two successive training sessions were separated by a minimum of 4 h and a maximum of 2 days, and there were no more than two training sessions on a single day. Each training session took about 30–40 min and contained 15 blocks each consisting of 48 trials, which were identical to the mixed-task blocks of auditory task switching in the pre- and post-test sessions. After each block, participants could take a short break.

The active control group was trained on the same tasks as the mixed-task training group (i.e., the fruit/vegetable task and the even/odd task) for 2,880 trials on four separate training days (720 trials per day). On each day, participants completed 15 blocks of 48 trials. However, the two tasks did not alternate within blocks, but only between blocks. That is, participants were trained on the fruit/vegetable task in odd blocks and on the even/odd task in even blocks. Each block was identical to a single-task block of the pre-test and post-test sessions, except that it consisted of 48 trials.

Participants in the passive control group did not complete any training sessions, and the pre- and post-test sessions were separated by 4–12 days.

Results

Data processing

A one-way ANOVA revealed a significant age difference between the groups, F(2,54) = 4.69; p = .013; η2G = 0.15, with Bonferroni-corrected pairwise t tests confirming a significant age difference between the passive control group and the active control group (p = .03), as well as between the passive control group and the training group (p = .02), whereas there was no difference in age between the active control group and the training group (p = .82). Nevertheless, age was included as a covariate in all the analyses reported below.

For the analysis of task switching and Number Stroop data, response times were analyzed only for trials on which participants made correct responses. In addition, due to large data variability, task-switching response times longer than 2,000 ms and Stroop-task response times longer than 1,000 ms were considered as outliers (corresponding to approximately two inter-quartile ranges above the 75% percentile, i.e., 2,080 ms for auditory task switching and 2,189 ms for visual task switching, and 998 ms for the Number Stroop task). Using these criteria, 2.41% of the auditory task-switching trials, 4.37% of the visual task-switching trials, and 3.58% of the Number Stroop trials were removed prior to the analysis of response times.Footnote 3

Training data

Participants in both the training group and the active control group significantly reduced the response times on repeat trials across the four training days (see Fig. 1a). A 4 (training day) × 2 (group: training, active control) mixed-factors ANCOVA on the response times (including only the repeat trial of the training group) during training using age as a covariate revealed a significant main effect of group, F(1,36) = 15.70; p < .001; η2G = 0.26, with longer response times on the repeat trials of the mixed-task blocks in the training group (M = 738 ms; SD = 163 ms) than in the single-task blocks in the active control group (M = 553 ms; SD = 59 ms). There was also a significant main effect of training day, F(3,108) = 38.09; p < .001; η2G = 0.15, confirming the overall decrease in response times, as well as a significant training day × group interaction, F(3,108) = 6.49; p < .001; η2G = 0.03, with a greater decrease in response times for the training group (ΔRT = 174 ms; from M1 = 844 ms; SD1 = 121 ms to M4 = 670 ms; SD4 = 158 ms) than for the active control group (ΔRT = 71 ms; from M1 = 598 ms; SD1 = 65 ms to M4 = 527 ms; SD4 = 45 ms).

Fig. 1
figure 1

Mean response times (a) and mean accuracy (b) in auditory task-switching training for the training group (trained on 15 mixed-task blocks per day) and the active control group (trained on 15 single-task blocks per day) across the four training days. Error bars depict standard errors of the mean

Moreover, a 2 (trial type: repeat, switch) × 4 (training day) repeated-measures ANOVA on the response times in the training group revealed a small but significant difference between repeat and switch trials, F(1,18) = 33.40; p < .001; η2G = 0.01 (i.e., switch costs), but no interaction between trial type and training day, F(3,54) = 0.85; p = .47; η2G < 0.01, indicating that the auditory switch costs did not change over the course of the four training days. The main effect of training day was also significant for the response times across all trials in the training group, F(3,54) = 21.39; p < .001; η2G = 0.16.

The accuracy of responses during training is illustrated in Fig. 1b. A further 4 (training day) × 2 (group: training, active control) mixed-factors ANCOVA on accuracy during training (with age being included as a covariate) also revealed a significant main effect of group, F(1,36) = 11.35; p < .001; η2G = 0.19, confirming the higher accuracy of the active control group (M = 93.6%; SD = 3.1%) compared to the training group (M = 87.8%; SD = 6.6%). There was also a significant main effect of training day, F(3,108) = 7.55; p < .001; η2G = 0.05, but no interaction with group, F(3,108) = 0.29; p = .83; η2G < 0.01.

In the training group, switch costs were also evident in terms of accuracy differences, as confirmed by the main effect of trial type, F(1,18) = 27.56; p < .001; η2G = 0.05. However, there was again no trial type × training day interaction for the training group, F(3,54) = 1.45; p = .28; η2G < 0.01, suggesting that the auditory switch costs in terms of accuracy did not change over the course of training either, whereas the main effect of training day was significant, F(3,54) = 3.07; p = .04; η2G < 0.04.

Auditory task switching (pre- and post-tests)

The average response times for auditory and visual task switching at pre- and post-test are illustrated in Fig. 2, separately for the single-task and mixed-task blocks. Mixing costs are indicated by the longer response times in the mixed-task blocks (squares and diamonds) compared to the single-task blocks (circles). Further, switch costs in the mixed-task blocks are also evident both for auditory and visual task switching with slightly longer response times on switch trials (diamonds) than on repeat trials (squares). Most importantly, the figure also shows a stronger decrease in mixing costs for the training group (black symbols) compared to the active and passive control group (gray and white symbols) for both the trained auditory task and the untrained visual tasks.

Fig. 2
figure 2

Mean response times in auditory and visual task switching at pre- and post-test for single-task blocks and both repeat and switch trials of mixed-task blocks. Error bars depict standard errors of the mean

Mixing costs in the pre- and post-test sessions were calculated for each individual by subtracting the mean response times for correct trials in the single-task blocks (2 × 24 trials) from the mean response times of correct repeat trials in the mixed-tasks block (24 trials). Switch costs were calculated for each participant by subtracting the mean response times on the correct repeat trials from the mean response times on correct switch trials of the mixed-tasks blocks at pre- and post-test.

Auditory mixing costs

Figure 3 illustrates the average mixing costs and switch costs for auditory task switching at pre- and post-test for the three experimental groups. As can be seen in Fig. 3a, auditory mixing costs (response times) were reduced considerably in the group that was trained in auditory task switching, but not in the two control groups (active and passive). A 2 (test: pre vs. post) × 3 (group: training, active control, passive control) mixed-factors ANCOVA on auditory mixing costs (with age as a covariate) revealed a significant main effect of group, F(2,54) = 10.01; p < .001; η2G = 0.20, as well as a significant interaction between group and test, F(2,54) = 17.61; p < .001; η2G = 0.18, confirming that the decrease in mixing costs differed between the three experimental groups. There was also a significant main effect of test, F(1,54) = 38.15; p < .001; η2G = 0.19. Separate follow-up 2 × 2 ANCOVAs revealed a significant test × group interaction for the contrast between the training and the active control group, F(1,36) = 33.09; p < .001; η2G = 0.24, as well as between the training and the passive control group, F(1,36) = 22.86; p < .001; η2G = 0.19, but there was no interaction for the contrast between the two control groups, F(1,36) = 0.15, p = .70. Bonferroni-corrected pairwise t tests (adjusting for nine comparisons) revealed no significant group differences in auditory mixing costs at pre-test (all adjusted ps > .99), whereas there were significant group differences at post-test between the training group and both the active control group (p < .001) and the passive control group (p < .001). There were no post-test differences between the two control groups though (p > .99). The comparisons also revealed a significant pre-post difference in auditory mixing costs for the training group (p < .001), but not for the active (p > .99) and passive control groups (p > .99).

Fig. 3
figure 3

Training Transfer: Mean costs of auditory task switching at pre- and post-test for the training group, the active control group, and the passive control group. (a) Auditory mixing costs (i.e., differences in response times between mixed blocks and single-task blocks) and (b) auditory switch costs (i.e., differences in response times between switch and repeat trials of mixed blocks). Error bars depict standard errors of the mean

Auditory mixing costs were also calculated for the accuracy of responses (i.e., accuracy in single-task blocks subtracted from accuracy on the repeat trials of the mixed-task block). However, a 2 (test) × 3 (group) mixed-factors ANCOVA (age as a covariate) on accuracy-based auditory mixing costs revealed no main effects of group, F(2,54) = 2.17; p = .12; η2G = 0.05, or test, F(1,54) = 0.68; p = .41; η2G < 0.01, as well as no interaction, F(2,54) = 0.10; p = .91; η2G < 0.01, suggesting that the auditory task-switching training did not affect mixing costs in terms of accuracy in the training group (Mpre = 4.4%; SDpre = 11.4% vs. Mpost = 2.1%; SDpost = 10.4%), the active control group (Mpre = 2.1%; SDpre = 7.3% vs. Mpost = 4.4%; SDpost = 11.4%), and the passive control group (Mpre = 7.6%; SDpre = 9.0% vs. Mpost = 6.6%; SDpost = 11.1%).

Auditory switch costs

A different pattern was observed for the auditory switch costs (see Fig. 3b). A 2 (test) × 3 (group) mixed-factors ANCOVA on switch costs (response times) with age as covariate revealed no main effects of test, F(1,54) = 2.65; p = .11; η2G = 0.02, or group, F(2,54) = 0.10; p = .91; η2G < 0.01, but a significant interaction between group and test, F(2,54) = 3.81; p = .03; η2G = 0.06, suggesting that the pre-post changes in switch cost differed between the three groups. Interestingly, follow-up 2 × 2 ANCOVAs revealed a significant interaction between test and group for the contrast between the training group and the active control group, F(1,36) = 4.81; p = .03; η2G = 0.05, as well as between the two control groups, F(1,36) = 6.93; p = .01; η2G = 0.07, but not between the training group and the passive control groups, F(1,36) = 0.50; p = .48; η2G < 0.01. Likewise, Bonferroni-corrected pairwise t tests (adjusting for nine comparisons) revealed no significant group differences at pre-test (ps > .68) or post-test (ps > .77), but a significant increase in switch costs for the active control group (p = .04). Switch costs did not change from pre- to post-test in the other two groups though (ps > .99).

For the accuracy-based auditory switch costs, the 2 (test) × 3 (group) mixed-factors ANCOVA (age as covariate) revealed no significant main effects of group, F(2,54) = 1.34; p = .27; η2G = 0.02, or test, F(1,54) = 0.03; p = .86; η2G < 0.01, and no significant group × test interaction, F(2,54) = 1.71; p = .19; η2G = 0.03, suggesting that there were no systematic changes in accuracy-based switch costs from pre- to post-test in the training group (Mpre = 4.6%; SDpre = 10.6% vs. Mpost = 2.4%; SDpost = 11.6%), the active control group (Mpre = 2.2%; SDpre = 9.5% vs. Mpost = 7.9%; SDpost = 10.4%), and the passive control group (Mpre = 2.6%; SDpre = 8.9% vs. Mpost = 0.2%; SDpost = 10.7%).

Visual task switching (cross-modal transfer)

The primary goal of the present study was to assess the near transfer of auditory task-switching training on mixing and switch costs in the transfer tasks of visual task switching. The average mixing and switch costs in visual task switching of the three experimental groups at pre- and post-tests are shown in Fig. 4.

Fig. 4
figure 4

Near Transfer Effects: Mean costs of visual task switching at pre- and post-test for the training group, the active control group as well as for passive control group. (a) Visual mixing costs (i.e., differences in response times between mixed blocks and single-task blocks) and (b) visual switch costs (i.e., differences in response times between switch and repeat trials of mixed blocks). Error bars depict standard errors of the mean

Visual mixing costs

A 2 (test) × 3 (group) mixed-factors ANCOVA (age as a covariate) on visual mixing costs (response times) revealed a significant main effect of test, F(1,54) = 36.92; p < .001; η2G = 0.16, as well as a significant interaction between group and test, F(2,54) = 3.29; p = .04; η2G = 0.03, suggesting that there was a greater reduction of visual mixing costs in the training group compared to the two control groups (see Fig. 4a). The analysis further revealed a marginally significant main effect of group, F(1,54) = 2.95; p = .06; η2G = 0.07. Follow-up 2 × 2 ANCOVAs revealed a significant group × test interaction for the contrast between the training and the active control group, F(1,36) = 5.42; p = .03; η2G = 0.05, whereas the interaction was not significant for the contrast between the training and the passive control group, F(1,36) = 2.58; p = .12; η2G = 0.02. There was no interaction for the contrast between the two control groups, F(1,36) = 1.11; p = .30; η2G < 0.01. Bonferroni-corrected pairwise t tests (adjusting for nine comparisons) revealed no significant group differences at pre-test (all ps > .78). At post-test there was a significant difference in visual mixing costs between the training group and the passive control group (p = .04), but not between the training and the active control group (p = .27), or between the two control groups (p > .99). In addition, there was a significant decrease in visual mixing costs from pre-test to post-test in the training group (p < .001), but not in the two control groups (active control: p > .99; passive control: p = .15), suggesting that auditory task-switching training yielded generalized improvements in set shifting that transferred to the visual modality.

The reduction of visual mixing costs in terms of accuracy was also slightly greater in the training group (Mpre = 4.4%; SDpre = 11.1% vs. Mpost = 2.1%; SDpost = 10.4%) than in the active (Mpre = 2.7%; SDpre = 6.8% vs. Mpost = 2.1%; SDpost = 7.3%) and passive control groups (Mpre = 7.6%; SDpre = 9.0% vs. Mpost = 6.6%; SDpost = 11.1%). However, a 2 (test) × 3 (group) mixed-factors ANCOVA (age as covariate) on these costs of accuracy did not reveal main effects of group, F(2,54) = 0.49; p = .61; η2G = 0.01, or test, F(1,54) = 0.30; p = .59; η2G < 0.01, and only a marginally significant interaction, F(2,54) = 2.47; p = .09; η2G = 0.04, suggesting that the auditory task-switching training did not reliably affect accuracy-based mixing costs in the visual modality.

Visual switch costs

A 2 (test) × 3 (group) mixed-factors ANCOVA (age as covariate) on response-time based visual switch costs revealed no significant main effect of group, F(2,54) = 2.42 p = .10; η2G = 0.04, no main effects of test, F(1,54) = 0.18; p = .67; η2G < 0.01, and no interaction between group and test, F(2,54) = 0.79; p = .46; η2G = 0.02, suggesting that visual switch costs did not change as a function of auditory task-switching training (see Fig. 4b).

For the accuracy-based switch costs in visual task switching, the ANCOVA revealed a significant main effect of group, F(2,54) = 3.36; p = .04; η2G = 0.05, with higher switch costs in the active (M = 7.6%; SD = 8.7%) and passive control groups (M = 5.9%; SD = 8.6%), as compared to the training group (M = 2.3%; SD = 12.6%). However, there was no main effect of test, F(1,54) < 0.01; p > .99; η2G < 0.01, and no interaction, F(2,54) = 0.46; p = .64; η2G = 0.01, suggesting that the changes from pre-test to post-test did not differ between the training group (Mpre = 1.5%; SDpre = 14.0% vs. Mpost = 3.1%; SDpost = 11.4%), the active control group (Mpre = 9.0%; SDpre = 6.7% vs. Mpost = 6.1%; SDpost = 10.2%), and the passive control group (Mpre = 5.3%; SDpre = 9.5% vs. Mpost = 6.6%; SDpost = 7.9%).

Far transfer effects

Number Stroop task

Response times on neutral, compatible, and incompatible trials of the Number Stroop task are illustrated in Fig. 5. As can be seen, response times decreased from pre-test to post-test in all three groups, but there were no group differences in transfer effects for the degree of response inhibition as measured with the Number Stroop task (i.e., differences in response times between compatible and incompatible trials). A 2 (test) × 3 (trial type: neutral, compatible, incompatible) × 3 (group) mixed-factors ANCOVA (age as a covariate) on the response times revealed a significant main effect of trial type, F(2,106) = 212.16; p < .001; η2G = 0.07, with significantly longer response times on incompatible trials (M = 595 ms; SD = 90 ms) than on compatible trials (M = 550 ms; SD = 92 ms), p < .001, and with significantly longer response times on neutral trials (M = 580 ms; SD = 90 ms) than compatible trials, p = .04, whereas the difference between neutral and incompatible trials was not significant, p = .58 (Bonferroni-corrected t tests), suggesting that there was a facilitation effect by compatible information, but no interference by incompatible information. The ANCOVA further revealed a significant main effect of test, F(1,53) = 77.26; p < .001; η2G = 0.15, with shorter response times at post-test than at pre-test, as well as a significant trial type × test interaction, F(2,106) = 4.94; p = .01; η2G < 0.01, referring to the general pre-post practice and reduced compatibility effects across all groups. There was no main effect of group, F(2,53) = 0.66; p = .52; η2G 0.02, and there were no other interactions with group, all Fs < 1, suggesting that auditory task-switching training did not have any specific effects on the inhibition or facilitation of response times in the Number Stroop task.

Fig. 5
figure 5

Means response times in the Number Stroop task on neutral (letters), incompatible (digits not corresponding to the number of characters), and compatible (digits corresponding to the number of characters) trials at pre- and post-test for the three experimental groups. Error bars depict standard errors of the mean

Digit span task

Verbal working memory was assessed in terms of the number of spoken or visually presented digits that were recalled correctly while either irrelevant speech or noise was presented in the background. At pre-test, participants on average recalled 5.3 (SD = 1.2) of the eight spoken digits when noise was presented, but only 4.0 digits (SD = 1.3) when irrelevant speech was presented. Likewise, 5.2 (SD = 1.3) and 4.0 (SD = 1.1) visually presented digits were recalled at the pre-test when irrelevant noise and speech were presented, respectively.

A 2 (test) × 2 (modality) × 3 (group) mixed-factors ANCOVA (age as a covariate) on the digit spans revealed a significant main effect of test, F(1,54) = 23.73; p < .001; η2G = 0.06, with enhanced digit spans at post-test (M = 5.17; SD = 1.26) compared to pre-test (M = 4.65; SD = 1.38). Moreover, there was a significant three-way interaction, F(2,54) = 4.38; p = .02; η2G = 0.02, suggesting that the training interventions differentially affected digit span for items presented in the two modalities. Additional 2-way ANCOVAs (age as the covariates) revealed a marginally significant group × test interaction for the visually presented digits, F(2,54) = 2.50; p = .09; η2G = 0.02, with a slightly greater increase in digit span in the active control group (Mpre = 4.29 ; Mpost = 5.31) than in the training group (Mpre = 4.75 ; Mpost = 5.19) and the passive control group (Mpre = 4.74 ; Mpost = 5.19). No indication of such an interaction was observed for the digit span with acoustically presented digits, F(2,54) = 1.20; p = .31; η2G = 0.01.

As auditory task-switching training may have affected the ability to inhibit or filter task-irrelevant information (Mayr, 2003), the degree of interference produced by irrelevant speech was further assessed as a function of the training intervention (which was also argued to be related to inhibition). Individual irrelevant speech effects (ISE scores) were calculated by subtracting the digit span during speech from the digit span during noise. The average ISE scores are illustrated in Fig. 6a for the three groups at pre- and post-test. A 2 (test) × 2 (modality) × 3 (group) mixed-factors ANCOVA (age as covariate) on the ISE scores revealed a significant main effect of test, F(1,54) = 15.47; p < .001; η2G = 0.06, indicating that in all three groups the ISE was greater at pre-test (M = 1.24 ± 1.25 digits) than at post-test (M = 0.69 ± 1.07 digits). The analysis further revealed a significant test × modality interaction, F(1,54) = 4.60; p = .04; η2G = 0.02, suggesting that the reduction of interference by irrelevant speech was more pronounced for spoken digits than for visually presented digits. There was also a marginally significant main effect of modality, F(1,54) = 3.24; p = .08; η2G = 0.02, with a slightly greater ISE in the visual modality (M = 1.11 ± 1.18 digits) than in the auditory modality (M = 0.83 ± 1.20 digits). However, there was no main effect of group, F(1,54) = 0.43; p = .65; η2G < 0.01, and no significant interactions with group, Fs < 1, suggesting that the auditory task-switching training did not help to control the interference of task-irrelevant speech on serial recall of verbal information.

Fig. 6
figure 6

(a) Average irrelevant speech effects (ISE) on verbal working memory for auditory and visually presented digits (i.e., digit span) produced by task-irrelevant speech (as compared to task-irrelevant noise) at pre- and post-test for the three experimental groups. (b) Average interference effects on spatial working memory (i.e., Corsi span) produced by visual distractors (as compared to no distractors) at pre- and post-test for the three experimental groups. All error bars depict standard errors of the mean

Corsi span task

Visuospatial working memory was assessed in terms of the number of correctly recalled Corsi block locations. On average, participants recalled 4.25 (SD = 1.05) and 4.65 (SD = 0.85) of the six sequentially presented Corsi blocks at pre-test and post-test, respectively. A 2 (test) × 3 (group) mixed-factors ANCOVA (age as covariate) confirmed this increase in Corsi span with a significant main effect of test, F(1,54) = 20.07; p < .001; η2G = 0.05, but there was no main effect of group, F(2,54) = 0.05; p = .95; η2G < 0.01, and no interaction, F(2,54) = 0.87; p = .42; η2G < 0.01, suggesting that the auditory task-switching training did not affect the capacity of visuospatial working memory.

As auditory task-switching training may have affected the ability to inhibit task-irrelevant information, visuospatial interference scores were calculated by subtracting the Corsi span on trials with distractor blocks from the Corsi span on trials without distractor blocks. These visuospatial interference scores are illustrated in Fig. 6b. However, a 2 (test) × 3 (group) mixed-factors ANCOVA (age as covariate) on these interference scores revealed no significant main effects, and no interaction, all Fs < 1, suggesting that the present training did not yield a far-transfer effect on the cognitive control of interference in visuospatial working memory.

Fluid intelligence

A 2 (test) × 3 (group) mixed-factors ANCOVA (age as covariate) showed that the number of correctly solved items in the Ravens Advanced Progressive Matrices test did not change from pre-test (M = 11.12; SD = 2.58) to post-test (M = 10.82; SD = 2.19), F(1,54) = 1.10; p = .30; η2G < 0.01. The analysis further revealed no main effect of group, F(2,54) = 1.56; p = .22; η2G =.04, with only slightly higher pre-test scores of fluid intelligence in the active control group (M = 11.82; SD = 2.32) than in the training group (M = 10.97; SD = 2.50) and the passive control group (M = 10.13; SD = 2.07) (note that the main effect was marginally significant when age was not included as a covariate, F(2,54) = 3.16; p = .05; η2G =.08). However, the crucial interaction between test and group was far from being significant, F(2,54) = 0.28; p = .76; η2G < .01, suggesting that auditory task-switching training did not affect fluid intelligence scores.

Discussion

The primary goal of the present study was to assess the cross-modal and far transfer of a newly developed auditory task-switching training. Both types of transfer were investigated by comparing performance in various cognitive tasks before and after four sessions of training in auditory task switching, relative to both an active and a passive control group.

Training effects on auditory task-switching abilities

First of all, the present auditory task-switching training successfully reduced the performance costs resulting from mixing two auditory tasks. That is, the increment in response times resulting from switching between two tasks (as compared to conducting a single task) were reduced in participants who were trained in auditory task switching, whereas no reductions of mixing costs were observed in the active control group that was trained on the single component tasks, as well as in the passive control group that did not receive any training. This observation demonstrates that the training-related decrease in mixing costs, as compared to an equally comprehensive single-task training and a passive control group, is not restricted to the switching between two visual tasks (as previously reported, e.g., Karbach & Kray, 2009; Minear & Shah, 2008), but can be generalized to an auditory task-switching situation. The observed training effect of the present study suggests that the auditory task-switching training may have improved the same set shifting capabilities, that is, the ability to quickly and flexibly process two different auditory tasks, as a visual task-switching training.

However, the present auditory task-switching training did not have an effect on the switch costs, that is, the performance costs resulting from switching between two different task sets from one trial to the next. Specifically, the small difference in response times between switch and repeat trials of the mixed-task blocks was not reduced after four sessions of auditory task-switching training (and also not in the two control groups – but we note that there was an increase in switch costs for the active control group, see below). Interestingly, previous studies on visual task switching also sometimes did not find reduced switch costs after training (e.g., Minear & Shah, 2008; Zinke et al., 2012), whereas other studies did find relatively strong effects of training on the magnitude of the switch costs (e.g., Karbach & Kray, 2009; Strobach et al., 2012). Specifically, Minear and Shah (2008) found a general decrease of switch costs from pre-test to post-test in the two groups that were trained in task switching as well as in a control group that was trained on the single tasks (in addition, switch costs were reduced across the training blocks), whereas Karbach and Kray (2008) observed slightly stronger reductions of switch costs in the task-switching training group than in the single-task group. However, in both studies, the visual switch costs were reduced also after single-task training, whereas there was no such reduction of auditory switch costs in the present study. This discrepancy might be due to possible functional discrepancies between visual and auditory task processing, and it could also be the case that switch costs are more resistant to training in the auditory modality (as assessed in the present study) than for visual task-switching training. This might be an indication that the cognitive mechanisms involved in task-switching training differ between visual and auditory input modalities with the specific costs of auditory switches (i.e., the switch costs) being less susceptible to practice than the costs of visual task switching, whereas the more general mixing costs, reflecting amodal processes of executive control, can also be reduced via training in the auditory modality.

On the other hand, it could be argued that the discrepancy in terms of an effect on the switch costs was due to procedural differences between the studies, such as the total training dosage or the spacing of training sessions. Minear and Shah (2008), for instance, trained participants for a total of 1,152 trials across only two experimental sessions, whereas Karbach and Kray (2009) as well as Zinke et al. (2012) trained participants for 1,768 trials across four experimental sessions (including 34 practice trials in each session). In the present study, however, participants were trained for a total of 2,880 trials, so it seems unlikely that the present study was underpowered in terms of the training dose. The differences in training dosage, however, might account for the inconsistent results in terms of the effects on the switch costs in the previous visual studies (note that the switch costs are typically much smaller than the mixing costs, so more statistical power and more training is required to test for a training effect on the switch costs).

Surprisingly, however, the active control group that was trained on the single-component tasks without being required to switch between the two tasks on a trial-by-trial basis showed a significant increase in switch costs from the pre-test to the post-test. While this observation was not expected in the first place (please note that this unexpected finding with a Bonferroni-corrected p = .04 could still be a false positive, and its needs to be replicated; so the following interpretation is purely speculative), it seems to suggest that the performance costs resulting from switching between two tasks grow for strongly practiced or automatized single tasks. This is consistent with studies using the global-local task, which requires participants to categorize global letters (H or S) that are composed of smaller local letters (also H or S; see Navon, 1977). These studies showed that the degree of interference produced by the irrelevant local stimulus features on global stimulus processing increases with the amount of training on the task, suggesting that participants improved their ability to categorize global stimulus features at the cost of the deteriorated ability to suppress task-irrelevant local information (Stoffer, 1993, 1994; Weissman, Gopalakrishnan, Hazlett, & Woldorff, 2005). Likewise, the auditory single-task training used in the present study might have enhanced the processing of the single-component tasks at the cost of a declined ability to flexibly switch between the two task sets.

Cross-modal transfer of auditory task-switching training

As mentioned above, reduced auditory mixing costs suggest that the auditory task-switching training improved the efficiency of certain cognitive control processes such as the maintenance, selection, and reconfiguration of a task set (i.e., set shifting). However, it may be possible that this improvement is specific to the trained input modality (e.g., enhanced shifting of auditory attention). The present study thus aimed to determine whether the training-related gain in set shifting is specific to the trained stimulus modality, or whether it is driven by improvement of more generalized executive control mechanisms operating at an amodel processing level. Hence, as a measure of cross-modal transfer, both mixing and switch costs were also determined for a visual task-switching situation that was structurally equivalent to the auditory paradigm from the training sessions (i.e., near transfer). We found that the auditory task-switching training reduced the mixing costs also for visual tasks, as compared to both the passive and the active control groups. This finding clearly suggests that the auditory task-switching training enhanced set-shifting abilities at an amodal processing level, rather than being specific to the trained stimulus modality, thus allowing for cross-modal transfer of set shifting abilities. More precisely, the modality-independent reduction of mixing costs suggests that the training improved amodal mechanisms of sustained executive control such as the maintenance, selection, and reconfiguration of a cognitive task set in working memory. Given that the auditory task-switching training did not reduce the auditory switch costs (see above), it is not surprising that there was also no reduction of visual switch costs. Hence, auditory task-switching training does not seem to affect the more specific skills required for rapid shifting the mental task set from one trial to the next, whereas it does seem to enhance the ability to flexibly process two task sets regardless of the stimulus modality (cf. Strobach et al., 2012).

The observed cross-modal transfer effect in terms of mixing costs thus indicates that the improvement resulting from auditory task-switching training is not specific to the trained stimulus modality, but generalizes to a structurally similar switching paradigm in a different stimulus modality. This finding suggests that task-switching training can be considered a process-based intervention (Karbach & Schubert, 2013; Schubert et al., 2014) eliciting generalized amodal improvements of set shifting, which reduce the costs of mixing two tasks irrespective of the exact stimulus modalities that were used during training. Thus, the current findings provide first evidence for the assumption that the cognitive control processes involved in set shifting operate at a general amodal level and that these processes can be enhanced through training in either modality. The observation of modality-independent enhancement in set shifting performance thus extends previous findings of near-transfer effects after visual task-switching training to untrained tasks that were still presented in visual modality (Berryhill & Hughes, 2009; Minear & Shah, 2008; Pereg et al., 2013; Zinke et al., 2012), showing that training of set shifting in one specific (i.e., auditory) modality affected cognitive processes that are independent of the stimulus modality present during training. Moreover, the cross-modal transfer of task-switching training is comparable to the near-transfer effects that were found after working memory training. For instance, the training-related improvements in complex span tasks were found to transfer to performance in structurally similar span tasks (e.g., reading-span or rotation-span tasks; Harrison et al., 2013). While the present cross-modal transfer after task-switching training can be considered as a form of near transfer as well, it further indicates that the cognitive control mechanism involved in task-switching can operate at an amodal processing level.

No evidence of far transfer

In addition to the observed near-transfer effects of auditory task switching to the visual modality, we were also interested in possible far-transfer effects to executive functions other than set shifting. Of particular interest was inhibition of inappropriate responses and task-irrelevant information, which have been discussed as a fundamental process involved in task switching (Kramer, Hahn, & Gopher, 1999; Mayr, 2001, 2003; Minear & Shah, 2008). We tested response inhibition before and after task-switching training using a Number Stroop task in which participants have to inhibit processing the meaning of presented digits in order to respond to the number of characters. In addition, inhibition of task-irrelevant stimuli was assessed for verbal and visuospatial working memory. While we observed some indication of retest effects (e.g., reduced interference by irrelevant speech on verbal memory in the post-test), we did not observe any differential effects of the auditory task-switching training on measures of response inhibition (i.e., Number Stroop effect) or the degree of interference in verbal or visuospatial working memory. These null-effects suggest that the effect of auditory task-switching training is specific to set shifting, though irrespective of the stimulus modality, and it does not seem to influence other executive functions such as inhibition of inappropriate responses or task-irrelevant information in verbal and visuospatial working memory. Finally, and in line with several previous studies (e.g., Pereg et al., 2013; Zinke et al., 2012; but in contrast to the findings reported by Karbach and Kray, 2009), we did not find any evidence for far-transfer effects on the capacity of verbal and spatial working memory (i.e., digit span and Corsi span) or fluid intelligence. While this suggests that the current auditory task-switching training primarily improved the set shifting processes operating at an amodal processing level, it remains unclear whether other types of task-switching training (e.g., in the visual modality; Karbach & Kray, 2009) may induce broader training-related improvements affecting more general cognitive functions.