Introduction

The collection of processes that underlie our ability to flexibly direct thoughts and actions according to our top-down goals is known as cognitive control (Miller & Cohen, 2001). This entails (a) maintaining goals and the rules linking them to appropriate actions (task-sets) in working memory, and (b) updating those rules in response to changing circumstances (i.e., task-switching; Frank et al., 2001; Monsell, 2003). Much research on cognitive control has focused on how we apply a given control operation in a context-appropriate manner (Blais et al., 2007; Verguts & Notebaert, 2008). One way in which context-appropriate control could be implemented is through mnemonic association between control processes – such as task-switching – and particular contexts or stimuli (Abrahamse et al., 2016; Braem & Egner, 2018; Egner, 2014); for instance, associating a particular intersection on your route to work with the need to avoid a pothole you encountered that morning. While this notion of “control learning” has attracted much interest in the recent literature, much remains unknown about how stimuli become associated with and later serve as retrieval cues of particular control states. That is, the encoded stimulus-control association from your morning drive could potentially be retrieved later to optimize switching from your ongoing task of maintaining the current direction of travel to a task sequence of motor actions to avoid this dangerous road obstruction. The present study investigated the fundamental question of what role attention plays in these stimulus-control binding processes.

The learning of stimulus-control associations has typically been investigated by repeatedly pairing specific contexts or stimuli and control demands. For instance, studies of task switching have established that switching between tasks from one trial to the next, as compared to repeating a task, leads to switch costs (i.e., relatively longer response times and more errors; Monsell, 2003). However, when presenting a given stimulus more often in the context of a switch trial, relative to a repeat trial, there is a reduction in task-switch costs for trials involving that stimulus (Chiu et al., 2020; Chiu & Egner, 2017; Leboe et al., 2008; Siqi-Liu & Egner, 2020). This reduction in task-switch costs suggests that specific stimuli can become associated with particular control demands and can subsequently cue the retrieval of context-appropriate control settings when they are re-encountered. Parallel findings in the domain of conflict-control support these conclusions (reviewed in Bugg & Crump, 2012; Chiu & Egner, 2019).

Importantly, more recent work has demonstrated the formation of such stimulus-control associations even for a one-shot, single-exposure pairing of a stimulus and a control state (Brosowsky & Crump, 2018; Whitehead et al., 2020). This work provides evidence for an episodic memory contribution to control learning, supporting prior studies on the contextual adjustments of cognitive control that have situated their findings within an episodic control-binding framework (Dignath et al., 2019; Jiang et al., 2015; Spapé & Hommel, 2008). The episodic control-binding hypothesis conceives of the associative process of control learning within an event-file framework (Hommel et al., 2001): here, a given event (e.g., a trial in a task-switching study) is thought to be encoded in episodic memory, and this process involves the mnemonic binding of the event’s features, including different stimulus characteristics (Treisman & Gelade, 1980), actions taken (Hommel et al., 2001), and internal states, such as ongoing cognitive control operations (Egner, 2014; Spapé & Hommel, 2008). If one of the event features recurs subsequently, the entire file gets retrieved from memory, which can serve as a shortcut to appropriate processing strategies and actions (Frings et al., 2020; Hommel et al., 2001).

In the present study, we sought to determine to what degree the process of initial encoding and subsequent retrieval of a control state within an event-file is affected by the level of attention participants are paying to their task. Research on the implementation of context-appropriate cognitive control – including that for an episodic control-binding hypothesis – has largely assumed a continuous engagement of attention when encoding and implementing cognitive control processes. However, continuous, on-task attention is not a natural state (Seli, Beaty, et al., 2018; Smith et al., 2018). A large body of research has sought to characterize and explain these periods of inattention during mind wandering (MW; see Christoff et al., 2004; Christoff et al., 2018, Christoff et al., 2016; Seli, Kane, et al., 2018; Smallwood & Schooler, 2006). While engaging in MW does not necessitate negative outcomes (Baird et al., 2011; Brosowsky et al., in press; Pereira et al., 2020), in many cases it is associated with costs, such as reduced processing of the environment, impaired behavioral performance, and difficulties with learning (Farley et al., 2013; Kam & Handy, 2013; Smallwood et al., 2007).

It is therefore plausible that fluctuations in on-task attention would alter the efficacy of one-shot stimulus-control learning, but whether this is the case, and whether on-task attention differentially influences event encoding and retrieval is presently unknown. Some rival predictions can be derived from the prior literature depending on whether internal control states are treated as task-relevant or -irrelevant features of an event-file. On the one hand, the event-file framework posits that during the encoding of an event-file, the task-relevant stimulus features and actions are encoded together into an episodic memory (Hommel et al., 2001; see also Frings et al., 2020). Once encoded, the recurrence of an event feature is thought to automatically trigger the retrieval of the relevant (most similar) event-file (Frings et al., 2020; but see also Moeller & Frings, 2014). The automatic nature of the retrieval and implementation of information in the event-file to the current event implies that focused attention would not be required during implementation (Logan, 1988). This suggests that periods of inattention during the encoding stage of an event-file might interfere with the formation of bindings in an event-file, and thus diminish the potential effects of stimulus-control bindings on future behavior more than inattention during the event-file retrieval stage. In turn, this implies that if ongoing control processes were treated similarly to task-relevant stimulus and response features of an event, then the effect of one-shot control associations should be more impaired by inattention as encoding than by inattention at retrieval.

On the other hand, other work has provided evidence to support a contrasting prediction – namely, it has been shown that orienting attention towards response-irrelevant event features at encoding has no effect on their subsequent retrieval, but that attention to such response-irrelevant features at retrieval is necessary for them to be retrieved (and affect behavior) (Moeller & Frings, 2014; see also Hommel et al., 2014). Since ongoing control processes, like task-set reconfiguration, are not directly tied to the correct motor response on a given trial, it is possible that control states are treated similarly to response-irrelevant stimulus features. If this were the case, then one would expect that periods of inattention during encoding of stimulus-control associations would not impair formation of these bindings, nor the future use of that event-file, whereas periods of inattention during retrieval of the event-file might diminish the ability to retrieve and implement the encoded stimulus-control bindings. Here, we adjudicated between these possibilities by conducting the first study examining how the encoding and retrieval of cognitive control components of an event-file are affected by natural fluctuations in ongoing task focus.

To this end, we sought to leverage techniques used in MW research to investigate the role of task-focused attention in episodic control-binding. The most common method used to identify periods of MW (or inattention) is known as the thought-sampling method, which involves the use of intermittently presented questions that ask participants to report whether they were focused on their ongoing task (“on task”) or were inattentive to their task (“MW”) (Smallwood & Schooler, 2006). Another method for measuring MW includes the use of trial-by-trial pupillometry indices (i.e. pupil diameter), which has been shown to correlate with thought-sampling (O’Neill et al., 2019 ; Unsworth & Robison, 2018). Pupil dilation is a longstanding assay of alertness and attention in cognitive tasks (Beatty, 1982; Kahneman, 1973), and has been linked to the locus coeruleus norepinephrine system, which plays a major role in arousal (LC-NE; van den Brink et al., 2016; Eldar et al., 2013; Jepma & Nieuwenhuis, 2010; Varazzani et al., 2015). Specifically, when arousal is low or attention unfocussed – such as during MW – tonic LC levels are low, and baseline pupil diameter is smaller than when attention is high. We will therefore here refer to pupil size as an index of (in)attention.

In the present study, we sought to investigate the role of (in)attention, characterized as MW, on the encoding (prime) and implementation (probe) phases of stimulus-control associations in episodic event-files. To do this, we combined thought-sampling (Experiments 1 and 2) and pupillometry (Experiment 2) with an adaptation of a prime-probe design previously used to document one-shot acquisition of stimulus-action, stimulus-classification, and stimulus-control associations (Moutsopoulou et al., 2015; Pfeuffer et al., 2017; see also Whitehead et al., 2020). In the “prime” phase, stimuli (images of objects) are presented in the context of either a task repetition or a switch. In a subsequent “probe” phase, the image recurs as either a task repetition or a switch trial. The key finding is that switch costs in the probe phase are reduced for images that had been presented as a switch (compared to repeat) trial in the prime phase (Whitehead et al., 2020). Thus, if the creation of one-shot stimulus-control associations were supported by episodic event-files as task-relevant information, we would expect periods of MW during the encoding (prime phase) of stimulus-control bindings to interfere in the formation of these bindings, thus impairing their successful implementation, as reflected by a decreased magnitude of switch-cost reduction in the probe phase. By contrast, periods of MW during implementation (probe phase) should not affect the automatic implementation of previously created stimulus-control associations, and we would therefore expect to replicate the switch cost reductions seen in Whitehead et al. (2020). Alternatively, if ongoing control processes were treated similarly to response-irrelevant stimulus information – such that inattention during encoding did not affect their consolidation – one would expect to see the opposite interaction between periods of MW at encoding and retrieval: that is, diminished effects of control learning (no reduction in switch cost) for inattention during retrieval compared to encoding.

Experiment 1

Here, we investigated how task inattention, as indexed by self-reports of MW, affects the strength of one-shot stimulus-control associations. Specifically, we examined whether MW at encoding or retrieval/implementation of stimulus-control associations would reduce the effects of a matching event on performance. To test this, in a within-participants design, we modified the task-switching experiment used in Whitehead, Pfeuffer, and Egner (2020, Experiment 1b) by inserting thought-sampling questionsFootnote 1 during the encoding (prime phase) and implementation (probe phase) stages in some mini-blocks to determine at which stage (if any) MW affected the impact of one-shot stimulus-control associations.

Method

Participants

Based on data from Whitehead et al. (2020; Experiment 1), we used the “simr” package to simulate a set of mixed-effect models where the DV and resulting beta (β) estimates represented millisecond response times. To obtain .80 power to detect an effect as low as ∣β∣ = 80 CI [26, 133] (representing millisecond response times), a sample size of at least 60 participants was required. Thus, we recruited 64 Amazon Mechanical Turk workers to participate in the study. Data from four participants were excluded for accuracy of <70% in the task (see Whitehead et al., 2020), leading to a final sample size of N = 60 (M age 37.03 years, SD 10.74; 31 women; 50 White). Participants provided informed consent in accordance with the policies of the Duke University Institutional Review Board. To be eligible to participate, workers were required to have a US-based IP address and more than 50 approved HITs (Human Intelligence Tasks). They were informed that, to receive monetary compensation, they had to respond accurately to at least 50% of the trials.

Stimuli and procedure

The design of the present study combined a quasi-experimental method (i.e., measuring mind wandering) with experimental manipulations of different factors (i.e., switch vs. repeat). The primary task was a basic cued task-switching protocol wherein participants classified items (images of objects) according to size (small vs. large), or as mechanical (i.e., wheels, hinge, or other moving part) versus non-mechanical. Items were randomly selected from a set of 512 images (Moutsopoulou et al., 2015; Pfeuffer et al., 2017).

Items were presented in the center of the screen for 2,000 ms and were accompanied by concurrent letter cues on both sides of the image. These letter cues (a) indicated which task participants should complete (i.e., the size-judgment task or the mechanical/non-mechanical task) and (b) conveyed the correct response mapping (see Fig. 1). For the size task, the letters ‘S’ (small) and ‘L’ (large) appeared on either side of the item, whereas for the non-mechanical/mechanical task, the letters ‘N’ (non-mechanical) and ‘M’ (mechanical) appeared on either side of the item. The side of the item on which the letter appeared indicated the corresponding response button: either the ‘1’ (left) or ‘0’ (right) key on a standard keyboard. By instructing response mappings along with task instructions on each trial (rather than using fixed response mappings), this design allows one to fully dissociate classification-rule and response-mapping factors. A response to the stimuli would result in the immediate removal of stimuli from the screen and the presentation of feedback (“correct,” “incorrect,” or “too slow”) for the first 500 ms of a 1,000-ms inter-trial interval.

Fig. 1
figure 1

The paradigm for Experiments 1 and 2, illustrating the two phases – prime and probe – of an example mini-block. Each image is presented in the center of the screen with letters on either side indicating the classification task and response mapping. “Prime Task Sequence” represents whether the prime task sequence (trial N-1 to trial N) applied to a specific stimulus in the prime stage was a “task-switch” or “task-repeat” trial. “Probe Task Sequence” indicates the task sequence type (task repeat vs. switch from trial N-1 to N) for that stimulus in the probe stage. The first stimulus in each mini-block did not have a Prime Task Sequence (represented by an X) as there was no trial N-1 for this stimulus. The periodic thought-sampling questions were placed either between the prime and probe sequence or post the probe sequence, as indicated by the arrows

The experiment was divided into mini-blocks. Each mini-block contained eight trials, broken down into a four-trial prime phase and a subsequent four-trial probe phase. Each mini-block used a unique set of four items that each occurred once as a prime and once as a probe. Participants were not informed about this structure, and there were no breaks between prime and probe phases or between mini-blocks (i.e., participants were presented a steady sequence of trials without interruption). Participants completed 800 trials, seeing each item once as a prime (or encoding trial) and once as a probe (or an implementation trial). Each item was shown a maximum of twice (once as a prime and once as a probe within a mini-block), and no items ever recurred in other mini-blocks. The distance between an encoding trial and its reappearance as an implementation trial varied randomly between two and seven trials. Crucially, whereas we kept the classification task and response mapping constant between the encoding trial and implementation trials for each analyzed item (thus controlling for their respective effects), we selectively manipulated whether cognitive control requirements matched (or mismatched) between encoding and implementation. Specifically, whether a given item occurred on a task-repetition trial (same classification task as trial n-1) or on a task-switch trial (different classification task from trial n-1) could vary from prime to probe (Fig. 1). The first trial of every mini-block was not analyzed, as it could be neither a switch nor a repeat trial. However, we used first-trial items as null trials in the probe sequence (manipulating the classification task, action sequence, and task sequence) to create a less predictable presentation of the trials of interest (i.e., to prevent the order of image presentation to be repeated between encoding and implementation sequences). Half of the prime trials matched their respective probe trials in terms of control demands (i.e., both were task repeat/switch trials), and half of them were mismatched (i.e., the encoding trial was a task repetition but the implementation trial was a task switch or vice versa).

The task also incorporated 24 thought-sampling questions that assessed MW by asking participants to report on the content of their thoughts just prior to the presentation of each question. Twelve thought-sampling questions were presented immediately following the encoding sequence of a mini-block, and 12 were presented immediately following the implementation sequence of a (different) mini-block. These questions were randomly presented approximately every 60 ± 20 s. Upon presentation of each question, participants were instructed to press a button to indicate whether they were “on task,” “off task – trying,” or “off task – not trying” (Fig. 1; O’Neill et al., 2020). These conditions were defined to participants as follows:

  1. (1)

    On task: Being focused on the task means that, just before the thought-sampling screen appeared, you were focused on some aspect of the task at hand. For example, if you were thinking about your performance on the task, or if you were thinking about when you should make a button press, these thoughts would count as being on task.

  2. (2)

    Off task – trying (unintentional MW): Experiencing task-unrelated thoughts means that you were thinking about something completely unrelated to the task. Some examples of task-unrelated thoughts include thoughts about what to eat for dinner… Any thoughts that you have that are not related to the task you are completing count as task unrelated… task-unrelated thoughts can occur in cases where you are trying to focus on the task, but your thoughts unintentionally drift to task-unrelated topics…

  3. (3)

    Off task – not trying (intentional MW): …they [task-unrelated thoughts] can occur in cases where you are not trying to focus on the task, and you begin to think about task-unrelated topics.

All code and data can be found at https://osf.io/kazrb/. These experiments were not preregistered.

Analysis

Thought-sampling questions. As each thought-sampling question required a response to continue, no thought-sampling questions were excluded from analysis nor were any other data-cleaning procedures performed for those questions. Differences between response rates for each category of MW – “on task,” “off task – trying,” “off task – not trying” – were assessed with a simple regression model. Here, we analyzed the three discrete levels of the MW factor, which we conceptualized as describing a continuous severity of mind wandering rather than functionally distinct processes, using a categorical regression.

Cued task-switching. Our analysis focused on response times (RTs) in the probe trials (i.e., implementation of control); we analyzed only implementation trials with correct responses for both the encoding and the implementation trials. Outlier RTs were trimmed (<200 ms), and the implementation trial item that was presented as the initial trial of each mini-block sequence as an encoding trial was removed. Further, the trial immediately following a thought-sampling question was removed (as the thought-sampling question interrupted the switch/repeat sequence of the task-switching task).

Due to lack of predictions for “trying” versus “not trying” off-task responses (and given that there was a low trial count in the “off task – trying” and “off task – not trying” conditions when separated), we recoded responses to the thought-sampling questions as either (a) “on task” or (b) “MW” (i.e., the sum of off task – trying and off task – not trying) when relating them to performance on task-switching trials. The four closest implementation trials to each thought-sampling question (that occurred either post-encoding or post-implementation) were considered to reflect “on-task” or “off-task” performance at the encoding or implementation stage as indicated by the participant’s response to the thought-sampling question. Thus, an implementation trial could be labeled as “on task at encoding,” “on task at implementation,” “off task at encoding,” or “off task at implementation” to reflect whether MW occurred at the encoding or implementation stage for each image (below, we refer to these as “MW-known” trials). These MW-known trials were analyzed separately from trials in which no MW information was known.

Our previous research (Whitehead et al., 2020) demonstrated a Current by Previous Trial type interaction: the Current Trial Type (probe) switch cost was smaller when the Previous Trial Type (prime) was a switch trial versus repeat trial for that image. The implementation trial RTs for trials in which MW information was known were submitted to a set of hierarchical mixed models with a nested random effects structure to determine whether MW during the encoding phase affected the Current by Previous Trial Type interaction more than MW during the implementation phase, thus demonstrating the effect of MW on stimulus-control associations.

The data were fit to the following four mixed effects models using the lme4 and lmerTest packages in R. The use of mixed models, over other common analysis procedures such as ANOVAs, allowed us to better model trial-by-trial variance that would otherwise be incorporated as unaccounted for error in averaged (over trials) RTs used in ANOVA analyses. Each model had an identical nested random effects structure (see Online Supplementary Material for the specific equations).

The hierarchical structure for this set of models can be summarized as Model 1: Null (random effects only), Model 2: Current trial type, Model 3: Current ×Previous trial type, Model 4: Current ×Previous trial type × MW. Trials in which MW information was not known were analyzed using a separate set of hierarchical mixed models, with a nested random effects structure. The fixed effects structure fit to data in which MW information was not known was identical to those of Models 1, 2, and 3. The fit of mixed models was determined using the anova() command in R to conduct a chi-squared test of each model against its hierarchically subordinate model (i.e., null vs. 1-factor model).

Results

Thought-sampling questions

We found that the rates of “off task – trying” and ‘off task – not trying” responses were generally lower than rates of being “on task” (Table 1).

Table 1 Results of regression model for determining differences in rates of mind wandering

Cued task-switching: MW known

After data were fit to each model, the model fit test indicated that the full-effects model, in which the Current trial type, Previous trial type, and MW information were included as main effects and a full set of factorial interactions (Table 2), was the best fitting model.

Table 2 Results of model comparison for hierarchical models of task-switching when mind-wandering (MW) information was known

The summary output of this model indicated that there was a main effect of the Current trial type (i.e., a switch cost; ß = -.86.41, p < .001; Table 3). Further, there was a Current by Previous trial type interaction, replicating our previous findings (Whitehead et al., 2020; ß = 76.74, p = .002; Table 3). Critically, there was also a three-way interaction between MW at encoding, Current trial type, and Previous trial type (ß = -111.95, p = .024; Table 3); there was less reduction of the switch cost for previous switch trials versus previous repeat trials when participants reported being off task during the encoding stage for that implementation trial (-28 ms; Fig. 2) compared to when participants reported being on task during encoding (64 ms), on task during implementation (41 ms), and off task during implementation (33 ms; Fig. 2). In other words, one-shot control learning was abolished when participants did not attend to the task during encoding but was intact regardless of whether or not they were on-task during retrieval/implementation.

Table 3 Summary results of the Current Trial Type × Previous Trial Type × mind-wandering (MW) model
Fig. 2
figure 2

Descriptive results for task-switching trials where mind-wandering information is known. Error bars are pseudo-95% confidence intervals

Cued task-switching: MW not known

After data were fit to each model, the model fit test indicated that the full-effects model, in which the Current trial type and Previous trial type were included as main effects and an interaction (Tables 4 and 5; Fig. 3) was the best-fitting model. These results replicated previous work (Whitehead et al., 2020), showing evidence for one-shot stimulus-control associations, reflected in a 37-ms switch cost reduction for items from previous switch trials versus previous repeat trials (Fig. 3).

Table 4 Results of model comparison for hierarchical models of task-switching when mind wandering information was not known
Table 5 Summary results of the Current × Previous model
Fig. 3
figure 3

Results of the task-switching task when mind wandering information was not known. Error bars are pseudo-95% confidence intervals

Discussion

We replicated previous findings of one-shot learning of stimulus-control associations from Whitehead et al. (2020), showing a reduced switch cost for probes that were switch trials during prime presentation (Fig. 3; Table 5). Critical to the current study, we also demonstrated that inattention (MW) during the prime encoding stage of a stimulus-control binding negatively impacts the implementation of that binding, but not inattention during probe implementation stage (Fig. 2; Tables 2 and 3). These results support the hypothesis that cognitive control states are integrated into episodic event-files in a similar way to task-relevant stimulus and response features – thus requiring attention at encoding but not retrieval (Frings et al., 2020; Laub et al., 2018). Further, they speak against the possibility that control states are treated like response-irrelevant event features, which would have resulted in attention-dependence at retrieval rather than encoding (Moeller & Frings, 2014).

Experiment 2

Based on thought-sampling data, Experiment 1 suggests that attention to the task during encoding, rather than during retrieval, of an event-file determines whether a control state employed during encoding is later successfully applied during retrieval of that file. However, it could be argued that the use of thought-sampling questions in Experiment 1 is associated with two shortcomings. First, to promote a reasonable rate of attention drifting away from the task, we presented thought-sampling questions infrequently (Seli et al., 2013), which in turn resulted in relatively low trial counts for trials with known MW status. Second, thought-sampling questions tap subjective self-report, which may be inaccurate and/or biased to suit demand characteristics (e.g., participants may prefer to be perceived as being on task).

To overcome both of these limitations, in Experiment 2 we sought to replicate and extend the findings of Experiment 1 with a continuous and objective measure of task focus; specifically, in Experiment 2 we used pupillometry as a trial-by-trial measure of attention. Periods of off-task MW have been associated with smaller tonic pupil diameters (i.e., baseline pupil size; Unsworth & Robison, 2018), and a burgeoning literature has established that when attention (and hence, performance) during cognitive tasks is low, baseline pupil diameter is smaller than when attention is high (Eldar et al., 2013 ; Jepma & Nieuwenhuis, 2010 ; Unsworth & Robison, 2018 ; van den Brink et al., 2016 ; Varazzani et al., 2015). Thus, here, we adapted the Experiment 1 task design to an in-person study wherein we measured trial-by-trial pupil size in order to obtain a continuous measure of on-task focus/attention.

Methods

Participants

We aimed to collect N = 60 in order to have .80 power to detect an effect as low as ∣β∣ = 20 CI [6, 34] (representing millisecond RTs), and to replicate the sample size from our initial experiment. However, due to COVID-19 related interruptions, only 55 participants provided informed consent in accordance with the policies of the Duke University Institutional Review Board (M age 18.96 years, SD 1.99; 35 women; 29 White). Three subjects completed the experiment, but data from one were excluded for accuracy <70% in the task (see Whitehead et al., 2020), and data from two were excluded due to incomplete experimental sessions, leaving a final sample size of N = 52.

Stimuli and procedure

The stimuli and procedure were identical to those from Experiment 1, except for the following change, which was implemented to accommodate eye tracking: Unlike Experiment 1, wherein participants were presented with 24 thought-sampling questions, here, they were presented with 20,Footnote 2 half of which appeared post-encoding and the other half of which appeared post-implementation. We implemented this change in Experiment 2 because, in this experiment, MW reports were not intended to be analyzed in interaction with the Previous × Current Trial Type factors but served only as a measurement check for the continuous pupil dilation measure.

Participants were presented with the task-switching stimuli for 2,000 ms. The inter-trial interval lasted for a random time interval between 2,000 and 3,000 ms, during which a central fixation cross was present in the middle of the screen; participants were instructed to remain fixated on this cross. This timing change was to accommodate the recording of pupillometry measures, which require a longer ITI between stimuli in order for pupil diameter to return to baseline dilation.

Each participant received a series of mini-blocks that were eight trials long, grouped into five larger blocks of 80 trials in order to accommodate regular re-calibration of the eye tracker. Prior to the beginning of each large block, participants performed a nine-point calibration and validation sequence in order to ensure high data quality; a block was only started once tracking error was under 0.5°. Throughout the entire experiment, participants rested their chin on a table-mounted head rest. Pupil diameter (not area) data were recorded from the participant’s left eye with an Eyelink© 1000+ eye tracker placed directly underneath a presentation monitor, 90 cm away from participant’s eyes, in a dimly lit room. All data were collected continuously throughout each block at 500 Hz.

Pupillometry data processing

Offline, data were converted from the proprietary EDF format to a mat file format using custom scripts from SR Research©. For time periods in which Eyelink© identified blinks, as well as time periods in which the absolute value of the pupil diameter was greater than 2.5 standard deviations away from the mean pupil size (i.e., partial blinks, eyes off screen, etc.; see Kret & Sjak-Shie, 2019; Mathôt et al., 2018), the data were linearly interpolated. After interpolation, a 6-Hz Butterworth filter was applied to the data. For each trial, the average 500-ms pre-stimulus baseline pupil diameter was calculated. Prior to analysis, all baseline pupil-diameter measures were scaled for each participant.

Behavioral data processing

The same cleaning steps were taken on behavioral data as in Experiment 1.

Analysis

The use of mixed models allowed us to model the continuous trial-by-trial measure of pupil diameter as it relates to other discrete factors of interest in explaining RT variance. This would not be possible with more common ANOVA analysis designs, which cannot model a continuous, trial-by-trial predictor.

MW and baseline pupil diameter. To validate the use of pupil diameter as an objective proxy for MW, we modeled baseline pupil diameter pre-thought-sampling question as a function of the response to the thought-sampling question in a mixed model. This model is shown in the Online Supplemental Material.

Cued task-switching and baseline pupil diameter. RT data were fit to a set of hierarchical mixed models to determine whether the previous baseline pupil diameter (i.e., attention at encoding of a stimulus-control association) or the current baseline pupil diameter (i.e., attention when implementing a stimulus-control association) significantly interacted with the Current by Previous trial type interaction that is indicative of a stimulus-control association.

The data were fit to five mixed effects models using the lme4 and lmerTest packages in R. The null model only included the random effects structure. Each model had an identical nested random effects structure, which can be found in the Online Supplemental Material.

The hierarchical structure for this set of models can be summarized as Model 1: Null (random effects only), Model 2: Current trial type, Model 3: Current ×Previous trial type, Model 4: Current ×Previous trial type × Previous Baseline Pupil Diameter, Model 5: Current ×Previous trial type × Previous ×Current Baseline Pupil Diameter.

Results

Thought-sampling questions

Similar to Experiment 1, we found that the rates of “off task – trying” and “off task – not trying” responses were lower than response rates for being “on task” (Table 6).

Table 6 Results of regression model for determining differences in rates of mind wandering

MW and baseline pupil diameter

We observed a significantly smaller baseline pupil diameter prior to participants reporting being “off task – not trying” versus being “on task” (Table 7; Fig. 4), thus corroborating the expected relationship between MW and pupil dilation.

Table 7 Results of the analysis for mind wandering by baseline pupil size
Fig. 4
figure 4

Standardized baseline pupil size (500-ms pre-stimulus presentation) prior to thought-sampling question as a function of mind wandering report. Error bars are standard error

Cued task switching and baseline pupillometry

The results of our pupillometry analysis revealed that the inclusion of all four factors, with a full set of factorial interactions, was the best-fitting model (p = .017; Table 8). Critical to our interpretation, the summary of that model showed a significant three-way interaction between the Previous Baseline Pupil Diameter, the Current Trial Type, and Previous Trial Type (ß = -18.75, p = .023; Table 9). By contrast, there was a non-significant three-way interaction between the Current Baseline Pupil Diameter, the Current Trial Type, and Previous Trial Type (ß = -2.00, p = .768; Table 9). We observed a 6-ms switch cost reduction under low attention conditions versus a 34-ms switch cost reduction under high attention conditions at encoding (Fig. 5). Conversely, we observed a 19-ms switch cost reduction under low attention conditions versus a 22-ms switch cost reduction under high attention conditions at implementation (Fig. 6). Finally, we also replicated the critical Current by Previous Trial Type interaction that is indicative of a one-shot stimulus-control association (ß = -21.13, p = .008; Table 9), a main effect of Current Trial Type (ß = 26.25, p < .001), and a main effect of the Previous Trial Type (ß = 26.14, p < .001).

Table 8 Results of model comparison for hierarchical models of task-switching and baseline pupil diameter
Table 9 Summary results of the Current × Previous × Previous Baseline Pupil × Current Baseline Pupil model
Fig. 5
figure 5

Median split of task-switching data by pupil diameter at the prime phase. Error bars are within-subjects 95% confidence intervals

Fig. 6
figure 6

Median split of task-switching data by pupil diameter at the probe phase. Error bars are within-subjects 95% confidence intervals

Discussion

In Experiment 2, we extended the results of Experiment 1, showing that attention – as measured via pupil diameter – during the prime encoding stage is significantly related to the formation of one-shot stimulus-control associations, measured via the switch-cost reduction during the probe stage, while attention at the probe implementation stage is not related to successful implementation of one-shot stimulus-control bindings. Further, these results also demonstrate that the match or mismatch of cognitive control settings between the prime and the probe affects performance. Namely, the increase in RTs for probe Repeat trials that were previously Switch trials in the prime stage could stem from the retrieval of a switch setting in response to the probe image – a high readiness to switch tasks – that could interfere with repeating the task from the previous trial. Together, the results of Experiment 2 again suggest that cognitive control states are integrated into event-files in the same way as task-relevant stimulus and response features – the integration of these control states depends on attention during encoding but not at retrieval – rather than being treated as response-irrelevant event features.

General discussion

Across two experiments, we demonstrated that MW/inattention differentially affected the encoding of a stimulus-control binding at the prime stage, versus the implementation of these bindings at the probe stage. For trials in which a participant was inattentive at the prime encoding phase, we did not observe the probe-trial switch-cost reductions indicative of stimulus-control bindings in episodic event-files (Figs. 2 and 5). Conversely, low attention or MW during the implementation (probe) phase did not affect the deployment of these stimulus-control bindings (Figs. 2 and 6). This serves as strong evidence for a variant of the episodic control-binding hypothesis that situates the internal cognitive state component of these event-files as being treated similarly to task-relevant stimulus and response features.

In Experiment 1, using thought-sampling questions, we showed that MW during the prime, encoding phase of stimulus-control bindings significantly impacted the switch-cost reduction seen for previous switch versus previous repeat trials in the probe phase (Fig. 2). Conversely, reports of MW during the implementation of stimulus-control bindings in the probe phase did not adversely affect their efficacy: the switch-cost reduction for previous switch versus repeat trials remained intact. In Experiment 2, we extended this finding using trial-by-trial baseline pupil diameter to determine the attention level prior to each trial in both the prime and probe phase. Here we found a similar pattern of results: low attention during the prime – when encoding the stimulus-control binding – impaired subsequent implementation of said binding in the probe phase (Fig. 5), but low attention during the probe phase did not impair successful implementation of the stimulus-control binding (Fig. 6). These results are in line with the predictions of an episodic control-binding hypothesis that once an event-file is successfully encoded, the retrieval and implementation of the information contained in that episodic memory trace is automatic and therefore largely unaffected by the transient arousal or attention state. Further, they also provide support for a key, but under-investigated, theoretical point of the binding and retrieval in action control (BRAC) framework: that distinct processes underlie the encoding and integration versus retrieval of event-files (Frings et al., 2020). The current results demonstrate that the role of attention in the encoding of event-files is markedly different than its role during the retrieval of those same episodic traces.

This work serves to further confirm the utility of both event-file theory and the episodic control-binding hypothesis in accounting for contextually appropriate implementation of cognitive control. The theory of event-coding, and recent updates on BRAC, posit that stimuli features, responses, and outcomes are encoded into event-files via a common representational format (Frings et al., 2020 ; Hommel et al., 2001). This common encoding format allows for the automatic retrieval of an event-file through the recurrence of any feature. This framework explains the basic switch-cost as a result of the hierarchical retrieval of the previous task-set – either incompatible (switch trial) or compatible (repeat trial) with the current trial – due to the repeated context throughout the task (i.e., the repeated classification of stimuli as either small/large or mechanical/non-mechanical; Frings et al., 2020; see also Altmann, 2011; Koch et al., 2018; Moeller & Frings, 2017). Thus, in a switch trial, this process creates task-based and response-based incompatibility between the current and previous classification tasks and response (Hommel et al., 2001; Moeller & Frings, 2017).

The episodic control-binding hypothesis extends this framework by proposing the integration of task-relevant internal control states (which are required to overcome the incompatibility of switch trials) into the event-file architecture. That is, the compatibility (task repetition vs. switching) and the resulting control process (or its specific setting, e.g., a readiness to update a task set) are also hierarchically integrated in an event-file, forming a retrievable stimulus-control binding in episodic memory (Whitehead et al., 2020; see also Brosowsky & Crump, 2018; Chiu & Egner, 2017). Within this framework, hierarchically superordinate features of the event-file – the classification task and the prior internal state – would be retrieved prior to the subordinate features (response or outcome), allowing for control adjustment to proactively limit response conflict and reactively limit conceptual, categorization-based conflict. The goal of the current experiments was to demonstrate that the encoding and retrieval of stimulus-control associations, based on one-shot episodic memory formation, adheres to predictions made by the underlying event-file theory and are treated similar to other task-relevant (rather than response-irrelevant) event features. Accordingly, these task-relevant stimulus-control bindings would require active on-task attention to be encoded in event-files in episodic memory, but their retrieval would be automatic as indicated by the present results.

One might wonder why task-evoked pupillary responses were not also used to assess attention during stimulus presentation in the task-switching task. Firstly, previous work has found baseline pupil diameter is associated with MW (Unsworth & Robison, 2018), as well as arousal or attention levels (Eldar et al., 2013 ; Jepma & Nieuwenhuis, 2010 ; Unsworth & Robison, 2018 ; van den Brink et al., 2016 ; Varazzani et al., 2015). While we could have also used pupillary responses during the presentation of stimuli, this could have resulted in a methodological confound: Using pictures of real-world images in the task-switching studies introduces a wide range of luminosity and other visual factors in to the dataset. This uncontrolled variance in the stimulus set is an issue for measuring pupil diameter, which is highly variable to the physical stimulus properties that are presented. To sidestep this issue, focusing on baseline pupil diameter allowed us to control the on-screen stimulus properties (a single fixation cross) and remove any subtle biases or associations between the physical properties of a stimulus (i.e. luminosity, hue, etc.) and the task conditions. The use of baseline pupil diameter thus allowed us to make concrete, testable predictions that were grounded in extensive previous research using this measure.

In addition to relevance for event-file theory and the episodic control binding hypothesis, the current results are also of relevance to the growing literature investigating the ways in which MW need not always impair performance. Engaging in MW is indeed very often associated with impaired performance outcomes (Farley et al., 2013; Kam & Handy, 2013; Smallwood et al., 2007). However, recent work has also shown a more nuanced relationship between MW and performance impairments; MW under certain conditions may not harm performance, and may even be beneficial in some cases (Baird et al., 2011; Brosowsky et al., in press; Pereira et al., 2020; Seli, Carriere, et al., 2018; Thomson et al., 2014; see also Beilock et al., 2004). The current results provide further evidence that MW does not impair performance in every scenario. Here we demonstrate that once a process or information is encoded and automated, MW during the execution of that procedural behavior does not negatively impact performance (see also Beilock et al., 2004)

Further, the current work also informs the existing literature on task-set formation, specifically the one-shot binding of stimulus-action (S-A) and stimulus-classification (S-C) associations (Henson et al., 2014; Moutsopoulou et al., 2015; Pfeuffer et al., 2017, 2018). That is, presumably, if stimulus-control associations require attention or task-engagement at encoding for successful inclusion into an episodic event-file, then the same property might apply to S-A and S-C associations. Future work should investigate this aspect of task-set creation, as it might also provide critical evidence for the encoding/retrieval distinctions made in recent theoretical proposals (see Frings et al., 2020). Further, the degree to which participants are explicitly aware of the encoding and retrieval process of these episodic event-files is currently not known, but is an interesting avenue for future research.

Conclusion

The episodic control-binding hypothesis predicts that stimulus-control bindings are held in event-files supported by episodic memory to promote contextually appropriate application of cognitive control. In two experiments, we examined how task-focused attention affects the encoding and retrieval of a control process component of event-files. In particular, we adjudicated between the possibilities that attention is required for the encoding versus the retrieval process. The former would be in line with the idea that internal control processes are integrated into the event-file in a similar manner as task-relevant stimuli or response features, whereas the latter would suggest stimulus-control bindings are treated similar to response-irrelevant stimulus features. In Experiment 1, we demonstrated that self-reports of MW during encoding interfered with successful deployment of stimulus-control bindings at a later point, but MW during implementation (i.e., retrieval) of stimulus-control bindings does not interfere in their successful deployment. In Experiment 2, we used trial-by-trial pupillometry to show that pre-stimulus attention at encoding predicts the subsequent implementation of stimulus-control bindings better than pre-stimulus attention levels at implementation. Together, these results suggest that encoding of stimulus-control bindings in episodic memory requires active attention and engagement; however, once encoding of a stimulus-control association has occurred, these bindings are automatically deployed to guide behavior when the stimulus reoccurs. This suggests that control states are encoded into event-files in a manner comparable to task-relevant stimulus and response features.