One of the deadliest “friendly fire” incidents in recent US military history occurred in Afghanistan in 2002 and involved the use of a global positioning system (GPS) by a US soldier. The soldier provided coordinates displayed on the GPS for an airstrike involving a 2000-lb satellite-guided bomb. Instead of hitting an enemy outpost as intended, the bomb landed on his own battalion command, injuring and killing many. While this soldier’s military training had taught him well that the GPS defaults to displaying its own location’s coordinates when its batteries are changed as he had just done, he used these coordinates anyway (Loeb 2002).

By failing to bring to mind well-learned information from long-term memory to guide his behavior for the task-at-hand, this tragedy appears to result from a soldier’s working memory failure during the fog of war. Working memory (WM) is a multifaceted system which includes the ability to call to mind task-relevant information from long-term memory (Lewis-Peacock and Postle 2008; Sreenivasan et al. 2011), as well as the ability to maintain and manipulate recently encountered information over short intervals (seconds) while protecting against task-irrelevant interference (Baddeley 1986; Cowan 2016). WM is critical for surviving and thriving in complex and challenging situations. Yet, as the anecdote above and a growing literature suggest, highly demanding and stressful circumstances compromise WM performance (see Arnsten 2009; Evans and Schamberg 2009; Oei et al. 2006; Qin et al. 2009).

In military servicemembers, cognitive performance across a variety of domains, including WM, degrades over the course of a combat deployment (Vasterling et al. 2006) and over the course of military field training, whether the training interval is relatively short (e.g., 5 days: Lieberman et al. 2002a, 2005; Morgan et al. 2006) or long (8 weeks: Jha et al. 2010, 2015, 2016). Degraded WM performance may signal risk and vulnerabilities for intrusive thoughts, poor mood, and psychological disorders such as PTSD (see Brewin and Smart 2005; Jha et al. 2010 for discussion). As such, there is a pressing need to develop and implement training regimens to strengthen WM and best protect against its decline. Such training could benefit soldiers preparing for and facing combat, as well as civilian first-responders, humanitarian aid workers, and others in high-demand professions who experience intense and often protracted intervals of challenge and stress.

Cognitive resilience is “the ability to maintain or regain cognitive capacities at risk of degradation, depletion, or failure in the face of situational challenges experienced over protracted time periods” (Jha et al. 2016, p. 46). One emerging area of research involves offering mindfulness training (MT) as a form of cognitive resilience training by which to bolster cognitive control processes such as WM and attention (see Jha et al. 2010, 2015, 2016; Leonard et al. 2013; Morrison et al. 2014), which can be compromised due to high demand and stress (e.g., Arnsten 2009; Hofmann et al. 2008, 2012; Oei et al. 2006; Qin et al. 2009). Mindfulness is described as “a mental mode characterized by attention to present moment experience without judgment, elaboration, or emotional reactivity” (Jha et al. 2010, p. 54; see also Kabat-Zinn 2013).

Typical MT programs offer didactic content and formal exercises on how to stabilize and focus attention on one’s present moment experience without elaboration or reactivity (Jha et al. 2010, 2015). Most programs for novices emphasize concentrative exercises that direct participants to focus on a target object, such as a body sensation or sound. During a typical MT breath-focused practice, for example, participants are instructed to sit in a relaxed, upright posture, and direct their attention to the sensations of breathing and to maintain their attention on the selected target object for the period of formal practice. When they notice that their attention has wandered away from the selected object, they are instructed to gently return it to the object.

A prediction from a cognitive training perspective is that repeatedly engaging in core cognitive control processes involving attentional orienting and selection, sustained attention, WM maintenance, and control over mind wandering as part of MT practice, will lead to corresponding strengthening and enhancement of these processes (see Lutz et al. 2009; Morrison et al. 2014; Morrison and Jha 2015). In line with this prediction, several prior studies have demonstrated that MT can improve performance on measures of selective and sustained attention (e.g., Allen et al. 2012; Jensen et al. 2012; Jha et al. 2007; MacLean et al. 2010; Zanesco et al. 2013), and WM (Chambers et al. 2008; Jensen et al. 2012; Mrazek et al. 2013; Quach et al. 2016; Van Vugt and Jha 2011; Zeidan et al. 2010; but see Morrison et al. 2014), and can reduce performance lapses associated with mind wandering (Jha et al. 2015; Morrison et al. 2014; Mrazek et al. 2013).

There is growing evidence suggesting that in addition to cognitive enhancement, engaging in MT programs improves psychological health and reduces symptoms across a variety of disorders (e.g., Gotink et al. 2015; Goyal et al. 2014), including PTSD (Polusny et al. 2015). The field of military medicine has recently recommended MT’s use as adjunctive care for servicemembers and veterans suffering from insomnia, PTSD, and chronic pain (Khusid 2013; Khusid and Vythilingam 2016a, Khusid and Vythilingam 2016b). In addition to recent clinical interest in MT, a literature on MT’s neural effects has emerged (see Fox et al. 2014, 2016). Theoretical accounts suggest that the health-related benefits and neural correlates of MT are tied to functional improvements in cognitive processing that result from regular engagement in MT practices (Creswell 2017; Goldin and Gross 2010; Lutz et al. 2015).

While there is an extensive evidence base for the benefits of MT on physical and psychological outcomes (Gotink et al. 2015), as well as a growing literature on MT-related cognitive enhancement (see Lutz et al. 2015), recent findings suggest that MT may also be protective against degradation of attention and WM. Several studies suggest that MT may promote cognitive resilience of these functions over protracted periods of high demand (Jha et al. 2010, 2015; Leonard et al. 2013; Morrison et al. 2014; Rooks et al. 2017).

For military servicemembers, declines in cognitive control over high-demand intervals could have dire consequences. The military deployment cycle increases the likelihood of servicemembers enduring psychological and physical harm, as well as suffering degradation in cognitive functioning (Marx et al. 2009; Tanielian et al. 2008; Vasterling et al. 2006). In the months leading up to their deployment to a combat zone, servicemembers engage in mission-critical operational field training and “stress-inoculation” training, which have been linked to degradation in cognitive performance (Lieberman et al. 2002b, 2005; Morgan et al. 2006). Thus, the very cognitive faculties necessary for troops to best meet the challenges of combat may be compromised even before they are deployed.

A recent study investigated MT’s ability to promote cognitive resilience in WM when offered to military servicemembers over the intensive period of predeployment training (Jha et al. 2010). Two questions were examined: (1) Does the high-demand predeployment interval have deleterious effects on WM, as indexed by the operation span task (OSPAN, Unsworth et al. 2005)?; (2) If WM is degraded, can MT prevent or dampen such effects over this interval? A no-training control group (NTC) of servicemembers was compared to a group who received a 24-h, 8-week MT program, called Mindfulness-based Mind Fitness Training (MMFT)® (Stanley 2014; Stanley et al. 2011). In the NTC group, WM performance significantly declined over the 8-week interval. For the MT group, salutary effects on WM were commensurate with the amount of time individuals spent daily engaging in MT practices. Servicemembers who practiced MT exercises regularly outside of class (an average of ~12 min per day or more) maintained or improved their WM performance over time. Those who practiced less frequently or not at all degraded over time. As such, their findings were in line with past research suggesting that intensive military training compromises WM (e.g., Lieberman et al. 2002a, 2005; Morgan et al. 2006), while also demonstrating that engaging in MT practice was protective. Nonetheless, the MT program employed in the study by Jha et al. (2010) was time-intensive (24 h over 8 weeks), making its broad inclusion into military training schedules potentially challenging. An open question is whether short-form variants of MT are able to similarly protect against the predeployment interval’s deleterious effects on WM.

A recent study examined if offering short-form MT programs (8 h) to predeployment servicemembers promotes greater cognitive resilience of sustained attention (Jha et al. 2015). Taking into account the many components of the 24-h MT program, two short-form variants were developed and delivered by the same team who developed the 24-h course (Jha et al. 2010; Stanley et al. 2011). Given prior evidence suggesting that MT’s benefits are commensurate with engagement in MT homework exercises (e.g., Carmody and Baer 2008; Jha et al. 2010; Stanley et al. 2011), these variants were created to manipulate the level of in-class emphasis on MT exercises. One course variant was training-focused and prioritized in-class instruction about and engagement in MT exercises. The other variant was didactic-focused and prioritized in-class instruction about the basic principles of mindfulness, neuroplasticity, stress, resilience, and self-regulation of the autonomic nervous system. Both variants had identical time requirements for MT exercises outside of class.

In order to compare the influence of these two course variants on sustained attention, the Sustained Attention to Response Task (SART) was administered to cohorts of military servicemembers (Jha et al. 2015). During the predeployment interval, SART performance was assessed once before and once after the MT course interval in the MT groups, as well as in a group of servicemembers who received no training (no-training control group, NTC). SART performance degraded over time in all groups. Yet, there were significantly fewer performance lapses in the military cohorts receiving MT relative to NTC, with training-focused MT outperforming didactic-focused MT at the end of the MT course interval. These results suggested that while sustained attention, much like WM (Jha et al. 2010; Lieberman et al. 2002a, 2005; Morgan et al. 2006), is vulnerable to compromise over protracted periods of high-demand military training, short-form MT is protective. The key finding was that training-focused short-form MT promoted greater cognitive resilience of attention in predeployment servicemembers relative to the didactic-focused program. These results can be interpreted to suggest that similar to the amount of engagement in homework exercises (see Jha et al. 2010, 2016), an in-class focus on mindfulness training elements (e.g., MT exercises, discussion) is a strong contributor to the benefits of MT to cognitive control processes.

Herein, we investigated if WM performance would enjoy the protective benefits of short-form MT in the same training cohorts examined in the Jha et al. (2015) study. We predicted that training-focused short-form MT, more so than didactic-focused, would protect against WM decline in predeployment soldiers. This prediction was motivated by (1) MT-related salutary effects reported in these participants’ sustained attention that were strongest in the training-focused variant (Jha et al. 2015); (2) the established relationship between attention and WM (Jha 2002; Kane et al. 2007; Levinson et al. 2012; McVay and Kane 2012); and (3) prior results suggesting that WM is bolstered by short-form MT during typical civilian life (Chambers et al. 2008; Mrazek et al. 2013; Quach et al. 2016; Zeidan et al. 2010).

Several studies reporting MT-related WM benefits (Chambers et al. 2008; Jha et al. 2010; Mrazek et al. 2013; Quach et al. 2016) employed complex span tasks. In the context of cognitive training and individual differences studies, these tasks are typically implemented as a holistic indicator of WM capacity instead of a method by which to isolate specific component processes of WM (Conway et al. 2005; Redick et al. 2012; Unsworth et al. 2005). However, the delayed-recognition task utilized in the current study has been extensively used to study the component processes of WM (Dolcos and McCarthy 2006; Dolcos et al. 2007, 2008). This task is amenable to selectively isolating and manipulating processes, making it suitable for uncovering which aspects of WM are made more vulnerable over high-demand/high-stress intervals, as well as those selectively strengthened by MT.

WM delayed-recognition tasks have a structure that allows for temporal segmentation of component WM processes so they can be selectively engaged and studied via manipulation of demand level. These tasks begin with the presentation of the memory set, comprising one or more items to be maintained over a delay interval of a few to several seconds. At the end of the delay, a memory probe is presented requiring a response to indicate if it was or was not part of the memory set. Varying the number of items in the memory set (i.e., memory load; see Jha and McCarthy 2000) differentially taxes WM maintenance processes, with higher maintenance demands as the load is increased. In addition, increasing the number or type of distracters presented during the delay interval may increase disruption to ongoing maintenance processes (i.e., distracter interference; see Gazzaley et al. 2007; Jha et al. 2004).

Recently, delayed-recognition tasks, as described above, have been used to investigate the impact of laboratory-induced stress (Oei et al. 2006) as well as stress-related disorders (PTSD, Morey et al. 2009) on component processes of WM. Oei et al. (2006) reported that task performance during high- but not low-load trials was worse in individuals induced to experience psychosocial stress via the Trier stress test versus those in a no-stress group. One interpretation for this pattern is that stress produces internal sources of distraction (e.g., physiological arousal, personal preoccupations, fears, and worry), which consume processing resources necessary for maintenance (see Qin et al. 2009). When task demands are low, and there are ample resources to perform the task-at-hand, internal distraction may not compromise performance. Yet when task demands are high, internal distraction may lead to a paucity of resources and concomitant performance errors. Other studies, in contrast, suggest that performance degradation for high-load trials may be due to stress-related increases in the presence of glucocorticoids, which selectively reduce the functioning of neural regions supporting maintenance, such as the dorsolateral PFC (dlPFC; see Birnbaum et al. 2004; Wang et al. 2007).

Relatedly, a series of studies revealed that the presentation of task-irrelevant emotionally salient distracters during the delay interval is associated with disrupted delay activity in the dlPFC together with increased activity in visual and emotional processing regions (Dolcos and McCarthy 2006; Dolcos et al. 2007, 2008). These findings suggest that salient negative stimuli, despite being task-irrelevant, capture attention and interfere with ongoing maintenance of task-relevant memoranda leading to decreased WM performance. These stimuli may be similar to what is encountered in the external environment under high-stress circumstances (Morey et al. 2008, 2009). Several studies have now reported that WM task performance is worse for trials on which negative vs. neutral distracters are presented (Beblo et al. 2010; Morey et al. 2008, 2009; Oei et al. 2012), with even greater performance costs under experimentally-induced stress (Oei et al. 2006).

Given that manipulations of load and distracter interference are able to identify stress-related vulnerabilities in WM, perhaps they can also provide specificity regarding which aspects of WM are strengthened with MT in individuals experiencing protracted periods of high-demand, such as the predeployment interval soldiers experienced herein. As such, the current study examined performance on a WM delayed-recognition task in which memory load (1 vs. 2 items) and distracter interference (neutral vs. negative task-irrelevant distracters) were manipulated across trials. Experiment 1 was conducted as a manipulation check to ensure that performance varied as a function of load and distracter category and to ensure that task performance did not change in individuals undergoing a typical interval of civilian life. In Experiment 2, the same task was offered to active-duty soldiers to investigate if performance declined over a period of high-demand military training, and to determine if MT protected against this decline. In both experiments, the Perceived Stress Scale (PSS; Cohen et al. 1983) was administered at each time point in order to examine its correspondence with task performance and to track self-reported perceived stress over intervals of civilian and military life.

One advance from prior work investigating the influence of MT on WM using complex span tasks (Chambers et al. 2008; Jha et al. 2010; Mrazek et al. 2013; Quach et al. 2016) is the use of a delayed-recognition task, which allows us to probe if MT differentially bolsters component WM processes (i.e., maintenance, distracter interference, or both). Another advance is offering military cohorts MT in a short-form program as training- vs. didactic-focused variants, to determine the effectiveness of short-form MT to protect WM over high-demand intervals.

Experiment 1

Method

Participants

To examine WM task performance in individuals undergoing a typical period of civilian life, a delayed-recognition WM task with valenced distracters was administered to a group of young adult males who were recruited from the University of Miami community (N = 22; M = 22.27 years, SD = 4.65) and were remunerated for their participation. Informed consent was obtained in accordance with the Institutional Review Boards of the University of Miami and other author-affiliated universities, with oversight from the Human Research Protections Office of the US Department of Defense.

Experimental Stimuli and Design

All participants were tested before (T1) and after (T2) an 8-week interval. A trained experimenter proctored sessions during which groups of up to 10 participants were tested, each at his own PC laptop workstation. Testing occurred in a quiet room where participants sat approximately 57 cm from a PC laptop display and performed the WM delayed-recognition task, arousal and valence rating scales, the Perceived Stress Scale, and measures outside of the scope of this report (see Jha et al. 2015).

WM Delayed-Recognition Task

The WM task instructed participants to remember faces or shoes over a delay-spanning interval with distracting images. These categories were selected to ensure that the differences between exemplar faces or shoes within each memory set were not easily verbalizable. Using these exemplars within each category allowed for an emphasis of perceptual, as opposed to verbal, representations of objects in visual WM (see Jha et al. 2004; Jha and Kiyonaga 2010; Sreenivasan et al. 2007). The task timing and mnemonic stimuli were identical to those used in a previous study of WM (Jha and Kiyonaga 2010). The primary modification here was the inclusion of delay-spanning valenced distracters, which were neutral or negative images that were intended to elicit varying levels of affective interference.

Figure 1 presents a schematic of the progression of each trial. Trials began with the encoding phase during which a memory array (S1) containing either two memory items (high mnemonic load) or one memory item paired with a noise mask (low mnemonic load) was presented for 3000 ms. S1 was followed by a delay interval of 3000 ms, after which a test item (S2) was presented for 2500 ms. On half of the trials, S2 was a single image that appeared in S1 (match trials), while on the remaining trials, S2 was a novel image (non-match trials) that did not appear in S1 or elsewhere in the experiment. S2 was always of the same category as S1 (face or shoe). Participants were instructed to determine whether S2 matched either memory item in S1 and indicate a match or non-match response by pressing a designated key. Participants were instructed to respond quickly and accurately, with greater emphasis on accuracy. Half of the trials utilized faces as stimuli and the other half utilized shoes, with both trial types intermixed throughout the task.

Fig. 1
figure 1

Time course of a sample delayed-recognition working memory task trial (high mnemonic load). A low mnemonic load trial would have a noise mask in place of the second image in S1. Participants were shown 2 images of either faces or shoes (S1), and asked to remember them over a delay interval during which they were shown a distracter image (either negative or neutral in valence). Participants were shown a single face or shoe (S2) and were asked to determine whether this image matched either of the images seen in S1. S1 type (faces vs. shoes) varied randomly across trials, but S2 type always matched S1 type within trials

During the delay interval, a task-irrelevant distracter that was neutral or negative in valence was displayed for 2000 ms and was preceded and followed by a fixation cross for 500 ms. Instructions at the beginning of the task directed participants to keep their gaze in the center of the screen at all times. The delay-spanning images were drawn from a previous study conducted in military populations (Morey et al. 2008). The negative stimuli were generated from internet searches and photo collections of soldiers that depicted combat-related scenes from Afghanistan and Iraq, while the neutral stimuli depicted civilian scenes that matched the negative stimuli in terms of figure/scene ratio, scene complexity, and chromatic structure. Memory items (face or shoe stimuli) and distracter images were not repeated across trials.

On half of the trials, the delay-spanning distracters were negatively valenced; on the other half of trials, they were neutrally valenced. The task consisted of a 36-trial practice block (with accuracy feedback for the first 6 trials) and two 30-trial experimental blocks.

Thus, task demands were manipulated along two levels of mnemonic load (low vs. high) and two levels of valenced distraction (neutral vs. negative), yielding four distinct trial types that were used for analysis: low load-neutral distracter, low load-negative distracter, high load-neutral distracter, and high load-negative distracter. Each trial type occurred with equal frequency. Across the experiment, trials varied along four variables: S1/S2 category (faces/shoes), match vs. non-match trials, mnemonic load level (low/high), and distracter valence (neutral/negative). Trial order was pseudo-randomly intermixed along these four variables so that identical trial types were never consecutively presented.

Image Rating

After completing the WM task, participants were asked to complete arousal and valence ratings of the delay-spanning images utilizing a 9-point scale ranging from 1 to 9. For valence, 1 represented highly negative emotional content, 5 represented neutral emotional content, and 9 represented highly positive emotional content. For arousal, 1 represented the lowest level of arousal and 9 represented the highest level of arousal. Participants were given as much time as needed to complete the rating scales.

Perceived Stress Scale

Participants’ level of self-reported stress was indexed by the Perceived Stress Scale (PSS; Cohen et al. 1983). On this 10-item scale, participants were instructed to indicate how often they felt or thought a certain way in the past month on a scale from 0 (never) to 4 (very often). Six items were negatively stated (e.g., “How often have you felt nervous and stressed?”) and four were positively stated (e.g., “How often have you felt that you were on top of things?”). PSS total score was calculated by first reversing the responses to positively stated items, and then summing all responses. A greater PSS total score indicates a greater level of perceived stress.

Data Analysis

All 22 participants were included in all Experiment 1 analyses.Footnote 1

WM Delayed-Recognition Task

Trials where the participant did not respond were excluded, which resulted in no more than 3 trials excluded for any participant. One trial was removed from analyses for all participants due to a programming error.

The primary outcome of interest was task accuracy (% correct), in order to capture the overall evaluation of the test item as a match or non-match to the memory item, without the constraint of the duration of motor response. While accuracy was emphasized in task instructions to participants, response time (RT, in ms) for correct trials was also included in analyses for the sake of completeness. To address whether performance differed due to levels of mnemonic load and distracter valence over time, a mixed model ANOVA examining task accuracy and RT was conducted across three factors: mnemonic load (low vs. high), distracter valence (neutral vs. negative), and time (T1 vs. T2).

Image Rating

In order to confirm the basic manipulation of valence and arousal across distracter types, as well as to investigate whether this pattern varied over time, a mixed model ANOVA examined the influence of time (T1 vs. T2) and distracter valence (neutral vs. negative) on valence and arousal ratings for the distracter images.

Perceived Stress Scale

To examine the relationship between WM performance and perceived stress, correlations were conducted between overall WM task accuracy and PSS score at T1 and T2, separately. In order to investigate whether there were changes in PSS over an 8-week interval of typical civilian life, a paired samples t test was conducted to compare PSS scores at T1 and T2. Cohen’s d z was calculated to estimate the effect size for the T1 to T2 comparison (Lakens 2013).

Results

WM Task Accuracy

Consistent with previous findings (Jha and Kiyonaga 2010; Jha and McCarthy 2000), there was a main effect of mnemonic load, such that accuracy was greater for low- versus high-load trials (F(1, 21) = 29.482, p < 0.001, \( {\eta}_p^2 \) = 0.584; Fig. 2a). There was also a main effect of distracter valence, such that accuracy was greater for trials with neutral versus negative distracters (F(1, 21) = 24.905, p < 0.001, \( {\eta}_p^2 \) = 0.543; Fig. 2b). There was no main effect of time, suggesting that performance did not differ from T1 to T2 (F(1, 21) = 1.504, p = 0.234). There were also no significant 2- or 3-way interactions (all p values > 0.114). Table 1 provides accuracy results (Ms and SDs) for each condition at each time point.

Fig. 2
figure 2

Experiment 1. a Participant mean accuracy (% correct) at T1 and T2 for each load condition: High = high mnemonic load, Low = low mnemonic load. b Participant mean accuracy (% correct) at T1 and T2 for each valenced distracter condition (Negative and Neutral)

Table 1 Working memory task accuracy by time, group, and condition

WM Task RT

Given that the task instructions emphasized task accuracy more so than speed of response, accuracy was our primary measure of interest. RT results were considered only secondarily and no post hoc comparisons were performed. For RT, there was a main effect of mnemonic load, such that RT was faster for low- versus high-load trials (F(1, 21) = 96.230, p < 0.001, \( {\eta}_p^2 \) = 0.821). There was also a main effect of distracter valence, such that RTs were faster for trials with neutral versus negative distracters (F(1, 21) = 6.101, p = 0.022, \( {\eta}_p^2 \) = 0.225). There was no main effect of time, which suggests that performance did not differ from T1 to T2 (F(1, 21) = 0.148, p = 0.704). There were no significant 2-way or 3-way interactions (all p values >0.059). Table 2 provides the response time results (Ms and SDs) for each condition at each time point.

Table 2 Working memory task response time by time, group, and condition

Image Rating Analyses

There was a main effect of distracter valence on valence ratings where distracters classified as neutral were rated as less negative than distracters classified as negative (F(1, 21) = 203.020, p < 0.001, \( {\eta}_p^2 \) = 0.906). There was also a main effect of distracter valence on arousal ratings, such that neutral distracters were rated as less arousing than negative distracters (F(1, 21) = 21.193, p < 0.001, \( {\eta}_p^{2\ } \) = 0.502). No main effect of time or interaction between time and distracter valence was significant for either valence or arousal ratings (all p values > 0.175). Valence and arousal ratings (Ms and SDs) for each distracter condition at T1 and T2 can be found in Table 3.

Table 3 Valence and arousal ratings of distracter images

PSS Analyses

Correlations between PSS scores and overall task accuracy were non-significant at T1 (r(20) = 0.321, p = 0.145) and T2 (r(20) = −0.253, p = 0.255). Additionally, PSS scores did not change significantly from T1 (M = 13.36, SD = 6.63) to T2 (M = 12.18, SD = 6.40; t(21) = 1.262, p = 0.221, d z  = 0.269).

Thus, the results of Experiment 1 demonstrated the predicted effects of manipulating WM load and distracter valence and showed that task accuracy did not significantly change over an 8-week interval of typical civilian life. Moreover, PSS did not significantly correspond to task accuracy and did not change over the study interval.

Experiment 2

In Experiment 2, the task was administered to military cohorts before and after an 8-week training period occurring during a high-demand interval, which included intensive demands such as field training and training to prepare for combat deployment. Didactic-focused MT (M8D) and training-focused MT (M8T) were delivered over 8 weeks, with the first 4 weeks comprising in-class sessions and the second 4 weeks comprising independent practice and an individual interview.

We investigated if task performance degraded over the training interval and, if so, whether M8D and/or M8T were protective. Similar to Experiment 1, we also examined the relationship between perceived stress and task performance as well as perceived stress within groups over time.

Methods

Participants

Figure 3 depicts the flow of participants through each stage of the experiment. We recruited 80 healthy male active-duty US Army volunteers (M = 26.25 years, SD = 5.41) to receive training. Participants were in the predeployment phase of their military deployment cycle. The testing and training were conducted at Schofield Barracks, Hawaii, 8 to 10 months prior to their deployment to Afghanistan. The study utilized a quasi-experimental design where units (as opposed to individuals) were randomized into two training groups based on troop availability. This was due to the military requirement that organic unit structure be maintained during testing and course session scheduling. This requirement represents a typical assignment strategy for experiments involving military participants (Adler et al. 2008; Jha et al. 2015; Johnson et al. 2014). Both MT groups were made up of two partial platoons with 20 soldiers each. One group (M8D: n = 40; M = 25.78 years, SD = 4.53) contained infantry soldiers and the second group (M8T: n = 40; M = 26.73 years, SD = 6.19) contained field artillery soldiers.

Fig. 3
figure 3

A CONSORT chart describing the breakdown of group allocation, follow-up, and analysis for Experiment 2

A no-training control (NTC) group of 46 healthy male active-duty US Army soldiers (M = 23.48 years, SD = 3.45) did not receive MT but was tested as a convenience sample comparison group to the MT groups. NTC was recruited from Ft. Stewart, Georgia, in conjunction with another study (Ramos et al. 2016), the results of which will be reported elsewhere. These participants were infantry soldiers from the same battalion who were undergoing intensive field training for combat readiness although, unlike the MT groups, they were not preparing for an impending combat deployment.

All testing and training occurred during the soldiers’ duty day, and thus participants did not receive compensation for their time, as per US Department of Defense rules regarding soldier compensation during the duty day. Participants provided informed consent in accord with the Institutional Review Boards of the University of Miami and other author-affiliated universities, with oversight from the Human Research Protections Office of the Department of Defense.

Mindfulness Training Course Variants

Soldiers receiving MT were trained in one of two 8-h variants of Mindfulness-based Mind Fitness Training (MMFT). Details regarding these 8-h variants can be found in a related study reporting measures of attentional performance (Jha et al. 2015). MMFT has similarities in course structure to mindfulness-based stress reduction (MBSR, Kabat-Zinn 2013) but differs in its approach to mindfulness training and in the scope of the didactic content and contextualization for military populations. In addition to MBSR, the program draws from body-based trauma therapies, incorporating and extending concepts and self-regulation skills from Sensorimotor Psychotherapy (Ogden et al. 2006), Somatic Experiencing® (Levine 1997; Payne et al. 2015), and the Trauma Resilience Model® (Leitch 2007; Leitch et al. 2009). More details regarding the MMFT program, including the variants used herein, can be found exhaustively elsewhere (Jha et al. 2015; Stanley 2014; Stanley et al. 2011).

Soldiers were assigned either to an 8-h variant that emphasized didactic content (M8D) or an 8-h variant that emphasized MT-related practices (M8T). Table 4 provides a breakdown of the course composition of each MMFT variant. Both of the 8-h variants were delivered over 8 weeks, with the first 4 weeks comprising one 2-h course session per week, the fifth week involving mandatory 15-min individual practice interviews with the instructor and the remaining 3 weeks involving only instructions to practice independently.

Table 4 Course composition, content, and delivery structure for each MMFT variant, M8T, and M8D

M8D and M8T participants were asked to complete 30 min of daily MT exercises outside of class for the duration of each MT program.Footnote 2 Participants were provided with audio CDs intended to bolster the instruction from their in-class sessions and guide their independent engagement in mindfulness exercises. Their self-reported minutes of practice time were recorded in weekly practice logs that were submitted to the research team, but not viewed by the instructor. Participants were informed of this policy and encouraged to report their actual practice time as honestly as possible. Additionally, at the end of the course, the instructor rated each participant on a 5-point Likert scale of how much they surmised that soldiers had been practicing outside of class, based on their in-class participation and individual interviews; these ratings were submitted to the research team for use as a second-person rating of practice time.

Both courses were interrupted by a 2-week block leave during which participants were not on post and did not receive training. M8T’s block leave occurred during a homework-only period when they were not scheduled to meet with the instructor. M8D’s block leave occurred between weeks 4 and 5 (following the MMFT classes but before interviews with the instructor).

Experimental Stimuli and Design

All participants, comprising the M8T, M8D, and NTC groups, were tested before (T1) and after (T2) an 8-week training period. All testing sessions occurred within a 2-week window of the onset and completion of the training period. In the NTC group, participants were also tested at an intermediate time point during week 4, but these data are not included herein. At each time point, all participants completed the same WM task as well as the PSS. Participants in the MT groups also completed valence and arousal ratings of distracter images as outlined in Experiment 1, whereas these ratings were not collected from the NTC group, as the ratings were not part of the study design involving NTC (Ramos et al. 2016).

Data Analysis

Participant exclusions by group are shown in Fig. 3. Participants were excluded if they did not complete both testing sessions, which occurred due to scheduling conflicts or an inability to engage in the training (n = 10). Additionally, participants were excluded if they failed to meet a benchmark level of performance and task engagement, which included failure to follow instructions (n = 1), failure to respond to at least two-thirds of all experimental trials (T1: n = 2, T2: n = 5), and mean task accuracy falling below three standard deviations of their group’s mean (T1: n = 1, T2: n = 1). Any trials where participants did not provide a response were excluded. After exclusions based on these criteria, there were 33 participants in M8T, 37 in M8D, and 36 in NTC included in the following analyses.

WM Delayed-Recognition Task

To examine WM task performance across the groups at each time point, we conducted a mixed model four-factor ANOVA with mnemonic load (low vs. high), distracter valence (neutral vs. negative), time (T1 vs. T2), and group (M8T vs. M8D vs. NTC) on task accuracy and RT. Planned contrasts or t tests were conducted to follow-up significant interactions involving time and group for our primary outcome of interest, task accuracy. Planned contrasts were utilized to further investigate whether each group, separately, changed over time (T1 – T2), while t tests compared the magnitude of change in task accuracy over time between the groups. Effect sizes were calculated for independent samples (d s for between-group comparisons) utilizing the procedures outlined in Lakens (2013).

Image Rating

Valence and arousal ratings of distracters were also investigated utilizing a mixed model three-factor ANOVA with distracter type (neutral vs. negative), time (T1 vs. T2), and group (M8T vs. M8D) on valence and arousal ratings, separately.

Perceived Stress Scale

In order to examine the relationship between WM performance and perceived stress in military servicemembers, correlations were conducted in all military participants between overall task accuracy and PSS score at T1 and T2, separately. Next, we examined PSS scores at each time point and over time (i.e., T1 vs. T2).

Homework Completion

Mean total practice time (in minutes) was compared between groups utilizing an independent samples t test. Correlations were conducted between mean total practice time and instructor ratings of participant practice (indicating impressions of how much participants practiced outside of class). Additionally, correlations were conducted between mean total practice time and WM overall accuracy for each group, separately.

Results

WM Task Accuracy

Similar to the results found in Experiment 1, there were significant main effects of both mnemonic load and distracter valence. Participants were more accurate in low- versus high-load trials (F(1, 103) = 182.053, p < 0.001, \( {\eta}_p^2 \) = 0.639) and trials with a neutral versus negative distracter (F(1, 103) = 84.015, p < 0.001, \( {\eta}_p^2 \) = 0.449). There was a significant main effect of time (F(1, 103) = 13.912, p < 0.001, \( {\eta}_p^2 \) = 0.119), where participants were more accurate at T1 compared to T2. There was also a significant main effect of group (F(1, 103) = 5.658, p = 0.005, \( {\eta}_p^2 \) = 0.099), although examination of baseline performance between groups showed no main effect of group at T1 (F(2, 103) = 0.933, p = 0.396).

There was a significant time by group interaction (F(2, 103) = 10.135, p < 0.001, \( {\eta}_p^2 \) = 0.164; Fig. 4a), where planned contrasts revealed that NTC (p < 0.001) and M8D (p = 0.018) showed significant decreases in accuracy from T1 to T2, while accuracy in M8T (p = 0.263) did not significantly differ from T1 to T2. Analysis proceeded by comparing the magnitude of change in accuracy over time (T1 minus T2) between the groups. NTC showed significantly larger decreases in accuracy over time than M8T (t(67) = 4.098, p < 0.001, d s  = 0.990), and directionally larger decreases than M8D (t(71) = 1.913, p = 0.060, d s  = 0.450), while M8D showed significantly larger decreases in accuracy over time than M8T (t(68) = −3.042, p = 0.003, d s  = 0.729). Furthermore, a one-way ANOVA investigating the change in accuracy over time (T1 minus T2) revealed a significant linear trend with the greatest degree of degradation in NTC, followed by M8D, and near stable, above zero change in performance in M8T (F(1, 103) = 19.900, p < 0.001, η 2 = 0.162). Table 1 shows the accuracy results (Ms, SDs) of each trial type for the groups at T1 and T2.

Fig. 4
figure 4

Experiment 2. a Percent differences in mean accuracy (T2-T1) for all military groups. b The mean accuracy (% correct) of the military groups (M8T, M8D, NTC) over time (T1, T2) and load (high, low)

There was also a significant interaction between mnemonic load, time, and group (F(2, 103) = 4.464, p = 0.014, \( {\eta}_p^2 \) = 0.080; Fig. 4b). Follow-up two-way ANOVAs examining time and group were performed at each level of load, separately. At low load, there was a significant time by group interaction (F(2, 103) = 11.906, p < 0.001, \( {\eta}_p^2 \) = 0.188). Planned contrasts demonstrated that NTC significantly degraded from T1 to T2 (p < 0.001), while M8D and M8T did not change over this same time period (p = 0.318, p = 0.299, respectively). Next, the magnitude of change over time (T1 minus T2) was examined to determine differences between the groups. NTC showed significantly larger decreases in accuracy over time than M8T (t(67) = 4.223, p < 0.001, d s  = 1.020) and M8D (t(54.822) = 3.211, p = 0.002, d s  = 0.758)Footnote 3 while M8D demonstrated only directionally but not significantly larger decreases in accuracy than M8T (t(68) = −1.793, p = 0.077, d s  = 0.430). At high load, there was also a significant time by group interaction (F(2, 103) = 5.684, p = 0.005 \( {\eta}_p^2 \) = 0.099), and planned contrasts demonstrated that both NTC (p = 0.001) and M8D (p = 0.002) degraded from T1 to T2, while M8T did not change (p = 0.351). Comparisons of the magnitude of change over time (T1 minus T2) between the groups revealed significantly larger decreases in accuracy over time for NTC versus M8T (t(67) = 2.951, p = 0.004, d s = 0.712) and M8D versus M8T (t(68) = −3.017, p = 0.004, d s  = 0.722). M8D and NTC did not significantly differ from one another (p = 0.793). There were no other significant 2-way, 3-way, or 4-way interactions (all p values > 0.06).Footnote 4

This pattern of results suggests that the significant interaction between time and group varies with mnemonic load, but not with distracter valence. While both MT groups may provide benefits to WM performance in low-load conditions, the superior benefits of training-focused MT over didactic-focused MT may be exclusive to high-load conditions.

WM Task RT

As in Experiment 1, the task instructions’ emphasis on accuracy over speed motivated our use of accuracy as our primary measure of interest. While RTs were examined, no post hoc comparisons were performed. As expected, there was a main effect of mnemonic load (F(1, 103) = 198.456, p < 0.001, \( {\eta}_p^2 \) = 0.658) and distracter valence (F(1, 103) = 62.015, p < 0.001, \( {\eta}_p^2 \) = 0.376), such that RT was faster for low-load versus high-load trials and neutral versus negative distracter trials. In contrast to the results for accuracy, there was no main effect of time (F(1, 103) = 0.577, p = 0.449). There was a time by load interaction (F(1, 103) = 8.079, p = 0.005, \( {\eta}_p^2 \) = 0.073), and a load by affect interaction (F(1, 103) = 5.125, p = 0.026, \( {\eta}_p^2 \) = 0.047). There was also a load by affect by group interaction (F(2, 103) = 4.235, p = 0.017, \( {\eta}_p^2 \) = 0.076). There were no other significant 2-way, 3-way, or 4-way interactions (all p values > 0.064). Table 2 provides the response time results (Ms, SDs) of each trial type for the groups at T1 and T2.

Image Rating Analyses

Ratings of distracter valence showed a significant main effect of distracter valence, confirming that the distracters classified as neutral were rated as less negative than the distracters classified as negative (F(1, 68) = 350.244, p < 0.001, \( {\eta}_p^2 \) = 0.837). There was a time by distracter valence interaction, with a greater change over time (T1 − T2) for neutral distracters than negative distracters (F(1, 68) = 8.605, p = 0.005, \( {\eta}_p^2 \) = 0.112), such that neutral distracters were rated as more negative at T2 versus T1 (p < 0.001), while negative distracter ratings did not significantly differ from T1 to T2 (p = 0.550). There was also a distracter valence by group interaction (F(1, 68) = 6.600, p = 0.012, \( {\eta}_p^2 \) = 0.088) where both groups rated neutral distracters as less negative than negative distracters (all p values < 0.001), but M8T showed a larger difference between neutral and negative distracter ratings than M8D. There was no significant main effect of time nor any other 2-way or 3-way interactions (all p values > 0.080).

For arousal ratings, there was a main effect of distracter valence on arousal rating, where neutral distracters were rated less arousing than negative distracters (F(1, 68) = 50.408, p < 0.001, \( {\eta}_p^2 \) = 0.426). There was no main effect of time, group, or any interactions on arousal ratings (all p values > 0.161). Valence and arousal ratings (Ms, SDs) for each distracter condition within each group at T1 and T2 can be found in Table 3.

PSS Analysis

There was a significant correlation between PSS scores and WM overall accuracy at T1 (r(104) = −0.216, p = 0.026) that was not present at T2 (r(104) = −0.033, p = 0.734). There were significant differences between group PSS scores at T1 (F(2, 103) = 4.428, p = 0.014, \( {\eta}_p^2 \) = 0.079), where M8D (p = 0.031) and NTC (p = 0.006) had significantly higher PSS scores than M8T, but M8D and NTC did not differ from each other (p = 0.566). Because of these T1 differences, PSS scores over time were investigated in each group separately using paired samples t tests. While there was no change in PSS scores over time for NTC (T1: M = 17.972, SD = 7.666; T2: M = 16.694, SD = 6.328; t(35) = 1.291, p = 0.205, d z = 0.215) or M8D (T1: M = 13.189, SD = 7.074; T2: M = 13.676, SD = 6.799; t(36) = −0.539, p = 0.593, d z = 0.089), M8T showed lower PSS scores at T2 versus T1 (T1: M = 16.970, SD = 6.917; T2: M = 14.182, SD = 7.248; t(32) = 2.696, p = 0.011, d z  = 0.469).

Homework Completion Assessment

Data from participants’ weekly practice logs were analyzed to determine group differences in minutes of practice time over the 8-week study period. The mean total practice time for M8T was 182.000 min (SD = 279.675) and for M8D was 79.257 min (SD = 173.086). An independent samples t test revealed that M8T practiced directionally but not significantly more than M8D (t(52.184) = −1.822, p = 0.074, d s  = 0.448).Footnote 5 During the first 4 weeks of the program, mean practice time for M8T was 159.939 min (SD = 218.056) and for M8D was 53.108 min (SD = 112.982). During the second 4 weeks of the MT program, mean practice time for M8T was 22.061 min (SD = 80.268) and for M8D was 26.149 min (SD = 70.823). Additionally, the instructor’s ratings of how much each participant practiced were significantly or marginally correlated with participants’ self-reported practice time for each group (M8D: r(35) = 0.394, p = 0.016; M8T: r(31) = 0.328, p = 0.062). There was no correlation between total practice time and WM overall accuracy for M8D (r(35) = 0.022, p = 0.897) or M8T (r(31) = 0.182, p = 0.312).

Discussion

The current study investigated if short-form MT (8 h) promotes cognitive resilience of WM in soldiers experiencing an intensive interval of military training. A novel delayed-recognition paradigm requiring WM maintenance of faces or shoes (1 vs. 2) in the presence of task-irrelevant valenced distracters (negative vs. neutral images) was employed to examine the relative effectiveness of two short-form MT program variants, one training focused (M8T) and the other emphasizing didactic content (M8D).

Prior to offering this task to soldiers, Experiment 1 confirmed that overall task accuracy did not change over an 8-week interval of typical civilian life. The task successfully manipulated the level of demand by varying memory load and distracter interference. Task accuracy was significantly higher on low- vs. high-load trials, and on trials containing neutral vs. negative distraction. In Experiment 2, the task was administered to three groups of soldiers before and after an 8-week training interval. This interval comprised an intensive period of military training, and for two of the groups also included MT. Like civilians, soldiers demonstrated main effects of load and distracter valence, which confirmed that our task manipulations generalized to this population. Of particular interest was the significant time by group interaction in overall task accuracy, which revealed that task accuracy degraded from T1 to T2 for NTC and M8D, but did not change over time for M8T. In addition, there was a time by group by load interaction, which demonstrated that at low load, NTC degraded from T1 to T2, and M8T and M8D did not change over time. At high load, NTC and M8D both degraded from T1 to T2, but M8T still did not change over time. Thus, while both program variants were protective at low load, only M8T was protective at high load. As such, M8T promoted greater cognitive resilience of WM than M8D.

Surprisingly, no time by group by distracter valence interactions were observed, suggesting that there were no differences in the susceptibility to negative distraction over time and across groups. As confirmed by the significant main effect of distracter valence, task accuracy was lower for trials containing negative vs. neutral distracters. However, the magnitude of the difference across these trial types remained constant; it did not vary as a function of time or group. This performance pattern suggests that the intensive military training interval did not alter susceptibility to negative distracters, nor did receiving MT.

The lack of variability in distracter interference over time or as a function of training group is notable and unexpected for two reasons. First, prior studies suggest increased susceptibility to negative distraction during induced stress (Oei et al. 2006, 2012) as well as in patients with stress-related disorders (e.g., PTSD, Morey et al. 2009). Second, prior studies suggest that MT may reduce reactivity to negative images (Brefczynski-Lewis et al. 2007; Ortner et al. 2007). Here, while susceptibility to negative distraction was found at both time points for all groups, its magnitude did not change over time for any group, suggesting that military training may not exacerbate, and short-form MT may not mollify, this susceptibility. Future studies should examine the generalizability of this pattern. Follow-up studies could, for example, examine the impact of tasks with more potent distractors, the influence of even higher-demand intervals, such as deployment itself, or the benefits of alternate MT variants.

The a priori classification of distracters as negative vs. neutral was confirmed via self-reported image ratings of valence and arousal in civilians and the MT groups. Both civilian and military MT participants rated negative distracters as more negative and more arousing than neutral distracters. In the MT groups, neutral distracters were rated as more negative at T2 than T1, regardless of group membership (i.e., M8T, M8D), whereas negative distracter ratings did not change over time. It is unclear if this change in neutral image rating is due to exposure to MT, the military training interval, or both. Regardless, as noted previously, there was no difference in the magnitude of the distractor valence effect (accuracy on neutral trials minus accuracy on negative trials) in WM performance. Thus, together these results suggest that while soldiers may have perceived the neutral images (but not negative images) as more negative over time, their WM task performance did not reflect a valence-specific vulnerability in WM performance over time.

In contrast to the distracter manipulation, the load manipulation effect varied over time and group. We predicted that the intensive interval of military training may selectively degrade performance in high-load trials, as found in studies involving laboratory-induction of stress (Oei et al. 2006). However, the NTC group demonstrated that the intensive military training interval degraded performance on both high- and low-load trials. Perhaps this is because the high-demand interval they endured put them at risk for depletion of WM resources. It is likely that WM is heavily utilized in the service of engaging in drills, learning new information, and regulating mood during military training. As such, the myriad of intensive and persistent cognitive and emotional challenges may have continuously taxed WM over time, leading to reduced availability of this resource. This interpretation is in line with a depletion framework of executive control which proposes specific executive functions may fatigue from overuse (see Persson and Reuter-Lorenz 2010; Hofmann et al. 2012).

In addition to resource depletion from overuse, WM maintenance may become compromised over the military training interval due to increases in internal distraction, such as intrusive thoughts, preoccupations, worries, and fears, as suggested in prior studies of psychosocial stress and WM (Oei et al. 2006). Indeed, prior studies have suggested that intrusive thoughts degrade WM performance (Brewin and Smart 2005). In addition, task-unrelated thought (see Smallwood et al. 2003) has been observed to increase over the predeployment interval in prior studies investigating mind wandering in predeployment cohorts (Jha et al. 2016). Future studies should include measures of internal distraction (i.e., task-unrelated and/or and intrusive thoughts) during an ongoing WM task to examine these as possible causes for degradation in WM maintenance over such intervals.

Thus, the current study is quite limited in its ability to unequivocally explain why WM performance declines over this intensive interval, as observed in the NTC group. Yet, these results provide evidence that MT protects WM performance from decline over time. One explanation for these findings is that MT strengthens maintenance processes and reduces susceptibility to specific forms of task-unrelated distraction, such as mind wandering. Indeed, prior studies have reported that MT reduces self-reported mind wandering (Brewer et al. 2011; Jha et al. 2016; Mrazek et al. 2013). In addition, a sister study to the current project found that M8T more so than M8D protected against performance lapses often associated with mind wandering (Jha et al. 2015).

If MT indeed bolsters WM, why might the M8T group have outperformed M8D at high WM load? The WM delayed-recognition task has several elements that may index “near transfer” from the cognitive processes exercised by MT. For example, during many exercises, participants are instructed to focus attention on a target object (such as body sensations or sounds) and maintain this focus over the practice period. If distractions emerge, especially from internal sources like mind wandering, attention is to be disengaged from these mental contents and redirected back to the target object (see Hasenkamp et al. 2012 for discussion). Likewise, during the WM task, participants are to attend to the memoranda and actively maintain them over a delay interval. If they were distracted by the negative or neutral delay-spanning images, they are to return back to the internal representation of the memoranda. Accordingly, the processes trained during MT exercises, such as maintaining focus and overcoming distraction, could be applied towards WM task performance. Yet, while both M8D and M8T showed benefits to WM performance at low load, only M8T benefitted at high load. This pattern suggests that mindfulness training delivered via a training-focused (vs. didactic-focused) course may result in more effective transfer of skill from the MT exercises to high-load trials.

Although both groups were taught the same MT exercises, only M8T’s emphasis allowed for ample access to instructor-led MT practice, opportunities to discuss and receive feedback about MT exercises, and greater time spent practicing MT exercises in class (4 vs. 1 h). As such, it is unclear if the performance advantage of M8T over M8D during high-load trials was due to greater in-class practice time or cumulative practice time (i.e., both in- and out-of-class). While the primary aim of the present study was to examine the contribution of training-focused MT on WM performance, we also examined out-of-class practice. The M8T group reported directionally but not significantly greater time spent out-of-class engaging in MT exercises, and neither the M8T nor M8D groups showed a significant correspondence between out-of-class practice time and change in WM performance.

Further, examination of independent practice over the 8-week program showed that the participants reported low levels of practice during the second 4 weeks of the program. One possible explanation for this may be that during the second 4 weeks, the MT groups submitted practice logs to the research team on their own, whereas during the first 4 weeks, these logs were collected at the in-class sessions. Without the knowledge that these logs would be collected in-class, they may have practiced less or kept less careful records. Future studies could include weekly check-ins during the second half of the program to collect practice logs and provide additional practice support. Nonetheless, our findings were not tightly linked to the specific amount of independent practice. One interpretation of the WM performance benefits described herein, which we favor, is that they are related to the in-class emphasis of the MT course rather than out-of-class practice.

Given previous literature suggesting that higher stress may lead to lower task performance, especially under higher task demands (i.e., high-load or negative distracter trials; Oei et al. 2006), we investigated whether levels of perceived stress corresponded with performance on our WM task. At baseline, there was no correspondence between PSS and WM task performance in civilians, but there was a significant correspondence in all military participants. This may have been due to the greater personal relevance of the combat-related distracter images to the military participants compared to civilians. There were also group differences in self-reported perceived stress at baseline. While we might have predicted that the NTC group would report lower stress as they were not preparing for an impending deployment, we found higher levels of perceived stress in the NTC and M8T groups than the M8D group. These results may suggest that level of military demand is not directly associated with levels of self-reported perceived stress. We also investigated PSS over time in each of our groups. While civilians, NTC, and M8D did not show changes in PSS over time, M8T showed decreases in perceived stress over the training interval. This provides some evidence suggesting that the benefits of training-focused MT include stress reduction. However, prior to making strong conclusions, it is important to evaluate other stress-related measures, such as cortisol and related hormone levels (Hoge et al. 2007). A multimodal investigation of stress is preferable due to concerns that military servicemembers may underreport levels of perceived stress or other psychological symptoms due to professional concerns or military culture (Langston et al. 2007; Greene-Shortridge et al. 2007; Stanley et al. 2011).

Nonetheless, in finding that M8T provided greater cognitive resilience for WM processes relative to M8D, the current study does advance knowledge regarding best practices for MT implementation. While informative, it is important to note several limitations of the current study. Many of these limitations are tied to the constraints of conducting research with active-duty military cohorts, including unit selection and randomization procedures. MT randomization occurred by group, with participants assigned to M8T or M8D based on their platoon assignment. Yet, training groups were matched as best as could be accommodated (age, gender, predeployment training regimen, expected mission during deployment, and point in the deployment cycle), and no baseline differences in overall performance were found across groups at T1. A second limitation was that the M8T and M8D groups had a 2-week block leave during the training interval, during which participants were not on post and did not receive training. However, because both the M8T and M8D groups had block leave after participants received all 8-h of in-class content, we do not consider this to be a strong factor driving our pattern of results. A related limitation is tied to the timing of the T2 testing session. Participants were assessed prior to and following the 8-week MT program, which was designed to include 4 weeks of in-class sessions followed by 4 weeks of independent practice and an independent interview. As the second half of the MT program was also interrupted by a 2-week block leave, T2 occurred several weeks after the completion of the in-class sessions. This may limit our ability to draw strong conclusions about the immediate effects of in-class MT.

All three military groups included US Army soldiers from combat arms military occupational specialties, engaging in field training. A final limitation, however, is that only the MT groups were in the predeployment interval. These two groups were US Army soldiers preparing for a counterinsurgency combat deployment to Afghanistan, whereas the NTC group were US Army soldiers engaged in intensive field training for combat readiness, but without a specific deployment date. Moreover, the MT groups were preparing to deploy to Afghanistan during the US troop surge period, during which US troops experienced relatively more violent and fatal combat. Their knowledge of this may have added to the predeployment demands and concerns experienced by the MT groups. As such, unlike the two MT groups, the NTC was neither part of the same parent unit and command climate nor were they preparing for an impending combat deployment. While this can be seen as a limitation of the study, it does not appear to explain our pattern of findings. The MT groups, who showed protective benefits over the high-stress interval, were preparing for an impending combat deployment, whereas, the NTC group, who were not in the predeployment interval, showed the largest degree of degradation.

Many prior MT studies have been limited by inadequate designs, such as failure to include an active control group for comparison with the MT and no-training control groups. An ideal active control group would be well-matched to the MT group in instructor expertise, enthusiasm, and personality, as well as psychosocial support, in- and out-of-class time demands, and participant expectations of benefits. An important feature of the current study was that both M8T and M8D included course content drawn from the same longer-form MMFT course and were taught by the same instructor. Thus, the present study design was able to address many of the limitations regarding appropriate controls raised in previous MT studies. These considerations allowed for stronger conclusions regarding the superiority of the M8T vs. M8D group at T2 for high-load conditions, because these findings cannot be attributed to instructor-related differences, differences in psychosocial support, or participant bias.

Future studies should aim to replicate the current study in larger cohorts of troops, with a study design including random assignment at the individual level, with WM performance, intrusive thoughts, and cortisol measured longitudinally over the entire deployment cycle. Yet, we acknowledge that such designs, while theoretically appropriate and experimentally ideal, may not be practically appropriate or feasible in active-duty contexts due to troops’ military training schedules. Thus, research in this context aims to best accommodate high research standards while accepting that this is secondary to the military mission. Another important consideration in administering MT to military groups is the availability of qualified trainers over the entirety of the deployment cycle. The availability of such trainers could be increased by using train-the-trainer methods, which have been implemented in military resilience protocols (Reivich et al. 2011), and are beginning to be incorporated into MT programs (Ramos et al. 2016).

In sum, the current results suggest that protracted periods of high demand experienced during military servicemembers’ intensive field training may compromise WM. However, MT programs akin to M8T, which emphasize engaging in MT exercises, may protect against associated performance costs, especially at high load. Short-form MT should be further considered as a method to bolster cognitive resilience in high-demand environments. With such training, soldiers may be less likely to make preventable mistakes that may result from failures of WM.