The studies that have directly investigated the relation between the n-back task and other WM measures always revealed weak correlations. (Szmalec, Verbruggen, Vandierendonck, & Kemps, 2011, p. 148)

Correlations between the n-back task and more complex span measures are variable, with some studies reporting relatively low correlations . . . and others reporting high correlations. (Li, Schmiedek, Huxhold, Röcke, Smith, & Lindenberger, 2008, p. 739)

Theories of WM . . . all predict n-back and WM span tasks to measure largely the same thing, that is, to reflect primarily the same WM construct. Why don’t they? (Kane, Conway, Miura & Colflesh, 2007, p. 621)

Working memory (WM) is a construct that has been studied extensively in the past 50 years, since it was first mentioned by Miller, Galanter, and Pribram (1960), and especially since the influential WM model proposed by Baddeley and Hitch (1974). The concept is a more dynamic version of the short-term memory construct that was present in initial information-processing models (e.g., Atkinson & Shiffrin, 1968). WM has been studied extensively not only in cognitive psychology, but also in other areas, including social, clinical, developmental, and personality research. WM is critical to activities involving the goal-directed use of immediate memory, the maintenance and manipulation of recently attended information, and switching and scheduling task priorities in multitasking situations. An important consideration for such research efforts is how to operationally define and measure WM. In the present research, we investigated the degree of overlap between two commonly used categories of WM measures: complex span and n-back tasks.

Complex span task measures of WM

Beginning with reading span (Daneman & Carpenter, 1980), “complex span” tasks became popular measures of WM, in contrast to existing “simple span” tasks such as digit span. Other complex span tasks followed: (a) counting span (Case, Kurland, & Goldberg, 1982); (b) operation span (Turner & Engle, 1989); (c) rotation span (Shah & Miyake, 1996); and (d) symmetry span (Kane et al., 2004), to name a few. Instead of being given a list of digits to serially recall, as in digit span, subjects taking an operation span task see a series of items such as the following: “IS (2 × 1) + 3 = 6 ? DOG”. Because complex span tasks combine the recall of some items (e.g., words) while subjects also perform a secondary processing task (e.g., math operations), such tasks are also known as storage-plus-processing tests. Complex span tasks are consistently positively related to higher-order cognitive abilities, including reasoning, reading comprehension, and mathematics achievement (Daneman & Merikle, 1996; Unsworth & Engle, 2007).

N-back task measures of WM

The n-back task was reported first by Kirchner (1958). However, n-back tasks did not rise to prominence until the surge of cognitive neuroscience research, largely because the task administration is amenable to the methodological constraints (stimulus and response timing, response formats) for many neuroimaging techniques. In the n-back task, individuals are asked to report whether or not the item currently presented matches the item that had been presented n items back. Often in studies, the n varies across experimental blocks, in order to assess the effect of different memory demands upon behavioral performance and physiological correlates. In some studies, stimulus sequences are explicitly manipulated to include not only match and nonmatch trials, but also lure trials. Lure trials are those in which the current stimulus is the same as a recently presented stimulus, but is not in the correct serial position to be a match (e.g., in a three-back task, the second letter K in the sequence TKWK is a lure, because it matches the letter presented two back).

Similarities and discrepancies between complex span and n-back tasks

On the surface, both complex span and n-back tasks seem to tap related aspects of WM functioning. In both tasks, subjects need to maintain information from a set of possible stimuli (e.g., letters, words, digits, symbols, or locations). The to-be-remembered items are typically presented visually or aurally. Across trials, subjects must remember the currently relevant information and prevent interference from recently presented items. Items need to be accessible in memory for short periods of time (seconds), until the necessary retention interval has passed. Thus, complex span and n-back tasks both seem likely candidates for assessing the WM system. Indeed, performance on both kinds of tasks has provided converging evidence about similar research topics (Table 1).

Table 1 Comparison of complex span and n-back tasks as measures of working memory (WM), with representative publications

However, other, more direct lines of experimental and correlational research have provided evidence suggesting a relatively weak relationship between complex span and n-back performance. First, training studies offer the opportunity to observe whether practice on one type of WM task affects performance on other WM measures. When subjects repeatedly practice n-back tasks (dual or single), they consistently show transfer to other versions of n-back tasks (Anguera et al., 2012; Jaeggi, Studer-Luethi et al., 2010; Li et al., 2008). Likewise, when subjects repeatedly practice complex span tasks, they show transfer to other, unpracticed versions of complex span tasks (Richmond, Morrison, Chein, & Olson, 2011). However, multiple WM-training studies using either single or dual n-back training did not demonstrate transfer to complex span measures (Chooi & Thompson, 2012; Jaeggi, Buschkuehl, Perrig, & Jonides, 2008; Jaeggi, Studer-Leuthi et al., 2010; Kundu, Sutterer, Emrich, & Postle, in press; Li et al., 2008; Lilienthal, Tamez, Shelton, Myerson, & Hale, 2013; Redick et al., 2013; Thompson et al., 2013; but see Anguera et al., 2012). Thus, many WM training studies suggest that transfer is task-specific: Improving performance on one type of WM measure (n-back) does not lead to improved performance on another category of WM task (complex span). The lack of transfer suggests that improving the processes involved in n-back task performance via repeated practice does not result in changes to the processes involved in complex span task performance.

Second, and more directly, correlational research has indicated surprisingly weak relationships between complex span and n-back performance. For example, Kane et al. (2007) observed nonsignificant-to-weak correlations between operation span and two- and three-back letter tasks. The weak relationship in Kane et al. (2007) was not attributable to a lack of statistical power, given their large sample size (N = 129), nor was it due to low reliability for the operation span or n-back tasks. In fact, the nonsignificant-to-weak correlations observed by Kane et al. (2007) replicated those from previous studies (Oberauer, 2005; Roberts & Gibson, 2002). Notably, Kane et al. (2007) observed that the complex span and n-back tasks accounted for independent variance in Raven’s Advanced Progressive Matrices, despite the weak association between the two WM measures. More recently, Jaeggi, Buschkuehl, Perrig, and Meier (2010) demonstrated similar nonsignificant complex span and n-back correlations.

Present research

The present research was designed to quantify the nature of the relationship between complex span and n-back tasks. In addition to Kane et al. (2007) and Jaeggi, Buschkuehl et al., (2010), the authors of several other recently published studies have administered complex span and n-back tasks. Interestingly, two studies using latent-variable analyses indicated substantial correlations between factors containing complex span tasks, on the one hand, and n-back tasks, on the other (Burgess, Gray, Conway, & Braver, 2011; Schmiedek, Hildebrandt, Lövdén, Wilhelm & Lindenberger, 2009). Therefore, we conducted a meta-analysis to determine the magnitude of the relationship between complex span and n-back tasks across studies. We also attempted to discover variables that might moderate the relationship, as will be seen below.

Because both Jaeggi, Buschkuehl et al., (2010) and Kane et al. (2007) cited studies suggesting that n-back tasks might be more strongly correlated with simple span measures than with complex span tasks, we also conducted a meta-analysis on the relationship between simple span and n-back tasks. Although complex and simple span tasks tend to correlate strongly and measure many similar processes (Unsworth & Engle, 2007), assessing the relationship between n-back and simple span tasks could provide information about what common WM processes the different tasks measure.

Method

Study selection

We identified studies by searching through the PsycINFO database using the keywords “working memory,” “complex span,” “simple span,” and “n-back,” along with the specific names of complex span tasks. Other studies were identified by searching through the publications and references of prominent WM researchers. Several studies were identified in which both complex span and n-back tasks were administered, but the correlations were not reported. In these cases, the authors were contacted and asked to provide the specific correlations, if possible. In some cases, the authors provided correlations for tasks or conditions that were administered but not reported separately in the published article. If the study contained separate analyses for young and older adults, only the young-adult data were used. If WM tasks were administered at both pre- and posttest, only the pretest data were used.

Design and analyses

Most studies have reported complex span task scores as performance on the storage aspect of the task (accuracy or number of items recalled), excluding performance on the processing part. This practice is relatively common in the WM literature and is based on examination of the psychometric properties of alternative scoring methods (Conway et al., 2005; Redick et al., 2012; Unsworth & Engle, 2007). However, there is no consensus on what is the dependent variable for the n-back task. Authors have reported performance in terms of mean accuracy and/or response times, and have included (a) overall accuracy; (b) accuracy and/or response times reported for specific trial types (targets, nontargets, lures); (c) hit and false alarm rates; and (d) signal detection measures of sensitivity and bias. Although the use of overall accuracy may not be ideal, we decided that it was the one measure of n-back performance that was both reported most often and the easiest for the authors to obtain upon request. In addition, multiple formulas can be used to derive signal detection measures, particularly in terms of the decision about how to correct for hit and false alarm rates at ceiling and at floor. Therefore, except where noted in Table S1, overall n-back accuracy was used as the dependent variable in the meta-analysis.Footnote 1

The main analyses of interest were meant to specify the magnitude of the correlations between (a) complex span and n-back, and (b) simple span and n-back tasks. For studies with multiple measures of complex span or n-back, the correlations were averaged together to deal with sample-dependence issues. After calculating the averaged correlations for each study, mean-weighted correlation coefficients (r+) were calculated, along with the 95 % confidence interval for r + (Hedges & Olkin, 1985). Finally, as a measure of the heterogeneity of the correlations derived from the different studies, Q total was calculated and tested for significance in relation to a χ 2 distribution (degrees of freedom = number of studies – 1).

Moderator variables

In an effort to examine sources of heterogeneity in the literature, we conducted moderator analyses using the following variables:

  • N-back load Where possible, our analyses focused separately on correlations of the complex and simple span tasks with n = 2 and 3. Too few studies had n at other levels for us to include them.

  • Automated versus traditional complex span For complex span tasks only, we examined separately the correlations with n-back for traditional versus automated complex span tasks (Unsworth, Heitz, Schrock, & Engle, 2005). Automated complex span tasks are completely mouse-driven and require subjects to identify the to-be-remembered items by clicking among a matrix of possible stimuli at test. In contrast, traditional complex span tasks are presented in either paper-and-pencil or a combination of computerized and paper-and-pencil response formats, in which the subject must generate the to-be-remembered items at test. Although the researchers in previous work have concluded that the automated and traditional complex span tasks largely measure the same underlying WM construct (Unsworth et al., 2005), we included it as a possible moderator here.

  • Complex and simple span content Complex and simple span tasks with verbal stimuli (numbers, letters, or words) as to-be-remembered items were coded separately from complex and simple span tasks with nonverbal memoranda (arrows, locations, patterns, shapes, or symbols).

  • N-back content N-back tasks with verbal stimuli (numbers, letters, or words) were coded separately from n-back tasks with nonverbal stimuli (locations or shapes).

  • Recall order For digit span only, studies were coded separately if the authors had administered either the forward or the backward version of the test. Although backward digit span is typically assumed to be a WM measure because item order must be manipulated, other studies have provided evidence that in young adults, forward and backward digit span measure largely common cognitive processes (Rosen & Engle, 1997; St. Clair-Thompson, 2010).

Results and discussion

Descriptive information about all of the studies included in the meta-analysis is provided in Table S1. As can be seen in that table, verbal, numerical, and visuospatial domains were represented across the studies. Also apparent in Table S1 is that in numerous studies the researchers administered multiple measures of complex span tasks but only one n-back task to subjects. Finally, the samples consisted mostly of young-adult subjects, as indicated by the mean age of the subjects, the age range of the subjects, or the fact that subjects were high school or undergraduate students.

Table 2 provides the aggregated correlations from the studies contributing to the meta-analysis. The meta-analysis results are presented in Table 3. Examining the confidence intervals reveals that all of the mean-weighted correlations were statistically greater than zero. The complex span and n-back correlation was r + = .20 (95 % CI = .16 to .24), and the simple span and n-back correlation was r + = .25 (95 % CI = .21 to .30). The overlap of the confidence intervals indicates that the difference between the two correlations was not significant. Although the test for heterogeneity was not significant for the simple span correlation, a significant Q total value for the complex span correlation indicated heterogeneity among the correlations across the 20 samples.

Table 2 Correlations with the n-back task from individual studies in the meta-analysis
Table 3 Overall meta-analysis results and results for moderators of complex and simple span correlations with the n-back task

As can be seen in Table 2, some studies have reported rather large zero-order correlations between complex span and n-back tasks. Several moderator variables were examined to try to account for this heterogeneity (Table 3). First, the complex span correlations did not differ as a function of n-back load of n = 2 or 3. Second, the administration method of the complex span tasks did not affect the correlation with n-back accuracy. However, moderator analyses determined that the verbal or nonverbal nature of the to-be-remembered items influenced the magnitude of the complex span and n-back correlation: N-back correlations were significantly higher with nonverbal (r + = .31, 95 % CI = .26 to .36) than with verbal (r + = .18, 95 % CI = .14 to .22) complex span tasks. However, complex span correlations were not statistically different for nonverbal (r + = .23, 95 % CI = .17 to .30) versus verbal (r + = .16, 95 % CI = .11 to .21) n-back tasks. Finally, the studies were further divided according to the content of both the complex span and n-back tasks. The correlation between nonverbal complex span and nonverbal n-back tasks was highest (r + = .32, 95 % CI = .23 to .40) and was significantly greater than the correlation between verbal complex span and verbal n-back tasks (r + = .14, 95 % CI = .09 to .19).

For the simple span correlations, none of the moderator analyses based on the content (verbal/nonverbal) of the simple span or the n-back tasks produced a significant difference. In addition, the simple span correlations did not differ as a function of n-back load (n = 2 or 3). However, when we analyzed the order of administration for the digit span, the correlation with n-back was significantly greater for the backward (r + = .31, 95 % CI = .24 to .37) than for the forward (r + = .16, 95 % CI = .10 to .23) condition.

Discussion

The meta-analysis results can be summarized as follows. First, the correlations of the complex span measures with n-back tasks were significantly greater than zero, but still weaker than would be expected for measures of the same underlying WM construct. In this respect, the meta-analysis results confirm those of Kane et al. (2007) and Jaeggi, Buschkuehl et al., (2010), individual studies that were specifically designed to address the magnitude of the relationship between complex span and n-back. Of course, the meta-analytic results extend the findings of Kane et al. (2007) and Jaeggi, Buschkuehl, et al. by measuring the relationship across multiple samples, different complex span tasks, and variations in the n-back tasks administered.

As we mentioned in the introduction, some studies have reported rather large zero-order correlations among complex span and n-back tasks, indicating more overlap in the processes involved in the successful completion of both tasks. The meta-analysis results indicated significant heterogeneity among the complex span and n-back correlations in the literature. Moderator analyses determined that the verbal or visuospatial nature of the to-be-remembered items influenced the magnitude of the complex span and n-back correlation. One prediction was that higher correlations would be obtained when the different WM tasks used stimuli from the same domain, instead of from across domains. For example, Redick et al. (2012) observed, in a sample of over 6,000 subjects, that the correlation between verbal complex span tasks (r = .68) was higher than the correlation between verbal and visuospatial complex span (r = .52–.53) tasks. The moderator analyses both partially support and partially undermine this prediction. The highest correlation was obtained when both the complex span and n-back tasks used visuospatial content. However, the lowest correlation was obtained when both the complex span and n-back tasks used verbal to-be-remembered information. Thus, as can be seen in Table 3, a more accurate description of the complex span and n-back results would be that the correlation between the types of tasks tended to be higher if one or both included visuospatial content as to-be-remembered memory items.

For the simple span results, the simple span and n-back correlation did not differ from the complex span and n-back correlation. Interestingly, when examining digit span specifically, the digit span backward correlation with n-back was greater than the digit span forward correlation with n-back. In addition, the digit span backward correlation with n-back (r + = .31, 95 % CI = .24 to .37) was greater than the verbal complex span correlation with n-back (r + = .18, 95 % CI = .14 to .22). A speculative interpretation is that the digit span backward task requires subjects to temporally reorder information after the digits have been encoded, thereby forcing them to update the relative serial position of the items in memory (e.g., 1–6–4–7 must be recoded as 7 in the first serial position, 4 in the second serial position, etc.). Similarly, across trials in the n-back task, subjects must change the serial position of items that have been previously encoded as new items are continuously presented (e.g., H goes from being item n, to item n – 1, to item n – 2, etc.). The similarity of these reordering processes may account for the higher n-back correlation with digit span backward than with either digit span forward or verbal complex span tasks (see Oberauer, 2005, and Szmalec, Verbruggen, Vandierendonck, & Kemps, 2011, for more on the role of binding items to the appropriate temporal context within the n-back task).

Sample composition as a moderator variable

An often overlooked aspect of individual-differences studies is whether the sample included a wide variation in cognitive abilities. Redick et al. (2012) showed that complex span intracorrelations varied as a function of the sample type—correlations among complex span tasks were smaller for samples from more selective universities than for samples from more diverse universities and samples composed of community volunteers. The pattern of results in Redick et al. (2012) is consistent with Spearman’s (1927) “law of diminishing returns,” in which correlations among mental-ability tests are smaller in individuals with higher IQ. Although IQ estimates are not available for all subjects in the present meta-analysis, it is apparent that, across studies, the samples represented different points along the mental-ability continuum.

For example, Roberts and Gibson (2002) used a sample of students at Massachusetts Institute of Technology and obtained an average complex span and n-back correlation of r = .01. In two separate samples of University of Georgia students, Unsworth (2010) and Unsworth et al. (2009) obtained average complex span and n-back correlations of r = .08. In Study 1 of Jaeggi, Buschkuehl, et al. (2010), 95 % of the subjects had a college degree or higher level of education—and the average complex span and n-back correlation was r = −.07. In contrast, Burgess et al. (2011) used a combination of Washington University St. Louis students and community volunteers and obtained an average complex span and n-back correlation of r = .41. Greenstein and Kassel (2009) recruited Chicago area community members and obtained an average complex span and n-back correlation of r = .50. Greenstein and Kassel’s results are particularly interesting, given that they used the same two- and three-back task as Kane et al. (2007), who found low and nonsignificant correlations with operation span using an undergraduate-only sample. In fact, Greenstein and Kassel argued that their “sample was probably more diverse in general cognitive ability” (p. 87) than was the Kane et al. (2007) sample. Making some assumptions about the IQ range of the subjects in these example studies, the pattern of correlations is consistent with the law of diminishing returns.

In order to further investigate the role of sample composition in complex span and n-back correlations, we analyzed previously unpublished data.Footnote 2 One hundred fifty-five subjects from 18 to 35 years old completed, as part of a larger study, automated operation span, automated symmetry span, and a three-back task with letter stimuli. Importantly, we also had access to the self-reported college status of these subjects. Seventy-five of the subjects were enrolled as students at Georgia Tech (GT) or Georgia State University (GSU), whereas 80 subjects were not college students or had attended another area college (primarily technical- and associate-degree-granting schools). This division of the sample led to roughly equal sample sizes that were also large enough to afford meaningful conclusions.

The correlations are presented in Table 4. In the overall sample, the zero-order correlations were moderate and larger than the complex span and n-back correlation obtained in the meta-analysis, but similar in magnitude to those from studies using similar sampling methods (e.g., Burgess et al., 2011). When examining the subsamples, the patterns of correlations differed, despite the similar sample sizes. In the GT + GSU subsample, the complex span and n-back correlations were not significant, but in the None + Other subsample, the complex span and n-back correlations were significant. For symmetry span, the subsample results indicated 10.6 % (.352–.132) more shared variance with the n-back in the None + Other subsample than in the GT + GSU subsample.

Table 4 Complex span and n-back correlations as a function of college status

One possible reason that the samples with high-ability subjects have tended to produce smaller correlations is that there is insufficient variability on the tasks, as compared to more diverse samples. Restriction of range would certainly be a problem that could limit the correlation magnitude. Although we cannot speak for the studies in the meta-analysis, restriction of range did not appear to be a problem in the GT + GSU subsample here. As can be seen in Table S3, although the GT + GSU subsample had a higher mean than the None + Other subsample on operation span, symmetry span, and three-back accuracy (all ps < .01), the variability did not appear to be very different between the subsamples. Levene’s tests for variance differences were only significant for operation span (p = .02), but not for symmetry span (p = .99) or a three-back task (p = .46).

Implications for WM research

The most important finding from the meta-analysis is that complex span and n-back tasks are weakly correlated, which is important, considering that the tasks are thought to both be measures of the same WM system. For comparison, in Daneman and Merikle’s (1996) meta-analysis, the complex span correlations with global and specific language comprehension measures ranged from r = .30 to .52. The correlations in Daneman and Merikle are not assumed to be measures of the same underlying WM construct, yet they are higher than the meta-analysis correlation reported here between complex span and n-back. Because of positive manifold, in which reliable tests produce positive correlations with each other, regardless of the exact underlying construct that the measure is designed to assess, it is striking how small the correlation is between complex span and n-back tasks.

The meta-analysis results validate what Kane et al. (2007) discussed. Namely, WM researchers should be specific in interpreting the results of studies that use complex span versus n-back tasks as WM measures. The results of WM research using the different categories of tasks cannot simply be used interchangeably. Therefore, as Kane et al. (2007) stated, the large body of cognitive neuroscience research using the n-back task is informative for understanding the neural substrates of WM, but it may shed little light on the nature of individual differences in WM as measured by complex span tasks. Likewise, for WM-training studies using n-back tasks, attempts to find “near transfer” to complex span tasks are somewhat misguided. In light of the small amount of shared variance between the types of WM tasks, practice on the n-back task need not affect performance on complex span measures of WM at all, and vice versa.

At first glance, our results indicating minimal processing overlap between complex span and n-back tasks seem completely inconsistent with the findings of Schmiedek et al. (2009). Those researchers created two WM latent variables, one with loadings from multiple updating tasks, including a three-back task, and a complex span factor composed only of reading, counting, and rotation span. The best-fitting model in Schmiedek et al. (2009) indicated that the correlation between these two WM latent variables was r = .96. However, closer inspection reveals that their results do fit with those of our meta-analysis. Reading span and counting span (both verbal tasks) had small, nonsignificant correlations with the nonverbal three-back task. In contrast, rotation span had sizeable, significant correlations with three-back; the content of both rotation span and three-back were nonverbal stimuli. The correlation between the two latent variables was so high at least partially because the complex span factor loading was much stronger for rotation span (.70) than for reading (.34) or counting span (.37).

Obviously, the reliability of the WM measures will place an upper limit on the maximum correlation that can be obtained between them. Given their frequent use in the individual-differences literature, the reliability of various complex span tasks has been assessed, and it is often quite high (Redick et al., 2012). However, less psychometric work has been carried out using the n-back task, and the findings have been largely inconsistent. For example, Jaeggi, Buschkuehl et al. (2010) examined the split-half reliability of verbal (auditory modality) and spatial (visual modality) one-, two-, and three-back tasks in three different samples. The reliabilities for the corrected-accuracy (hits minus false alarms) dependent variables were generally low (two-back, r = .09 to .85; three-back, r = .39 to .60). Indeed, Jaeggi, Buschkuehl et al. (2010) concluded that “the N-back task does not seem to be a useful measure of individual differences in WMC, due to its low reliability” (p. 409).

However, other studies have reported acceptable reliabilities for the n-back task. For example, the following studies included in the meta-analysis reported reliabilities greater than .70: Jaeggi, Studer-Luethi et al. (2010); Krumm et al. (2009); Oberauer (2005); Schmiedek et al. (2009); and Unsworth (2010). In addition, although the complex span and n-back tasks might not be correlated strongly, the fact that n-back tasks are significantly correlated with other measures, such as Raven’s Advanced Progressive Matrices (Colom, Abad, Quiroga, Shih, & Flores-Mendoza, 2008; Jaeggi, Buschkuehl et al., 2010; Jaeggi, Studer-Luethi et al., 2010; Kane et al., 2007), indicates that the n-back can capture meaningful individual-differences variation. In contrast to the Jaeggi, Buschkuehl, et al. quote above, Kane et al. (2007, p. 618) concluded that “n-back is a reliable individual-differences indicator of some construct(s).” This, then, is the puzzle—why would two measures of WM correlate less strongly than measures of putatively different constructs?

In contrast to viewing WM as a monolithic construct, we agree with other theories (e.g., Oberauer, Süß, Wilhelm, & Sander, 2007; Unsworth & Spillers, 2010b) that WM is a multifaceted system that relies on multiple processes (encoding, maintenance, recall, recognition, familiarity, updating, temporal ordering, binding, attention, and inhibition). If complex span and n-back tap largely separate components of the WM system, then one can account for why two WM measures might be only weakly correlated.

One notable distinction between complex span and n-back tasks is whether retrieval is based on recall or recognition. On complex span tasks, subjects must generate the sequence of stimuli presented on a given trial, whereas n-back tasks require subjects to recognize a current item as an item that was recently presented in the correct serial position. The importance of this distinction between recall versus recognition is evident in two separate lines of research.

First, Shelton and colleagues (Shelton, Elliot, Hill, Calamia, & Gouvier, 2009; Shelton, Metzger, & Elliot, 2007) measured the relationship between complex span tasks and a variant of the n-back task called the modified lag task. In this task, subjects are presented with a series of stimuli of unknown length. During the presentation of the stimuli, no response is required, but after the sequence is presented, subjects are required to recall either the last item, the penultimate item, or the antepenultimate item. The item that the subjects will need to recall is also unknown until the stimulus presentation is complete. As can be seen, the method of the modified lag task is quite different from the typical n-back task, in which, for example, subjects know in advance that they are making target/nontarget decisions about every item presented and know that the decision is based on whether the current item matches the one presented n items ago. Germane to the present work, across three unique samples, Shelton and colleagues observed complex span and modified lag task correlations ranging from r = .38 to .51. Note that in the Shelton et al. studies, the samples were composed entirely of college students, so the stronger correlations do not seem to reflect the use of a more diverse sample involving community nonstudents. Instead, the stronger correlations using the modified lag task may reflect increased overlap in the processes involved because of the use of explicit recall, without the opportunity to respond relying upon a familiarity signal. In fact, the modified lag task seems more similar to a running memory span task (Broadway & Engle, 2010; Cowan et al., 2005), in that subjects do not know exactly which part of the upcoming sequence they will need to recall, but do know that the last three items are the most important items to remember.

Second, neuroimaging studies seem to point to a common neural underpinning for performance on both complex span and n-back tasks—namely, involving prefrontal, parietal, and cingulate cortex regions (for an n-back fMRI meta-analysis, see Owen, McMillan, Laird, & Bullmore, 2005). In addition, although hundreds of fMRI studies have been conducted using n-back tasks, relatively few fMRI studies have been conducted using complex span tasks. One reason is that complex span tasks do not easily lend themselves to be administered in fMRI studies, because there is typically little control over the timing of events such as processing task decisions, and especially the retrieval process for each trial. As such, no meta-analysis of complex span fMRI activation has been conducted. However, two recent complex span fMRI studies (Chein, Moore, & Conway, 2011; Faraco et al., 2011) suggest an interesting difference in brain activity patterns versus the n-back fMRI results. Both studies showed significant involvement of the medial temporal lobe during performance of various complex span tasks. The medial temporal activity, which is not typically present during n-back tasks, suggests again that the retrieval processes involved in the explicit search and recall of items during complex span tasks differentiate the two types of WM tasks.

Limitations and future directions

The advantage of a meta-analysis is that the method aggregates across multiple studies in order to estimate the magnitude of the relationship on the basis of larger samples. However, a limitation of meta-analysis is that aggregation may obscure factors that affect whether or not a relationship is observed. Although we tested potential moderators, and also examined the role of sample composition, many of the moderators were coarse, given the limited number of available studies. The verbal/nonverbal moderator variable collapsed across different types of stimuli within each category. Other n-back task variables that varied across the studies included (a) response type (target-only or target/nontarget decision), (b) presentation rate, (c) presentation modality, and (d) the size of the pool of possible stimuli. Also, the frequency and inclusion of lure trials may be an important consideration. As Szmalec et al. (2011) stated, “an n-back procedure with and one without lure trials are almost two different tasks in terms of what they measure” (p. 148). Many of the studies included here did not clearly indicate whether the n-back task included lures, and if so, on what proportion of trials.

In addition, the meta-analysis collapsed over variables that affect complex span correlations with other measures, including (a) the scoring procedure (Unsworth & Engle, 2007); (b) experimenter- versus subject-paced tasks (Friedman & Miyake, 2004); and (c) random versus ascending list-length presentation (St. Clair-Thompson, 2012). Research assessing whether these variables affect correlations with n-back performance in theoretically meaningful ways (e.g., a scoring procedure that minimizes contributions from secondary memory—Unsworth & Engle, 2007) may prove fruitful for understanding more about what complex span and n-back tasks do share in common.

In addition, there is strong evidence (e.g., Bailey, Dunlosky, & Kane, 2011; Unsworth & Spillers, 2010a) that individual differences in strategy use contribute to performance on complex span tasks. However, less is known about the explicit strategies that different subjects engage in during the performance of n-back tasks. Because strategy use during complex span task performance has been shown to mediate the relationship with other cognitive measures (Turley-Ames & Whitfield, 2003), more knowledge about the strategies underlying n-back performance would be informative. One possibility is that the presentation rate of stimuli during an n-back task may affect the particular strategies employed: A slower rate may allow covert rehearsal and active updating of the current memory set, whereas a faster rate may force subjects to rely more on familiarity matching. The n-back task is often assumed to measure updating, with subjects actively updating the current contents of a limited portion of temporary memory. However, as was illustrated by Szmalec et al. (2011), the underlying processes contributing to n-back performance are not simply the reflection of a mental counter that updates with the relevant information as necessary.

In general, many of the studies in the meta-analysis used multiple complex and/or simple span measures, but only one n-back task. Future latent-variable studies with multiple complex span and multiple n-back tasks would be helpful for determining the relationship at the construct level.

Conclusion

The results of the meta-analysis indicate that two categories of tasks used to measure WM, complex span and n-back tasks, are only weakly related. Although tasks using nonverbal stimuli have produced higher correlations, overall the findings demonstrate little shared variance among the two types of tasks. In fact, the digit span backward task was more strongly correlated with n-back performance than the verbal complex span tasks were. The present findings indicate that complex span and n-back tasks cannot be used interchangeably as WM measures in research applications. WM researchers should consider how the lack of this relationship affects interpretation of their own results and the work of others.