Age differences in sustained attention tasks: A meta-analysis

Vallesi, Antonino; Tronelli, Virginia; Lomi, Francesco; Pezzetta, Rachele

doi:10.3758/s13423-021-01908-x

Age differences in sustained attention tasks: A meta-analysis

Theoretical/Review
Open access
Published: 26 March 2021

Volume 28, pages 1755–1775, (2021)
Cite this article

Download PDF

You have full access to this open access article

Psychonomic Bulletin & Review Aims and scope Submit manuscript

Age differences in sustained attention tasks: A meta-analysis

Download PDF

Antonino Vallesi ORCID: orcid.org/0000-0002-4087-2845^1,2,
Virginia Tronelli¹,
Francesco Lomi³ &
…
Rachele Pezzetta²

12k Accesses
33 Citations
11 Altmetric
Explore all metrics

Abstract

Many aspects of attention decline with aging. There is a current debate on how aging also affects sustained attention. In this study, we contribute to this debate by meta-analytically comparing performance on the go/no-go Sustained Attention to Response Task (SART) in younger and older adults. We included only studies in which the SART had a low proportion of no-go trials (5%–30%), there was a random or quasirandom stimulus presentation, and data on both healthy younger and older adults were available. A total of 12 studies were suitable with 832 younger adults and 690 older adults. Results showed that older adults were slower than younger adults on go trials (g = 1, 95% CI [.72, 1.27]) and more accurate than younger adults on no-go trials (g = .59, 95% CI [.32, .85]). Moreover, older adults were slower after a no-go error than younger adults (g = .79, 95% CI [.60, .99]). These results are compatible with an age-related processing speed deficit, mostly suggested by longer go RTs, but also with an increased preference for a prudent strategy, as demonstrated by fewer no-go errors and greater posterror slowing in older adults. An inhibitory deficit account could not explain these findings, as older adults actually outperformed younger adults by producing fewer false alarms to no-go stimuli. These findings point to a more prudent strategy when using attentional resources in aging that allows reducing the false-alarm rate in tasks producing a tendency for automatic responding.

Evidence that ageing yields improvements as well as declines across attention and executive functions

Article 19 August 2021

Inhibition in aging: What is preserved? What declines? A meta-analysis

Article 10 October 2017

Age-related differences in the attentional white bear

Article Open access 10 June 2019

The ability to maintain the focus of attention on a task over time is known as sustained attention or vigilance, and it is a fundamental component of normal cognitive capacities. Indeed, without this ability, many other cognitive functions would be compromised (Parasuraman, 1998). Given its importance for general cognitive functioning, sustained attention has been investigated in many studies.

One of the first experimental tasks used to study sustained attention dates back to the 1950s and was used to evaluate vigilance in the British Air Force (Mackworth, 1948). The original device—known as the “Mackworth Clock”—was similar to a watch with a pointer moving with short jumps. Double jumps occurred at irregular intervals, and the task was to respond to them by pushing a button. The overall task duration was about 2 hours. At first, this might be an easy task, and one would rarely make mistakes. With time on task, however, it can become harder and harder to maintain the attentional focus and accuracy starts to decrease.

This task was the starting point for many studies on sustained attention. Over the years, new tasks were developed in which the participant has to monitor a continuous flow of stimuli for a prolonged period and has to respond to rare target stimuli. These types of tasks have recently been defined as “traditionally formatted tasks” (TFTs; Stevenson et al., 2011). In this case, the vigilance decrement is the index of deterioration of sustained attention, characterized by a decrease in accuracy and/or an increase in reaction times (RTs) with time on task. The duration of TFTs varies between studies (from 150 s to 2 h), but the average duration is about 30–45 minutes (Staub et al., 2013).

Another type of task aimed at investigating sustained attention is the Sustained Attention to Response Task (SART; Robertson et al., 1997). The original SART introduced by Robertson et al. (1997) is a no-go task with a quasirandom presentation of digits from 1 to 9, in which the participant has to respond to all the digits except for 3, which is the no-go target. Digits are presented for 250 ms, followed by a 900-ms mask. The task takes about 4 minutes. The no-go trials represent only 11% of total trials, in order to favour an automated response to go trials. Hence, contrary to a TFT, the SART requires one to withhold the response to targets and to respond to nontargets. Robertson and colleagues argued that sustained attention to the task would be taxed more heavily if the automatic response was directed to nontarget stimuli. Indeed, the active-controlled processing could be activated more to overcome the prepotent automatic response at the onset of the rare target. In this sense, the commission errors (i.e., response to target) are the main indicator of the impaired sustained attention ability. The SART is more sensitive to sustained attention deficits than are traditional vigilance tasks (Staub et al., 2013) and seems to have a higher ecological validity: Commission errors are indeed positively correlated with a tendency to report everyday cognitive errors (Manly et al., 1999; Robertson et al., 1997), and more specifically, attention-related everyday cognitive errors (Cheyne et al., 2006).

Sustained attention is essential for functioning in everyday life; thus, it is important to understand how it changes across the adult lifespan, and in particular with aging. Several studies reported that older adults showed longer RTs and fewer errors on sustained attention tasks than younger adults (e.g., Brache et al., 2010; Carriere et al., 2010; Grandjean & Collette, 2011; Heilbronner & Münte, 2013; Hsieh et al., 2016; Jackson et al., 2013; Jackson & Balota, 2012; Kousaie et al., 2014; McVay et al., 2013; Mioni et al., 2019; Staub et al., 2014c; Staub et al., 2015). Longer RTs could be in line with an age-related processing speed deficit (Salthouse, 1996), which has been attributed, among other factors, to the reduction in white matter integrity associated with aging (Salthouse, 2017). However, the longer RTs and the difference in the amount of errors also suggest a conservative strategy to compensate for their poor response inhibition (Staub et al., 2013): in other words, older adults could be more cautious in responding on go trials to avoid errors on no-go trials. Although many studies show higher performance in terms of accuracy for older adults on go/no-go tasks, there are contrasting results reporting no age-related differences or even better performance in younger adults (e.g., Cassarino et al., 2019; Harty et al., 2013; Hong et al., 2014; Hsieh et al., 2016a, 2016b; Langenecker et al., 2007; Lin et al., 2018; Lucci et al., 2013; McAvinue et al., 2012; Nielson et al., 2002; Rush et al., 2006; Vallesi & Stuss 2010; Vallesi, 2011; Vallesi et al., 2011; Zavagnin et al., 2014).

To deal with these issues, the objective of the present meta-analytical study is to contribute to the debate on SART performance in cognitive aging. To this end, we selected the studies that used a cross-sectional design involving participants from 18 to 95 years of age.

The first aim was to determine the difference between older and younger adults on SART performance, above all in terms of accuracy on no-go trials. This variable indicates the ability to avoid a commission error (i.e., the capacity to inhibit the response). Indeed, calculating the accuracy on no-go trials was useful in investigating whether the inhibition capacities in older adults are preserved (Rey-Mermet & Gade, 2018) or impaired (Hasher & Zacks, 1988). Further, previous studies found that the stimulus evaluation in younger adults decreases with time on task, as compared with older adults, in whom the evaluation processes become even more controlled as the task advances. This suggests that younger adults might adopt a more automatic behavior, rather than a careful and controlled strategy (Carriere et al., 2010; Staub et al., 2015). Thus, in line with previous reports, we expected that response automatization could occur in younger adults, and consequently it could increase the likelihood of committing errors on no-go trials (Staub et al., 2015). Conversely, older adults could adopt a high degree of control over the motor system, enabling them to reach a good level of performance (Staub et al., 2015).

Indeed, some studies (Jackson et al., 2013; Jackson & Balota, 2012; Staub et al., 2014b, 2014c; Staub et al., 2015) reported a reduction in self-reported mind-wandering in older adults compared with younger ones while performing the SART. This may be attributable to older adults finding the SART more difficult and/or more engaging than do younger ones (Jackson et al., 2013; Jackson & Balota, 2012; Staub et al., 2014b, 2014c; Staub et al., 2015). These age differences may have resulted in more effort, and therefore less mind-wandering and a higher degree of control over the motor system in the older group (Jackson & Balota, 2012). A high degree of motor control could also be associated with the increase of RTs in older adults: they may prefer to be slower in order to be more careful and cautious in responding (speed–accuracy trade-off; Staub et al., 2013). For this reason, beside the screening of RTs in go trials in younger and older adults, we considered necessary to also analyze the posterror slowing (PES)—namely, the prolonged RT that is observed after the commission of an error. Indeed, several studies found that RTs after a commission error on no-go trials were increased more in older adults than in younger ones (Jackson & Balota, 2012; McVay et al., 2013; Staub et al., 2014c).

One of the main accounts for PES suggests that this effect reflects the implementation of cognitive control to improve subsequent performance (Danielmeier & Ullsperger, 2011). Cognitive control refers to processes that allow information processing of current goals and support flexible, adaptive, and complex responses. Hence, the increased PES in older adults may be indicative of a decline in cognitive control ability—that is, a difficulty in reestablishing the task set after an error has been made (Jackson & Balota, 2012). Moreover, the age difference in PES could be due to the engagement of a type of reactive thought process, also called “task-related interference” (Smallwood et al., 2004): Older adults could be more conscientious, and hence increase their self-assessment of performance after an error, thereby producing prolonged RTs (Jackson & Balota, 2012; Staub et al., 2013). The two hypotheses are not mutually exclusive.

Finally, we also analyzed the accuracy on go trials to evaluate the ability not to make an omission error. We expected to find no age-related differences (Carriere et al., 2010; Hsieh et al., 2016; Jackson et al., 2013; Jackson & Balota, 2012; McAvinue et al., 2012; McVay et al., 2013; Mioni et al., 2019). Indeed, this type of response should be simpler than no-go trials, as we chose to include only studies with a higher percentage of go trials. The second aim of this meta-analytical study was to investigate how performance varies over time in older and younger adults. Based on some of the reported findings, we hypothesized a better preservation of performance over time in older adults than in younger ones (Brache et al., 2010; Staub et al., 2014a, 2014b, 2014c; Staub et al., 2015). The more controlled response strategy in older adults could lead them to maintain a stable level of performance in the go/no-go SART over the course of the task. We also checked whether older adults’ performance is associated with increased fatigue over time.

Method

The meta-analysis is reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA; Liberati et al., 2009). Each of the recommended steps (search and eligibility criteria, study selection, data extraction and analysis) were made independently by two authors; results were compared, and possible disagreements were resolved by discussion and consensus with a third author.

Eligibility criteria

The following inclusion criteria were used to select articles for the meta-analysis:

Using the Sustained Attention to Response Task (SART; Robertson et al., 1997) or a modified SART version. In the latter case, we included only those works that used paradigms that adhere to the main parameters of the Robertson’s task, such as the presence of a single no-go trial type, random or quasirandom presentation of stimuli, a higher proportion of go trials (i.e., 70%–95%) than no-go trials (i.e., 5%–30%) and instructions emphasizing equally speed and accuracy. Only studies with a lower percentage of no-go than go were chosen to reflect the criteria identified in Mackworth's (1956) review about the nature of classic vigilance tests. According to this author, there are two types of vigilance: one is needed throughout a long test to detect the occasional significant stimuli among many others presented at a slow pace, and the other one is necessary during a short test to detect rare signals among many other rapidly presented stimuli (Mackworth, 1956). We chose the second type because it is closer to more recent definitions of sustained attention (Leclercq, 2002). Furthermore, tasks that adopt no-go stimuli as targets, considered as more difficult than TFTs (Robertson et al., 1997), could be more sensitive to age-related differences.
Inclusion of healthy samples for younger (about 18–35 years old) and older adults (60 years old and over).
Enough statistical information, such as means or medians, standard deviations (SD) or ranges, separately for the younger and older adults of the whole sample, or t or F, in order to calculate the differential effect size and perform the meta-analysis.

Information sources

A systematic literature search was carried out using PubMed, PsycINFO, and Scopus in order to retrieve relevant articles. Further, we checked the references in the selected articles and additional studies on the SART from different sources to find other potentially relevant articles.

Search strategy

The search for eligible studies was carried out between March and April 2020. Then, an update was performed December 20–21, 2020, but no additional suitable studies were found. The literature search was performed using the conjunction of the following terms: (“older adults” OR “elderly” OR “aging” OR “ageing” OR “cognitive aging” OR “cognitive ageing” OR “normal aging” OR “normal ageing”) AND (“SART” OR “Sustained Attention to Response Task”). All terms were searched both as a keyword within the text and as a word belonging to the title and/or abstract. No restriction on publication date range was applied and only published works with an English version available were considered.

Study selection

The relevant material was searched through databases, with the strategy explained above, or through other sources (e.g., citations of the articles obtained by database search). The relevance and eligibility of articles were evaluated using a hierarchical approach. The total sum of papers was first assessed for duplicates. Then, the papers were screened on the basis of title and abstract, and those that did not meet the inclusion criteria were excluded. The remaining articles were finally examined in more depth—that is, by reading the full manuscript—and those that met the inclusion criteria were included in the meta-analysis.

When a potentially eligible paper did not provide some necessary information to perform the analyses, the corresponding author was contacted via email. For example, when the study did not stratify the whole sample based on age, we directly contacted via email the authors of the article to ask for the data separately for older and younger adults. If we did not get an answer or the requested information could not be found, that study was discarded.

Before analyzing each variable taken into consideration, some clarifications must be made on some of these included studies:

The study by Carriere et al. (2010) reported the age groups by decade; hence, only the third decade (for the group of younger adults) and the seventh-plus decade (for the group of older adults) were included in the present meta-analysis, since the age of the other groups was out of our interest range.
Three studies included different experiments (Jackson et al., 2013; Jackson & Balota, 2012) and/or different conditions within the same experiment (Jackson et al., 2013; Kousaie et al., 2014), involving different participants; therefore, these experiments and conditions were divided and analyzed as independent.
McAvinue et al. (2012) reported two SART conditions: a random condition, in which the digits appeared in a random order, and a fixed one, in which there was a fixed sequence from 1 to 9. Only the random condition was taken into account as it resembles Robertson’s version. In addition, only the age groups 20s and 30s (for the group of younger adults) and the age groups 60s and 70s (for the group of older adults) were taken into consideration, since the age of the other groups was out of our interest range.
The study by McVay et al. (2013) assigned participants to two conditions based on the SART version, and we only considered Robertson’s one. The other version was excluded because the participants had to respond to targets, which were 11% of total trials. Hence, like in a TFT, the inhibition of the response did not refer to rare stimuli, but to frequent ones (89%). We contacted the authors in order to obtain the sample size and the performance variables of the standard SART condition, separately for older and younger adults. The authors kindly provided us with the sample size and accuracy on go and no-go trials.
The study by Hsieh et al. (2016) investigated cognitive performance on the SART after a reading session and an acute resistance exercise session. Since the former was considered as the baseline in that study, we decided to include only the “reading” condition in the meta-analysis.
In the study by Cassarino et al. (2019), the SART was administered before and after viewing images of natural or urban environments. Therefore, only the SART variables concerning the baseline condition were included.
We contacted Dr. Mioni for more information on her study data (Mioni et al., 2012). She kindly provided us with another article (Mioni et al., 2019), since the article found by us was a conference proceeding. Moreover, she provided us with the RTs for each trial of each participant and the mean and standard deviation of commission errors and omission errors separately for younger and older adults.

Data collection process

The meta-analysis was performed using Meta-Essentials software (Suurmond et al., 2017), in particular, the “Differences Between Independent Groups—Continuous Data” workbook, since the main outcome of interest was the mean difference between younger and older adults. All statistical information necessary for performing the meta-analysis was extracted from the retrieved articles, including sample size, means and standard deviations, separately for younger and older adults, or t or F, so that effect sizes could be calculated or at least estimated. When not directly reported in the text, statistical information was retrieved from plots using WebPlotDigitizer, a software freely available on the internet, which allows to extract numerical data from images (Rohatgi, 2019).

Data items

Only dependent variables reported by at least five studies were subjected to meta-analysis:

RTs (in ms) on correct go trials

The amount of time taken to respond to routine go stimuli. Eleven articles (Brache et al., 2010; Carriere et al., 2010; Cassarino et al., 2019; Hsieh et al., 2016; Jackson et al., 2013; Jackson & Balota, 2012; Kousaie et al., 2014; McVay et al., 2013; Mioni et al., 2019; Staub et al., 2014c; Staub et al., 2015), for a total of 18 substudies taken separately, were considered in the analysis of correct RTs to go trials. The study by McVay et al. (2013) did not report the RT standard deviation, and therefore the t value was considered. The studies by Staub et al. (2014c), Staub et al. (2015) and Cassarino et al. (2019) did not report in the text the mean and standard deviations values of the RTs, so we obtained these data from the graphs shown in these articles (their Fig. 2, Fig. 1, Fig. 2, respectively) with the WebPlotDigitizer program. In the studies by Staub and colleagues the mean and standard deviation were reported separately for the three periods in which the task was subdivided, so we made an average of the three blocks. However, in the Staub et al.’s (2014c) graph, confidence intervals (95%) were reported instead of standard deviations, so the standard deviation was obtained through the formula \( SD=\frac{ME}{t_{.025,n-1}}\times \sqrt{n} \) (ME = Error Margin; n = sample size; t_{0.025, n − 1}= critical value corresponding to an area of .025 in each tail for n-1 degrees of freedom). Also, in the Cassarino’s graph there were standard errors instead of standard deviations of RTs, so the latter were obtained through the formula (SE = standard error).

Posterror slowing (PES; in ms)

It is often quantified as the difference between the mean RTs on the trials immediately following a commission error on no-go trials and the mean RTs on the trials immediately following a correct no-go trial (Danielmeier & Ullsperger, 2011). Three articles (Jackson & Balota, 2012; McVay et al., 2013; Mioni et al., 2019), which included five substudies taken separately, were considered in the analysis of PES. In this case, we only considered the interaction results of the 2 × 2 analysis of variance (ANOVA), with no-go trial response (correct vs. incorrect) as the within-subjects factor and age group (younger vs. older) as the between-subjects factor on go RTs right after no-go trials. Importantly, raw RTs had to be transformed (i.e., into z-scores) to account for the age-related generalized slowing. Hence, one study (Staub et al., 2014c) was excluded because, although the authors reported data on PES, they did not apply any kind of transformation on RTs. Among the selected articles, two reported standardized RTs (zRTs) for this analysis (Jackson & Balota, 2012; McVay et al., 2013); for the other study (Mioni et al., 2019), the main author kindly provided us with the necessary data to perform this transformation. Therefore, RT for each go trial was first z-transformed for each subject by using this formula: z\( RT=\frac{RT- mean\ RT}{SD} \), where RT is the raw reaction time at a specific go trial, and mean RT and SD are the within-subjects mean and standard deviation of go RTs. Then, mean zRT after no-go trials was used as a dependent variable for the 2 × 2 ANOVA mentioned above, and the interaction result was considered for the analysis. Two older adults had to be excluded from this analysis, since they did not have any post-no-go error RTs available.

Accuracy on go trials

The proportion between correct go trials and total go trials. Eight articles (Carriere et al., 2010; Cassarino et al., 2019; Hsieh et al., 2016; Jackson et al., 2013; Jackson & Balota, 2012; McAvinue et al., 2012; McVay et al., 2013; Mioni et al., 2019), including a total of 13 substudies, were considered in the analysis of accuracy on go trials. Carriere et al. (2010), McAvinue et al. (2012) and Mioni et al. (2019) reported only the mean and the standard deviation of omission errors (i.e., failure to respond to go stimuli), so we calculated the mean proportion of errors by dividing the mean number of omissions by the total number of go trials, separately for younger and older adults. Then, the result was subtracted from 1, since the maximum value of the accuracy index is 1 and the accuracy is complementary to error. The standard deviation of accuracy was computed by dividing the standard deviation of omission errors by the total number of go trials. Hsieh et al. (2016) reported the mean and the standard deviation of omission errors in percentages. We obtained the complementary go accuracy percentage by subtracting the mean percentage of errors from 100, and subsequently the means and the standard deviations were obtained by dividing by 100. Then, Cassarino et al. (2019) reported only the median and interquartile range (IQR) of omission errors. Hence, the authors were contacted for these data and they provided us with the means and standard deviations of this variable. Then, the values of the variable were transformed into accuracy, as in previous studies.

Accuracy on no-go trials

Proportion between correct no-go trials and total no-go trials. Twelve articles (Brache et al., 2010; Carriere et al., 2010; Cassarino et al., 2019; Hsieh et al., 2016; Jackson et al., 2013; Jackson & Balota, 2012; Kousaie et al., 2014; McAvinue et al., 2012; McVay et al., 2013; Mioni et al., 2019; Staub et al., 2014c; Staub et al., 2015), which included 19 substudies altogether, were considered in the analysis of accuracy on no-go trials. The study by Brache et al. (2010) did not report the standard deviation of accuracy on no-go trials, and therefore the F-value was considered. Carriere et al. (2010), Kousaie et al. (2014), McAvinue et al. (2012), and Mioni et al. (2019) reported only the means and the standard deviations of commission errors (false alarms to no-go stimulus). Hence, the mean proportion of errors was calculated by dividing the mean number of commissions by the total number of no-go trials and the result was subtracted from 1, since the accuracy is complementary to error and its maximum value is 1. Then, the standard deviation of accuracy was calculated by dividing the standard deviation of commission errors by the total number of no-go trials. The studies by Staub et al. (2014c) and Staub et al. (2015) reported means and standard deviations of commission errors in percentages, and we obtained these data from the graphs shown in their articles (their Fig. 1, for both) with the WebPlotDigitizer program. Again, since these studies reported the values separately for the three periods of the task, we first averaged them. Then, the complementary value of the mean commission error percentage was calculated to obtain the mean no-go accuracy in percentage, and we finally divided it and the standard deviation by 100 to have the accuracy in proportion. Staub et al. (2014c) reported the confidence intervals (95%) instead of standard deviations in the graphs, so the latter were obtained from confidence intervals through the formula \( SD=\frac{ME}{t_{.025,n-1}}\times \sqrt{n} \). Also, Hsieh et al. (2016) reported means and standard deviations of no-go errors in percentage, so once again we calculated no-go accuracy as described above. Finally, Cassarino et al. (2019) reported only the median and IQR of commission errors, so the authors were contacted. They provided us with the means and standard deviations of this variable. Then, the accuracy was calculated as for previous studies.

Our study also aimed to investigate how performance changes over time in younger and older adults. However, a meta-analysis on this variable was not possible, since the minimum number of five studies was not reached. So, we will only descriptively review the results of the studies that reported block-wise performance for their experimental task.

Risk of bias in individual studies

Only studies with healthy participants—without any psychiatric or neurological disorders—were selected. In order to assess the quality of the included studies we used the Newcastle–Ottawa Scale (NOS), a tool developed to evaluate nonrandomized studies for systematic reviews (Wells et al., 2011), and more specifically we chose a version adapted for cross-sectional studies (Patra et al., 2015). Similar to the other steps, the scoring of the NOS was performed by two authors independently, and any mismatch was solved with the intervention of a third author to reach a consensus. Details on this scale can be found in Table 3.

Risk of bias across studies

The risk of publication bias across studies was assessed through funnel plots, provided by Meta-Essentials (Suurmond et al., 2017). In the absence of publication bias, the funnel should be symmetrical, so the studies should be equally distributed around the mean effect. With high risk of publication bias, some data are expected to be missing in the plot, leading to an asymmetrical funnel. However, this approach is limited by several factors: First of all, it is a largely subjective procedure, and in second instance there might be other causes of the funnel plot asymmetry besides publication bias (e.g., high heterogeneity among studies; Sterne et al., 2008). To partially circumvent this issue, Meta-Essentials includes a tool more specifically intended for publication bias, that is the “trim and fill” algorithm (Duval & Tweedie, 2000); this procedure imputes the potentially missing studies and calculates an unbiased estimate for the combined effect size.

Summary measures

The difference in the mean RTs on go correct trials, accuracy on go and no-go trials between younger and older adults and interaction effects of PES were used as the summary measures.

Synthesis of results

Four meta-analyses were performed on the SART in older and younger adults, by reporting subgroup values for each variable (RTs, PES, accuracy on go trials, accuracy on no-go trials). The two healthy subgroups were already combined in the original studies, in terms of means and standard deviations or F or t values. For each meta-analysis, the effect sizes of the individual studies and the combined effect size were estimated, reported in a forest plot, along with measures of heterogeneity (e.g., T), confidence and prediction intervals. Like the other “difference family” effect sizes (e.g., Cohen’s d, odds ratio), Hedges’ g is used to define the magnitude of a difference between or within groups (Van Rhee et al., 2015); this index, that applies for continuous data, is a standardized mean difference based upon a pooled and weighted standard deviation (Borenstein et al., 2009). Heterogeneity can be defined as the variation in the true effect sizes under a random-effects model, where it is assumed that each observed effect size estimates a different true effect (Borenstein et al., 2009). I² and T are the most indicative measures of heterogeneity, the former indicating the percentage of total variation across studies due to heterogeneity versus chance and the latter representing the estimated standard deviation of true effects, so the absolute value of heterogeneity. I² is typically interpreted as follows: 25% = low, 50% = moderate, and 75% = high (Higgins et al., 2003). The T value can instead be put in relation to the length of the prediction interval, which depends on it (see below for the definition of prediction interval; Borenstein et al., 2009). The confidence interval is a numerical range, centered on the point estimate of the parameter, that is likely to include the population parameter (e.g., the difference of the population means). The calculation of confidence intervals begins by setting the probability that the interval estimation does not include the parameter. Usually, 5% is accepted as the level of risk, so the confidence interval is 95% (Vaske, 2002). It is interpreted as the range that, if the parameter estimate was calculated repeatedly with different samples from the same population, it would contain the true population parameter in approximately 95% of the cases (Hoekstra et al., 2014). If the confidence interval for a difference between groups includes the zero, the result is not significant since it means that the true difference in the population might be null (Van Rhee et al., 2015). The prediction interval is based on the same (frequentist) logic, but it gives the range in which a future sampled data point might fall. Meta-Essentials calculates the prediction interval around the combined effect size, an estimate of how the true effects are distributed around the summary effect (under a random effects model; Van Rhee et al., 2015). Choosing a confidence level of 95%, the prediction interval gives the range in which the 95% of future effect sizes will fall, assuming that true effect sizes are normally distributed (Hak et al., 2016).