Inhibition refers to the ability to suppress or ignore ongoing irrelevant thoughts and actions to achieve current goals (Logan, 1985). Preserved inhibitory processes are assumed to play an important role in keeping attention focused on relevant information (e.g., attending to traffic while driving rather than becoming distracted by the news on the radio), and in avoiding highly overlearned but currently wrong action tendencies (e.g., suppressing the habit of driving on the right side of the road when entering the UK). Inhibition has been described as one of the key processes of executive functions (i.e., the collection of cognitive processes that enable goal-directed actions and complex cognitive processing; see Burgess, 1997; Miyake et al., 2000; Miyake & Friedman, 2012). Executive functions have been shown to develop late and decline first (e.g., Brainerd & Dempster, 1995; Craik & Bialystok, 2006; S.-C. Li & Baltes, 2006). The purpose of the present study was to conduct a meta-analysis on tasks assumed to measure inhibition to test whether inhibition declines in adult aging.

Hypothesis of an inhibition deficit in older age

In their seminal paper, Hasher and Zacks (1988) proposed that inhibition is specifically impaired with advanced adult age. That is, when performing a task, older adults are less able to overcome dominant responses or to ignore distracting information than young adults are. Hasher and Zacks proposed that this inhibition deficit explains, to a substantial extent, the age-related deficits observed in many cognitive tasks, such as simple and choice reaction-time tasks, working memory and episodic memory tests, tests of spatial and reasoning abilities, mental rotation, and visual search (see, e.g., Kausler, 1991; Salthouse, 1991).

During the past 30 years, a large number of studies tested the hypothesis of an inhibition deficit in older adults. Typically, these studies consisted of the comparison of two age groups (a younger sample and an older one) in one experimental task assumed to measure inhibition. So far, the results are mixed. An age-related inhibition deficit was found in some studies (e.g., Andrés, Guerrini, Phillips, & Perfect, 2008; Kramer, Humphrey, Larish, Logan, & Strayer, 1994), but not in others (e.g., Salthouse, 2010; Sebastian et al., 2013). More surprisingly, in a few studies, older adults were found to score better than young adults on measures of inhibition (Fernandez-Duque & Black, 2006; Madden & Gottlob, 1997).

This inconsistency might be the result of several factors. First, it is possible that in studies revealing no inhibition deficit, older adults with more preserved cognitive functioning were tested than in studies revealing an inhibition deficit (see Kramer et al., 1994, for similar conclusions). Second, different tasks were used to measure inhibition, and the method for each experimental task differed from study to study (see, e.g., Ludwig, Borella, Tettamanti, & de Ribaupierre, 2010). Third, as older adults typically suffer from a general slowing in processing speed (Salthouse, 1996; Salthouse & Babcock, 1991), most studies controlled for speed differences between younger and older adults. However, this was achieved by different methods (proportional scores, see, e.g., Langley, Vivas, Fuentes, & Bagne, 2005; natural logarithm, see, e.g., Van der Lubbe & Verleger, 2002; hierarchical regression, see, e.g., Bugg, DeLosh, Davalos, & Davis, 2007). Taken together, these factors could explain why different studies with different participants and tasks result in inconsistent findings regarding the hypothesis of an inhibition deficit in older adults. The aim of the present study was to conduct a meta-analysis to arrive at a summary of the existing evidence, averaging out the idiosyncrasies of individual studies.

The meta-analytic approach

The meta-analytic approach has at least three advantages (see Cumming, 2013; Verhaeghen, 2014, for more details about the advantages and biases in conducting meta-analyses). First, it uses the data from all available studies performed so far. Second, it condenses information inherent in each study and pools data from disparate studies so that the differences in participant samples and in methodology have less impact. Third, it results in global finding as the basis for broad conclusions.

The meta-analytic approach was used by Verhaeghen and colleagues for assessing age differences in individual tasks measuring executive control (Verhaeghen, 2011, 2014; Verhaeghen & Cerella, 2002; Verhaeghen & De Meersman, 1998a, 1998b; Verhaeghen, Steitz, Sliwinski, & Cerella, 2003; Wasylyshyn, Verhaeghen, & Sliwinski, 2011). Table 1 provides a summary of these meta-analyses. As pointed out by Verhaeghen and colleagues, investigating age-related deficits of executive control in a meta-analysis requires a different approach than analyzing mean reaction times (RTs) and/or the effect sizes based on these RTs. Comparing RT performance between younger and older adults can be problematic because older adults’ RTs are slower than young adults’ RTs by a constant proportion (Cerella, 1994; Cerella & Hale, 1994; Myerson, Hale, Wagstaff, Poon, & Smith, 1990; Verhaeghen & Cerella, 2002). For example, absolute RT differences between a baseline and an experimental condition are larger for older adults simply because their RTs in both conditions are proportionally slower than those of younger adults. For example, if younger adults take 700 ms to respond to the baseline condition and 1,000 ms to respond to the experimental condition, older adults would take—according to the proportional slowing of 1.5—1,050 and 1,500 ms, respectively. Thus, the difference between baseline and experimental conditions would be 300 ms for young adults but 450 ms for older adults, although the underlying age effect (a proportional slowing of 1.5) is identical for both conditions. This could lead to the erroneous impression of an age-related deficit where in fact age differences are merely caused by a general proportional slowing of all RTs (Verhaeghen, 2011, 2014).

Table 1 Overview of the executive-control tasks used in the previous meta-analyses and their results (for all tasks, the dependent measure was reaction time; RT)

To avoid this misinterpretation, we followed the meta-analytic approach used by Verhaeghen and colleagues (e.g., Verhaeghen, 2011, 2014; Verhaeghen & Cerella, 2002; Wasylyshyn et al., 2011; see also Cerella, 1994; Myerson et al., 1990). That is, we conducted Brinley and state-trace analyses. In these analyses, performance in baseline and experimental conditions are displayed in plots separately for younger and older adults. Then, the goal is to determine whether the data displayed in the plots can be reliably fitted with a single line or two different lines. If two lines are necessary, it would imply an age-related inhibition deficit. In contrast, if a single line is sufficient to explain the data, it would imply no age-related deficit. To determine statistically whether one or two lines are necessary, we used multilevel modeling and Bayesian hypothesis testing.

The diversity of inhibition

To test the hypothesis of an inhibition deficit in older age, we selected tasks assumed to measure inhibition based on three criteria. First, we opted for tasks for which there is broad (though not necessarily general) agreement that inhibition plays a role. Second, we selected tasks commonly assessed in experimental psychology and individual differences studies. Third, we opted for tasks for which newer studies were available (i.e., the color Stroop and flanker tasks) or for which no meta-analysis has been performed (i.e., the Simon, global–local, stop-signal, go/no-go, positive and negative compatibility tasks, as well as the task assessing n-2 repetition costs). A description of the tasks used in the present study is given in Table 2.

Table 2 Overview of the tasks selected for the present meta-analysis

One advantage of including so many tasks was to determine whether all tasks or only some are prone to an age-related inhibition deficit. Thus, if older adults are impaired in all tasks, this would suggest that the inhibition deficit in adult aging is rather general. In contrast, it is possible that older adults are impaired in some tasks but not in others. In this case, it would be interesting to determine whether this differential inhibition deficit corresponds to any of the taxonomies of inhibition proposed so far (e.g., Chuderski, Taraday, Nęcka, & Smoleń, 2012; Cyders & Coskunpinar, 2011; Dempster, 1993; Friedman & Miyake, 2004; Harnishfeger, 1995; Hasher, Lustig, & Zacks, 2007; Nigg, 2000; Pettigrew & Martin, 2014; Stahl et al., 2014). Following the fractionations of inhibition in individual differences studies (Chuderski et al., 2012; Friedman & Miyake, 2004; Pettigrew & Martin, 2014; Rey-Mermet, Gade, & Oberauer, in press; Stahl et al., 2014), we considered three forms of inhibitory processes. The first form of inhibition is the ability to ignore distracting information. This form of inhibition can be measured with the flanker and positive compatibility tasks as well as the task assessing n-2 repetition costs (see Friedman & Miyake, 2004; Rey-Mermet et al., in press). The second form of inhibition is the ability to suppress dominant responses and is measured with the go/no-go or stop-signal tasks (Stahl et al., 2014). The third form of inhibition is the ability to ignore response interference and can be measured with the color Stroop, Simon, global–local, or negative compatibility tasks (see Friedman & Miyake, 2004; Rey-Mermet et al., in press). In contrast to previous meta-analyses,Footnote 1 the present meta-analysis includes different tasks for each form of inhibition (see Table 2). Thus, to assume an age-related deficit in one form of inhibition, we would expect to find an age-related deficit across several tasks assessing this form of inhibition. However, it is also possible that although older adults are impaired in some tasks but not in others, this differential inhibition deficit did not correspond to this taxonomy (or any others), because the tasks used to assess inhibition do not measure a common underlying process, but the highly task-specific ability to resolve the interference arising in that task (see Rey-Mermet et al., in press).

The present study

The present study had two purposes. First, we intended to test the hypothesis of an inhibition deficit in older age (Hasher & Zacks, 1988) across a broad range of inhibition tasks to arrive at a summary of the existing evidence. Second, in case an age-related deficit could be identified, we aimed to determine whether this deficit can be accommodated with the taxonomy of inhibition proposed so far in individual differences research. To this end, we selected several tasks, for most of which no meta-analysis has been performed (i.e., the Simon, global–local, stop-signal, go/no-go, positive and negative compatibility tasks, as well as the task assessing n-2 repetition costs). Only the Stroop and flanker tasks were already analyzed in previous meta-analyses (Verhaeghen, 2011, 2014; Verhaeghen & Cerella, 2002; Verhaeghen & De Meersman, 1998b). As we found more individual studies for both tasks, we included them in the present meta-analysis to keep track with ongoing research (Cumming, 2013). Moreover, we considered the color Stroop task and the color-word Stroop test as two different tasks because of their differences in methodological requirements (see Ludwig et al., 2010). Most critically, we also complemented the Brinley and state-trace analyses by using a Bayesian hypothesis testing approach, which allowed us to provide not only evidence for an age-related deficit but also evidence for the absence of such a deficit.

Method

Sample of studies

Articles were collected using the PsycINFO, PsycARTICLES, PsycBOOKS, and PSYNDEX electronic databases, through personal contacts, and by checking references found in the articles already included. For the search in the electronic databases, we used as keywords the name of each task and its effect (i.e., Stroop task/Stroop effect, flanker task/flanker effect, Simon task/Simon effect, stop-signal task, go/no-go task, global–local task, negative compatibility task/backward masking, n-2 repetition cost/backward inhibition) in combination with the words old adults, older adults, elderly, aging, age, or life span. Following Verhaeghen’s (2014) approach, we defined “study” as an independent sample of participants, and thus a single article might contain multiple studies (i.e., a series of experiments performed on different groups of participants, or multiple between-subjects conditions conducted within the same experiment, e.g., a group with color as a cue and a group with location as a cue). Moreover, a study may include different within-subject conditions (e.g., a short and long stimulus onset asynchrony; SOA).

The search was concluded in September 2015. This resulted in 252 articles (see Table A1 and Table A2 in the Supplementary Materials). Inclusion criteria were as follows: (a) The study included at least one sample of younger adults (mean age between 18 and 30 years) and one sample of older adults (mean age between 60 and 80 years); (b) the study contained one of the tasks typically used to measure inhibition; and (c) the study included a baseline condition and an experimental condition in which inhibition is expected to occur. If several relevant age groups were tested in a study, the data were averaged across these age groups by weighting the mean with the sample size. For example, if a study included 10 older adults between ages 60 and 70 years, and 15 older adults between age 70 and 80 years, a weighted mean was computed with the following equation:

(mean performance group 60–70 × 10 + mean performance group 70–80 × 15) / (10 + 15).

If some (but not all) participants were re-tested in a second experiment, the data from the second experiment were removed to avoid mixing variances within and between the studies (see Bruyer & Scailquin, 2000; Nielson et al., 2004; Schlaghecken, Birak, & Maylor, 2011, for such cases). Finally, if some information was not present in the articles, authors were contacted per e-mail. This procedure resulted in 196 studies contained in 121 published articles (see Table A1 and Table A2 in the Supplementary Materials for the studies included in the present meta-analysis, and for those excluded from the present meta-analysis, respecitvely).Footnote 2

Tasks used in the present meta-analysis

In the present study, we computed a meta-analysis on the color Stroop, flanker, Simon, global–local, positive compatibility, negative compatibility, stop-signal, and go/no-go tasks, as well as the paradigm assessing n-2 repetition costs in task switching. Even if Verhaeghen and colleagues already did a meta-analysis on the Stroop and flanker tasks (Verhaeghen & De Meersman, 1998b; and Verhaeghen & Cerella, 2002, respectively), we included both tasks in the present study because we collected more studies and thus we could keep track with ongoing research (Cumming, 2013). Moreover, we considered the color Stroop task and the color-word Stroop test as two different paradigms due to their differences in methodological requirements (see Ludwig et al., 2010). To our knowledge, no meta-analysis has been performed on the remaining tasks we selected.

As can been seen in Table 2, each task consisted of interference and baseline trials. Interference trials are those trials that involve a conflict between relevant and irrelevant information. Inhibition is required in these trials to ignore or suppress the irrelevant information that creates the conflict. Baseline trials are those trials without conflict between information or with only one response-relevant feature.

Data analysis

For each study in each task, raw data consisted of the mean reaction times (RTs) and mean error rates for the interference and baseline trials in the young and older age group (see Table A1 for all raw data). Two types of dependent measures were analyzed (see Table 3): main dependent measures (i.e., the typical measures for the task, such as the RTs for the color Stroop task) and secondary dependent measures (i.e., further measures such as the error rates for the color Stroop task). Furthermore, we performed the analyses on raw data and zero-centered data (i.e., data in which the mean was subtracted from every value). The results on zero-centered data are only referred to when diverging from the results on raw data. All results can be found on https://osf.io/fthku/. In the present study, we focused on age effects within and across interference and baseline trials for each task by conducting a Brinley analysis and a state-trace analysis.

Table 3 Type of dependent measures for each task

Brinley analysis

In the Brinley analysis (e.g., Cerella, 1994; Cerella & Hale, 1994), a Brinley plot is displayed in which the average performance of younger adults is plotted on the x-axis, and the average performance of older adults is plotted on the y-axis. In this scatterplot, each trial type (interference and baseline) of each study yields a data point. The goal of the Brinley analysis is to determine whether the data displayed in the Brinley plot can be reliably fitted with a single line or two different lines. If two lines were necessary (i.e., one for the interference trials and one for the baseline trials), it would imply an age-related deficit in the interference trials compared to the baseline trials. This would support the conclusion of an age-related inhibition deficit. In contrast, if a single line was sufficient to explain the data, then it would imply that the age-related deficit was comparable in the interference and baseline trials. This would support the conclusion of no inhibition deficit in older age.

To determine statistically whether one or two regression lines were necessary, we used a multilevel modeling approach. To account for within-study and between-studies variances, we computed a multilevel regression model with random intercept and slope. This was implemented in R with the afex package (Singmann et al., 2016) using the following equation:

$$ {\displaystyle \begin{array}{l}{RT}_{\mathrm{older},\mathrm{ij}}={\beta}_0+{\beta}_1\ trial+{\beta}_2\ {RT}_{\mathrm{young},\mathrm{ij}}+{\beta}_3\ trial\ x\ {RT}_{\mathrm{young}\ \mathrm{ij}}\\ {}+\left({b}_{0i}+{b}_{1i}\ {RT}_{\mathrm{young},\mathrm{ij}}+{\varepsilon}_{\mathrm{ij}}\right)\end{array}} $$

where RT older,ij is the average response time of older adults from the condition j in study i, RT young,ij is the average response time of younger adults from the condition j in study i, trial is a dummy variable in which baseline and interference were coded with 0 and 1, respectively, β0 is the intercept, β1 is the effect of trial (interference vs. baseline) on the intercept, β2 is the slope relating older to young adults, β3 is the effect of trial on the slope, b 0i is the random intercept for study i, b 1i is the random slope for study i, and ε ij is the residual for the condition j in study i.

As we were interested in the effect of trial type (i.e., interference vs. baseline) on the relation between younger and older adults’ performance, the primary focus was on the interaction term. Therefore, we compared the full model with a restricted model in which the interaction term was removed. Model selection was evaluated via multiple fit indices: the pseudo-R 2 for generalized linear mixed models with random slopes (here, we specifically focused on the marginal coefficient because this is assumed to express how much variance is explained by the fixed factors; Johnson, 2014), the Akaike information criterion (AIC), the Bayesian information criterion (BIC), and the deviance (= −2 × logarithmized likelihood). Except for pseudo-R 2, smaller indices indicate better fit.

To examine if one model (restricted vs. full) fit the data reliably better than another, we performed two analyses. First, we conducted χ2 difference (Δχ2) tests on nested models. If the more complex model (i.e., the model with more free parameters) yields a reduction in χ2 that is significant given the loss of degrees of freedom, it is accepted as having better fit. Second, we performed a Bayesian hypothesis test using the BIC approximation (Wagenmakers, 2007). That is, we used the difference between the BIC for the null hypothesis (i.e., the restricted model) and the BIC for the alternative hypothesis (i.e., the full model) in order to compute a Bayes factor in favor of the null hypothesis (BF01). The Bayes factor in favor the alternative hypothesis (BF10) was computed as 1/BF01. Following Raftery (1995) classification scheme, we considered a BF between 1 and 3 as weak evidence, between 3 and 20 as positive evidence, between 20 and 150 as strong evidence, and larger than 150 as very strong evidence for the null hypothesis. The advantage of using Bayesian hypothesis testing in addition to the more standard Δχ2 test was that we could assess the strength of evidence not only for the alternative but also for the null hypothesis. Thus, if Δχ2 test was not significant and the BF01 constituted positive to very strong evidence, it would imply that the restricted model had a better fit than the full model. Thus, only one line would be sufficient to account for the data in the Brinley plot, which indicates the absence of age-related deficits. If Δχ2 test was not significant but the BF01 and BF10 constituted weak evidence, we concluded that the evidence was too weak to draw any firm conclusions. Thus, more data points (in our case, more studies) would be necessary. In contrast, if Δχ2 test was significant and the BF10 constituted positive to very strong evidence, it would imply that the full model had a better fit. In this case, we examined the estimates of the fixed parameters, in particular the estimate of the interaction term. Only if this estimate was significant, we concluded that two lines—one for the interference trials and one for the baseline trials—would be necessary to account for the data in the Brinley plot, which thus would indicate age-related deficits.

State-trace analysis

In addition to the Brinley plot, we computed state-trace analyses to determine the type of cognitive processes added within and across age groups in the interference trials compared to the baseline trials (see Prince, Brown, & Heathcote, 2012, for an overview on the state-trace analysis). That is, we displayed a state-trace plot in which the average performance of the baseline trials is plotted on the x-axis and the average performance of the more complex trials (here, the interference trials) is plotted on the y-axis. In this scatterplot, each age group (young and older) of each study yields a data point. The goal of the state-trace analysis was to investigate whether the data displayed in the state-trace plot can be explained with one single line or two lines (one for younger adults, one for older adults). If a single line fit the data, it would imply that there was no age difference in the relation between the interference and baseline trials. If two lines were necessary, the interpretation would depend on the pattern that emerged from the two lines (see Verhaeghen, 2014; Verhaeghen & Cerella, 2002). If both lines were parallel to the diagonal but differ in elevation, it would imply that the interference trials add a constant cost to the baseline trials. In this case, the intercept would be significantly larger than zero, and the effect of the interference trials compared to the baseline trials is considered as additive. That is, an extra processing stage is added to the baseline, or one existing processing stage is prolonged. In contrast, if both lines diverge with a slope larger than 1, the effect of the interference trials is considered as multiplicative. That is, each processing stage from the baseline trials is inflated in the interference trials. We computed the state-trace analysis for the tasks in which additive or multiplicative cognitive processes have been assumed to occur in the interference trials compared to the baseline trials (i.e., the color Stroop, flanker, Simon, global–local, positive and negative compatibility tasks, as well as n-2 repetition costs). In the go/no-go and stop-signal tasks, performance is modeled as a race between a go process, which is triggered by the presentation of the go stimulus, and a stop process, which is triggered by the presentation of the stop signal or the no-go trial (Logan, 1994; Verbruggen & Logan, 2008). When the stop process finishes before the go process, the response is inhibited; when the go process finishes before the stop process, the response is executed. Thus, no additive or multiplicative processes have been assumed. For this reason, no state-trace analysis was performed for the go/no-go and stop-signal tasks.

We statistically determined whether one or two regression lines were necessary in the state-trace plot by computing the following multilevel regression model:

$$ {\displaystyle \begin{array}{l}{RT}_{\mathrm{interference},\mathrm{ij}}={\beta}_0+{\beta}_1\ age\ group+{\beta}_2\ {RT}_{\mathrm{baseline},\mathrm{ij}}+{\beta}_3\ age\ group\ x\ {RT}_{\mathrm{baseline},\mathrm{ij}}\\ {}+\left({b}_{0i}+{b}_{1i}\ {RT}_{\mathrm{baseline},\mathrm{ij}}+{\varepsilon}_{\mathrm{ij}}\right)\end{array}} $$

where RT interference,ij is the average response time of the interference trials from the condition j in study i, RT baseline,ij is the average response time of the baseline trials from the condition j in study i, age group is a dummy variable in which young and older age groups were coded with 0 and 1, respectively, β0 is the intercept, β1 is the effect of age group (young vs. older) on the intercept, β2 is the slope relating the interference to the baseline trials, β3 is the effect of age group on the slope, b 0i is the random intercept for study i, b 1i is the random slope for study i, and ε ij is the residual for the condition j in study i.

As for the Brinley analysis, we compared this full model with a restricted model in which the interaction term was removed. Model selection was carried out in the same way as described for the Brinley analysis.

Results

Main dependent measures

The main dependent measure for each task is described in Table 3 (left part). Figure 1 depicts the Brinley plots (left part) and state-trace plots (right part) for each task. Model fits are presented in Table 4. Estimates of the fixed parameters for the full model are presented in Table 5.

Fig. 1
figure 1figure 1figure 1figure 1figure 1

Main dependent measures (i.e., the reaction times for the color Stroop, flanker, Simon, global, local, positive and negative compatibility tasks, the color-word Stroop test and for the n-2 repetition cost, the stop-signal reaction time for the stop-signal task, and the error rates for the go/no-go task). For the Brinley analysis solid line results from the multilevel modeling analyses using the baseline trials only; dotted line results from the multilevel modeling analyses using the interference trials only. For the state-trace analysis, solid line results from the multilevel modeling analyses using the young adults only; dotted line results from the multilevel modeling analyses using the older adults only. For all plots, the diagonal is indicated by a dashed line. RT = reaction time. As studies including the color-word Stroop tests had different number of items per card, we compared them by dividing the completion time of each card by the number of items presented on each card. No state-trace analysis was performed for the stop-signal and go/no-go tasks because neither an additional process nor a prolongation of processing is expected to occur in the interference trials relative to the baseline trials

Table 4 Goodness-of-fit statistics and model comparison results for the main dependent measures of each task (i.e., the reaction times for the color Stroop task, the color-word Stroop test, the flanker task, the Simon task, the global task, the local task, the positive and negative compatibility tasks and the n-2 repetition cost, the stop-signal reaction time for the stop-signal task, and the error rates for the go/no-go task)
Table 5 Estimates of the fixed parameters from the full model for the main dependent measures (i.e., the reaction times for the color Stroop task, the color-word Stroop test, the flanker task, the Simon task, the global task, the local task, the positive and negative compatibility tasks and the n-2 repetition cost, the stop-signal reaction time for the stop-signal task, and the error rates for the go/no-go task)

As shown in Table 4, the Brinley and state-trace analyses showed a better model fit for the restricted model than for the full model in the color Stroop task, the color-word Stroop test, the flanker task, the local task, and the n-2 repetition costs. Thus, for these tasks, the results speak against an inhibition deficit in older age. For the global task, Brinley and state-trace analyses on raw data also revealed a better fit for the restricted model than for the full model. However, on zero-centered data, the evidence from the Brinley analysis was weak. For the positive compatibility task, the state-trace analysis suggested a better fit for the restricted model, but the evidence from the Brinley analysis was weak. For the negative compatibility task, both the Brinley and state-trace analyses revealed too weak evidence to decide between the full and restricted models. Therefore, for these tasks (i.e., the global, positive, and negative compatibility tasks), more studies seem necessary to draw conclusions. Finally, a better fit was found for the full model compared to the restricted model for the Simon, stop-signal, and go/no-go tasks. Moreover, in all three tasks, the estimates of the interaction parameter were significant (see Table 5). Thus, as depicted in Figure 1, data were better fitted with two lines (i.e., one for the interference trials and one for the baseline trials for the Brinley plot; and one for the young age group and one for the older age group for the state-trace analysis).Footnote 3

A closer inspection of Figure 1 revealed, however, that older adults have very long RTs (more than 2,000 ms) in one study for the Simon task and in three studies for the color-word Stroop test. To test whether these outliers affect the results, we removed mean RTs larger than 2,000 ms (see Verhaeghen, 2014, for the same exclusion criterion) and we reanalyzed the data. A similar pattern of results emerged for the color-word Stroop test. In contrast, for the Simon task, the results differed. That is, the Bayesian hypothesis testing in both Brinley and state-trace analyses revealed weak evidence for the models (see Table 4). Therefore, this suggests that the age-related inhibition deficit previously observed in the Simon task was only caused by one outlier study and that more research is necessary to determine whether an inhibition deficit in older adults can be observed in the Simon task.Footnote 4 Together, the present results support the assumption of an age-related deficit for the stop-signal and go/no-go tasks only.

Secondary dependent measures

The secondary dependent measure for each task is described in Table 3 (right part). Figure A1 in the Supplementary Materials depicts the Brinley plots (left part) and state-trace plots (right part) for each task. Model fits are presented in Table A3 in the Supplementary Materials. Estimates of the fixed parameters for the full model are presented in Table A4 in the Supplementary Materials.

As shown in Table A3, the state-trace analysis for the color Stroop task suggested a better fit for the restricted model, but the evidence from the Brinley analysis was weak. For the flanker task, the Simon task and the n-2 repetition cost, the Brinley analysis suggested a better fit for the restricted model, but the evidence from the state-trace analysis was weak. For the global, local and positive compatibility tasks, both the Brinley and state-trace analyses revealed too weak evidence to decide between the full and restricted models. Therefore, for all these tasks (i.e., the color Stroop, flanker, Simon, global, local and positive compatibility tasks, as well as the n-2 repetition cost), more studies seem necessary to draw conclusions. Finally, a better fit was found for the full model compared to the restricted model for the negative compatibility task. Moreover, in this task, the estimates of the interaction parameter were significant in both analyses (see Table A4). Thus, as depicted in Figure A1, data were better fitted with two lines (i.e., one for the interference trials and one for the baseline trials for the Brinley plot; and one for the young age group and one for the older age group for the state-trace analysis). Thus, the analyses on error rates revealed a negative compatibility effect (i.e., larger error rates for congruent trials than for incongruent trials) for the young adults, but no such effect for the older adults. According to the explanation underlying the negative compatibility effect (Schlaghecken et al., 2011; Schlaghecken & Maylor, 2005), this speaks for an inhibition deficit for older adults in the error rates. However, this conclusion should be interpreted with caution, as so far only three studies have been conducted with the negative compatibility task (Rey-Mermet et al., in press; Schlaghecken et al., 2011; Schlaghecken & Maylor, 2005).

General discussion

The purpose of the present study was to conduct a meta-analysis in order to test the hypothesis of an inhibition deficit in older age. According to this hypothesis, older adults are less able to suppress dominant and well-learned responses and/or to ignore irrelevant information than younger adults are (e.g., Hasher & Zacks, 1988). To this end, we conducted a quantitative literature review including 11 tasks typically assumed to measure inhibition in experimental as well as individual differences studies (i.e., the color Stroop task, the color-word Stroop test, the flanker task, the Simon task, the global–local task, the positive and negative compatibility tasks, the paradigm assessing n-2 repetition costs in task switching, the stop-signal task, as well as the go/no-go task). A multilevel modeling approach in addition to a Bayesian hypothesis testing was used to conduct Brinley and state-trace analyses. We opted for a Bayesian hypothesis testing in addition to the standard null-hypothesis significance testing because this approach allows us not only to report in each task the presence of an effect (i.e., an inhibition deficit in older age) but also to provide clear statistical support for the absence of this effect (i.e., no inhibition deficit in older age). The Bayesian approach also enables us to perform the analyses even with the small number of studies per task. Moreover, it emphasizes which task remains underresearched and clearly requires more research to allow firm conclusions.

An overview of the results is presented in Table 6. When the typical dependent measures were used (i.e., the stop-signal reaction time for the stop-signal task, the error rates for the go/no-go task and the RTs for all remaining tasks), the results of the present meta-analysis showed an age-related deficit for the stop-signal and go/no-go tasks, but they speak against such a deficit for the color Stroop task, the color-word Stroop test, the flanker task, the local task, and for the n-2 repetition costs. Moreover, for the Simon, global, and positive and negative compatibility tasks, the present findings suggest that more research is necessary to draw a conclusion about whether older adults showed impaired inhibition in these tasks.

Table 6 Summary of the results

Together, these results indicate that no general inhibition deficit was observed in older age. Moreover, they are to some extend in line with the taxonomies of inhibitory processes presented to account for the construct of inhibition (Chuderski et al., 2012; Friedman & Miyake, 2004; Pettigrew & Martin, 2014; Stahl et al., 2014; but see Rey-Mermet et al., in press). According to these taxonomies, the color Stroop, local, and flanker tasks as well as the n-2 repetition costs measure the ability to ignore distracting information and response interference. As the present findings showed no age-related deficit in these tasks, this indicates that older adults can ignore distracting information and response interference as well as young adults. In contrast, an age-related deficit was observed for the go/no-go and stop-signal tasks. As these tasks were associated to the ability to suppress dominant responses (e.g., Chuderski et al., 2012; Stahl et al., 2014), these results support the view that older adults are less able to suppress dominant responses than young adults. Together, the present results are in line with the taxonomy separating the ability to suppress dominant responses from the ability to ignore distracting information and response interference (see Stahl et al., 2014).

Alternatively, it might be also emphasized that in the stop-signal and go/no-go tasks, two task sets (one to perform the response on go trial and one to inhibit the response on stop/no-go trial) have to be maintained and coordinated in order to perform correctly (e.g., Logan, 1994). That is, participants have to maintain the information about the no-go/stop-signal trials while performing the task. Therefore, it is possible that the age-related deficits found in the go/no-go and stop-signal tasks would not result from a decline in the ability to suppress dominant responses but rather from the fact that older adults are less able to maintain and coordinate no-go/stop-signal information than young adults are (see Hsieh & Lin, 2017). This account is in line with the results showing age-related deficits in the dual-task performance and global switch costs (see Table 1) and with the interpretation that older adults are less able to maintain and coordinate information under conditions requiring divided attention (Verhaeghen, 2011, 2014; Verhaeghen et al., 2003; Wasylyshyn et al., 2011; see also Kramer & Kray, 2006).

Nevertheless, before throwing away the hypothesis of an inhibition deficit in adult aging, it may be necessary to ask to what extent the simple interference effects measured, for example, in the Stroop, Simon, or flanker tasks are adequate to assess inhibition. Previous research has put forward that inhibitory processes are also involved in more complex manipulations, such as in manipulating whether the previous trial is an interference or a baseline trial (called the Gratton effect or congruency sequence effect; see, e.g., Braem, Abrahamse, Duthoo, & Notebaert, 2014; Egner, 2007) or in increasing the number of interference trials relative to baseline trials (called proportion congruency effect; see, e.g., Gratton, Coles, & Donchin, 1992). So far, only a few studies have investigated these more complex effects in aging (Bugg, 2014a, 2014b; Mutter, Naylor, & Patterson, 2005; Puccioni & Vallesi, 2012; Trewartha, Penhune, & Li, 2011; West & Baylis, 1998; West & Moore, 2005; Yoshizaki, Kuratomi, Kimura, & Kato, 2013). The results were inconsistent. Whereas some studies found an age-related change (Bugg, 2014a; Trewartha et al., 2011; West & Baylis, 1998), other studies reported no change (Bugg, 2014b; Mutter et al., 2005; Puccioni & Vallesi, 2012; West & Moore, 2005; Yoshizaki et al., 2013). Moreover, it would be interesting to broaden the spectrum of these manipulations in tasks such as the go/no-go task or the stop-signal task in order to determine whether the inhibition induced by these manipulations is generally sensitive to aging (see, e.g., Hsieh, Wu, & Tang, 2016). Together, this emphasizes the necessity of constructing paradigms in which inhibition is more properly assessed, and thus of using these paradigms to test the hypothesis of an inhibition deficit in older age.

In addition to using RTs as dependent measures, we also analyzed the error rates for the color Stroop, flanker, Simon, global, local, positive and negative compatibility tasks, as well as for the n-2 repetition costs. Overall, the results on the error rates emphasize the necessity of further research to decide whether an age-related deficit occurs in this dependent measure. However, these analyses may be problematic for at least three reasons. First, some studies did not record or report the error rates separately for baseline and interference trials in both age groups, which reduces the number of studies in each task (see Table 6 for a comparison of the number of studies for both types of dependent measures). Second, there were some floor effects in error rates (see Figure A1 in the Supplementary Materials). Especially, young adults typically show very low error rates in baseline trials, which makes it difficult to interpret the analyses. Third, errors are noisier because they result from different processes, such as anticipatory responses or responses based on irrelevant information (see, e.g., Maier & Steinhauser, 2013; Maier, Yeung, & Steinhauser, 2011). Thus, error rates may involve more than inhibitory processes. Despite these limitations, not analyzing the error rates may also be problematic because it neglects some potential speed–accuracy trade-offs or compensatory processes (such as the adoption of a more cautious response style by older adults; see, e.g., Rabbitt, 1979; Salthouse, 1979; Smith & Brewer, 1995; Starns & Ratcliff, 2010). However, these problems could be solved if all raw data of each trial in each individual study were available. This should encourage us to more frequently share our research data publicly (see, e.g., Morey et al., 2016; Vanpaemel, Vermorgen, Deriemaecker, & Storms, 2015).

To conclude, the purpose of the present meta-analysis was to arrive at a summary of the existing evidence to establish some order in the discrepancy of findings regarding the hypothesis of an inhibition deficit in older adults. The results demonstrate that for most tasks (i.e., the color Stroop task, the color-word Stroop test, the flanker task, the global task, as well as the task assessing n-2 repetition costs), no inhibition deficit in older age was observed. For other tasks (i.e., the Simon, global, positive and negative compatibility tasks), more research is necessary to decide whether such a deficit occurs. In only two tasks (i.e., the stop-signal and go/no-go tasks), an age-related deficit was found. However, this deficit might be explained not only by a decline of inhibiting dominant responses but also by a decline of updating and coordinating information. Together, these findings challenge the notion of a general inhibition deficit in older age.

Author note

Alodie Rey-Mermet, Department of Psychology, Cognitive Psychology Unit, University of Zurich, Switzerland, University Research Priority Program (URPP) “Dynamics of Healthy Aging,” University of Zurich, Switzerland, and Department of Psychology, General Psychology, Catholic University of Eichstätt-Ingolstadt, Germany; Miriam Gade, Department of Psychology, Cognitive Psychology Unit, University of Zurich, Switzerland, University Research Priority Program (URPP) “Dynamics of Healthy Aging,” University of Zurich, Switzerland, and Department of Psychology, General Psychology, Catholic University of Eichstätt-Ingolstadt, Germany.

This work was supported by the University Research Priority Program (URPP) “Dynamics of Healthy Aging,” University of Zurich, and by a grant from the Velux Foundation. We thank Lilian Huber, Julia Moser, Jan Nussbaumer, and Lena Pisarzewski, for their help in data collection. We would like to thank Henrik Singmann, Klaus Oberauer, and Paul Verhaeghen for helpful comments and hints. Finally, we indebted to all authors we contacted per e-mail. The data and analysis scripts used in this work can be found at https://osf.io/fthku/.