Introduction

Identifying and understanding causal risk factors for crime over the life-course is a key area of inquiry in developmental and life-course criminology [14]. Prospective longitudinal studies provide valuable information regarding age-related trajectories of criminal offending, which is crucial not only for informing theories of crime etiology but also for developing interventions aimed at reducing criminal behavior (e.g., [57]). Findings from any individual study will rarely provide sufficient evidence for policy or practice recommendations. Even the most well-designed study will be subject to sampling error [8]. Meta-analyses that synthesize findings from multiple studies can provide more compelling evidence than that from a single study, given their greater statistical power to detect effects, generalizability across settings, and ability to systematically appraise conflicting findings across studies. Meta-analyses are therefore uniquely situated to guide the field of developmental and life-course criminology by assisting in the identification of risk factors that are promising targets for intervention and prevention, and identifying effective crime prevention programs. The Campbell Collaboration (www.campbellcollaboration.org), for instance, is an organization committed to producing and disseminating systematic reviews and meta-analyses to identify effective programs in the fields of criminology and other social sciences.

Complex data structures are common in meta-analyses of criminological research, whereby primary studies provide multiple, statistically dependent effect size estimates (i.e., effect size estimates with correlated error terms or correlated effect size parameters). For instance, the guiding example used in this tutorial is a meta-analysis where the effect size of interest is the correlation indexing the relationship between school motivation/attitudes and subsequent delinquency or criminal behavior. The prospective longitudinal studies that were included in this meta-analysis often reported multiple effect sizes (i.e., correlation coefficients) of interest, given the numerous ways in which school motivation/attitudes variables could be operationalized (e.g., academic aspirations, academic self-efficacy) as well as the numerous ways in which crime/delinquency could be operationalized (e.g., property crime, violent crime). In their meta-analysis of correlational research examining the association between school bullying and later aggression/violence, Ttofi et al. [9] also encountered dependent effect sizes, such that primary studies often reported multiple measures of bullying or aggression by informant (e.g., self- vs. peer-rated).

Complex data structures are also common in meta-analyses of intervention effectiveness research. For instance, in a Campbell Collaboration review and meta-analysis examining the effectiveness of drug courts, Mitchell and colleagues [10] found that evaluation studies reported multiple odds ratio effect sizes indexing the effects of drug courts on multiple indicators of recidivism (e.g., referrals vs. convictions, general vs. specific offense types). In another meta-analysis examining the effectiveness of self-control interventions for reducing delinquency, Piquero, Jennings, and Farrington [2] reported that evaluations often included more than one intervention condition and/or comparison condition, thus yielding a set of statistically dependent effect size estimates. And in their meta-analysis synthesizing research on the effects of “pulling levers” focused deterrence strategies, Braga and Weisburd [11] also encountered dependent effect sizes such that evaluations often reported multiple indicators of crime (e.g., homicides, shots fired, gun assaults, youth gun assaults).

Statistically dependent effect sizes would ideally be synthesized via multivariate meta-analysis methods, which permit the inclusion of dependent effect sizes in a meta-analysis by fully modeling those dependencies [1215]. Multivariate meta-analysis is rarely used, however, because it assumes knowledge of the underlying covariance structure among effect sizes (i.e., the covariance between effect size estimates within a study), which is seldom reported in primary studies and thus often unknown to the meta-analyst. Historically, most meta-analysts have consequently used data processing and selection techniques for handling dependent effect sizes. This might include, for instance, selecting one effect size per study, either randomly or based on a set of selection criteria, or creating a single synthetic effect size per study by averaging all effect sizes within a given study [14, 1618]. Although these data processing and selection techniques result in an analytic sample of statistically independent effect sizes for any given analysis, these techniques ultimately result in the loss of potentially valuable information.

One contemporary alternative for handling statistically dependent effect sizes is through robust variance estimation (RVE). This method permits the inclusion of statistically dependent effect sizes within a meta-analysis and therefore permits the meta-analyst to utilize all available information reported in primary studies. This method does not require knowledge of the underlying covariance structure among effect sizes, and thus is practical for use in many scenarios. Recent advances in meta-analytic software have also made RVE easy to implement. Thus, this paper will provide an overview of the RVE method, describing the underlying model and estimation procedures and its applicability to meta-analyses of research in developmental criminology. We then present a tutorial on implementing these methods in the R statistical environment [19], using an example meta-analysis on school motivation risk factors for adolescent crime and delinquency. The goal of this paper is to provide practical guidance to researchers interested in applying this innovative method to complex meta-analytic data structures.

This is the first tutorial to demonstrate the application of RVE in the R statistical environment and the first to discuss the application of RVE for meta-analyses of criminological research. It is also the first to illustrate the use of RVE with small-sample adjustments for both the t- and F-tests, which we introduce below [20, 21]. Although more technical presentations of the RVE method exist (e.g. [22, 23]), as well as a tutorial for implementing RVE in SPSS and Stata [24], this is the first tutorial of which we are aware that provides a user-friendly description of how to implement RVE in the R environment.

Review of Standard Meta-analysis Model

In a standard meta-analysis with independent effect sizes, the fixed-effect model is written as:

$$ {y}_j={\beta}_0+{e}_j $$
(1)

where y j is an effect size in the jth study (j = 1… m); β 0 is the average population effect, and e j is the study level residual ∼ N(0, v j ) [25]. Although this fixed-effect model is appropriate under some circumstances, most meta-analyses of criminological research will assume some underlying heterogeneity in effects (e.g., across settings, samples, outcomes) and/or will desire to make inferences to a larger population of studies, in which case a random-effects model will be more appropriate.In a standard meta-analysis with independent effect sizes, the random-effects model is written as:

$$ {y}_j={\beta}_0+{u}_j+{e}_j $$
(2)

where y j is an effect size in the jth study (j = 1…m); β 0 is the average population effect; u j is the study level random effect ∼N(0, τ 2), and e j is the study level residual ∼N(0, v j ) [25]. Given that heterogeneous effect sizes are common in meta-analyses of criminological research, most meta-analysts are also interested in examining the effect of moderator variables that may account for some of the observed variability in effect sizes [26, 27]. This simple random-effects meta-analytic model can thus be extended to include p covariates or effect size moderators x 1x p :

$$ {y}_j={\beta}_0+{\beta}_1{x}_1{+}_{\dots } + {\beta}_p{x}_p+{u}_j+{e}_j $$
(3)

where β 0 is the intercept, β 1β p are the regression coefficients for the covariates, and Var(u j ) = τ 2 res is the residual heterogeneity after adjusting for the covariates. This meta-regression model can thus be used to model the associations between the p covariates or effect size moderators and the effect size y. The weighted least-squares estimate of β = (β 1,…, β p ) can be calculated using

$$ \mathbf{b}={\left({\mathbf{X}}^{\mathbf{\prime}}\mathbf{W}\mathbf{X}\right)}^{-1}\left({\mathbf{X}}^{\mathbf{\prime}}\mathbf{W}\mathbf{Y}\right) $$
(4)

where W is a diagonal weight matrix. The variance of the estimate b of β can be written as:

$$ \mathrm{V}\left(\mathbf{b}\right)={\left({\mathbf{X}}^{\mathbf{\prime}}\mathbf{W}\mathbf{X}\right)}^{-1}\left({\mathbf{X}}^{\mathbf{\prime}}\mathbf{W}\boldsymbol{\Sigma } \mathbf{WX}\right){\left({\mathbf{X}}^{\mathbf{\prime}}\mathbf{W}\mathbf{X}\right)}^{-1}. $$
(5)

In a traditional meta-analysis model where the effect sizes from each study are assumed to be statistically independent, the diagonal elements of Σ are the variances v j and the off-diagonal covariance estimates are assumed to be zero. Typically, the weights are defined to be inverse variance, with the diagonals of W equal to (1/v j ) for each effect size, and the off-diagonals equal to zero; in this case, the variance reduces to:

$$ {\mathrm{V}}^{\mathrm{M}}\left(\mathbf{b}\right)={\left({\mathbf{X}}^{\mathbf{\prime}}\mathbf{W}\mathbf{X}\right)}^{-1}. $$
(6)

This formula for standard errors is widely used in meta-analyses of criminological research and is appropriate to use when estimated effect sizes are statistically independent. The estimator VM(b) is inappropriate for use when effect sizes are statistically dependent, however, because it results in standard error estimates that are much too small.

Review of Robust Variance Estimation (RVE)

RVE is a method that can be used to analyze statistically dependent effect sizes in a meta-analysis [22, 23]. Although this method does not require researchers to correctly specify the dependence structure of the data, it is helpful for researchers to understand where and how dependence can arise. For simplicity, we focus here on two broad types of dependence structures: “hierarchical effects” and “correlated effects.” Although we describe these two types of dependence separately, it is common for both forms of dependence to arise within the same meta-analysis (see [24]).

In the hierarchical effects case, dependence arises due to multiple studies being nested within a larger cluster such as a research group, lab, region, or country. For instance, a meta-analysis on drug court effectiveness studies might synthesize effect sizes indexing reductions in recidivism among drug court participants across multiple studies, but some drug courts may have conducted multiple independent evaluations at the same drug court site (e.g., by cohort). Thus, in this scenario, the meta-analyst might have effect sizes from multiple independent studies nested within a larger cluster (i.e., drug court site). In this hierarchical effects case, primary study participants (level 1) provide effect sizes in multiple studies (level 2), which are nested within some larger cluster (level 3). This hierarchical effects model assumes independent sampling errors within clusters, which is often a reasonable assumption when there are no overlapping participant samples represented in the effect size estimates.

Perhaps a more common dependence structure that arises in meta-analyses of criminological research is the correlated effects case, where there are dependent sampling errors within clusters. Correlated effects arise when the same participant samples are used to estimate multiple effect sizes. For instance, a meta-analysis on childhood risk factors for juvenile delinquency might synthesize effect sizes indexing the strength of different parenting-related risk factors across multiple studies; individual studies may report multiple measures of parenting-related risk factors, and individual studies may also report longitudinal effects at multiple follow-up points. In this correlated effects case, the meta-analytic model now represents primary study participants (level 1) providing multiple effect estimates (level 2) that are nested within studies (level 3).

The RVE method is agnostic to the type of dependence structure present in the data, so can be used to handle both the hierarchical effects and correlated effects scenarios described above. Moreover, the same underlying model is used here as in Eq. 2. Assuming a correlated effects model where multiple effect size estimates are nested within studies, the meta-analytic model can now be written as:

$$ {y}_{ij}={\beta}_0+{u}_j+{e}_{ij} $$
(7)

where for i = 1…k j , j = 1…m, y ij is the ith effect size in the jth study; β 0 is the average population effect; u j is the study level random effect such that Var(u j ) = τ 2 is the between-study variance component; and e ij is the residual for the ith effect size in the jth study. To examine effect size moderators, the RVE model can be extended to include p covariates x 1x p , where:

$$ {y}_{ij}={\beta}_0+{\beta}_1{x_{1ij}}_{+ \dots +}{\beta}_p{x}_{pij+}{u}_j+{e}_{ij.} $$
(8)

With RVE models, the weighted least-squares estimate of β = (β 1,…, β p ) can be calculated using the same approach as in the standard meta-analysis model (Eq. 4). The RVE approach differs from the standard meta-analysis approach in the estimate of the variance of b (Eq. 5). Whereas in standard meta-analysis the model-based estimator (Eq. 6) is employed, in RVE instead this variance is estimated by:

$$ {\mathrm{V}}^{\mathrm{R}}\left(\mathbf{b}\right)={\left({\displaystyle {\sum}_{j=1}^m{\mathbf{X}}_j^{\mathbf{\prime}}{\mathbf{W}}_j{\mathbf{X}}_j}\right)}^{-1}\left({\displaystyle {\sum}_{j=1}^m{\mathbf{X}}_j^{\mathbf{\prime}}{\mathbf{W}}_j{\mathbf{A}}_j{\mathbf{e}}_j{\mathbf{e}}_j^{\mathbf{\prime}}{\mathbf{A}}_j^{\mathbf{\prime}}{\mathbf{W}}_j{\mathbf{X}}_j}\right){\left({\displaystyle {\sum}_{j=1}^m{\mathbf{X}}_j^{\mathbf{\prime}}{\mathbf{W}}_j{\mathbf{X}}_j}\right)}^{-1} $$
(9)

where for study j = 1 … m, X j is the design matrix, W j is the weight matrix, A j is an adjustment matrix, and e j is the estimated residual vector. This adjustment matrix is included to adjust for small-sample bias in the estimator VR(b). When the adjustment matrix follows the form provided in Tipton [20], the robust variance estimator VR(b) is unbiased for the true variance V(b) even with a small number of included studies.

The Role of Weights in RVE

Although any weights can be used with the RVE method, Hedges et al. [22] argue that approximately inverse variance weights will often be desired because they are most efficient. Because the RVE estimator is derived for any weights, using weights other than inverse variance weights will not comprise statistical inferences and will only comprise efficiency. Hedges and colleagues [22] therefore proposed two methods for weighting: hierarchical effects weights and correlated effects weights (and these are the weights that are available as default options in the R package used in this tutorial).

In a hierarchical effects model, inverse variance weights can be used:

$$ {\omega}_{ij}=\frac{1}{\left({v}_{ij}+{\tau}^2+{\omega}^2\right)} $$
(10)

where v ij is the within-study sampling variance for effect size i in study j, τ 2 is the estimate of the between-cluster variance component, and ω 2 is the estimate of the within-cluster between-study variance component. The matrix W j thus has the values w ij on the diagonals and zeros on the off-diagonals. Hedges et al. [22] provide method of moments estimators for both τ 2 and ω 2. Importantly, unlike maximum likelihood estimates, these estimates do not require any distributional assumptions.

In a correlated effects model, Hedges et al. also provide a simple method for estimating inverse variance weights:

$$ {\mathrm{w}}_{ij}=\frac{1}{k_j\left({v}_j+{\tau}^2\right)} $$
(11)

where v j is the mean of the within-study sampling variances (v ij ) for the k j effect sizes in study j, τ 2 is the estimate of the between-study variance component, and k j is the number of effect sizes within each study j. Again, Hedges et al. provide a method of moments estimator of τ 2, which is implemented in the R package used in this tutorial.

Because the choice of weights only affects efficiency in estimates, Tanner-Smith and Tipton [24] recommended that meta-analysts faced with complicated dependence structures (i.e., a mixture of correlated and hierarchical effects) choose weights based on the most prevalent type of dependence. Namely, if most studies provide results from multiple evaluations on different participant samples, but a few also measure multiple outcomes on the same participant samples, then the hierarchical effects weights would be most appropriate. In contrast, if most studies provide the results of a single evaluation with multiple measures on the same participant sample, but a few have multiple cohorts of samples, then the correlated effects weights would be more appropriate.

Adjustment Matrices

In the RVE estimator of the variance (Eq. 9), adjustment matrices A j are found on either side of the estimator. In the original formulation of RVE found in Hedges et al. [22], these adjustments were not included, and the estimator was known to under-estimate the true variance when the number of studies is small (see simulations in [22, 23]). Tipton [20] showed that the bias from inclusion of a small number of studies could be reduced if these adjustment matrices were included. In the case in which correlated effects weights are used, these matrices can be written,

$$ {\mathbf{A}}_{\mathrm{j}}={\left(\mathbf{I}\hbox{--} {\mathbf{H}}_{\mathrm{j}}\right)}^{-1/2} $$
(12)

where H j = X j(XWX)−1 X jW j is the hat matrix for study j. These adjustments differ from study to study and take into account the leverage and influence of each study in the analysis. The adjustments for the hierarchical effects case, as well as more general adjustments, are found in Tipton [20] and follow a similar form.

Hypothesis Tests for RVE Meta-regression Coefficients

In meta-regression, analysts are typically interested in testing whether effect sizes vary in relation to particular covariates or effect size moderators. Hypotheses regarding a single coefficient β k in a meta-regression model (e.g., H0: β k  = 0) can be tested using the robust variance estimator as:

$$ {t}_{k = \frac{\beta_k}{\sqrt{{\mathrm{V}}_{\mathrm{k}}^{\mathrm{R}}}}} $$
(13)

where b k is the estimate of β k and VR k is robust estimate of the variance estimate of b k (i.e., the kth diagonal of the VR(b) variance-covariance matrix). In large samples, under the null hypothesis this t k statistic can be shown to be normally distributed (see [22]). In small to moderate samples, Tipton [20] shows that the distribution of t k is approximately that of a t-distribution, where the degrees of freedom are estimated using a Satterthwaite approximation; this approximation holds as long as the degrees of freedom are greater than or equal to four. When the degrees of freedom are less than four, the approximation does not hold and the type I error can be larger than the stated p value associated with the test.

Similarly, researchers are often interested in multi-parameter tests, including omnibus tests of model fit (e.g., H0: β = 0) and incremental tests (e.g., H0: β 1 = β 2 = 0). Incremental tests are particularly useful in the presence of categorical effect size moderators, whereby multiple dummy or indicator variables might be used to examine the effects of a categorical factor in a meta-regression model (e.g., region; age groups; race composition). In this case, Tipton and Pustejovsky [21] showed that the F test,

$$ {F}_q = \left(\frac{\eta -q+1\ }{\eta q}\right){\left({\mathbf{b}}_q\right)}^{\prime }{\left[{\mathbf{V}}_q^{\mathrm{R}}\right]}^{-1}\left({\mathbf{b}}_q\right) $$
(14)

can be used to test the null hypothesis that β q  = 0, where b q is a vector of estimates for the q coefficients being tested, V q R = VR(b q ) is the sub-matrix of VR(b) associated with the q estimates in b q, and η is a small-sample correction factor that is empirically estimated. Tipton and Pustejovsky also showed that the distribution of F q can be approximated by an F(q, η − q + 1) distribution, where q is the number of parameters or contrasts tested and η is empirically estimated. Like the robust t test described above, this approximation holds even when the number of studies is small (and, unlike the t test, for all values of the degrees of freedom).

Importantly, for both the robust t test and F test, the degrees of freedom depend on both the number of studies in the meta-analysis (but not the number of effect sizes), and also on features of the covariate(s). For example, for the robust t test, when a covariate is balanced (e.g., a dummy variable for sex with 50 % females and 50 % males), the degrees of freedom are close to m − p (where m is the number of studies and p is the number of coefficients in the model), but grow smaller as the degree of balance in the covariate values decreases. Similarly, the degrees of freedom decrease when a covariate has large skew or an extreme leverage point. This relationship between covariate features and degrees of freedom holds for the F test as well, although the multivariate nature of the F test makes it harder to diagnose. Nevertheless, the most important practical issue for meta-analysts using this method is that because the degrees of freedom vary in relation to covariate features (and not just the number of studies), these small-sample corrected hypothesis tests should always be implemented when using RVE (and indeed, these are now the default test statistics in the RVE software used in this tutorial).

Comparisons Between RVE and Multilevel Meta-analysis

Multilevel meta-analysis (MLMA) is another method that can be used to include statistically dependent effect sizes in a meta-analysis [2832]. The MLMA approach decomposes the variance of the effect sizes at each level, and thus explicitly models the dependence structure among effect sizes. In contrast to RVE, MLMA is not agnostic to the dependence structure in the data and is only appropriate in the hierarchical effect case, where primary study participants (level 1) provide one effect size in multiple studies (level 2), which are nested within some larger cluster (level 3). This hierarchical effects model assumes independent sampling errors within clusters, which is a reasonable assumption when there are no overlapping participant samples represented in the effect size estimates. For instance, in a meta-analysis of drug court impact evaluations, an evaluation study might report findings from several consecutive cohorts of drug court participants. In this example, the MLMA approach would be appropriate for handling the dependent effect sizes in this hierarchical structure, and would also be useful for decomposing variance of effect sizes at each level (e.g., between drug court sites, between cohorts within drug court sites). Parameters from these hierarchical effects three-level MLMA models can be estimated using iterative Maximum Likelihood or Restricted Maximum Likelihood algorithms [30, 33].

Because MLMA models assume independent sampling errors within clusters, using MLMA under the correlated effects scenario (i.e., when the same participants provide data on multiple effect sizes) violates the model assumptions of independent sampling errors within level 3 units. Because most developmental criminological meta-analyses with complex data structures will include correlated effects (rather than hierarchical effects), this tutorial will focus solely on the application of the RVE method for correlated effects data (but see [32, 34]).

Benefits and Limitations of RVE

The RVE method permits the inclusion of statistically dependent effect sizes within a meta-analysis, and therefore does not require the analyst to throw away information contained in the effect sizes reported in primary studies (i.e., the type of data loss that would occur if selection criteria were used to select a set of statistically independent effect sizes). Indeed, RVE does not require knowledge of the covariance between the effect sizes within each study, information that is required when applying multivariate meta-analysis methods [35]. Because RVE can be easily implemented using free statistical software (demonstrated later in this tutorial), it is an innovative and attractive method for handling multiple types of complex meta-analytic data structures.

Although RVE is a promising method for handling statistically dependent effect sizes within meta-analyses of criminological research, it also has several limitations that should be noted. The first of these has to do with the treatment of heterogeneity in RVE. Whereas MLMA models heterogeneity at multiple levels of the data, and therefore provides estimates of variance parameters (e.g., τ 2 L2 and τ 2 L3) and corresponding hypothesis tests (i.e., Q, I 2), RVE is simply a method for adjusting standard errors where the primary focus is on estimating fixed effects (e.g., the mean effect size, meta-regression coefficients). Heterogeneity parameters, although reported in RVE, are incidental to the RVE analysis; they are only needed for the estimation of approximately inverse variance weights, and are often estimated via a simplistic method-of-moments estimator. Thus, RVE models are not intended to provide precise variance parameter estimates, nor test null hypotheses regarding heterogeneity parameters. This is an important issue to highlight; many meta-analysts are interested in estimating precise variance parameter estimates and want to conduct hypothesis tests around these variance parameters, but RVE may not be the most appropriate method for that purpose.

Second, the fact that the degrees of freedom used in hypothesis testing depend on the number of studies and features of the covariate means that in some cases, tests of particular moderators may be surprisingly under-powered. This problem arises when the covariate under study is highly skewed (e.g., most of the values are between 10 and 20, with one value of 100) or imbalanced (e.g., 27 studies are on adolescents and 3 are on adults, with a dummy code for age as a moderator). In these cases, the number of studies could be quite large (e.g., more than 40) and yet the degrees of freedom could be quite small (e.g., less than 10).

Aside from issues of power, this can be particularly problematic when the degrees of freedom fall below 4 for t tests, where the t-distribution approximation no longer holds. In these cases, two approaches are recommended. First, if the degrees of freedom are much smaller than the number of studies, the analyst should carefully examine the covariate values, paying attention to leverage points and imbalances. The suggested approach here is to follow the rules of thumb and guidelines for dealing with “unusual” data (e.g., outliers, leverage, influence points) in regression. For example, if most of the covariate values are between 10 and 20, with one “outlying” value of 100, a strategy may be to remove or winsorize this extreme value; doing so will improve not only the degrees of freedom, but may also help with the interpretability of the findings, since this observation could be exerting large influence on the coefficient estimate as well. Second, if the degrees of freedom are very small, a lower p value should be used; for example, if p < 0.05 is used as a threshold elsewhere, for these cases p < 0.01 should be used instead (since the type I error is typically higher than stated in these cases).

Finally, RVE provides a statistically appropriate method for the inclusion of all effect sizes in a meta-analysis, but does not address other obstacles commonly encountered in a meta-analysis. For instance, many primary studies fail to report the information needed to estimate a standardized effect size, which often necessitates requesting additional information from primary study authors (and response rates to these requests can be quite low). Such missing effect size data pose a challenge to any meta-analyst, and the use of RVE may not be warranted if standardized effect sizes are not available from a sufficient number of studies. Similarly, the RVE method does not directly address another common obstacle in meta-analyses: how to handle the synthesis of bivariate and partial correlations. Partial correlations are often available from regression coefficients reported in primary studies, and meta-analysts synthesizing correlations must carefully attend to this issue [36]. The RVE method also does not provide a solution to the “apples and oranges” critique common in meta-analysis. Just as in any other meta-analyses, analysts using RVE should be careful to consider whether the effect sizes included in a meta-analysis are “similar” enough to be synthesized. This is a substantive consideration that must always be guided by content expertise, however, so is not a true limitation of the RVE method itself but rather a potential limitation if meta-analysts carelessly apply the method to their data.

Example Research Questions to Guide the Software Tutorial

To illustrate the application of RVE for handling statistically dependent effect sizes in a correlated effects scenario, this tutorial will be guided by the following research questions: (1) How strongly are school motivation risk factors in childhood and adolescence correlated with subsequent crime and delinquency in adolescence, and (2) How does the crime risk associated with school motivation/attitudes vary across the span of adolescence? This example was selected given its relevance to developmental and life-course criminology and its demonstration of the potential developmental variability in risk factors for crime during adolescence. This example is used solely for pedagogical and demonstration purposes, however, and is not intended to provide a review or synthesis of the most current research evidence. And although these example data use correlation coefficients, the same procedures outlined in this tutorial can be applied to intervention effectiveness meta-analyses that may synthesize standardized mean difference effect sizes, risk ratios, or odds ratios.

Method

Sample

This tutorial uses a meta-analytic database that includes correlations indexing the relationship between school motivation or attitudes (hereafter referred to as school motivation) and a later chargeable delinquent or criminal behavior (hereafter referred to as crime) measured during adolescence and adulthood. This database is a subset from a larger meta-analytic database of studies published through 2002 that used prospective longitudinal panel designs with at least 6 months between measurement waves, which reported the correlation between a risk factor and a subsequent antisocial behavior outcome measured during late adolescence or early adulthood (funded by R01MH051685 and R01MH63288 from the National Institutes of Mental Health to PI Mark Lipsey). To be eligible for inclusion in the larger meta-analytic database, studies could have no more than 50 % attrition between consecutive waves, and must have been conducted in the USA and published in English (see [37]). The subset of studies used in this example includes 113 product–moment correlation coefficients, each of which represent a relationship between a school motivation risk factor measured between ages 11 and 18 and a crime outcome measured between ages 12 and 20. These correlations originated from 17 independent respondent samples from 6 longitudinal studies (see [58-63]). Note that this sample of correlation effect sizes is used solely for pedagogical purposes for this tutorial, however, and is not intended to provide a current synthesis of the literature (for which an updated, systematic literature review would be needed).

Measures

The effect sizes used in this tutorial are Fisher’s z-transformed correlation coefficients representing the longitudinal relationship between a school motivation risk factor measured prior to age 18 and a crime outcome measured at least six months after the initial wave of measurement:

$$ z=0.5\times \ln \left(\frac{1+r}{1 - r}\right) $$
(15)

where r is the product–moment correlation coefficient between school motivation and crime. The variance of the z-transformed correlation effect size is estimated as:

$$ V=\frac{1}{n - 3}\ . $$
(16)

Correlation coefficients were coded directly from study reports or estimated from contingency tables or other summary statistics if necessary. As is recommended when estimating correlation effect sizes from primary studies, artifact adjustments were used to correct for the unreliability of any artificial dichotomous variables used in the correlations for underlying continuous variables [26, 38]. We refer readers interested in guidance on how to calculate effect size estimates from primary studies to more comprehensive sources on that topic [25, 26, 38, 39].

Analytic Strategies

On average, studies reported 6.65 correlation effect sizes (SD = 3.81, Minimum = 2, Maximum = 12) measuring the relationship between a school motivation risk factor and a subsequent crime outcome. We therefore used RVE to synthesize these dependent effect size estimates. First, to estimate the overall strength of the correlation between school motivation risk factors and later crime outcomes, we estimated a simple RVE meta-regression model:

$$ {y}_{ij}={\beta}_0+{u}_j+{e}_{ij} $$
(17)

where y ij is the ith correlation effect size in the jth study; β 0 is the average population effect of the risk-crime correlation; u j is the study level random effect such that Var(u j ) = τ 2 is the between-study variance component; and e ij is the residual for the ith effect size in the jth study. Next, to explore variability in the strength of this risk factor over the span of adolescence, we estimated a mixed-effects RVE meta-regression model:

$$ {y}_{ij}={\beta}_0+{\beta}_1{\left(\mathrm{Age}\ \mathrm{at}\ {\mathrm{T}}_1\right)}_{ij}+{\beta}_2{\left(\mathrm{Age}\ \mathrm{at}\ {\mathrm{T}}_2\right)}_{ij}+{\beta}_3{\left(\mathrm{Age}\ \mathrm{at}\ {\mathrm{T}}_1 \times \mathrm{Age}\ \mathrm{at}\ {\mathrm{T}}_2\right)}_{ij}+{u}_j+{e}_{ij} $$
(18)

that included the main effects and multiplicative interaction term for the average age of participants at the initial measurement wave (i.e., the age at which the school motivation risk factor was measured) and the age at the follow-up measurement wave (i.e., the age at which the crime outcome was measured). Because the age of participants could vary both within and between studies, we used study-level mean values for all age moderators (see [24] for a discussion on measuring moderators in RVE models). Finally, to examine whether these effects persisted after adjusting for the sex composition of the samples, we estimated a final model:

$$ {y}_{ij}={\beta}_0+{\beta}_1{\left(\mathrm{Age}\ \mathrm{at}\ {\mathrm{T}}_1\right)}_{ji}+{\beta}_2{\left(\mathrm{Age}\ \mathrm{at}\ {\mathrm{T}}_2\right)}_{ij}+{\beta}_3{\left(\mathrm{Age}\ \mathrm{at}\ {\mathrm{T}}_1 \times \mathrm{Age}\ \mathrm{at}\ {\mathrm{T}}_2\right)}_{ij}+{\beta}_4\left(\mathrm{All}\ \mathrm{males}\right)+{\beta}_5\left(\mathrm{Mixed}\ \mathrm{sex}\right)+{u}_j+{e}_{ij} $$
(19)

that included two dummy variables indicating whether the participant sample was comprised entirely of males or was of mixed sex composition (with all females being the reference category). Note that this categorization of the gender composition variable is used solely for pedagogical purposes to demonstrate the examination of a categorical effect size moderator. This illustrative example was selected to highlight how RVE can be used to examine developmental changes in risk factors over time, but this meta-regression could easily be extended to include a range of other effect size moderators (provided sufficient sample sizes for parameter estimation). Because the dependence structure in these data reflected correlated effects (i.e., multiple effect sizes available from the same participant samples), we used the recommended approximately inverse variance weights for correlated effects (Eq. 11).

All analyses were conducted in the R statistical environment (version 3.2.2). To illustrate the standard meta-analysis method that (naively) ignores the dependencies in the data, we used the metafor package [40]. For the RVE method, the robumeta package [41] was used to estimate mean effect sizes and meta-regression models and the clubSandwich package [42] was used to estimate the multi-parameter F tests. The package ggplot2 was used to create the scatter plot [43]. Appendix A provides annotated code using the R Markdown language [44].

Robust Variance Estimation in R

We now provide a tutorial on how to conduct a meta-analysis with RVE in the R statistical environment. Although these methods can also be implemented in SAS and Stata (e.g., see [24, 34]), here we use R because it is freely available from the Comprehensive R Archive Network at http://cran.r-project.org. Appendix A includes the R code that reproduces all analyses demonstrated below; Appendix B (in the only supplementary material) Table 3 (and the online supplementary material) provides a text file containing the data used in all analyses.

Prior to synthesizing effect sizes, we examined the distribution of the effect sizes to assess for potential outliers using a violin plot (not shown, but syntax included in Appendix A). The effect size distribution is approximately normal and right skewed, but there do not appear to be any outlying effect sizes.

We now estimate the overall mean effect size, synthesizing the 113 correlations from the 17 independent study samples. For illustrative purposes, we first estimate a naïve standard meta-analysis model that ignores the dependency structure of the data. This naïve meta-analysis model would never be appropriate to use with such a complex meta-analytic data structure; here we present it simply to illustrate how ignoring dependencies can yield inappropriately small standard errors. As shown in Table 1, the naïve standard meta-analysis (\( \overline{z} \) = 0.12, 95 % CI [0.09, 0.14], τ 2 = 0.01) yields a larger estimate of the average effect size and a narrower confidence interval than the analysis appropriately handling the statistically dependent effect sizes using RVE (\( \overline{z} \) = 0.09, 95 % CI [0.04, 0.15], τ 2 = 0.01). Thus, these results suggest that school motivation has a small but significant association with crime during adolescence. Transforming this mean Fisher’s z-transformed effect size into a Pearson correlation (e2*0.09 − 1)/(e2*0.09 + 1) yields a mean correlation of 0.09. Although informative for estimating the overall correlation between school motivation and adolescent criminal behavior, this overall mean correlation estimate ignores any potential developmental variability in this risk-crime relationship. Thus, we next estimate a meta-regression model with RVE that models the potential variability in this risk-crime correlation across the span of adolescence.

Table 1 Mean effect size estimates from standard meta-analysis and RVE meta-analysis

Table 2 presents the results from the RVE meta-regressions that model the potential developmental variability in the risk of adolescent crime associated with school motivation. The left panel presents results from a model that estimates the main effects of participant ages at time 1 (when school motivation risk was measured) and time 2 (when the crime outcome was measured). Overall, the results provide no evidence that the risk-crime correlation varies depending on the average ages of participants at the different measurements waves. The omnibus F test [21] provides no evidence that the age covariates are jointly significant (F = 1.54, df = 3.11, p = 0.34). This model does not appropriately parameterize the potential developmental variability in risk, however, because it does not include the multiplicative interaction term between the two age variables. Namely, the strength of school motivation as a risk factor at age 12 may vary for crime at age 14, 16, 18, etc.; thus, an interaction term is needed to model that potential variability in risk. Furthermore, the small-sample corrected degrees of freedom for the age at time 2 parameter in the main effects model is less than four (df = 1.92) and thus the p value associated with the hypothesis test is likely to under-estimate the true type I error; because the test is not significant, however, taking this into account does not change the conclusion [20]. Thus, we next estimated a full model that included both the main effects and interaction terms to examine the potential developmental variability in crime risk associated with school motivation.

Table 2 Results from RVE meta-regression modeling developmental variability in risk-crime correlation (n = 113, k = 17)

The middle panel of Table 2 presents results from this full model. In this model, all of the small-sample corrected degrees of freedom are in an acceptable range (i.e., greater than four), indicating that the p values for the associated t tests accurately reflect the type I error. The results indicate that there is a significant interaction between age at time 1 and age at time 2 (b = −0.01, 95 % CI [−0.02, −0.00]). Although the omnibus F test provides no evidence of joint significance across all covariates (F = 1.83, df = 3.03, p = 0.32), this is likely due to the small number of studies and the imbalance in covariate values. Plotting the results from this interaction in Fig. 1, we see that the magnitude of the correlation between school motivation and criminal behavior generally decreases over the course of adolescence (evidenced by the negative slopes of the regression lines). However, school motivation around age 17 has the strongest correlation with crime during late adolescence and early adulthood (ages 18–20). Although the magnitude of these risk-crime correlations is somewhat small in magnitude, these results nonetheless suggest that school motivation may be an appropriate target for delinquency prevention and intervention programs, and also suggest that the most developmentally appropriate time to target school motivation risk factors may be prior to age 16.

Fig. 1
figure 1

Developmental variation in the correlation between school motivation risk and subsequent criminal behavior

The right panel of Table 2 includes two additional dummy variables representing the sex composition of the samples in the meta-analysis, contrasting samples that were all male, mixed sex, and all female (the reference category). The results from this model suggest that the developmental variability in the crime risk associated with school motivation is still statistically significant (b = −0.01, 95 % CI [−0.03, −0.01]) even after adjusting for the sex composition of the primary study samples. Neither of the regression coefficients for the sex composition dummy variables are statistically significant, so it is no surprise that the joint F test regarding the effect of gender composition (i.e., the two dummy variables) is non-significant as well (F = 0.14, df = 6.51, p = 0.87). And similar to the prior model, the omnibus F test provides no evidence of joint significance across all covariates (F = 1.05, df = 3.08, p = 0.52).

Discussion

Meta-analyses in the field of developmental and life-course criminology will often encounter complex meta-analytic data structures with statistically dependent effect size estimates. Although statistically dependent effect sizes would ideally be synthesized via multivariate meta-analysis, that method is often not feasible to implement given that it requires knowledge about the covariance structure of the effect size estimates. Historically, therefore, meta-analysts have been forced to use less optimal methods for handling statistically dependent effect sizes, such as selecting one effect size per study or creating a single synthetic effect size per study, thus ultimately throwing away potentially valuable information from primary studies (e.g., [911, 45]).

RVE for meta-regression is a recent innovative technique that can be used to handle statistically dependent effect sizes in a meta-analysis, but does not require information about the effect size covariance structure. Although the RVE method has several limitations (e.g., it does not model variance at all levels of the data, does not provide tests of null hypotheses regarding heterogeneity parameters; see Benefits and Limitations of RVE), the method is easy to implement and can also be used to handle different types of dependency structures. This software tutorial demonstrated how to use RVE in the R statistical environment, providing example code for future researchers to adapt for their own meta-analyses. Using an example meta-analysis examining the crime risk associated with adolescents’ school motivation and attitudes, we illustrated how RVE can be used to address important research synthesis questions regarding the developmental variability in risk factors. Syntheses like these can be useful for identifying risk and protective factors that may be appropriate targets for crime prevention and intervention programs, establishing the most developmentally appropriate time to address these risk factors, and advancing developmental theories of crime etiology.

Although this tutorial illustrated the use of RVE in a meta-analysis with a correlated effects dependency structure, this method is also appropriate for meta-analyses with hierarchical effects dependency structures or a combination of dependence structures (see [24] for a tutorial). Further, although the data used in this tutorial were from a meta-analysis where the effect size metric was a correlation coefficient, the same procedures can be used in meta-analyses using other effect size metrics such as mean differences, standardized mean differences, odds ratios, risk ratios, risk differences, or phi coefficients. Indeed, the example code in this software tutorial can be easily translated to meta-analyses of program or policy effectiveness, such as those examining the effects of neighborhood watch on crime [46], street-level drug law enforcement [47], or batterer intervention programs [48].

This tutorial demonstrated the application of RVE in the R statistical environment (version 3.2.1) using the robumeta [41] and clubSandwich [42] packages. The tutorial illustrates the ease with which the RVE method can be implemented in R, with the additional advantage that R is freely available to all researchers. Meta-analysts interested in implementing RVE with other statistical platforms such as SAS, SPSS, or Stata should consult other resources (e.g., [24, 34]). Importantly, analysts should pay attention to the implementation of RVE in these packages, noting if corrections for small numbers of studies have been implemented before proceeding. Currently, the Stata macro (also called “robumeta”) includes small sample adjustments for t tests but not F tests, whereas the SPSS and SAS macros do not include these adjustments.

Complex meta-analytic data structures with statistically dependent effect size will become more common in criminology as the body of primary studies continues to accumulate, and as the focus of meta-analyses shifts away from estimation of mean effect sizes to the examination of variability in effect sizes. Indeed, the RVE method for handling dependent effect sizes is already being adopted by researchers in the fields of ecology [49, 50], education [51], mental health [52], pediatrics [53, 54], psychology [55], psychiatry [56], and substance abuse [57]. The uptake in RVE is not surprising given that this method obviates the loss of information that occurs when meta-analysts artificially create a set of statistically independent effect sizes, either through selection processes or averaging effect sizes within studies.

This tutorial is intended to provide a guide for criminologists interested in applying the RVE method to their complex meta-analytic data structures. This method will be particularly useful for meta-analysts interested in modeling the developmental graduations in risk factors for crime (e.g., what are the age-crime curves for different risk factors), examining the effectiveness of crime prevention programs during key life-course transitions (e.g., are residential mobility programs effective in reducing juvenile delinquency?), and other developmental dimensions of crime across the life-course.