Meta-analysis is a technique for summarizing the results of previous studies. It is widely used in many disciplines, including medical studies, marketing research, physics, and psychology. The basic idea of meta-analysis is simple: Studies of the same constructs—for example, the correlation between intention and behavior—do not always have the same conclusions. The sample correlations and the results of significance tests usually vary across studies. This may be due to a variety of factors, such as measurement errors, the scales used, or the sample characteristics. In the popular random-effects model (Borenstein, Hedges, Higgins, & Rothstein, 2010), variation in effect sizes is usually explained by moderators, such as population characteristics or the scales used. Following this general idea, various meta-analysis techniques have been proposed (e.g., Hedges & Olkin, 1985; Hedges & Vevea, 1998; Hunter & Schmidt, 2004).

Although the idea is simple, several practical problems affect the application of meta-analyses. One is dependence among the effect sizes in univariate meta-analysis.Footnote 1 In primary studies, researchers can usually monitor the data collection process to prevent the occurrence of undesirable dependent data, such as when two questionnaires completed by the same person are erroneously treated as data from different persons. Dependence is desirable in some designs, such as within-subjects designs, in which the change in individual scores is studied across time. This kind of dependence by design can be handled easily in primary studies by techniques such as within-subjects analysis of variance. However, in meta-analysis, dependence usually occurs irregularly. For example, in a collection of studies investigating the correlation between intention and behavior, some researchers may have investigated the intention–behavior relationship across several forms of behavior (e.g., Madden, Ellen, & Ajzen, 1992). Most common statistical techniques used in meta-analysis assume independent effect sizes, and using these techniques to summarize effect sizes when some of them are dependent will lead to biased estimates of the degree of heterogeneity (Cheung & Chan, 2004, 2008; Martinussen & Bjørnstad, 1999; Viswesvaran, Sanchez, & Fisher, 1999). Just as we cannot treat two questionnaires completed by the same person as though they were completed by two different persons, the presence of dependent effect sizes cannot be ignored in meta-analysis. Several techniques have been proposed to handle dependent effect sizes in meta-analysis. We will briefly discuss some of them below, but our aim is not to provide a comprehensive review of all such techniques (please see Cooper, 2009, for a review). Instead, we will focus on a pair of related techniques that are easy to apply and have been empirically demonstrated to perform satisfactorily, but that have rarely been applied in meta-analyses. We will also compare these techniques with three-level meta-analysis, because both types of analyses do not require knowledge of the inter-effect-size covariances. In the present article, we will briefly present the procedures proposed by Cheung and Chan (2004) to handle dependent effect sizes. Then we will introduce an SPSS syntax and an R script and provide an example to illustrate how to use the selected procedures to conduct meta-analyses with dependent correlations. Although in principle the procedure could be applied, with modifications, to other effect size measures without loss of generalizability, we selected correlations as the effect size measures.

Existing procedures

Three major approaches have been developed to the problem of dependent effect sizes. The first approach, which we label the analytic approach, takes into account the inter-effect-size covariances in a meta-analysis and does not assume that all of the effect sizes are independent (i.e., it models the dependence, as described by Van den Noortgate, López-López, Marín-Martínez, & Sánchez-Meca, 2013). For example, Becker’s (1992) generalized least squares meta-analysis (see also Raudenbush, Becker, & Kalaian, 1988) and the procedures reviewed by Gleser and Olkin (2009) can model inter-effect-size covariances appropriately. (Most of these procedures were originally proposed to handle multivariate meta-analysis, but they can be extended to univariate meta-analysis with dependent data.) Robust variance estimation can also be classified as being in line with this approach (Hedges, Tipton, & Johnson, 2010), although the inter-effect-size covariances are not estimated directly. The second approach is the three-level meta-analytic model (Cheung, 2013c; Konstantopoulos, 2011; Van den Noortgate et al., 2013; also called multilevel modeling in some previous studies), which has been used to take into account dependence within clusters of effect sizes (Marsh, Bornmann, Mutz, Daniel, & O’Mara, 2009; Van den Noortgate et al., 2013). This is an extension of a two-level meta-analytic model (called the v-known model in Raudenbush & Bryk, 2002), with clusters of dependent effect sizes as the second-level units. In the third approach, the samplewise approach, the average of each set of dependent effect sizes from the same sample is computed and denoted as the within-sample mean (Hunter & Schmidt, 2004). These within-sample means are then used, along with other independent effect sizes, in subsequent analyses using the common meta-analytic techniques. All of the available effect sizes are used, and the data points to be meta-analyzed are independent. This approach is easy to apply and can be used regardless of the meta-analytic procedures being adopted for the main analysis (e.g., Baltes, Briggs, Huff, Wright, & Neuman, 1999; Huffcutt, Conway, Roth, & Stone, 2001).Footnote 2

Each of these three approaches has its own advantages and disadvantages. The analytic approach requires the inter-effect-size covariance to be known between each pair of dependent effect sizes in a sample. If these covariances are accurately estimated, then the dependence is modeled appropriately. However, these covariances, although they can be computed if the full correlation matrices for all of the variables in a sample are available, are rarely reported in published studies. This makes the analytic approach, though theoretically valid, difficult to apply in practice.

As compared to the analytic approach, the three-level meta-analytic model can take into account the clustering of effect sizes without the need to know the inter-effect-size covariances. It also allows for the estimation of within-cluster heterogeneity (i.e., the Level 2 heterogeneity; Cheung, 2013c), which in the analytic approach is assumed to be equal to the Level 3 heterogeneity. The Level 2 heterogeneity can be useful when the effect sizes within clusters vary on the same dimensions, such as different measures of the same construct (Cheung, 2013c). However, the usual formulation of the three-level meta-analytic model assumes that the residuals of the within-cluster effect sizes are uncorrelated. This assumption is unlikely to be true when the effect sizes came from the same samples, share a common comparison group, or have overlap in variables. On the other hand, the analytic approach, such as the generalized least squares model (Becker, 1992), takes these correlations into account. In principle, the three-level meta-analytic model can take into account the correlations between the residuals within a cluster by modifying the model such that the residuals are orthogonal (see Kalaian & Raudenbush, 1996, for an example of this transformation for multivariate meta-analysis using two-level meta-analysis). However, this requires estimation of the inter-effect-size covariances, as in the analytic approach. Moreover, in common meta-analysis scenarios, only some of the samples contribute more than one effect size, whereas many samples contribute only one effect size each. In other words, many Level 2 clusters have only one effect size. The performance of three-level meta-analysis in this kind of situation has yet to be examined empirically (see Van den Noortgate et al., 2013, for findings on the performance of this approach when all of the studies contribute the same number of effect sizes).

The common techniques of the samplewise approach assume that the inter-effect-size correlations are either all zeros or all ones (see below for an illustration). Unlike the analytic approach and the three-level meta-analytic model, the samplewise approach is not a complete framework for meta-analysis. Instead, it is an approach to “preprocess” the dependent effect sizes for meta-analysis. This preprocessing is very simple to carry out and can be applied regardless of which meta-analysis approach is adopted. This is important, because a researcher may have legitimate reasons to choose a framework other than the two- or three-level meta-analytic models, or the generalized least squares model, for a meta-analysis. For instance, using the three-level meta-analysis to handle the dependent effect sizes will force researchers to adopt this framework for the whole meta-analysis. Moreover, the samplewise approach does not require knowledge of the inter-effect-size covariances within a sample. However, ignoring the covariances leads to biased estimates of the sampling variance and of the confidence interval of the within-sample mean effect size (Cheung & Chan, 2004; Hunter & Schmidt, 2004). By analyzing the within-sample mean instead of all individual effect sizes, this approach also assumes that the within-sample effect sizes are homogeneous (i.e., that the Level 2 variance is zero, in three-level meta-analysis terminology). This assumption is tenable in some situations, such as when the effect sizes within a sample are conceptually similar or are just repeated measures of the same relationships. However, if the Level 2 heterogeneity is large, the samplewise approach may lead to incorrect results. If the population effect size varies across studies, then the estimated degree of heterogeneity will also be biased (Cheung & Chan, 2004; Hunter & Schmidt, 2004). This last problem is serious, because estimating and explaining the degree of heterogeneity are two of the main goals of meta-analysis (Hunter & Schmidt, 2004).

In sum, none of the three aforementioned approaches is unconditionally better than the others. The analytic approach is theoretically sound but usually cannot be applied, due to insufficient information. Three-level meta-analysis is an attractive framework that not only handles clustered effect sizes, but also allows for the inclusion of predictors to explain the heterogeneity and for the estimation of Level 2 and Level 3 heterogeneity. However, its assumption of independent residuals within a cluster is usually violated, due to the presence of within-cluster covariances among the residuals, which is modeled in the analytic approach. The samplewise procedure is an easy-to-apply method to preprocess the dependent effect sizes, and is therefore not restricted to any particular framework of meta-analysis. Nevertheless, this approach can lead to biased results, due to ignorance of the within-sample covariances.

If a meta-analyst decides to adopt a framework other than the three-level meta-analytic model, such as the approach of Hunter and Schmidt (2004), and judges that the within-sample heterogeneity is negligible relative to the between-sample heterogeneity, then the samplewise approach is a viable option. However, the aforementioned potential biases have to be reduced. In the next section, we review two improved versions of existing samplewise-procedures.

Samplewise-adjusted procedures

In line with previous studies, in the following discussion, we use correlation coefficients as the effect size measures, although most of the arguments and derivations are not restricted to correlation coefficients. One cause of bias in the samplewise approach is the use of an inappropriate sample size to weight each within-sample mean. Suppose that three intention–behavior correlations are found in the same sample of 100 cases. The within-sample mean of these three sample correlations will be used in the meta-analysis. There are two convenient choices of sample size for this average: the original sample size (100), or the original sample size multiplied by the number of effect sizes (300) (Hunter & Schmidt, 2004, chap. 10). We denote the procedure using the first choice as the samplewise-n procedure, and the procedure using the second choice as the samplewise-np procedure. The original sample size seems to be the natural choice because it is the actual number of subjects. Yet the sample size multiplied by the number of effect sizes also seems appropriate, because reporting more correlations gives more information than reporting only one effect size. However, Hunter and Schmidt argued that using the simple sample size (100 in the example) tends to overestimate the sampling error because the within-sample mean tends to have less sampling error than any one single correlation from this sample, whereas the second choice (300 in the example) tends to underestimate the sampling error because the averaged correlations are assumed to be independent when they are not. Cheung and Chan (2004, 2008) proposed two modifications of the samplewise procedure, which are presented in the next section.

The two samplewise-adjusted procedures proposed by Cheung and Chan (2004, 2008) are the samplewise-adjusted-individual and samplewise-adjusted-weighted procedures. As we discussed above, for a sample with n cases and p correlations, both n and np are usually not the appropriate sample size to weight the within-sample mean. The appropriate sample size, depending on the degree of dependenceFootnote 3 among the p correlations, is a value between n and np. The goal of the samplewise-adjusted procedure is to find an adjusted sample size that more accurately reflects the sampling variability of the within-sample mean than do the samplewise-n and -np procedures. Once these effective sample sizes are determined for the within-sample means, common meta-analytic techniques can be applied as usual. The samplewise-adjusted-individual procedure is outlined briefly below. For the technical details, please refer to Cheung and Chan (2004, 2008).

The samplewise-adjusted-individual procedure basically involves two steps. For each sample with multiple correlations, the average degree of dependence of the correlations is estimated. The sampling variance for the within-sample mean, which takes into account the degree of dependence, is then computed. In the first step, the average inter-effect-size correlation (b i ) for the ith sample with multiple effect sizes is estimated from the average theoretical sampling variance for this sample (σ ei ) and the observed variance (S 2 ESi ) of the dependent effect sizes (Cheung & Chan, 2008) using the following equation,

$$ {\widehat{b}}_i=1-\frac{S_{ESi}^2}{\sigma_{ei}}. $$

The logic is that, if the multiple effect sizes are independent, the observed variance and the average theoretical sampling variance are expected to be equal, and \( {\widehat{b}}_i \) is expected to be zero. To the extent that the multiple effect sizes are dependent, the observed variance is expected to be lower than the average theoretical sampling variance, and \( {\widehat{b}}_i \) is expected to be greater than zero. If the effect sizes are correlations, then b i can be estimated by

$$ {\widehat{b}}_i=1-\frac{\left({{\displaystyle \sum}}_{j=1}^{p_i}{\left({r}_{ij}-\overline{r_i}\right)}^2\right)/\left({p}_i-1\right)}{\left(1-{\overline{r_i}}^2\right)/\left({n}_i-1\right)}, $$

where p i is the number of correlations, r ij is the jth correlation, \( {\overline{r}}_i \) is the within-sample mean correlation, and n i is the sample size, all for the ith sample. In their computer simulation studies Cheung and Chan (2004, Eq. 9) followed the logic of the Hunter–Schmidt approach and used the mean correlation \( \left(\overline{r}\right) \) to estimate the average sampling variance, b i , with the equation

$$ {\widehat{b}}_i=1-\frac{\left({{\displaystyle \sum}}_{j=1}^{p_i}{\left({r}_{ij}-\overline{r_i}\right)}^2\right)/\left({p}_i-1\right)}{\left(1-{\overline{r}}^2\right)/\left({n}_i-1\right)}. $$

Cheung and Chan (2004, 2008) demonstrated empirically that this samplewise-adjusted procedure performed better than the samplewise-n or samplewise-np (labeled samplewise-PN in Cheung & Chan, 2004) procedures, even when the theoretical sampling variance and the degree of dependence varied across and within samples. Therefore, for a sample with more than two correlations, and hence probably with different degrees of dependence between different combinations of correlations, the estimated average degree of dependence, \( \left({\widehat{b}}_i\right) \), is more appropriate for the second step of the samplewise-adjusted-individual procedure than the estimated degree of dependence between each pair of correlations in a sample.

In the second step, the sampling variance of the within-sample mean is estimated by σ ei C i , where C i is given by

$$ {C}_i=1-\frac{\left({p}_i-1\right){S}_{ESi}^2}{p_i{\sigma}_{ei}}=\frac{1+\left({p}_i-1\right){\widehat{b}}_i}{p_i}. $$

If the effect sizes are correlations, the adjusted sample size can be computed by

$$ {n}_i^{\prime }=\left({n}_i-1\right)/{C}_i+1=\left({n}_i-1\right){A}_i+1,\kern0.5em \mathrm{where}\kern0.5em {A}_i=1/{C}_i. $$

The adjustment factor A i is introduced to illustrate that the procedure essentially adjusts the sample size, finding the appropriate weight that lies between n i (when \( {\widehat{b}}_i=1 \)) and n i p i (when \( {\widehat{b}}_i=0 \)). If n i is large—that is, (n i – 1)/n i ≈ 1—then (n i – 1)A i + 1 ≈ n i A i . The adjusted sample sizes for the correlations are computed this way so that they can be substituted in the usual formula for the sampling variance of a sample correlation, (1 − r 2)2/(n i  − 1), which subtracts one from the sample size. Therefore, the programs used for the meta-analysis of the correlations do not need to know which correlations are the within-sample mean. The adjusted sample size or the adjusted sampling variance (σ ei C i ) can be used directly as the input.

After the adjusted sample sizes of all of the samples with multiple effect sizes are computed, the within-sample means can then be treated like other independent correlations; their adjusted sample sizes can be used as the input, whereas other independent correlations use their usual sample sizes, to compute the weights and sampling variances. For example, if a meta-analysis has five independent correlations and five dependent correlations from the same sample, the samplewise-adjusted procedure yields six independent correlations, one of which is the within-sample mean of the five correlations, with an adjusted sample size calculated by the aforementioned procedure. Meta-analytic procedures and programs for independent effect sizes can then be used to analyze the independent correlations and the within-sample mean correlations without additional treatment.

The samplewise-adjusted-weighted procedure is similar to the samplewise-adjusted-individual procedure, except that the sample size weighted average of \( {\widehat{b}}_i \) for all of the samples with multiple effect sizes is computed, and this weighted average \( \left(\overline{\widehat{b}}\right) \) is used to compute the adjustment factor and adjusted sample sizes for all of the multiple effect size samples. Cheung and Chan (2008) proposed this procedure following the logic of the Hunter–Schmidt approach, which suggests that the mean correlation is more stable and therefore more suitable for estimating the sampling variance. They demonstrated that this procedure is less biased than the adjusted-individual procedure in estimating the degree of heterogeneity, when the degree of dependence varies across samples.

The samplewise-adjusted procedures are similar to the analytic approach procedures. They also take into consideration the degree of dependence, but estimate it from the within-sample variation. Unlike most procedures in the analytic approach, the samplewise-adjusted procedures do not require the individual inter-effect-size covariances. Moreover, researchers can easily combine these procedures with most other specific meta-analytic procedures (e.g., Hunter–Schmidt, DerSimonia–Laird, or meta-regression by weighed least squares). Cheung and Chan (2004, 2008) have demonstrated empirically that the samplewise-adjusted procedures are less biased than the unadjusted samplewise procedure in estimating the degree of heterogeneity under various conditions.

A preliminary simulation study comparing the samplewise-adjusted procedures and three-level meta-analysis

To the best our knowledge, no studies have empirically compared the samplewise-adjusted procedures with the other approaches presented above to handle dependent effect sizes. A preliminary empirical study may provide some insights for users considering which approach to use to handle dependent effect sizes.Footnote 4 Therefore, we conducted a small-scale simulation study to compare the three-level approach with the samplewise-adjusted procedures. This cannot be treated as a comprehensive simulation. Nevertheless, we hope that it can serve as a starting point for future simulation studies comparing the performance of the existing approaches to handle dependent effect sizes under different conditions.

Due to space limits and the scope of the present article, we specifically investigated conditions that we expected would lead the samplewise-adjusted procedures and the three-level meta-analytic model to yield different results. We investigated conditions similar to the skewed distribution of effect sizes in Cheung and Chan (2004). For a condition with 12 studies and a maximum of ten effect sizes, the numbers of effect sizes in the studies were 10, 8, 6, 4, 2, 1, 1, 1, 1, 1, 1, and 1. Although this kind of distribution, with most studies contributing one effect size, is common in practice, it has rarely been investigated in previous studies of the three-level meta-analytic model. We also investigated small and high levels of within-sample dependence, R rr = .30 and .70, where R rr is the correlation between two effect sizes in the same sample. The numbers of studies investigated were 12 and 60, and the mean sample sizes were 100 and 300, the same conditions investigated in Cheung and Chan (2008). We did not expect the samplewise-adjusted procedures to perform well for as few as 12 studies. However, investigating this end of the continuum could give us an idea of the relationship between the number of studies and the performance of these procedures. To benchmark the performance of the samplewise-adjusted procedures, we also included a procedure that we called R1, in which we randomly selected one effect size from each cluster of dependent effect sizes. This procedure does not violate the assumption of independence, and it provides the performance of the same meta-analysis procedure for the same number of studies and mean sample sizes (although fewer numbers of effect sizes). The samplewise-adjusted procedures were considered to have performed well in terms of bias and confidence interval coverage if their performance was similar to that of the R1 procedure, and their statistical power (not investigated here) should be higher than that of the R1 procedure, due to the use of all available information. In line with previous studies of the samplewise procedures, the correlation coefficient was chosen as the effect size measure in the simulation study. To be consistent with the assumption of the samplewise procedures, there was no within-sample heterogeneity in the population effect size. Consequently, the Level 2 heterogeneity was constrained to be zero in the three-level meta-analytic model. Other technical details of the simulation are described in Appendix A.

As we discussed above, one problem with the three-level meta-analytic model, as compared to the samplewise-adjusted procedures, is that it ignores within-sample covariance that is due to situations such as overlapping variables or repeated measures. In cases involving small covariance (e.g., R rr = .30 or R XX = .55), we expected that violating this assumption would have little impact on the three-level meta-analysis. The samplewise-adjusted procedures should also have little advantage over three-level meta-analysis. In cases of larger within-sample covariance (e.g., R rr = .70 or R XX = .84), we expected the three-level meta-analytic model to produce a biased estimate of the degree of heterogeneity (Level 3 variance) and to lead to suboptimal coverage probability of the confidence interval. The samplewise-adjusted procedures were expected to be less biased when the within-sample covariance was large, as had been found in Cheung and Chan (2004, 2008). The differences between the two approaches were expected to be similar as the sample size and number of studies increased.

As is shown in Fig. 1, when the within-sample covariance was small, all four procedures had similar performance, except when the number of studies was also small, in which case the R1 procedure and the samplewise-adjusted procedures tended to underestimate the degree of heterogeneity. When the within-sample covariance was large, the samplewise-adjusted procedures generally performed as well as the three-level meta-analysis, except when the number of studies was small, in which case the three-level meta-analyses tended to overestimate the degree of heterogeneity. As is shown in Fig. 2, across the conditions examined, the samplewise procedures and the three-level meta-analytic model generally had coverage probabilities close to the nominal value of 95 % for the confidence interval of the degree of heterogeneity. However, the coverage probabilities for three-level meta-analysis were substantially lower than the nominal value when the degree of within-sample covariance was high. Interestingly, this problem seemed to be more severe when, for the same number of studies, the total number of effect sizes was larger (i.e., ke = 10 vs. ke = 20).

Fig. 1
figure 1

Bias in estimating degrees of heterogeneity (tau-squared)

Fig. 2
figure 2

Proportions of 95 % confidence intervals, including the population degree of heterogeneity (tau-squared)

In sum, when the dependence among effect sizes is high and is due to situations such as overlapping variables or repeated measures, the three-level meta-analytic model can produce biased estimates of the degree of heterogeneity, and suboptimal confidence intervals. Provided that the within-sample effect sizes can be considered homogeneous, the samplewise-adjusted procedures perform as well as, or even better than, three-level meta-analysis. When the number of studies is small (e.g., 12 or fewer), the samplewise-adjusted procedures and three-level meta-analysis should both be used with caution.

Problems with samplewise-adjusted procedures

Despite their advantages, samplewise-adjusted procedures have not been widely used. Their limited popularity is probably due to the lack of a ready-to-use program to implement the procedures. Currently, if researchers want to apply samplewise-adjusted procedures, they need to write their own programs or prepare their own electronic spreadsheets. This may have limited the popularity of the procedures. In the following section, we present two tools, an SPSS macro and an R script, that can be used to compute the within-sample means, as well as the adjusted sample sizes and the sampling variances for these means. With these adjusted sample sizes and sampling variances, existing meta-analysis programs can be used as usual for the main analysis.

Two tools for implementing the samplewise procedures

In this section, we present a general description of the two tools for implementing the samplewise procedures. Both the R function and the SPSS macro command accept a data file (called a data frame in R) with an effect level data set as the input. In an effect level data set, each row represents one correlation and its associated sample size. Correlations from the same sample are identified as such by a unique identifier assigned to each sample (denoted as the Study ID below). A sample-level data set is then formed by computing the within-sample mean correlation for each sample with more than one correlation. Regardless of the total number of correlations, the number of rows in the sample-level data set is equal to the number of independent samples. In the next step, the four samplewise procedures are used to generate the adjustment factor, A i ; the adjusted sample size, n t ; and the adjusted sampling variance, σ ei /A i for each procedure. For a sample with only one correlation, A i = 1, the sample size and the sampling variance will be the same in all four procedures. For the samplewise-n procedure, A i = 1 in all of the samples, regardless of the number of correlations in a sample. For the samplewise-np procedure, A i = p i for all of the samples, and so the adjusted sample size is approximately equal to n i p i (not exactly n i p i , for reasons mentioned above). For the two samplewise-adjusted procedures, A i is determined by the estimated within-sample correlation (b i ). The samplewise-adjusted-individual procedure computes the adjustment factor of a sample on the basis of the estimated within-sample correlation of the examined sample. Therefore, even if two samples have the same sample size and the same number of correlations, the adjustment factors may be different. In contrast, the samplewise-adjusted-weighted procedure computes a weighted average of the estimated degree of dependence, and uses this to compute the adjustment factor for all of the samples with more than one correlation. Therefore, if two samples have the same sample size and the same number of correlations, their adjustment factor will be the same. The column or variable names of the output of these four procedures, as generated by the SPSS macro and R function, are presented in the Appendix B for quick reference.Footnote 5

Researchers can then use the sample-level data set and the adjusted sample sizes or sampling variances for further analysis. For the analysis of moderators, users can merge the sample-level data set with the original data set. Many common meta-analysis software packages accept the correlations and the sampling variances as input (e.g., the metafor package for R; Viechtbauer, 2010). However, some programs may require correlations and sample sizes as input before they can compute the sampling variances for the users. Some programs require the user-supplied weight for each correlation, which is simply the inverse of the sampling variance (Wilson, 2005). With both pieces of information available, users can choose the software package they prefer.

SPSS syntax for the meta-analysis of dependent correlations

SPSS is one of the most popular software packages used by researchers, as measured by the number of hits in a Google Scholar search (Muenchen, 2012). Therefore, we first introduce the SPSS macro. Like some common SPSS macros for other analyses (e.g., PROCESS for mediation and moderation [Hayes, 2012] and MeanES for meta-analysis [Wilson, 2005]), our whole syntax file is first run without modification. This defines the macro command !MADep, which can then be used to compute the within-sample means, form a sample-level data set, and compute the adjusted sample sizes and sampling variances. The syntax of this command is very simple, with only four required arguments and one optional argument. A sample command is

!MADep es=r /n=n /sid=SampleID /eid=EffectID /workdir=("C:\Temp").

In the sample command, es is the name of the column of correlations, n is the sample size, sid is the unique identifier of each sample (SampleID, in this example), and eid is the unique identifier of each correlation within a sample (EffectID, in this example). The last subcommand, workdir, is optional. If specified, the output data file will be stored in this directory. The default directory is C:\Temp, but users can change this to any other directory in which they have the rights to save and read files. After the command is successfully executed, the resulting sample-level data file is stored in the file SampleLevel.sav and saved in the directory specified in workdir. This file can then be used for meta-analysis. For example, to use the meta-analysis macro given in Wilson (2005) with the samplewise-adjusted-individual procedure, we would use the following commands:

Compute Correlation_Mean_SVar_Adjusted_SAdj_ind_Inverse = 1/ Correlation_Mean_SVar_Adjusted_SAdj_ind.

MeanES ES = Correlation_Mean /W = Correlation_Mean_SVar_Adjusted_SAdj_ind_Inverse.

The first line computes the weight, which is equal to the inverse of the sampling variance. The second line runs the MeanES macro to conduct the meta-analysis on the sample-level data.

R script for the meta-analysis of dependent correlations

Despite the dominance of SPSS, R (R Development Core Team, 2012) is gaining in popularity (Muenchen, 2012). In addition to being free, R’s capability is quickly expanding, with contributions from researchers who are writing packages and functions to implement newly developed procedures. Therefore, we also developed an R script to implement the samplewise procedures. The R script defines three functions, MADependentES, MADataSamplingVariance, and SampleLevelEffectSizeCorrelation. MADependentES is the main function to compute the adjusted sample sizes and sampling variances, whereas the other two functions are helper functions used by MADependentES. A sample call of the function is

ma.example.1 <- MADependentES(ma.data=example.1,sid=SampleID, n=n, r=r, do.meta=TRUE, ma.method="DL," min.results=TRUE).

In the sample call, ma.data is the data frame containing the effect level data, sid is the column name of the sample IDs in the data frame, n is the sample sizes, and r is the correlations. If the users want to conduct a meta-analysis on the sample-level data frame and have metafor (Viechtbauer, 2010) installed, they must set do.meta to TRUE (ma.method and min.results are discussed below). The results are stored in the object ma.example.1.

The result is a list of three objects—namely, EffectLevel, SampleLevel, and MAResults. EffectLevel is a data frame containing the original effect level data, with sample identifier (sid), sample correlation (es), sample size (n), and sampling variance (es.var). SampleLevel is a data frame storing the sample-level meta-analysis data, with the adjustment factors, adjusted sample sizes, and the adjusted sampling variances for the four samplewise procedures. The SampleLevel data frame can serve as the input for other meta-analysis programs adopted by the user for the main analysis. With the help of the sample identifiers, the sample-level data frame can be merged with the original data frame and with other information, such as the values of the moderators.

The main function can also conduct the meta-analysis using the package metafor. MAResults is null by default. If do.meta is set to TRUE, the rma function in metafor will be called to conduct five meta-analyses, one for each of the four samplewise procedures, and one on the effect level data. The meta-analysis method used by metafor is specified by ma.method. By default, only the major results from the meta-analysis will be returned (min.results = TRUE). The result, stored in MAResults, is a data frame with one row for each of the five procedures. Although no moderator is involved, this can serve as a quick check of the sensitivity of the results to the choice of sample sizes. If min.results is set to FALSE, then the original objects returned from rma will be assigned to MAResults as a list object. This is desirable if a user wants to access other technical information returned from rma.

A numerical example

To illustrate the use of the R script described above, we analyzed a meta-analysis data set with dependent correlations. The SPSS output for this data set would be similar. The sample data set contained 20 correlations from 12 samples (see Table 1). Six samples contributed two or three correlations. No information on the within-sample dependence was available. In the first step, the data set was loaded into R as a data frame (named mes.example, stored in the file mes.example.RObj in the supplementary materials). The following command can be used to load the sample data set if the file is placed in the folder C:\Temp:

Table 1 Sample data set for illustration

load('C:/Temp/mes.example.RObj').

The R script MADependentESFucntions.r, given in the supplementary materials, was then run to define the functions required. Alternatively, the following command could have been used to define the function, if the file was placed in the folder C:\Temp:

source('C:/Temp/MADependentESFucntions.r').

We first installed metafor. After the three functions were defined and metafor was installed, the following command was used to run the analysis:

ma.mes.example <- MADependentES(ma.data=mes.example, sid=SampleID, n=n, r=r, do.meta=TRUE, ma.method=“DL”, min.results=TRUE).

Other meta-analysis methods available in rma could have been used. We chose the DerSimonian–Laird method in this example simply because it yields results identical to those produced by Wilson’s (2005) SPSS macro. Accordingly, if users would like to verify the computations, they can use the tools to compare the results of the two programs. This can be helpful for users who have used SPSS to conduct meta-analysis, but would like to try R and use packages such as metafor for meta-analyses. We have no intention to suggest that the DerSimonian–Laird method is appropriate or the most suitable for this example. The script and macro are not intended to replace other existing tools for meta-analysis. The ability to conduct meta-analysis for the R script is mainly for quick examination of the potential impact of the choice of samplewise procedures.

The output object is ma.mes.example, and the sample-level data set is stored in ma.mes.example$SampleLevel. The contents of this sample-level data set are presented in Tables 2 and 3. Table 2 shows the within-sample means for the samples with more than one correlation (es.mean). The other variables are necessary for computing the adjustment factors. As is shown in Table 2, the degree of dependence ranged from negligibly small (Sample 12, .00, with two correlations, .60 and .70) to high (Sample 6, .90, with three correlations, .69 to .70).

Table 2 Sample-level data set generated by MADependentES from the sample data set
Table 3 Adjustment factors, adjusted sample sizes, and adjusted sampling variance for the four samplewise methods

The adjustment variables from the four procedures, stored in ma.mes.example$SampleLevel, are presented in Table 3. Each procedure produced the adjustment factor (adj), the adjusted sample size (n.adj), and the adjusted sampling variance for the sample-level correlation (es.mean.svar). The samplewise-n procedure gave the lower bounds for the adjustment factors and the adjusted sample size, whereas the samplewise-np procedure gave the upper bounds for the adjustment factors and the adjusted sample size. For example, Sample 8 (n = 217) had three correlations (.63, .65, and .68), and the estimated degree of dependence was .60. The samplewise-n sample size was 217, whereas the samplewise-np sample size was 649 = (217 – 1) × 3 + 1.Footnote 6 If the estimated degree of dependence in Sample 8 was used for adjustment (.60), as in the samplewise-adjusted-individual procedure, the adjustment factor would be 1.36, resulting in an adjusted sample size of 295. If the weighted mean of the estimated degree of dependence across the six multiple-effect-size samples were used to compute the adjustment, as in the samplewise-adjusted-weighted procedure, the adjustment factor of Sample 8 would be 1.64, with an adjusted sample size of 355. In sum, the sample sizes from the four procedures were 217, 295, 355, and 649—the value 217 being based on the (unrealistic) assumption that the three correlations were perfectly correlated, 649 on the assumption that the three correlations were uncorrelated, and 295 and 355 on two different estimates of the within-sample degree of dependence.

Finally, as in this example the meta-analysis was run with min.results set to TRUE, ma.mes.example$MAResults held the major results from rma for the four samplewise procedures, as well as the effectwise procedure (Table 4). (For technical details about the results, such as how they are computed, please refer to Viechtbauer, 2010. They are simply outputs extracted from rma in metafor.) The effectwise procedure is rarely recommended, and we will not, therefore, discuss its results. As is shown in Table 4, the weighted mean correlations were very similar across the four samplewise procedures (.61 to .62). This was expected, as previous studies have found that the samplewise procedures have little effect on the estimate of mean correlation (e.g., Cheung & Chan, 2004). We examined the degree of heterogeneity from two perspectives. First, following the tradition of estimating the “true” variation not due to sampling error, we examined tau2, the variance of the population correlations, and tau, the standard deviation of the population correlations. A credibility interval around the mean effect size can be determined from tau (Whitener, 1990). If tau2 and tau are close to zero, the correlations are highly homogeneous and the observed variation is largely due to sampling variation. The samplewise-n procedure yielded the smallest estimate (tau2 = .0011 and tau = .0329), whereas the samplewise-np procedure yielded the largest estimate (tau2 = .0017 and tau = .0411). The samplewise-adjusted procedures yielded estimates that fell between the previous two estimates (samplewise-adjusted-individual: tau2 = .0011 and tau = .0337; samplewise-adjusted-weighted: tau2 = .0014 and tau = .0374). The significance tests of heterogeneity for the four procedures also yielded different conclusions, with only the samplewise-np procedure suggesting that there was heterogeneity among the sample correlations (p < .05).

Table 4 Meta-analysis results from MADependentES, according to the DerSimonian–Laird method

Our second approach to examining the degree of heterogeneity was to use the I 2 proposed by Higgins and Thompson (2002), which is roughly the percentage of the “true” variance of the population correlations divided by the total variance of the sample correlations. This is defined as the sum of the “true” variance and the “typical” sampling variance for each sample (see Higgins & Thompson, 2002, for the precise definition of I 2). The value of I 2 ranges from 0 to 100 percent, and is numerically easier to interpret than tau2 and tau. In our example, the pattern of results was the same as in the first approach; samplewise-n yielded the smallest estimate (27.02 %), and samplewise-np yielded the largest estimate (50.82 %). The samplewise-adjusted procedures yielded estimates of 32.86 % (samplewise-adjusted-individual) and 38.21 % (samplewise-adjusted-weighted), falling between the previous two estimates but closer to the samplewise-n procedure estimate. We preferred the results of the samplewise-adjusted-weighted procedure, as Cheung and Chan (2004, 2008) demonstrated that these results are less biased than those for the other three procedures.

For users’ convenience, all of the necessary commands for running the above numerical example are stored in the file MADependentES_Illustration.r in the supplementary materials.

Finally, we briefly describe how the SPSS macro can be used to generate the sample-level data set for meta-analysis. In SPSS, there is no need to specify the name of the data set. To use the example in the supplementary materials, the user must first open the sample SPSS data file in the supplementary materials ( mes.example.sav ). After defining the macro !MADep by running the syntax file, with the data set active, run the following command in a syntax window:

!MADep es=r /n=n /sid=SampleID /eid=EffectID /workdir=("C:\Temp").

A new data set, SampleLevel.sav, which will be saved to C:\Temp (or the working directory specified by the user), will become the active data set. It contains data similar to those presented in Tables 2 and 3. Users can then submit the data set to other programs, such as the SPSS macro by Wilson (2005), for meta-analysis. The sample syntax commands shown above, along with syntax commands for running meta-analysis using the Wilson’s macro, are available in the file MADependentES_Illustrations.sps in the supplementary materials.

A note on applications

The results of Cheung and Chan (2004, 2008) suggest that the samplewise-adjusted-weighted procedure is the preferred procedure for handling dependent correlations, and that it performs better than the samplewise-adjusted-individual procedure even when the degree of dependence varies within the samples. Nevertheless, in practice, researchers can conduct a sensitivity analysis by attempting all four procedures and then comparing the results to examine how sensitive the results are to the choice of sample size for the within-sample means. If the results of the samplewise-n and samplewise-np procedures are similar, then the choice of sample size will have little effect on the results. This can happen if only a small portion of the samples contribute two or more correlations and each such sample only contributes two or at most three correlations. If the results of the samplewise-n and samplewise-np procedures are substantially different, then the researcher cannot ignore the problem of using inappropriate sample sizes, and the results from the two samplewise-adjusted procedures should be examined. If the degree of dependence is high, the results of the two samplewise-adjusted procedures will be similar to the results of the samplewise-n procedure. This may justify the simpler and more common samplewise-n procedure. However, if the degree of dependence is small, the results from the two adjusted procedures will be similar to those of the samplewise-np procedure. If the degree of dependence is moderate, then both the samplewise-n and samplewise-np procedures will be inappropriate, and the samplewise-adjusted procedures should be adopted.

For unknown reasons, the samplewise-n procedure is more commonly used than the samplewise-np procedure for handling dependent effect sizes. However, the unconditional use of the samplewise-n procedure may systematically underestimate the degree of heterogeneity in some meta-analyses. This is a severe problem, because discovering the true variation in effect sizes due to moderators is one of the unique values of meta-analysis.

Finally, we will highlight the conditions under which the samplewise-adjusted procedures should not be used. First, these procedures have been developed for univariate meta-analysis. The dependent effect sizes within a sample have to be compatible in order to justify computing a mean. If the dependent effect sizes are too different—if, for example, they involve conceptually different measures of job performance—then they should not be combined to form a mean. In this case, multivariate meta-analysis procedures should be adopted instead (e.g., Cheung, 2013b). Second, like other samplewise procedures, the samplewise-adjusted methods assume that the within-sample heterogeneity in the population effect sizes is negligible, relative to the sampling variances and between-sample heterogeneity. This assumption is tenable in some situations, such as when the effect sizes are repeated measures of the same variables or the differences in the measures are small, relative to the between-sample differences. If a researcher believes that the within-sample heterogeneity is nonnegligible, approaches that take into account within-sample heterogeneity should be considered, such as the three-level meta-analytic model.

Limitations

The SPSS macro and R script have two limitations. First, they can only handle Pearson product-moment correlations. This is not a major limitation, because correlation is a common measure of effect sizes in meta-analyses. One of the crucial tasks of the samplewise-adjusted procedures is to estimate the sampling variances in several stages of the analysis, which is necessary for computing the adjustment factors. As a result, it is technically difficult to develop one single tool to automatically adapt the computation for different measures of effect size. This is especially true for the SPSS macro. For R, though it may be easier to write a general function for different effect size measures with the help of escalc from metafor, this package does not easily allow the use of mean effect size for computing sample variance (as in the Hunter–Schmidt procedure; see note 2). Nevertheless, we plan to prepare additional SPSS macros and R scripts that can be used for other common effect size measures, such as the standardized mean difference.

Second, the SPSS macro and R script cannot handle corrections for artifacts (Schmidt, Le, & Oh, 2009). This is also a limitation of the samplewise-adjusted procedure proposed by Cheung and Chan (2004, 2008). In practice, many meta-analyses of Pearson product-moment correlations do not correct for artifacts for various reasons. Therefore, for those types of meta-analyses, the two tools are sufficient for applying the samplewise procedures. Nevertheless, work is being done to generalize the samplewise procedure to include meta-analyses with correction for artifacts.

Conclusions

In this article, we briefly reviewed the four samplewise procedures for handling dependent correlations and compared them with existing procedures, mainly the three-level meta-analytic model. The results from a small-scale simulation study suggested that in some situations the samplewise-adjusted procedures performed as well as or better than three-level meta-analysis. We presented two tools, an SPSS macro and an R script, that can easily be used to apply the four procedures in a meta-analysis. A numerical example was used to illustrate how the R script can be applied and how the four procedures can yield different results and conclusions. We hope that the two tools we have developed will facilitate the implementation of the samplewise-adjusted procedures and raise awareness of the issue of dependent correlations. We also hope that the tools will discourage the unconditional use of the samplewise-n procedure, which tends to underestimate the degree of heterogeneity, resulting in failure to find variation that could be explained by moderators. Finally, we hope that our small simulation study and our comparison of the samplewise-adjusted procedures and the three-level meta-analytic model can stimulate more empirical studies comparing the different approaches to handling dependent effect sizes. The results of these studies can provide a knowledge base for users to make evidence-based decisions when choosing the approach by which to handle dependent effect sizes in meta-analysis.