Analysis of covariance (ANCOVA) with difference scores

https://doi.org/10.1016/j.ijpsycho.2003.12.009Get rights and content

Abstract

When comparing pretest to posttest changes in non-randomized groups, most researchers are correctly avoiding ANCOVA with posttest as the dependent variable and pretest as the covariate. However, there is a widespread use of ANCOVA in which the difference score (posttest minus pretest) is used as the dependent variable, and pretest as the covariate. A computer simulation study is presented which shows that measurement error causes identical, biased conclusions when comparing changes using either the posttest score or the posttest minus pretest difference score as the dependent variable. The reasons for this bias are explained and illustrated.

Introduction

Miller and Chapman (2001) have recently directed attention to the widespread but incorrect belief that ANCOVA can be used to control for or eliminate non-trivial group differences. An assumption underlying ANCOVA is that the covariate is independent of group membership, which happens with random assignment to groups, but not with naturally occurring groups. For example, if control and experimental groups are created by random assignment, possible covariates such as body mass index (BMI) will be independent of group membership. However, if groups are naturally occurring, such as male and female, then BMI will not be independent of group membership. ANCOVA is designed to control for covariates when groups were randomly assigned, but not to control for naturally occurring group differences.

One application of ANCOVA of particular relevance to psychophysiologists is to control for baseline (pretest) differences. When groups differ in baseline, ANCOVA may be used to control for these differences. The usual way to do this ANCOVA is to use the posttest score as the dependent variable, and the pretest score as the covariate. By removing the variance explained by the pretest from the posttest, the residual is variation that reflects the change from the pretest. When groups are assigned at random, ANCOVA is an excellent method for comparing changes between groups. However, when groups are naturally occurring, the baseline differences are not due to chance, and ANCOVA will yield biased conclusions.

The fact that ANCOVA (with covariate=pretest and dependent variable=posttest) should not be used to compare changes between naturally occurring groups has been pointed out many times (e.g. Huitema, 1980, Rogosa, 1988, Schafer, 1992). Inspection of recent journals in psychophysiology and health psychology indicates that this use of ANCOVA to compare changes is appearing less frequently. However, a variation of this usage is quite widely used, namely, where the dependent variable is the difference score (posttest minus pretest) and the covariate is the pretest. Because this alternative use of ANCOVA is apparently viewed as superior to the traditional method, computer simulations will be presented to show the two ANCOVA methods are in fact identical.

To understand why ANCOVA produces biased conclusions when the covariate is not independent of group membership, it is useful to examine how ANCOVA adjusts the posttest means for pretest differences. ANCOVA uses two methods for removing the influence of the covariate(s) on the dependent variable. The first method focuses within each group and calculates regression lines for predicting the dependent variable from the covariate in each group. These regression lines are used to find the predicted dependent variable score for each case based on their score on the covariate. The residual scores for each case (observed score on the dependent variable minus the predicted dependent variable score) are pooled to calculate an error term. This within group use of regression is an excellent method for removing the effect of the covariate from error variance, and is not a source of controversy.

The second method of adjustment in ANCOVA is much more problematic. The regression lines from each group are pooled (hence the assumption of homogeneity of regression) to obtain a single regression coefficient (b). This pooled regression coefficient is then used in a formula to adjust the mean on the dependent variable for each group. To facilitate communication, a concrete example will be used, instead of notation. The example is based on the most famous illustration of misuse of ANCOVA, Lord's paradox (Lord 1967). Lord presented a hypothetical example of a group of male and a group of female adolescents each weighed on two consecutive years (called ‘year 1’ and ‘year 2’). Even though both groups gained the identical number of pounds, ANCOVA resulted in the incorrect conclusion that males increased significantly more than did females.

The critical feature of ANCOVA, which leads to the incorrect adjustments, is that it incorrectly assumes both males and females in the population actually have the same average weight at year 1. ANCOVA obtains an estimate of this average weight in the population by averaging the year 1 weights for both males and females. For example, if males averaged 130 lb at year 1 and females averaged 110 lb, the average of these two weights, 120 lb, would be used as an estimate of the population mean weight at year 1. If the groups had been assigned at random, such an estimate would make sense, but when the groups are naturally occurring, it does not.

ANCOVA then uses a formula to adjust the observed year 2 weight for each of males and females. This formula takes into account both the difference between the actual year 1 weight for each group and the estimated population average year 1 weight, as well as the pooled regression coefficient. For example, if males weighed 140 lb at year 2 and females weighed 120 lb at year 2 (both groups gained the identical weight of 10 lb), the ANCOVA would adjust the average year 2 weights using the following formulas:

Adjusted year 2 weight=year 2 weight−b(year 1 weight−population year 1 weight).

  • For males: adjusted year 2 weight=140−b(130−120).

  • For females: adjusted year 2 weight=120−b(110−120).

If there is a moderate amount of measurement error, b might be 0.5, which would yield the following adjusted values:

  • For males: adjusted year 2 weight=140−0.5(130−120)=140−5=135.

  • For females: adjusted year 2 weight=120−0.5(110−120)=120+5=125.

Under the null hypothesis that males and females will show identical weights at year 2, after adjusting for year 1 weight (i.e. there is no difference between the changes in weight for males and females), the adjusted year 2 weights for each group should be equal. But because 135 is not equal to 125, ANCOVA will result in the conclusion that there is a significant difference in changes, and that males changed more (since their adjusted year 2 mean is higher than the adjusted year 2 mean for females). This is Lord's paradox: a significant difference in changes found by ANCOVA, when the actual difference in changes was identical for both males and females (10 lb).

However, it can readily be seen that this paradox is simply due to measurement error. If there is no measurement error, the weights on year 1 and year 2 will be perfectly correlated and b will equal 1.0 (assuming homogeneity of variance from year 1 to year 2). When b=1.0, the adjusted year 2 weights for both males and females will be identical (130 lb), and ANCOVA will yield the correct conclusion that the two groups showed identical changes in weight from year 1 to year 2. As measurement error increases (as b approaches zero), the formulas produce less adjustment, which results in more divergent adjusted means, and hence more bias. When b is less than 1.0, the year 1 difference in weight between males and females is not totally included in the adjusted year 2 weights. This results in a difference between the adjusted year 2 means, which is interpreted by ANCOVA as a significant difference in changes.

This example illustrates how measurement error causes ANCOVA to result in misleading conclusions, when comparing naturally occurring groups. If measurement error is present, b will be less than 1, and there will be less adjustment to the dependent variable means (year 2 weights) based on the covariate differences (year 1), which will result in an under-adjustment of the predicted mean year 2 weights. This under-adjustment will yield differences in the adjusted means for each group, which will be interpreted by ANCOVA as a significant difference. However, if the groups had been assigned at random, the group with the higher year 1 mean would contain more positive errors, which would regress on retesting at year 2. The b coefficient takes into account this regression. In contrast, when the groups are naturally occurring, there is absolutely no reason to expect the means to regress at year 2. On the contrary, it is much more reasonable to assume the year 1 differences are ‘real’ (unbiased estimates) and will also be present at posttest. So, with naturally occurring groups, ANCOVA produces an incorrect adjustment. It can also be seen that this incorrect adjustment is caused by measurement error.

A set of computer simulations is presented to show that using ANCOVA to compare changes from pretest (covariate) to posttest (dependent variable) produces a bias, simply as a result of measurement error. Because of the widespread usage of ANCOVA with the pretest minus posttest difference score as the dependent variable (in place of the posttest score), the following computer simulations also include this analysis, to show that using difference scores as the dependent variable is identical to using posttest as the dependent variable, and results in equally biased conclusions.

Section snippets

Method

The SAS generator RANNOR (SAS Institute, 1990) was used to generate pseudorandom variates, with means of zero and standard deviations (see below) selected to yield realistic effects. One thousand simulations were computed for each of the 12 conditions.

For comparison of changes between two groups, pretest and posttest scores were created for two groups, each of n=25, by adding a different error component (μ=0) to the same true score component (μ=0, σ=10). The standard deviation of the error

Results

Table 1 presents the Type 1 error rates for ANOVA comparing mean changes in the two groups from pretest to posttest for each of the 12 conditions. In all conditions, the Type 1 error rate was approximately 0.05.

Table 2, Table 3 present the Type 1 error rates for the two ANCOVAs. Both ANCOVAs gave identical answers, showing that ANCOVA with posttest as the dependent variable and pretest as the covariate is identical to ANCOVA with the difference score as the dependent variable and pretest as the

Discussion

These simulations clearly show that using ANCOVA with a difference score (posttest minus pretest) as the dependent variable and pretest as the covariate is identical to using ANCOVA with posttest as the dependent variable and pretest as the covariate. The simulations yielded identical answers in each condition. This result shows that using difference scores as the dependent variable provides no advantage over the use of posttest scores. ANCOVA on difference scores shares with ANCOVA on posttest

References (16)

  • J. Jamieson

    Dealing with baseline differences: two principles and two dilemmas

    Int. J. Psychophysiol.

    (1999)
  • L. Baldwin et al.

    A comparison of covariance to within-class regression in the analysis of non-equivalent groups

    J. Exp. Educ.

    (1984)
  • J. Cohen et al.

    Applied Multiple Regression/Correlation Analyses for the Behavioral Sciences

    (1983)
  • L.M. Collins

    Is reliability obsolete? A commentary on ‘Are simple gain scores obsolete?’

    Appl. Psychol. Meas.

    (1996)
  • R.A. Cribbie et al.

    Structural equation models and the regression bias for measuring correlates of change

    Educ. Psychol. Meas.

    (2000)
  • Cribbie, R.A., Jamieson, J., Decreases in posttest variance and the measurement of change,...
  • L.J Cronbach et al.

    How should we measure ‘change’—or should we?

    Psychol. Bull.

    (1970)
  • B.E. Huitema

    The Analysis of Covariance and Alternatives

    (1980)
There are more references available in the full text version of this article.

Cited by (96)

  • The effect of perceptual-motor exercise on temporal dynamics of cognitive inhibition control in children with developmental coordination disorder

    2023, Mental Health and Physical Activity
    Citation Excerpt :

    Analysis of covariance (ANCOVA) was used to analyze the effect of perceptual-motor exercise in the post-test, with the pretest as the covariate factor and the group as the fixed factor. The differences in the pretest were controlled by ANCOVA statistical method (Jamieson, 2004). Effect sizes were calculated and reported as partial eta-squared (η2).

  • Repetitive negative thinking and depressive symptoms are differentially related to response inhibition: The influence of non-emotional, socio-emotional, and self-referential stimuli

    2021, Behaviour Research and Therapy
    Citation Excerpt :

    Third, these effects were significant only when using a median split for brooding rumination scores. Both the use of difference scores and the dichotomizing of continuous variables have been criticized for introducing bias and inflating type I error (e.g., Jamieson, 2004; Maxwell & Delaney, 1993). The effects of depression and RNT on inhibition of participants’ own sad facial expressions was not significant.

  • Statistical Methods

    2021, Statistical Methods
View all citing articles on Scopus
View full text