What residualizing predictors in regression analyses does (and what it does not do)

https://doi.org/10.1016/j.jml.2013.12.003Get rights and content

Highlights

  • Regression analyses are replacing traditional ANOVAs in some areas of research.

  • Some researchers orthogonalize predictors by residualizing.

  • Several effects of residualizing are demonstrated and discussed.

  • Judging by many authors’ rationales, some of these effects are unexpected.

  • Some of the effects of residualizing are quite undesirable.

Abstract

Psycholinguists are making increasing use of regression analyses and mixed-effects modeling. In an attempt to deal with concerns about collinearity, a number of researchers orthogonalize predictor variables by residualizing (i.e., by regressing one predictor onto another, and using the residuals as a stand-in for the original predictor). In the current study, the effects of residualizing predictor variables are demonstrated and discussed using ordinary least-squares regression and mixed-effects models. Some of these effects are almost certainly not what the researcher intended and are probably highly undesirable. Most importantly, what residualizing does not do is change the result for the residualized variable, which many researchers probably will find surprising. Further, some analyses with residualized variables cannot be meaningfully interpreted. Hence, residualizing is not a useful remedy for collinearity.

Introduction

In psycholinguistics there has been a move toward regression studies, which offer several advantages over traditional factorial designs. Baayen, Wurm, and Aycock (2007), for example, used mixed-effects modeling1 to examine auditory and visual lexical decision and naming times. They found a number of curvilinear effects that are difficult to detect with factorial designs. Even more interesting, the authors found sequential dependencies in the response times, such that response latency on a given trial could be predicted by latencies on the previous four trials. This sequential dependency, which cannot be assessed in a factorial design, ultimately exhibited more explanatory power than nearly all of the other predictors that were examined.

A second advantage of regression designs is pragmatic. With the increased complexity of many theoretical models, it becomes impractical to isolate a difference on one predictor while adequately equating stimulus materials on the growing number of other variables known to affect psycholinguistic processing. Baayen et al. (2007) examined 18 predictor variables. The influential megastudy of Balota, Cortese, Sergent-Marshall, Spieler, and Yap (2004) examined 19. A factorial design matching on all but one or two of the variables in situations like these is virtually inconceivable, and so a large number of potentially interesting studies simply could not be done. The Balota et al. (2004) study is interesting for the additional reason that they included as stimuli virtually all single-syllable monomorphemic words in English. An exhaustive study such as this cannot be done in a factorial manner, because the words in the language are naturally correlated on a number of variables of theoretical interest.

Many researchers express concern about the extent to which these natural correlations between predictors might lead to collinearity and computational problems. For example, Tabachnick and Fidell (2007) assert that, with predictor intercorrelations of .90 and above, there are statistical difficulties in the precision of estimation of regression coefficients (citing Fox, 1991). Further, Cohen, Cohen, West, and Aiken (2003) state that the estimates of the coefficients will be “very unreliable” and “of little or no use” (p. 390). In addition, Darlington (1990) emphasizes the loss of statistical power of tests on the individual regression slopes.

However, Friedman and Wall (2005) assert and demonstrate that improvements in algorithms and computer accuracy have eliminated the computational difficulties. The current study lends additional support to their claim. Further, Friedman and Wall (2005), along with others, also note that collinearity per se is not necessarily bad. For example, if a researcher’s goal is simply to maximize explained variance, collinearity can be ignored (Darlington, 1990, Tabachnick and Fidell, 2007). The goal of most psycholinguistic applications of regression, though, is to evaluate the effects of several of the individual predictor variables. The potential interpretational problems caused by collinearity here can be thorny, even if the computational problems are not.

Because of concerns like this, some researchers have attempted to deal with collinearity by residualizing one of the correlated predictor variables. To do this, one runs a preliminary regression analysis using one of the predictor variables to predict the other (e.g., using X2 to predict X1). The residuals from this analysis constitute a new predictor variable, X1resid, that is used in subsequent analyses in lieu of X1. X1resid is guaranteed to be uncorrelated with X2, providing an apparent solution to the problem of collinearity. Thus, residualizing seems like a useful and appropriate technique.

Psycholinguists have offered several justifications for residualizing. A review of some of those justifications is instructive, as it illustrates a considerable range of beliefs, some erroneous, about what residualizing accomplishes2:

“To avoid problems with increased multicollinearity, we included the residuals…in our mixed-effects model…These residuals are thus corrected for the influence of all variables correlated with the original familiarity and meaningfulness measures” (Lemhöfer et al., 2008, p. 23)

To dissociate the effect of one predictor from another and demonstrate that the effect of one predictor does not explain the effect of the other (Green, Kraemer, Fugelsang, Gray, & Dunbar, 2012, pp. 267–268)

To help rule out the possibility that the effect of one predictor masks the effect of another (Kuperman, Bertram, & Baayen, 2010, p. 89)

“…to assess the effect of “ [a predictor] (Jaeger, 2010, p. 33)

“…to ensure a true effect of” [a predictor] (Cohen-Goldberg, 2012, pp. 191–192)

“…to allow for assessment of the respective contributions of each predictor” (Ambridge, Pine, & Rowland, 2012, p. 267)

“…to determine the unique contribution of” [a predictor] (Cohen-Goldberg, 2012, p. 188)

To provide “…a reliable estimate of the unique variance explained by each” [predictor] (Ambridge et al., 2012, p. 268)

To pit predictors against one another and determine whether one explains variance that the other cannot (Ambridge et al., 2012, p. 268)

“…to reliably assess effect directions for collinear predictors” and to be able to simultaneously assess “…the independent effects of multiple hypothesized mechanisms” (Jaeger, 2010, p. 30; emphasis in original)

to test the effect of one predictor beyond the properties of two other predictors (Jaeger, 2010, p. 33)

“Orthogonalisation of such variables is crucial for the accuracy of predictions of multiple regression models. Teasing collinear variables apart is also advisable for analytical clarity, as it affords better assessment of the independent contributions of predictors to the model’s estimate of the dependent variable” (Kuperman, Bertram, & Baayen, 2008, p. 1098).

Most researchers do not specify precisely what would trigger the strategy. Cohen-Goldberg (2012) said it was done when a predictor “…was collinear with one or more control variables…” (p. 188). Jaeger and Snider (2013) did it “since the two predictor variables were correlated” (p. 63). Kahn and Arnold (2012) residualized “Because of high correlations” between the predictor variables (p. 317). This last case is interesting for the additional fact that the residualization was restricted to variables that were included only for purposes of statistical control. The individual effects of these variables were not of interest – the goal was simply to be able to assure readers that the analysis had controlled for them. Below, we show that residualizing accomplishes literally nothing in this case. Further, examination of the cut-off values that are reported reveals a lack of consensus about when one should residualize: Kuperman et al. (2008) residualized whenever a zero-order correlation between predictors exceeded 0.50, whereas Bürki and Gaskell (2012) used 0.30 as a cut-off.

Use of this strategy in psycholinguistics is a relatively recent phenomenon. The earliest example we have identified is Baayen, Feldman, and Schreuder (2006). The scope of what Baayen et al. (2006) did was restricted, and the reasons for it were principled and clearly articulated. They wanted to determine if a subjectively-rated version of word frequency offered anything beyond various objective measures. They partialed the objective measures from the subjective measure, and added the residuals to a model they had already specified as more or less complete. They did mention collinearity in this context, but it was not their primary motivation. Indeed, in this study, they handled collinearity among their primary predictors in other ways.

Examples of residualizing can be found in at least a dozen papers published in three of the top journals in the field in 2012 (Cognition; Journal of Experimental Psychology: Learning, Memory, and Cognition; Journal of Memory and Language). Judging by the descriptions found in these studies, some of which were included above, there seems to have been significant drift in researchers’ implementation of the strategy. Concerned that enthusiasm for the technique might be outpacing understanding of what it does, we decided to examine more closely exactly what is (and what is not) achieved by residualization of predictor variables. Our ultimate goal is to clear up some misconceptions and improve statistical practice in psycholinguistics.

Section snippets

Study 1: Reanalysis of data from Lorch and Myers (1990)

Lorch and Myers (1990) presented a data set to illustrate a recommended way to analyze repeated-measures regression data. The DV was time to read a sentence. The predictor variables of theoretical interest were the number of words in the sentence (WORDS) and the number of new arguments in the sentence (NEWARGS). They also included an index of the serial position of each sentence in the experimental list. To make certain points clearer, we exclude this variable from analysis. The data set

Study 2: Simulated data

Friedman and Wall (2005) demonstrated that r12 is merely one of the three pieces of information that determine the presence and extent of problems relating to collinearity. We now simulate data for additional analyses, incorporating all three pieces of information, with the goal of achieving a more systematic revelation and understanding of the underlying issues.

In a typical behavioral study of word recognition, a researcher might have each of 50 participants respond to each of 100 items. A

Extensions

One might wonder whether our results scale up to more realistic data sets. We present three different pieces of evidence showing that they do. First, we mentioned above an analysis in which Cohen-Goldberg (2012) found an exactly identical result for a predictor before and after residualizing it. That analysis included more than 10,000 responses and had 19 predictor variables. The predictor in question had been residualized against three other predictors.

Second, we took an actual data set

General discussion

The current study has shown several of the effects of residualizing a predictor variable (assume X1 here) in regression analyses. First and foremost, it produces an intercorrelation between predictors of 0, which was of course its desired effect. It is important to note that it does this by substituting a new predictor for one of the originals (e.g., X1resid for X1). This has the concomitant effect of substituting rY1resid for rY1. The difference between these two correlations depends on r12

Additional interpretational issues

Psycholinguists using regression have been concerned about statistical undercontrol (i.e., failing to take some important variable into account), but aside from collinearity concerns, they seem to have placed far less emphasis on the issue of statistical overcontrol (i.e., including too many predictor variables in a model). Meehl (1970) framed the conceptual consequences of this in terms of investigators interpreting counterfactual situations (e.g., a world in which written word frequency is

Recommendations and conclusion

Some researchers consider mean-centering to be a viable alternative to the residualizing of predictors because they contend that it reduces collinearity (Kromrey & Foster-Johnson, 1998). However, the strategy is misguided because it ignores the crucial distinction between essential and non-essential collinearity (e.g., Dalal & Zicker, 2012). Mean-centering reduces non-essential collinearity, which is due to the way in which variables are scaled, but not essential collinearity, which is due to

References (45)

  • Baayen, R. H. (2010). languageR: Data sets and functions with Analyzing linguistic data: A practical introduction to...
  • R.H. Baayen et al.

    Semantic density and past-tense formation in three Germanic languages

    Language

    (2005)
  • R.H. Baayen et al.

    Lexical dynamics for low-frequency complex words: A regression study across tasks and modalities

    The Mental Lexicon

    (2007)
  • D.A. Balota et al.

    Visual word recognition of single-syllable words

    Journal of Experimental Psychology: General

    (2004)
  • D.A. Belsley

    Demeaning conditioning diagnostics through centering

    The American Statistician

    (1984)
  • J.A. Breaugh

    Rethinking the control of nuisance variables in theory testing

    Journal of Business and Psychology

    (2006)
  • A. Bürki et al.

    Lexical representation of schwa words: Two mackerels, but only one salami

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (2012)
  • A. Campbell et al.

    The quality of American life: Perceptions, evaluations, and satisfactions

    (1976)
  • J. Cohen

    Partialed products are interactions; partialed powers are curve components

    Psychological Bulletin

    (1978)
  • J. Cohen et al.

    Applied multiple regression/correlation analysis for the behavioral sciences

    (2003)
  • D.K. Dalal et al.

    Some common myths about centering predictor variables in moderated multiple regression and polynomial regression

    Organizational Research Methods

    (2012)
  • R.B. Darlington

    Multiple regression in psychological research and practice

    Psychological Bulletin

    (1968)
  • Cited by (132)

    • Which COVID-19 information really impacts stock markets?

      2023, Journal of International Financial Markets, Institutions and Money
    View all citing articles on Scopus

    Portions of this study were presented at the 54th annual meeting of the Psychonomic Society in Toronto, Ontario (November 14–17, 2013).

    View full text