Elsevier

Intelligence

Volume 31, Issue 6, November–December 2003, Pages 543-566
Intelligence

On the relationship between sources of within- and between-group differences and measurement invariance in the common factor model

https://doi.org/10.1016/S0160-2896(03)00051-5Get rights and content

Abstract

Investigating sources of within- and between-group differences and measurement invariance (MI) across groups is fundamental to any meaningful group comparison based on observed test scores. It is shown that by placing certain restrictions on the multigroup confirmatory factor model, it is possible to investigate the hypothesis that within- and between-group differences are due to the same factors. Moreover, the modeling approach clarifies that absence of measurement bias implies common sources of within- and between-group variation. It is shown how the influence of background variables can be incorporated in the model. The advantages of the modeling approach as compared with other commonly used methods for group comparisons is discussed and illustrated by means of an analysis of empirical data.

Introduction

Investigating within- and/or between-group differences on test scores is the focus of a large number of research studies. The variance of an item or subscale score within a group indicates the individual differences within the group. Individual differences with respect to multiple observed variables may be summarized in a within-group variance–covariance (or correlation) matrix. The structure of this matrix can be investigated using confirmatory (or exploratory) factor analysis. Confirmatory factor analysis (CFA) is concerned with “explaining” the common content of observed variables captured by their covariances with a smaller number of underlying latent variables called factors. As CFA is applied to the covariance matrix within a single group, the common factors can be regarded as the sources of systematic within-group differences. Differences between groups, on the other hand, are often tested by comparing the groups with respect to the means of the observed scores or with respect to the means of the factors underlying the observed scores. The latter may be viewed as an analysis of the sources of between-group differences and can be done by carrying out multigroup CFA.

To render the group comparisons meaningful, it is necessary to address the issue of measurement invariance (MI) and demonstrate that a given test measures the same underlying factors across groups. We use the expression “same factor” to indicate that a factor has exactly the same conceptual interpretation across groups. The interpretation of a factor depends on the content of the observed items or subscales that are related to the factor and the strength of those relations. Consequently, for a factor to have an identical interpretation across groups, it is necessary that the relations of the observed variables and the underlying factor are exactly the same across groups.

The present paper focuses on the relation between these three aspects of group comparisons, namely the relation between the sources of within-group differences (i.e., which factors explain individual differences within a group?), the sources of between-group differences (i.e., which factors explain the differences between groups?), and MI (i.e., does the test measure the same factors in all groups?). Although all three aspects have been extensively investigated separately, the relation between the three has not been clearly examined. In addition, we show that hypotheses concerning these three issues can be tested using multigroup confirmatory factor models.

Two areas, in which the relation between within- and between-group differences and MI is important, are ethnic group differences in IQ test scores and the seemingly linear increase over time in mean IQ test scores, termed the “Flynn effect” Flynn, 1987, Flynn, 1999. In these areas of research, it has been frequently noted that sources of within-group differences and sources of between-group differences are not necessarily identical (Lewontin, 1970). Differences between ethnic groups on an IQ test may be due to other factors than those which contribute to the individual differences within each of the groups. Although several studies focus explicitly on the issue of within- and between-group differences, both in a more general context Turkheimer, 1990, Turkheimer, 1991 and specifically with respect to the Flynn effect Flynn, 2000, Rodgers, 1998, it is common practice to investigate the two sources of variance separately. Examples are single group factor analysis and multiple regression, which are based on within-group differences, and (M)ANOVA, in which groups are compared with respect to their observed means. Another common strategy is to compare the test score means adjusted for the influence of some variable of interest (Phillips, Brooks-Gunn, Duncan, Klebanov, & Crane, 1998). As will be shown, this approach is based on the implicit assumption that sources of within-group variance are the same as sources of between-group variance. This assumption is usually not tested in practice. In order to show that within- and between-group differences are indeed due to the same factors, it is necessary to analyze the means and the covariances of the observed scores simultaneously.

As mentioned above, if comparisons of item or subscale scores are to be valid, the test has to measure the same underlying factors in all groups. The concept of MI provides a theoretical framework, which includes the necessary conditions to establish whether a given test measures the same factors in the groups under consideration. The definition of MI states that, conditional on the factor scores, observed scores do not depend on group membership. This means that members of different groups who have the same score on the factor (e.g., the same level of ability) have on average the same observed scores. The definition of MI implies that groups may differ only with respect to the means and covariances of the factors that are measured by the observed scores. In practice, MI can be investigated by fitting multigroup CFA models to a given data set. To represent MI, certain model parameters are restricted to be equal across groups. Both the restricted model and a less restricted model are fitted to the data. The models may be compared by means of a likelihood ratio test. The test can provide evidence that MI is tenable (for applications, see Dolan, 2000, Dolan & Hamaker, 2001).

The central issue of the present paper concerns the relation between MI on the one hand and within- and between-group differences on the other hand. Specifically, the definition of MI across groups implies that between-group differences cannot be due to factors with a different conceptual interpretation than the factors that account for the within-group differences. Although the importance of MI has been acknowledged Byrne et al., 1989, Dolan, 2000, Lubke et al., 2001, Marsh, 1994, McArdle, 1998, Oort, 1998, this implication is not well recognized. Hence, if in practice the hypothesis of MI is not rejected, one can conclude with some confidence that within- and between-group differences are attributable to the same factors.

Given the importance of conclusions in areas such as ethnic differences and/or the Flynn effect, it is surprising that, at least to our knowledge, few of the recent studies use multigroup CFA. A possible reason for the lack of using more state-of-the-art methods may lie in the rather technical character of publications discussing implications of MI Bloxom, 1972, Ellis, 1993, Meredith, 1993. Although some technical formulation is unavoidable, it is our aim to explain the relation between MI and common sources of within- and between-group in an accessible way and to discuss the advantages of using multigroup confirmatory factor models rather than other commonly used methods. The approach proposed in this paper is applicable to a wide range of research questions. The approach is adequate if groups are to be compared on tests that consist of a larger number of continuous items or subscale scores, which are assumed to measure a smaller number of underlying factors. This includes group comparisons on multidimensional test batteries (e.g., IQ test batteries) as well group comparisons on personality, mood, and attitude questionnaires or combinations of these.

The paper is organized as follows. First, the multigroup CFA model is presented. We show that observed scores are decomposed into common factor scores and a regression residual, which comprises measurement error and item specific error. This decomposition has the advantage that groups can be compared with respect to the means and covariances of the factors. Second, we explain the concept of MI on a theoretical level and on a more practical level in the context of the multigroup common factor model. The multigroup common factor model corresponding to MI is characterized by a set of invariance restrictions across groups. Third, we show that MI implies that between-group differences are unlikely to be due to other factors than those capturing systematic within-group differences. We discuss how this result can be used in practice. By comparing a model with the invariance restrictions across groups to a less restricted model in a likelihood ratio test, one can examine not only whether MI holds but also whether between-group differences are due to differences in the same factors as the within-group differences. Fourth, we discuss how the multigroup model can be extended to include background variables. The way in which background variables are integrated can be guided by the outcome of tests of MI Oort, 1992, Oort, 1998. Finally, we briefly discuss the advantages of multigroup CFA as compared with other commonly used methods and present, for the purpose of illustration, an analysis of scores of African and Caucasian Americans on an IQ test (Osborne, 1980).

Section snippets

The multigroup model

The basic idea in multigroup CFA as opposed to single group analysis is to fit factor models in several groups simultaneously. The factor model fitted within a group is a linear regression model, which relates observed item or subscale scores to a smaller number of latent variables called factors. Say we have i=1, …, I observed scores, Y, measuring l=1, …, L factors. Suppose further that the total sample consists of j=1, …, J subjects each belonging to one of s=1, …, S groups. If I=6, L=2, J

Measurement invariance

MI has been defined in a very general context, independent of the sort of data at hand (e.g., binary items, continuous items or subscales, etc.) or the type of model for the data. Essentially, it is a statement that the distribution of observed variables given the underlying factor scores is the same in all groups. In the context of IQ scores for instance, this means that given a certain level of, say, verbal ability, all test takers have the same probability of answering a verbal item

MI implies that between-group differences cannot be due to other factors than those accounting for within-group differences

The statement that between-group differences are attributable to the same sources as within-group differences (or a subset thereof) is another way of saying that mean differences between groups cannot be due to other factors than the individual differences within each group. To confirm this statement, we have to show that two propositions are tenable by the usual statistical criteria: (1) that the same factors are measured in the model for the means as in the model for the covariances and (2)

Testing the measurement invariant model

The multigroup model can be fitted using standard software such as Mplus (Muthén & Muthén, 2002), Lisrel (Jöreskog & Sörbom, 1999), EQS (Bentler, 1993), or Mx (Neale, M.C., Boker, S.M., Xie, G., & Maes, H.H., 2002). Tenability of the MI model may be evaluated on the basis of measures of goodness-of-fit and/or likelihood ratio tests (see, for instance, Bollen, 1989, Bollen & Long, 1993). Since MI is a composite hypothesis consisting of three restrictions, rejection of MI can have several causes.

Model extension with background variables

Frequently, researchers have data concerning the subjects in addition to the test scores they want to analyze. There are two ways of integrating background variables in the multigroup model. First, one can specify the hypothesis that background variables influence only the factor(s). We will call this Option 1. Option 1 serves to investigate structural relations, for instance, the hypothesis that nutrition has an impact on IQ factors. Importantly, the influence of the background variable on the

Disadvantages of other commonly used methods for group comparisons

In what follows we briefly discuss some of the drawbacks of alternative methods for group comparisons. Methods for group comparisons such as those used in recent studies concerning the Flynn effect and ethnic differences hinge on, at times implicit, assumptions about the relation between sources of within-group variance and between-group variance. In addition, the issue of MI is not always addressed adequately. Although the MI model is certainly not restricted to the analysis of IQ test

Illustration using Osborne's twin data

For the empirical example, we use data published in Osborne (1980). The subjects are African and Caucasian American twins drawn from public and private schools in Kentucky, Georgia, and Indiana. Note that this is clearly not a representative sample for the population of African and Caucasian Americans in the United States; conceptual interpretations of the analysis below are therefore not generalizable. The analysis is included for illustrative purposes.

The data are scores on four subscales of

Discussion

If groups are to be compared on observed test scores, it is necessary to investigate whether the test is measurement invariant across groups. MI in the factor model, that is, absence of bias, implies that within- and between-group variations are due to the same factors. Consequently, establishing MI and investigating between- and within-group differences coincide in the context of the multigroup confirmatory factor model. A model restricted according to MI can be compared with a less restricted

Acknowledgements

The research by the first author was supported through a subcontract to grant 5 R01 HD30995-07 by NICHD. The research of Conor Dolan was made possible by a fellowship of the Royal Netherlands Academy of the Arts and Sciences.

References (52)

  • C.V. Dolan et al.

    Investigating black–white differences in psychometric IQ: Multi-group confirmatory factor analysis and a critique of the method of correlated vectors

  • J.L. Ellis

    Subpopulation invariance of patterns in covariance matrices

    British Journal of Mathematical & Statistical Psychology

    (1993)
  • J.R. Flynn

    Massive IQ gains in 14 nations: What IQ really measures

    Psychological Bulletin

    (1987)
  • J.R. Flynn

    IQ-gains over time: Toward finding the causes

  • J.R. Flynn

    Searching for justice: The discovery of IQ gains over time

    American Psychologist

    (1999)
  • J.R. Flynn

    IQ-gains, WISC subtests and fluid g: g theory and the relevance of Spearman's hypothesis to race

  • J.E. Gustafsson

    The relevance of factor analysis for the study of group differences

    Multivariate Behavioral Research

    (1992)
  • A.R. Jensen

    The nature of the black–white difference on various psychometric tests: Spearman's hypothesis

    Behavioral and Brain Sciences

    (1985)
  • K.G. Jöreskog et al.

    Lisrel 8.3

    (1999)
  • D. Kaplan et al.

    A study of the power associated with testing factor mean differences under violations of factorial invariance

    Structural Equation Modeling

    (1995)
  • R.C. Lewontin

    Race and intelligence

    Bulletin of the Atomic Scientists

    (1970)
  • R.C. Lewontin

    The analysis of variance and the analysis of causes

    American Journal of Human Genetics

    (1974)
  • T.D. Little

    Mean and covariance structures (MACS) analyses of cross-cultural data: Practical and theoretical issues

    Multivariate Behavioral Research

    (1997)
  • G.H. Lubke et al.

    Can unequal residual variances across subpopulations mask differences in residual means in the common factor model?

    Structural Equation Modeling

    (2003)
  • G.H. Lubke et al.

    Investigating group differences using Spearman's hypothesis: An evaluation of Jensen's method

    Multivariate Behavioral Research

    (2001)
  • Cited by (100)

    • An imposed etic approach with Schwartz polar dimensions to explore cross-cultural use of social network services

      2020, Information and Management
      Citation Excerpt :

      Researchers also noticed that the segregation between individual-level interpretations and country-level effects leads to discriminatory and suboptimal decisions that undermine our understandings of cross-cultural phenomena [139]. Intelligence scores, for example, at the individual level may reflect individual (genetic) differences; however, economic or educational differences at the culture level may influence differences between groups of applicants [144]. Schwartz and Fischer also advise not to consider country level as a simple aggregation of individual values without proper justification and analysis [139,143].

    • Distinguishing Specific from General Effects in Cognition Research

      2019, Journal of Applied Research in Memory and Cognition
    View all citing articles on Scopus
    View full text