Elsevier

Intelligence

Volume 41, Issue 5, September–October 2013, Pages 407-422
Intelligence

The limitations of model fit in comparing the bi-factor versus higher-order models of human cognitive ability structure

https://doi.org/10.1016/j.intell.2013.06.004Get rights and content

Highlights

  • We compare the higher-order and bi-factor models of ability structure.

  • Statistical comparisons of model fits are biased in favour of the bi-factor model.

  • This bias is due to unmodelled complexity.

  • We caution against substantive interpretations of a better fitting bi-factor model.

Abstract

We addressed the question of whether the bi-factor or higher-order model is the more appropriate model of human cognitive ability structure. In previously published nested confirmatory factor analyses, the bi-factor model tended to be better fitting than the higher-order model; however, these studies did not consider a possible inherent statistical bias favouring the fit of the bi-factor model. In our own analyses and consistent with previous empirical results, the bi-factor model was also better fitting than the higher-order model. However, simulation results suggested that the comparison of bi-factor and higher-order models is substantially biased in favour of the bi-factor model when, as is commonly the case in CFA analyses, there is unmodelled complexity. These results suggest that decisions as to which model to adopt either as a substantive description of human cognitive ability structure or as a measurement model in empirical analyses should not rely on which is better fitting.

Introduction

Historically, a key interest in cognitive ability research has been the determination of the structure of human cognitive ability. Early debates were concerned with whether ability should be described in terms of a single general factor (e.g. Spearman, 1927) or multiple specific ability factors (e.g. Guilford, 1967, Thurstone, 1938) but these opposing theoretical perspectives eventually found conciliation in the adoption of models with multiple strata of ability factors ranging in breadth from specific to general (Gustafsson, 2001, Mackintosh, 2011). Such multi-strata models of cognitive ability structure are now well established, are reflected in contemporary theoretical models of ability structure, and have received extensive empirical support from exploratory and confirmatory factor analyses (e.g. Carroll, 1993, Johnson and Bouchard, 2005, McGrew, 2009, Vernon, 1964).

Although, implicitly, it is often assumed in using these multi-strata models that g is super-ordinate to more specific abilities, this is only one hypothesis about how multiple ability factors are related to cognitive performance and other models may explain the data equally well, or better. It is useful to discuss these hypotheses in terms of psychometric models of ability structure because these models facilitate the operationalisation and empirical testing of such hypotheses in a mathematically precise and falsifiable framework (Johnson and Bouchard, 2005, Vrieze, 2012).

The most commonly used psychometric model of human cognitive ability is the higher-order model, an example of which is shown in the top panel of Fig. 1. Implicit in the model are several assumptions about human cognitive ability structure, beyond the base assumption that both a g factor and specific ability factors play roles in cognitive performance. The model also represents the assumption that the effects of g on observed subtests are completely mediated by lower-order, more specific abilities. This means that g is assumed not to be directly involved in cognitive performance, and its effects on cognitive performance are realised only through its influences on the more specific abilities which are directly involved in cognitive performance.

Another model which can equally represent the existence of both a general factor and group factors, and thus, a plausible alternative to the higher-order model, is the bi-factor model. The higher-order model is mathematically more constrained than, and nested within the bi-factor model. An example of the bi-factor model is shown in the bottom panel of Fig. 1 (Yung, Thissen, & McLeod, 1999). The bi-factor model represents some differing assumptions about the structure of human cognitive ability. In contrast to the higher-order model, the bi-factor model reflects the assumption that the associations of g with observed cognitive performance are direct and independent of the associations of specific abilities with cognitive performance. These specific abilities are assumed to reflect narrower abilities such as ‘Verbal’ or ‘Spatial’ ability (e.g. Brunner, Nagy, & Wilhelm, 2012) that are independent of g.

From a substantive perspective, an empirical comparison of the higher-order and bi-factor model may reveal which model best approximates the ‘true’ structure of human cognitive ability structure, similar to historical studies that have established that a hierarchical model with both g and specific abilities is a more appropriate description of human cognitive ability structure than a one factor g model (e.g. Gustafsson, 2001). On a more pragmatic level, the measurement of g and specific abilities in empirical studies should utilise the best available statistical operationalisations of the constructs and the bi-factor and higher-order model are competing statistical operationalisations. Model selection issues such as those above typically rely on comparing the fits of competing models with the better fitting model being accepted as the more appropriate substantive description and/or practical operationalisation for a construct or constructs (e.g. Vrieze, 2012).

The bi-factor and higher-order models have previously been compared in this way, in a number of previous empirical studies, which aimed to determine which of the two models can best account for observed correlation amongst subtests. Such studies have used nested models confirmatory factor analysis (CFA) to compare the global fit of the two models. This is made possible by the fact that the class of higher-order CFA models is nested within the class of bi-factor CFA models (Yung et al., 1999). Yung et al. (1999) showed that the higher-order model is equivalent to a bi-factor model in which the ratio of the variances attributable to g and the relevant specific ability for a given subtest are constrained to be equal across all subtests loading on the same specific ability factor. The statistical significance of the difference in fit between the two models can, therefore, be assessed using a chi-square (χ2) difference test on the degrees of freedom equal to the number of additional constraints in the higher-order model. Given that this test can be overly powerful, the authors of these studies have also presented measures of the ‘practical significance’ of the difference in fit of the two models where differences that are ‘practically significant’ are larger than those that are merely statistically significant. In particular, Gignac (2007) has argued that a Tucker–Lewis index (TLI; Tucker & Lewis, 1973) difference of > .01 between models is indicative of a practically significant difference in fit. This was based on a review and discussion of criteria for detecting a lack of measurement invariance in nested multi-group CFA models (Vandenberg & Lance, 2000). In principle, however, any fit index that takes into account differing model complexities should be informative in this respect. For example, root mean square error of approximation (RMSEA) is expressed per degree of freedom and, therefore, recognises an effect of model parsimony (Steiger, 1990) and global information-theoretic fit statistics such as Akaike and Bayesian information criteria (AIC; Akaike, 1987 and BIC; Raftery, 1995) were developed to recognise the importance to theory of model parsimony. Smaller values of all three indicate a better-fitting model.

The majority of previous studies using this approach has concluded that the bi-factor model was a better fitting model than the higher-order model (e.g. Brunner et al., 2012, Gignac, 2005, Gignac, 2006a, Gignac, 2006b, Gignac, 2008, Golay and Lecerf, 2011, Keith, 2005, Watkins, 2010, Watkins and Kush, 2002). These studies have shown that chi-square difference tests support the bi-factor model as being statistically significantly better fitting, with smaller RMSEA and larger TLI values, though the difference in model fit was in a few cases not practically significant according to the TLI difference criterion (> .01) (e.g. Gignac, 2008, Watkins, 2010). A small number of comparisons have favoured the higher-order model (e.g. Golay & Lecerf, 2011). Overall, though, support for the superior fit of the bi-factor model over the higher-order model has been strong.

There are some outstanding issues from these studies. For example, judgements of fit have relied mainly on TLI and chi-square, whereas information theoretic fit statistics such as AIC and BIC have generally not been reported. Previous comparisons of the bi-factor and higher-order models have also tended to utilise relatively narrow ability batteries which often don't meet recommended minimum criteria for indicators per factor (Reeve & Blacksmith, 2009). In addition, all of these studies have based both their bi-factor and higher-order models on exploratory methods that assume a higher-order structure or on pre-existing models that have ultimately derived from such methods. This could potentially introduce a bias into the comparison because there is some evidence that whether g is extracted directly (reflecting a bi-factor assumption) versus as a higher-order factor after oblique rotation (reflecting a higher-order factor assumption) can affect the content and number of specific factors (e.g. Gustafsson and Balke, 1993, Jennrich and Bentler, 2011, Johnson and Bouchard, 2007).

The most serious outstanding issue in bi-factor versus higher-order comparisons, however, is the possibility that the superior fit of the bi-factor model has nothing to do with its appropriateness as a description of human cognitive ability structure. There are reasons that the bi-factor model could fit better than the higher-order model even if its core assumptions are not more appropriate descriptions of the structure of human cognitive ability. For example, the bi-factor model may simply be better at accommodating unmodelled complexity in test batteries. Psychometric models of human cognitive ability will necessarily represent simplifications of its true structure; thus, models will always be to some degree mis-specified. For example, in CFA it is customary to constrain the majority of cross-loadings to zero, even though their true magnitudes may be non-trivial (Asparouhov & Muthén, 2009). Similarly, small residual correlations between subtests may remain but be constrained to zero. This unmodelled complexity may be substantive in total but any one aspect of it may be too small to merit inclusion in the model, or it may be due to sampling fluctuations that give rise to non-substantive inter-correlations by chance. It is arguable that the global fit of the bi-factor model may be less sensitive to these mis-specifications than the higher-order model because it includes more free parameters and a g that loads directly on observed subtests. However, in accommodating these mis-specifications, modelled parameters may become biased.

To explicate, consider a pair of nested higher-order and bi-factor CFA models in which there are unmodelled cross-loadings. In the higher-order model, the effect of constraining these cross-loadings to zero is to force the covariances due to these unmodelled cross-loadings to be absorbed by the modelled factor loadings, inflating factor covariances and, in turn, the variance attributable to g (Asparouhov & Muthén, 2009). Thus, specific factor correlations can to some degree compensate for unmodelled cross-loadings that would otherwise decrease global fit (Beauducel & Wittmann, 2005). In a bi-factor model the correlations amongst specific factors and g are constrained to zero so this compensatory route is not available. Instead, subtests are connected directly to g, providing an alternative and more direct mediation route for this unmodelled complexity, i.e. the covariance due to unmodelled cross-loadings can be absorbed by g without going through specific ability factors.

Similar considerations apply to unmodelled correlated residuals. The effects of incorrectly assuming uncorrelated errors between two items can be compensated by upward biasing of factor inter-correlations or by addition of further first-order factors (Westfall, Henning, & Howell, 2012). In the higher-order model, therefore, correlated errors may inflate factor inter-correlations and the strength of g but in the bi-factor model, g is an additional first-order factor which could absorb some of these correlations. Similar to the effects with unmodelled cross-loadings, the route provided by the bi-factor model is a more direct alternative mediation route for the unmodelled covariance than is afforded by the higher-order model.

Thus, the bi-factor model may compensate differently and better for arbitrary and unintended distortions of simple (or near simple) structure and incorrect assumptions of uncorrelated errors, leading to smaller reductions in global fit but perhaps greater parameter bias for essentially the same mis-specification. More importantly, however, the bi-factor model simply has more parameters, and thus, more opportunities for specific parameters to absorb some of the effects of unmodelled complexity. Although some fit statistics have been developed specifically to take account of differing numbers of parameters across models being compared, these are based on applying adjustments for model degrees of freedom. The adjustment, therefore, does not make reference to the ‘location’ of the additional parameters in the model nor to the possibility that additional parameters may model both intended covariances (e.g. adding a cross-loading to account for the correlation between an item and a factor) but also unintended unmodelled covariances (e.g. a factor correlation absorbing covariance due to an unmodelled correlation between item errors). Thus, the parsimony penalties of such fit indices may under-compensate for increases in fit due to the addition of more parameters when there are mis-specifications that can be absorbed by additional parameters and, in particular, when the degree to which this happens depends on the location of the parameter in the model.

The possible existence of such effects is an important consideration when attempting to evaluate which of the two alternative models is the most appropriate description of human cognitive ability structure because they could result in the incorrect model being favoured by global fit statistics. Unfortunately, for any given dataset, the existence, nature, and specific consequences of such biases will be difficult to predict in advance. They will likely depend on the specific subtests and models included in an analysis.

In the present study we addressed the question of whether the bi-factor represents a superior description of human cognitive ability structure than the higher-order model. To do so, we assessed whether comparisons between the higher-order and bi-factor models are subject to substantial statistical bias for the reasons discussed above. Only if no bias was evident could we infer that a better-fitting bi-factor model than a higher-order model was due to the core substantive assumptions of the models, rather than to the non-substantive statistical advantage of the bi-factor model i.e. its greater robustness to subtle mis-specifications. We first attempted to replicate previous real data comparisons of the bi-factor and higher-order model, expecting to find the former to be better fitting. We did so to contribute further evidence as to the generality of the superiority of the fit of bi-factor model over that of the higher-order model in a way that addressed the outstanding methodological issues from previous studies. Thus, we investigated the possibility that bias is introduced when both the bi-factor and higher-order models are implicitly based on a higher-order assumption, we used a large number of diverse indicators, and we examined AIC and BIC in judging fit, in addition to those conventionally used to compare the bi-factor and higher-order model. Following our replication attempt in real data, we proceeded to investigate why bi-factor models tend to prove better fitting in a brief simulation study. We hypothesised that the increased complexity of the bi-factor model, rather than its superiority as a description of ability structure can explain its better fit relative to the higher-order model.

Section snippets

Participants

For our real data comparison we analysed data from 433 participants of the Minnesota Study of Twins Reared Apart (MISTRA). The full MISTRA sample (n = 436) included participants in the age range of 18 to 79 years with a mean of 42.7 who were a mix of twin pairs, their spouses, partners, friends and members of their adoptive and biological families. The mean full scale IQ (FSIQ) of the sample was 109.7 (SD = 11.8) when normed at the 1955 level, or 101.2 (SD = 14.8) when adjusted for the Flynn effect

Data screening

The average missingness for the variables of Battery A was 4.9% (SD = 3.7, max = 9.4%) and for Battery B was 5.9% (SD = 6.7, max = 31%). FIML estimation was very likely still appropriate for dealing with these levels of missingness (Enders & Bandalos, 2001). No potentially outlying values were detected in either battery (|z > |3.29, p < 0.01; Tabachnick & Fidell, 2007). The mean subtest communality for Battery A was 0.51 and for Battery B was 0.49. Collectively these values, the number and breadth of

Discussion

We used nested CFA models to compare the appropriateness of the higher-order and bi-factor models as descriptions of human cognitive ability structure in two batteries each of 21 cognitive tests. We found that, for both batteries, the bi-factor model was favoured in real data comparisons. However, we also discussed several possible reasons why comparison of the two models may be inherently biased due to their differing parsimonies. In our real data analyses, tentative support for this

Conclusions

The measurement of g and specific abilities should be based on the best available statistical operationalisation and model fit is typically an important consideration in evaluating alternative operationalisations. CFA comparisons of the higher-order and bi-factor fit model may, however, be statistically biased, complicating their empirical comparison. Strong substantive reasons for preferring one or other model are also lacking. Given this, and the fact that the relative statistical and

Acknowledgements

We are grateful to Tom Bouchard for providing us with the data used in this study and for helpful comments on an earlier draft.

References (69)

  • C.L. Reeve et al.

    Identifying g: A review of current factor analytic practices in the science of mental abilities

    Intelligence

    (2009)
  • C.Y. Tang et al.

    Brain networks for working memory and factors of intelligence assessed in males and females with fMRI and DTI

    Intelligence

    (2010)
  • H. Akaike

    Factor analysis and AIC

    Psychometrika

    (1987)
  • T. Asparouhov et al.

    Exploratory structural equation modeling

    Structural Equation Modeling

    (2009)
  • A. Beauducel et al.

    Simulation study on fit indexes in CFA with slightly distorted simple structure

    Structural Equation Modeling

    (2005)
  • P.M. Bentler

    Comparative fit indexes in structural models

    Psychological Bulletin

    (1990)
  • T.J. Bouchard et al.

    Sources of human psychological differences: The Minnesota Study of Twins Reared Apart

    Science

    (1990)
  • M. Brunner et al.

    A tutorial on hierarchically structured constructs

    Journal of Personality

    (2012)
  • J.B. Carroll

    Human cognitive abilities: A survey of factor-analytic studies

    (1993)
  • J.C. DeFries et al.

    Near identity of cognitive structure in two ethnic groups

    Science

    (1974)
  • C.K. Enders et al.

    The relative performance of full information maximum likelihood estimation for missing data in structural equation models

    Structural Equation Modeling

    (2001)
  • L.R. Fabrigar et al.

    Evaluating the use of exploratory factor analysis in psychological research

    Psychological methods

    (1999)
  • G.E. Gignac

    Revisiting the factor structure of the WAIS-R

    Assessment

    (2005)
  • G.E. Gignac

    A confirmatory examination of the factor structure of the Multidimensional Aptitude Battery (MAB): Contrasting oblique, higher-order and nested factor models

    Educational and Psychological Measurement

    (2006)
  • G.E. Gignac

    The WAIS-III as a nested factors model: A useful alternative to the more conventional oblique and higher-order models

    Journal of Individual Differences

    (2006)
  • G.E. Gignac

    Higher-order models versus direct hierarchical models: g as superordinate or breadth factor?

    Psychology Science

    (2008)
  • P. Golay et al.

    Orthogonal higher-order structure and confirmatory factor analysis of the French Wechsler Adult Intelligence Scale (WAIS-III)

    Psychological Assessment

    (2011)
  • J.P. Guilford

    The nature of human intelligence

    (1967)
  • J.E. Gustafsson

    On the hierarchical structure of ability and personality

  • J.E. Gustafsson et al.

    General and specific abilities as predictors of school achievement

    Multivariate Behavioral Research

    (1993)
  • R. Haier et al.

    Gray matter correlates of cognitive ability tests used for vocational guidance

    BMC research notes

    (2010)
  • A.R. Hakstian et al.

    The Comprehensive Ability Battery

    (1975)
  • A.R. Hakstian et al.

    Higher stratum ability structures on a basis of twenty primary mental abilities

    Journal of Educational Psychology

    (1978)
  • J.L. Horn

    A rationale and test for the number of factors in factor analysis

    Psychometrika

    (1965)
  • Cited by (249)

    View all citing articles on Scopus
    View full text