Understanding the limitations of global fit assessment in structural equation modeling
Section snippets
Understanding the limitations of global fit assessment in structural equation modeling
I share the concerns of Paul Barrett (PB) about the limitations of global fit indices in structural equation modeling (SEM). However, his analysis of the problems is both incomplete and unconvincing, and his proposed solutions are unnecessarily regressive. My space is limited, so I will confine myself to some key issues.
The classic chi-square test and the fallacy of accept-support logic
The fundamental facts bear repeating. The traditional chi-square goodness of fit test in SEM is an accept-support test of the statistical null hypothesis of perfect model fit, i.e., that the model deviates zero from the data. Accepting H0 supports the model. This accept-support test of zero discrepancy (AS0) is of little direct value, for several reasons. First, this “nil hypothesis” of perfect fit is irrelevant, because SEM models are highly restrictive, and have an essentially zero likelihood
Barrett’s objections
Most of PB’s objections to fit indices are inadequate to support his regressive agenda.
(1) PB notes that recent Monte Carlo studies confirm that “golden rule” cutoff values such as those suggested by Hu and Bentler (1999) sometimes incorrectly identify “misspecified models” as fitting “acceptably”. This is not surprising. However, such Monte Carlo studies often, like AS0 itself, ask the wrong question, by beginning with an assumption that there is a perfect model. They then omit one or more
Barrett’s new agenda
PB is ready to set new “rules” for his field. The first is that AS0 must always be reported, and “a statement that the model fits or fails to fit via this statistic must be provided”. The chi-square and p-value should be reported routinely, but failure to reject the null hypothesis should never be conflated with the notion that “the model fits”. To do so is the classic accept-support fallacy.
PB’s second rule states that except in special circumstances “SEM analyses based on samples of less than
Barrett rewrites history
At one point in his discussion (“what happened to the logic of model testing?”), PB gives a brief and seamless historical account that attributes both the chi-square test logic and “the notion of an approximate fit index” to Karl Jöreskog. This makes for smooth reading, but creates a completely inaccurate impression. AS0 was known when Jöreskog was an undergraduate. Tucker and Lewis proposed an early “heuristic” fit index in 1973, and Steiger and Lind (1980) invented statistically-based fit
Conclusions, and a recommendation
PB and I agree on a number of issues, but differ sharply on some aspects of strategy for dealing with these issues.
PB’s recommendation to ban fit indices is overkill, much like trying to end air pollution by prohibiting the automobile instead of improving it.
I share PB’s fundamental concern that some researchers have used fit indices as an excuse to publish sloppy and inaccurate models. I urge PB to consider the test of not-close fit (MacCallum et al., 1996) as a constructive approach to
References (17)
Structural equation modelling: adjudging model fit
Personality and Individual Differences
(2007)Some cautions concerning the application of causal modeling methods
Multivariate Behavioral Research
(1983)A new approach to factor analysis: the radex
- Holzinger, K. J., & Swineford, F. (1939). A study in factor analysis: the stability of a bi-factor solution. In...
- et al.
Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives
Structural Equation Modeling
(1999) Structural analysis of covariance and correlation matrices
Psychometrika
(1978)- et al.
Power analysis and determination of sample size for covariance structure modeling
Psychological Methods
(1996) - et al.
Goodness of fit in structural equation models
Cited by (1660)
Development and validation of the self-injury stigma scale
2024, Journal of Psychiatric ResearchEvaluation of consumer usage behavior for interactive entertainment: A Netflix case study
2024, Entertainment ComputingDoes Emotional Stability Form the Core of Self-Evaluations? A Multi-Rater Cross-Lagged Panel Study
2024, Journal of Research in Personality