Should I allow my confirmatory factors to correlate during factor score extraction? Implications for the applied researcher

Logan, Jessica A. R.; Jiang, Hui; Helsabeck, Nathan; Yeomans-Maldonado, Gloria

doi:10.1007/s11135-021-01202-x

Should I allow my confirmatory factors to correlate during factor score extraction? Implications for the applied researcher

Published: 23 July 2021

Volume 56, pages 2107–2131, (2022)
Cite this article

Quality & Quantity Aims and scope Submit manuscript

Jessica A. R. Logan ORCID: orcid.org/0000-0003-3113-4346^1,4,
Hui Jiang²,
Nathan Helsabeck¹ &
…
Gloria Yeomans-Maldonado¹^nAff3

694 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

With complex models becoming increasingly popular in the social sciences, many researchers have begun using latent variable modeling in multiple-steps, saving, estimating, or otherwise extracting factor scores from one confirmatory factor analysis (CFA) for use in a second inferential analysis. With two or more factors identified in a CFA, there exist few practical guidelines as to how researchers should proceed. In Study 1, we examine two common practices when CFAs have two or more factors: Fitting separate CFAs or allowing them to correlate in the model used for extraction. We provide a simulation study to demonstrate the bias introduced in each of the two approaches. In Study 2, we demonstrate that the between-factor correlation bias can be mitigated through the use of a different estimator; using ten Berge estimation shows near zero bias on the critical correlations between factors. Finally, we demonstrate this with an example dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Best practices for your confirmatory factor analysis: A JASP and lavaan tutorial

Article 13 March 2024

Pablo Rogers

Parceling Cannot Reduce Factor Indeterminacy in Factor Analysis: A Research Note

Article 10 July 2019

Edward E. Rigdon, Jan-Michael Becker & Marko Sarstedt

CFA Models with a General Factor and Multiple Sets of Secondary Factors

Article 17 August 2018

Minjeong Jeon, Frank Rijmen & Sophia Rabe-Hesketh

References

Anderson, T.W., Rubin, H.: Statistical inference in factor analysis, vol. 5, pp. 111–150 (1956)
Bollen, K.A.: Structural Equations with Latent Variables. Wiley, New York (1989)
Book Google Scholar
Borgeest, G.S., Henson, R., Shafto, M., Samu, D., Kievit, R.: Greater lifestyle engagement is associated with better cognitive resilience (2018). https://doi.org/10.31234/osf.io/6pzve
Brown, T.A.: Confirmatory Factor Analysis for Applied Research, 2nd edn. Gilford Press, New York (2015)
Google Scholar
Croon, M.: Using predicted latent scores in general latent structure models. In: Marcoulides, G.A., Moustaki, I. (eds.) Latent variable and latent structure models, p. 195. Lawrence Erlbaum, Mahwah (2002)
Google Scholar
Curran, P.J., Hussong, A.M.: Integrative data analysis: the simultaneous analysis of multiple data sets. Psychol. Methods 14(2), 81 (2009)
Article Google Scholar
Curran, P.J., Cole, V.T., Bauer, D.J., Rothenberg, W.A., Hussong, A.M., Gottfredson, N.: Improving factor score estimation through the use of observed background characteristics. Struct. Equ. Model. 23(6), 827–844 (2016)
Article Google Scholar
Curran, P.J., Cole, V.T., Bauer, D.J., Rothenberg, W.A., Hussong, A.M.: Recovering predictor-criterion relations using covariate-informed factor score estimates. Struct. Equ. Model. 25(6), 860–875 (2018). https://doi.org/10.1080/10705511.2018.1473773
Article Google Scholar
Devlieger, I., Mayer, A., Rosseel, Y.: Hypothesis testing using factor score regression: a comparison of four methods. Educ. Psychol. Meas. 76(5), 741–770 (2016). https://doi.org/10.1177/0013164415607618
Article Google Scholar
DiStefano, C., Zhu, M., Mindrila, D.: Understanding and using factor scores: considerations for the applied researcher. Pract. Assess. Res. Eval. 14(20), 1–11 (2009)
Google Scholar
Fernández-Giménez, M.E., Allington, G.R., Angerer, J., Reid, R.S., Jamsranjav, C., Ulambayar, T., Hondula, K., Baival, B., Batjav, B., Altanzul, T.: Using an integrated social-ecological analysis to detect effects of household herding practices on indicators of rangeland resilience in Mongolia. Environ. Res. Lett. 13(7), 075010 (2018)
Article Google Scholar
Greenbaum, P.E., Wang, W., Henderson, C.E., Kan, L., Hall, K., Dakof, G.A., Liddle, H.A.: Gender and ethnicity as moderators: integrative data analysis of multidimensional family therapy randomized clinical trials. J. Fam. Psychol. 29(6), 919–930 (2015). https://doi.org/10.1037/fam0000127
Article Google Scholar
Harrington, D.: Confirmatory Factor Analysis. Oxford University Press, Oxford (2009)
Google Scholar
Holzinger, K.J., Swineford, F.A.: A Study of Factor Analysis: The Stability of a Bi-Factor Solution. University of Chicago Press, Chicago (1939)
Google Scholar
Hoshino, T., Bentler, P.M.: Bias in factor score regression and a simple solution. UCLA, Department of Statistics (2011). https://escholarship.org/uc/item/45h3t3t2
Jöreskog, K.G., Sörbom, D.: LISREL 8: User’s Reference Guide. Scientific Software International, Chicago (1996)
Google Scholar
Kim, Y.S., Al Otaiba, S., Wanzek, J., Gatlin, B.: Toward an understanding of dimensions, predictors, and the gender gap in written composition. J. Educ. Psychol. 107(1), 79 (2015)
Article Google Scholar
Kline, R.B.: Principles and Practice of Structural Equation Modeling. Guilford Publications, Chicago (2015)
Google Scholar
Krijnen, W.P., Wansbeek, T., ten Berge, J.M.F.: Best linear predictors for factor scores. Commun. Stat. Theory Methods 25(12), 3013–3025 (1996)
Article Google Scholar
Logan, J.A.R.: Question for researchers: have you ever run a confirmatory factor analysis and then saved out the factor scores (turning into observed scores) for use in another analysis? No(28%), Yes (40%), Just show me the results (32%). [tweet]. (2018). https://twitter.com/jarlogan/status/1058009194006220802
Lu, I.R.R., Thomas, D.R.: Avoiding and correcting bias in score-based latent variable regression with discrete manifest items. Struct. Equ. Model. 15(3), 462–490 (2008). https://doi.org/10.1080/10705510802154323
Article Google Scholar
McDonald, R.P.: The dimensionality of tests and items. Br. J. Math. Stat. Psychol. 34(1), 100–117 (1981)
Article Google Scholar
McNeish, D., Wolf, M.G.: Thinking twice about sum scores. Behav. Res. Methods 52(6), 2287–2305 (2020). https://doi.org/10.3758/s13428-020-01398-0
Article Google Scholar
Muthén, L.K., Muthén, B.O.: Mplus: Statistical Analysis with Latent Variables: User’s Guide (Version 8). Los Angeles, CA: Muthén & Muthén (2017). https://www.statmodel.com/
Purpura, D.J., Hume, L.E., Sims, D.M., Lonigan, C.J.: Early literacy and early numeracy: the value of including early literacy skills in the prediction of numeracy development. J. Exp. Child Psychol. 110(4), 647–658 (2011)
Article Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2017)
Google Scholar
Rimm-Kaufman, S.E., Baroody, A.E., Larsen, R.A., Curby, T.W., Abry, T.: To what extent do teacher–student interaction quality and student gender contribute to fifth graders’ engagement in mathematics learning? J. Educ. Psychol. 107(1), 170 (2015)
Article Google Scholar
Rose, J.S., Dierker, L.C., Hedeker, D., Mermelstein, R.: An integrated data analysis approach to investigating measurement equivalence of DSM nicotine dependence symptoms. Drug Alcohol Depend. 129(1–2), 25–32 (2013)
Article Google Scholar
Rosseel, Y.: Lavaan: an R package for structural equation modeling. J. Stat. Softw. 48, 1–36 (2012a)
Article Google Scholar
Rosseel, Y.: lavaan: an R package for structural equation modeling. J. Stat. Softw. 48(2), 1–36 (2012b)
Article Google Scholar
Skrondal, A., Laake, P.: Regression among factor scores. Psychometrika 66(4), 563–575 (2001)
Article Google Scholar
ten Berge, J.M.F., Krijnen, W.P., Wansbeek, T., Shapiro, A.: Some new results on correlation-preserving factor scores prediction methods. Linear Algebra Appl. 289(1–3), 311–318 (1999)
Article Google Scholar
Thurstone, L.L.: The Vectors of Mind: Multiple-Factor Analysis for the Isolation of Primary Traits. University of Chicago Press, Chicago (1935)
Book Google Scholar

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Gloria Yeomans-Maldonado
Present address: Children’s Learning Institute, University of Texas Health Science Center, Houston, USA

Authors and Affiliations

The Ohio State University, Columbus, USA
Jessica A. R. Logan, Nathan Helsabeck & Gloria Yeomans-Maldonado
Crane Center for Early Childhood Research and Policy, The Ohio State University, Columbus, USA
Hui Jiang
College of Education and Human Ecology, The Ohio State University, 29 W Woodruff Ave, 211A Ramseyer Hall, Columbus, OH, 43210, USA
Jessica A. R. Logan

Authors

Jessica A. R. Logan
View author publications
You can also search for this author in PubMed Google Scholar
Hui Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Helsabeck
View author publications
You can also search for this author in PubMed Google Scholar
Gloria Yeomans-Maldonado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jessica A. R. Logan.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

This study did not involve human subjects and was not subject to ethics approval.

Informed consent

Informed consent was not applicable for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 1.1 Data generation

The data generation model was based on Eq. 4 (data model) and Eq. 5 (implied covariance form), where $y_{i}$ is the p × 1 observed data vector for the ith observation and p = number of indicators, i.e. 8; ${\Sigma }$ is the implied covariance matrix for y; $\mu$ is the p × 1 intercept vector fixed to 0; ${\Lambda }$ is the p × m matrix of factor loadings (fixed), where m = number of factors (i.e. 2), $\eta_{i}$ is the m × 1 vector of “true” factor scores for the ith observation following a multivariate normal distribution, MVN $\left( {0,{\Phi }} \right)$, and $\epsilon_{i}$ is the p × 1 residual vector for the ith observation following a multivariate normal distribution, MVN $\left( {0,{\Psi }} \right)$. Note that ${\Phi }$ is a m × m matrix which represents the covariance matrix between the factors, and ${\Psi }$ is a p × p diagonal matrix representing the error covariance matrix, where the error variances are set up as such that the variance of y is 1.

$$y_{i} = \mu + \Lambda \eta_{i} + \upepsilon_{i}$$

(4)

$$\Sigma = \Lambda \Phi \Lambda^{^{\prime}} + \Psi$$

(5)

In addition, we also introduced an external variable z that correlated at 0.5 with factor 1 only. It is assumed that z is normally distributed with a variance of 1. The covariance matrix between factors and variable z is therefore

$$\Phi_{{\left( {m + 1} \right) \times \left( {m + 1} \right)}}^{*} = \left[ {\begin{array}{*{20}c} \Phi &\upgamma \\ {\upgamma ^{\prime } } & 1 \\ \end{array} } \right],$$

where ${\upgamma }^{\prime }$ = [0.5, 0, …, 0] with dimension of 1 × m.

For each replication, data were generated in the following steps: (1) a series of true factor scores (γ) as well as variables (z) were drawn randomly from the multivariate normal distribution MVN(0, ${\Phi }^{*}$), and a series of residuals $\upepsilon$were drawn from MVN(0, ${\Psi }$); and (2) the observed data matrix Y of dimension p x n (n = sample size) was calculated based on Eq. 4.

Appendix 2 2.1 Code to calculate ten Berge factor scores

Appendix 3

Means and standard deviations of between-factor correlation estimates used to convert to the bias estimates presented in Table 3.

True corr	n	Orthogonal extraction			Correlated extraction
True corr	n	High	High mixed	Mixed	High	High mixed	Mixed
0	100	0 (0.10)	0 (0.10)	0 (0.10)	0 (0.13)	0 (0.14)	0 (0.15)
	250	0 (0.06)	0 (0.06)	0 (0.06)	0 (0.08)	0 (0.09)	0 (0.09)
	500	0 (0.04)	0 (0.04)	0 (0.04)	0 (0.06)	0 (0.06)	0 (0.07)
0.3	100	0.25 (0.09)	0.25 (0.09)	0.23 (0.10)	0.33 (0.12)	0.34 (0.13)	0.34 (0.14)
	250	0.27 (0.06)	0.25 (0.06)	0.23 (0.06)	0.34 (0.07)	0.34 (0.09)	0.35 (0.09)
	500	0.26 (0.04)	0.25 (0.04)	0.23 (0.04)	0.34 (0.05)	0.34 (0.06)	0.35 (0.06)
0.5	100	0.43 (0.08)	0.40 (0.09)	0.38 (0.09)	0.55 (0.10)	0.56 (0.11)	0.57 (0.13)
	250	0.44 (0.05)	0.41 (0.05)	0.39 (0.05)	0.55 (0.06)	0.57 (0.07)	0.59 (0.07)
	500	0.44 (0.04)	0.42 (0.04)	0.39 (0.04)	0.56 (0.04)	0.57 (0.05)	0.59 (0.05)
0.8	100	0.70 (0.05)	0.65 (0.06)	0.61 (0.07)	0.87 (0.05)	0.88 (0.06)	0.88 (0.06)
	250	0.70 (0.03)	0.66 (0.04)	0.63 (0.04)	0.87 (0.03)	0.88 (0.03)	0.89 (0.04)
	500	0.70 (0.02)	0.66 (0.02)	0.63 (0.03)	0.87 (0.02)	0.88 (0.02)	0.89 (0.03)

Standard deviations are in parentheses. For High = all factor loadings set to 0.80; HM = for factor 1, the loadings are all 0.80, and for factor 2 the loadings are 0.40, 0.40, 0.80, 0.80; Mixed = for factor 1 and factor 2, the loadings are 0.40, 0.40, 0.80, 0.80

Appendix 4

Mean estimates and standard deviations for the correlation between the extracted factor and an external variable (z), this information is presented as bias estimates in Table 4.

True Corr	n	Orthogonal extraction			Correlated extraction
True Corr	n	High	High mixed	Mixed	High	High mixed	Mixed
0	100	0.47 (0.08)	0.47 (0.08)	0.44 (0.08)	0.47 (0.08)	0.47 (0.08)	0.44 (0.08)
	250	0.47 (0.05)	0.46 (0.05)	0.44 (0.05)	0.47 (0.05)	0.46 (0.05)	0.44 (0.05)
	500	0.47 (0.03)	0.47 (0.04)	0.44 (0.04)	0.47 (0.03)	0.47 (0.04)	0.44 (0.04)
0.3	100	0.47 (0.08)	0.47 (0.08)	0.44 (0.08)	0.46 (0.08)	0.46 (0.08)	0.43 (0.08)
	250	0.46 (0.05)	0.46 (0.05)	0.44 (0.05)	0.46 (0.05)	0.46 (0.05)	0.44 (0.05)
	500	0.47 (0.04)	0.47 (0.04)	0.44 (0.04)	0.46 (0.04)	0.46 (0.04)	0.44 (0.04)
0.5	100	0.47 (0.08)	0.47 (0.08)	0.44 (0.08)	0.45 (0.08)	0.45 (0.08)	0.42 (0.09)
	250	0.47 (0.05)	0.47 (0.05)	0.44 (0.05)	0.45 (0.05)	0.45 (0.05)	0.42 (0.05)
	500	0.47 (0.04)	0.47 (0.04)	0.45 (0.04)	0.45 (0.04)	0.45 (0.04)	0.42 (0.04)
0.8	100	0.46 (0.08)	0.46 (0.08)	0.44 (0.08)	0.40 (0.09)	0.41 (0.09)	0.36 (0.10)
	250	0.47 (0.05)	0.47 (0.05)	0.44 (0.05)	0.40 (0.05)	0.41 (0.05)	0.36 (0.06)
	500	0.47 (0.04)	0.47 (0.03)	0.44 (0.04)	0.40 (0.04)	0.41 (0.04)	0.36 (0.04)

Standard deviations are in parentheses. The correlation between the factor and the external variable was set to 0.50 for all conditions

Rights and permissions

Reprints and permissions

About this article

Cite this article

Logan, J.A.R., Jiang, H., Helsabeck, N. et al. Should I allow my confirmatory factors to correlate during factor score extraction? Implications for the applied researcher. Qual Quant 56, 2107–2131 (2022). https://doi.org/10.1007/s11135-021-01202-x

Download citation

Accepted: 04 July 2021
Published: 23 July 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11135-021-01202-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Should I allow my confirmatory factors to correlate during factor score extraction? Implications for the applied researcher

Abstract

Access this article

Similar content being viewed by others

Best practices for your confirmatory factor analysis: A JASP and lavaan tutorial

Parceling Cannot Reduce Factor Indeterminacy in Factor Analysis: A Research Note

CFA Models with a General Factor and Multiple Sets of Secondary Factors

References

Funding