Skip to main content
Log in

Should I allow my confirmatory factors to correlate during factor score extraction? Implications for the applied researcher

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

With complex models becoming increasingly popular in the social sciences, many researchers have begun using latent variable modeling in multiple-steps, saving, estimating, or otherwise extracting factor scores from one confirmatory factor analysis (CFA) for use in a second inferential analysis. With two or more factors identified in a CFA, there exist few practical guidelines as to how researchers should proceed. In Study 1, we examine two common practices when CFAs have two or more factors: Fitting separate CFAs or allowing them to correlate in the model used for extraction. We provide a simulation study to demonstrate the bias introduced in each of the two approaches. In Study 2, we demonstrate that the between-factor correlation bias can be mitigated through the use of a different estimator; using ten Berge estimation shows near zero bias on the critical correlations between factors. Finally, we demonstrate this with an example dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Anderson, T.W., Rubin, H.: Statistical inference in factor analysis, vol. 5, pp. 111–150 (1956)

  • Bollen, K.A.: Structural Equations with Latent Variables. Wiley, New York (1989)

    Book  Google Scholar 

  • Borgeest, G.S., Henson, R., Shafto, M., Samu, D., Kievit, R.: Greater lifestyle engagement is associated with better cognitive resilience (2018). https://doi.org/10.31234/osf.io/6pzve

  • Brown, T.A.: Confirmatory Factor Analysis for Applied Research, 2nd edn. Gilford Press, New York (2015)

    Google Scholar 

  • Croon, M.: Using predicted latent scores in general latent structure models. In: Marcoulides, G.A., Moustaki, I. (eds.) Latent variable and latent structure models, p. 195. Lawrence Erlbaum, Mahwah (2002)

    Google Scholar 

  • Curran, P.J., Hussong, A.M.: Integrative data analysis: the simultaneous analysis of multiple data sets. Psychol. Methods 14(2), 81 (2009)

    Article  Google Scholar 

  • Curran, P.J., Cole, V.T., Bauer, D.J., Rothenberg, W.A., Hussong, A.M., Gottfredson, N.: Improving factor score estimation through the use of observed background characteristics. Struct. Equ. Model. 23(6), 827–844 (2016)

    Article  Google Scholar 

  • Curran, P.J., Cole, V.T., Bauer, D.J., Rothenberg, W.A., Hussong, A.M.: Recovering predictor-criterion relations using covariate-informed factor score estimates. Struct. Equ. Model. 25(6), 860–875 (2018). https://doi.org/10.1080/10705511.2018.1473773

    Article  Google Scholar 

  • Devlieger, I., Mayer, A., Rosseel, Y.: Hypothesis testing using factor score regression: a comparison of four methods. Educ. Psychol. Meas. 76(5), 741–770 (2016). https://doi.org/10.1177/0013164415607618

    Article  Google Scholar 

  • DiStefano, C., Zhu, M., Mindrila, D.: Understanding and using factor scores: considerations for the applied researcher. Pract. Assess. Res. Eval. 14(20), 1–11 (2009)

    Google Scholar 

  • Fernández-Giménez, M.E., Allington, G.R., Angerer, J., Reid, R.S., Jamsranjav, C., Ulambayar, T., Hondula, K., Baival, B., Batjav, B., Altanzul, T.: Using an integrated social-ecological analysis to detect effects of household herding practices on indicators of rangeland resilience in Mongolia. Environ. Res. Lett. 13(7), 075010 (2018)

    Article  Google Scholar 

  • Greenbaum, P.E., Wang, W., Henderson, C.E., Kan, L., Hall, K., Dakof, G.A., Liddle, H.A.: Gender and ethnicity as moderators: integrative data analysis of multidimensional family therapy randomized clinical trials. J. Fam. Psychol. 29(6), 919–930 (2015). https://doi.org/10.1037/fam0000127

    Article  Google Scholar 

  • Harrington, D.: Confirmatory Factor Analysis. Oxford University Press, Oxford (2009)

    Google Scholar 

  • Holzinger, K.J., Swineford, F.A.: A Study of Factor Analysis: The Stability of a Bi-Factor Solution. University of Chicago Press, Chicago (1939)

    Google Scholar 

  • Hoshino, T., Bentler, P.M.: Bias in factor score regression and a simple solution. UCLA, Department of Statistics (2011). https://escholarship.org/uc/item/45h3t3t2

  • Jöreskog, K.G., Sörbom, D.: LISREL 8: User’s Reference Guide. Scientific Software International, Chicago (1996)

    Google Scholar 

  • Kim, Y.S., Al Otaiba, S., Wanzek, J., Gatlin, B.: Toward an understanding of dimensions, predictors, and the gender gap in written composition. J. Educ. Psychol. 107(1), 79 (2015)

    Article  Google Scholar 

  • Kline, R.B.: Principles and Practice of Structural Equation Modeling. Guilford Publications, Chicago (2015)

    Google Scholar 

  • Krijnen, W.P., Wansbeek, T., ten Berge, J.M.F.: Best linear predictors for factor scores. Commun. Stat. Theory Methods 25(12), 3013–3025 (1996)

    Article  Google Scholar 

  • Logan, J.A.R.: Question for researchers: have you ever run a confirmatory factor analysis and then saved out the factor scores (turning into observed scores) for use in another analysis? No(28%), Yes (40%), Just show me the results (32%). [tweet]. (2018). https://twitter.com/jarlogan/status/1058009194006220802

  • Lu, I.R.R., Thomas, D.R.: Avoiding and correcting bias in score-based latent variable regression with discrete manifest items. Struct. Equ. Model. 15(3), 462–490 (2008). https://doi.org/10.1080/10705510802154323

    Article  Google Scholar 

  • McDonald, R.P.: The dimensionality of tests and items. Br. J. Math. Stat. Psychol. 34(1), 100–117 (1981)

    Article  Google Scholar 

  • McNeish, D., Wolf, M.G.: Thinking twice about sum scores. Behav. Res. Methods 52(6), 2287–2305 (2020). https://doi.org/10.3758/s13428-020-01398-0

    Article  Google Scholar 

  • Muthén, L.K., Muthén, B.O.: Mplus: Statistical Analysis with Latent Variables: User’s Guide (Version 8). Los Angeles, CA: Muthén & Muthén (2017). https://www.statmodel.com/

  • Purpura, D.J., Hume, L.E., Sims, D.M., Lonigan, C.J.: Early literacy and early numeracy: the value of including early literacy skills in the prediction of numeracy development. J. Exp. Child Psychol. 110(4), 647–658 (2011)

    Article  Google Scholar 

  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2017)

    Google Scholar 

  • Rimm-Kaufman, S.E., Baroody, A.E., Larsen, R.A., Curby, T.W., Abry, T.: To what extent do teacher–student interaction quality and student gender contribute to fifth graders’ engagement in mathematics learning? J. Educ. Psychol. 107(1), 170 (2015)

    Article  Google Scholar 

  • Rose, J.S., Dierker, L.C., Hedeker, D., Mermelstein, R.: An integrated data analysis approach to investigating measurement equivalence of DSM nicotine dependence symptoms. Drug Alcohol Depend. 129(1–2), 25–32 (2013)

    Article  Google Scholar 

  • Rosseel, Y.: Lavaan: an R package for structural equation modeling. J. Stat. Softw. 48, 1–36 (2012a)

    Article  Google Scholar 

  • Rosseel, Y.: lavaan: an R package for structural equation modeling. J. Stat. Softw. 48(2), 1–36 (2012b)

    Article  Google Scholar 

  • Skrondal, A., Laake, P.: Regression among factor scores. Psychometrika 66(4), 563–575 (2001)

    Article  Google Scholar 

  • ten Berge, J.M.F., Krijnen, W.P., Wansbeek, T., Shapiro, A.: Some new results on correlation-preserving factor scores prediction methods. Linear Algebra Appl. 289(1–3), 311–318 (1999)

    Article  Google Scholar 

  • Thurstone, L.L.: The Vectors of Mind: Multiple-Factor Analysis for the Isolation of Primary Traits. University of Chicago Press, Chicago (1935)

    Book  Google Scholar 

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jessica A. R. Logan.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

This study did not involve human subjects and was not subject to ethics approval.

Informed consent

Informed consent was not applicable for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

1.1 Data generation

The data generation model was based on Eq. 4 (data model) and Eq. 5 (implied covariance form), where \(y_{i}\) is the p × 1 observed data vector for the ith observation and p = number of indicators, i.e. 8; \({\Sigma }\) is the implied covariance matrix for y; \(\mu\) is the p × 1 intercept vector fixed to 0; \({\Lambda }\) is the p × m matrix of factor loadings (fixed), where m = number of factors (i.e. 2), \(\eta_{i}\) is the m × 1 vector of “true” factor scores for the ith observation following a multivariate normal distribution, MVN \(\left( {0,{\Phi }} \right)\), and \(\epsilon_{i}\) is the p × 1 residual vector for the ith observation following a multivariate normal distribution, MVN \(\left( {0,{\Psi }} \right)\). Note that \({\Phi }\) is a m × m matrix which represents the covariance matrix between the factors, and \({\Psi }\) is a p × p diagonal matrix representing the error covariance matrix, where the error variances are set up as such that the variance of y is 1.

$$y_{i} = \mu + \Lambda \eta_{i} + \upepsilon_{i}$$
(4)
$$\Sigma = \Lambda \Phi \Lambda^{^{\prime}} + \Psi$$
(5)

In addition, we also introduced an external variable z that correlated at 0.5 with factor 1 only. It is assumed that z is normally distributed with a variance of 1. The covariance matrix between factors and variable z is therefore

$$\Phi_{{\left( {m + 1} \right) \times \left( {m + 1} \right)}}^{*} = \left[ {\begin{array}{*{20}c} \Phi &\upgamma \\ {\upgamma ^{\prime } } & 1 \\ \end{array} } \right],$$

where \({\upgamma }^{\prime }\) = [0.5, 0, …, 0] with dimension of 1 × m.

For each replication, data were generated in the following steps: (1) a series of true factor scores (γ) as well as variables (z) were drawn randomly from the multivariate normal distribution MVN(0, \({\Phi }^{*}\)), and a series of residuals \(\upepsilon\)were drawn from MVN(0, \({\Psi }\)); and (2) the observed data matrix Y of dimension p x n (n = sample size) was calculated based on Eq. 4.

figure a
figure b
figure c

Appendix 2

2.1 Code to calculate ten Berge factor scores

figure d
figure e

Appendix 3

Means and standard deviations of between-factor correlation estimates used to convert to the bias estimates presented in Table 3.

True corr

n

Orthogonal extraction

Correlated extraction

High

High mixed

Mixed

High

High mixed

Mixed

0

100

0 (0.10)

0 (0.10)

0 (0.10)

0 (0.13)

0 (0.14)

0 (0.15)

250

0 (0.06)

0 (0.06)

0 (0.06)

0 (0.08)

0 (0.09)

0 (0.09)

500

0 (0.04)

0 (0.04)

0 (0.04)

0 (0.06)

0 (0.06)

0 (0.07)

0.3

100

0.25 (0.09)

0.25 (0.09)

0.23 (0.10)

0.33 (0.12)

0.34 (0.13)

0.34 (0.14)

250

0.27 (0.06)

0.25 (0.06)

0.23 (0.06)

0.34 (0.07)

0.34 (0.09)

0.35 (0.09)

500

0.26 (0.04)

0.25 (0.04)

0.23 (0.04)

0.34 (0.05)

0.34 (0.06)

0.35 (0.06)

0.5

100

0.43 (0.08)

0.40 (0.09)

0.38 (0.09)

0.55 (0.10)

0.56 (0.11)

0.57 (0.13)

250

0.44 (0.05)

0.41 (0.05)

0.39 (0.05)

0.55 (0.06)

0.57 (0.07)

0.59 (0.07)

500

0.44 (0.04)

0.42 (0.04)

0.39 (0.04)

0.56 (0.04)

0.57 (0.05)

0.59 (0.05)

0.8

100

0.70 (0.05)

0.65 (0.06)

0.61 (0.07)

0.87 (0.05)

0.88 (0.06)

0.88 (0.06)

250

0.70 (0.03)

0.66 (0.04)

0.63 (0.04)

0.87 (0.03)

0.88 (0.03)

0.89 (0.04)

500

0.70 (0.02)

0.66 (0.02)

0.63 (0.03)

0.87 (0.02)

0.88 (0.02)

0.89 (0.03)

  1. Standard deviations are in parentheses. For High = all factor loadings set to 0.80; HM = for factor 1, the loadings are all 0.80, and for factor 2 the loadings are 0.40, 0.40, 0.80, 0.80; Mixed = for factor 1 and factor 2, the loadings are 0.40, 0.40, 0.80, 0.80

Appendix 4

Mean estimates and standard deviations for the correlation between the extracted factor and an external variable (z), this information is presented as bias estimates in Table 4.

True Corr

n

Orthogonal extraction

Correlated extraction

High

High mixed

Mixed

High

High mixed

Mixed

0

100

0.47 (0.08)

0.47 (0.08)

0.44 (0.08)

0.47 (0.08)

0.47 (0.08)

0.44 (0.08)

250

0.47 (0.05)

0.46 (0.05)

0.44 (0.05)

0.47 (0.05)

0.46 (0.05)

0.44 (0.05)

500

0.47 (0.03)

0.47 (0.04)

0.44 (0.04)

0.47 (0.03)

0.47 (0.04)

0.44 (0.04)

0.3

100

0.47 (0.08)

0.47 (0.08)

0.44 (0.08)

0.46 (0.08)

0.46 (0.08)

0.43 (0.08)

250

0.46 (0.05)

0.46 (0.05)

0.44 (0.05)

0.46 (0.05)

0.46 (0.05)

0.44 (0.05)

500

0.47 (0.04)

0.47 (0.04)

0.44 (0.04)

0.46 (0.04)

0.46 (0.04)

0.44 (0.04)

0.5

100

0.47 (0.08)

0.47 (0.08)

0.44 (0.08)

0.45 (0.08)

0.45 (0.08)

0.42 (0.09)

250

0.47 (0.05)

0.47 (0.05)

0.44 (0.05)

0.45 (0.05)

0.45 (0.05)

0.42 (0.05)

500

0.47 (0.04)

0.47 (0.04)

0.45 (0.04)

0.45 (0.04)

0.45 (0.04)

0.42 (0.04)

0.8

100

0.46 (0.08)

0.46 (0.08)

0.44 (0.08)

0.40 (0.09)

0.41 (0.09)

0.36 (0.10)

250

0.47 (0.05)

0.47 (0.05)

0.44 (0.05)

0.40 (0.05)

0.41 (0.05)

0.36 (0.06)

500

0.47 (0.04)

0.47 (0.03)

0.44 (0.04)

0.40 (0.04)

0.41 (0.04)

0.36 (0.04)

  1. Standard deviations are in parentheses. The correlation between the factor and the external variable was set to 0.50 for all conditions

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Logan, J.A.R., Jiang, H., Helsabeck, N. et al. Should I allow my confirmatory factors to correlate during factor score extraction? Implications for the applied researcher. Qual Quant 56, 2107–2131 (2022). https://doi.org/10.1007/s11135-021-01202-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-021-01202-x

Keywords

Navigation