Skip to main content
Log in

A unified approach to exploratory factor analysis with missing data, nonnormal data, and in the presence of outliers

  • Articles
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Factor analysis is regularly used for analyzing survey data. Missing data, data with outliers and consequently nonnormal data are very common for data obtained through questionnaires. Based on covariance matrix estimates for such nonstandard samples, a unified approach for factor analysis is developed. By generalizing the approach of maximum likelihood under constraints, statistical properties of the estimates for factor loadings and error variances are obtained. A rescaled Bartlett-corrected statistic is proposed for evaluating the number of factors. Equivariance and invariance of parameter estimates and their standard errors for canonical, varimax, and normalized varimax rotations are discussed. Numerical results illustrate the sensitivity of classical methods and advantages of the proposed procedures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison, J., & Silvey, S.D. (1958). Maximum likelihood estimation of parameters subject to restraints.Annals of Mathematical Statistics, 29, 813–828.

    Google Scholar 

  • Algina, J. (1980). A note on identification in the oblique and orthogonal factor analysis models.Psychometrika, 45, 393–396.

    Google Scholar 

  • Allison, P.D. (1987). Estimation of linear models with incomplete data.Sociological Methodology, 17, 71–103.

    Google Scholar 

  • Ammann, L.P. (1989). Robust principal components.Communications in Statistics: Simulation and Computation, 18, 857–874.

    Google Scholar 

  • Anderson, T.W., & Rubin, H. (1956). Statistical inference in factor analysis.Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability (pp. 111–150). Berkeley and Los Angeles: University of California Press.

    Google Scholar 

  • Arbuckle, J.L. (1996). Full information estimation in the presence of incomplete data. In G.A. Marcoulides & R.E. Schumacker (Eds.),Advanced structural equation modeling: Issues and techniques (pp. 243–277). New Jersey, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Archer, C.O., & Jennrich, R.I. (1973). Standard errors for orthogonally rotated factor loadings.Psychometrika, 38, 581–592.

    Google Scholar 

  • Arminger, G., & Sobel, M.E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data.Journal of the American Statistical Association, 85, 195–203.

    Google Scholar 

  • Bartlett, M.S. (1951). The effect of standardisation on an approximation in factor analysis.Biometrika, 38, 337–344.

    Google Scholar 

  • Bentler, P.M., & Yuan, K.-H. (1999). Structural equation modeling with small samples: Test statistics.Multivariate Behavioral Research, 34, 181–197.

    Google Scholar 

  • Birch, J.B., & Myers, R.H. (1982). Robust analysis of covariance.Biometrics, 38, 699–713.

    Google Scholar 

  • Bishop, Y.M.M., Fienberg, S.E., & Holland, P.W. (1975).Discrete multivariate analysis: Theory and practice. Cambridge: MIT Press.

    Google Scholar 

  • Brown, C.H. (1983). Asymptotic comparison of missing data procedures for estimating factor loadings.Psychometrika, 48, 269–291.

    Google Scholar 

  • Browne, M.W. (1982). Covariance structures. In D.M. Hawkins (Ed.),Topics in applied multivariate analysis (pp. 72–141). Cambridge, England: Cambridge University Press.

    Google Scholar 

  • Browne, M.W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures.British Journal of Mathematical and Statistical Psychology, 37, 62–83.

    Google Scholar 

  • Browne, M.W., Cudeck, R., Tateneni, K., & Mels, G. (1998).CEFA: Comprehensive exploratory factor analysis [Computer software]. Columbus, OH: Authors.

    Google Scholar 

  • Browne, M.W., & Du Toit, S.H.C. (1992). Automated fitting of nonstandard models.Multivariate Behavior Research, 27, 269–300.

    Google Scholar 

  • Browne, M.W., & Shapiro, A. (1986). The asymptotic covariance matrix of sample correlation coefficients under general conditions.Linear Algebra and Its Applications, 82, 169–176.

    Google Scholar 

  • Campbell, N.A. (1980). Robust procedures in multivariate analysis I: Robust covariance estimation.Applied Statistics, 29, 231–237.

    Google Scholar 

  • Campbell, N.A. (1982). Robust procedures in multivariate analysis II: Robust canonical variate analysis.Applied Statistics, 31, 1–8.

    Google Scholar 

  • Castaño-Tostado, E., & Tanaka, Y. (1991). Sensitivity measures of influence on the loading matrix in exploratory factor analysis.Communications in Statistics: Theory and Methods, 20, 1329–1343.

    Google Scholar 

  • Chung, E.K.P., & Zak, S.H. (1996).An introduction to optimization. New York, NY: Wiley.

    Google Scholar 

  • Cudeck, R. (1989). Analysis of correlation matrices using covariance structure models.Psychological Bulletin, 105, 317–327.

    Google Scholar 

  • Cudeck, R., & O'Dell, L.L. (1994). Applications of standard error estimates in unrestricted factor analysis: Significance tests for factor loadings and correlations.Psychological Bulletin, 115, 475–487.

    Google Scholar 

  • Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion).Journal of the Royal Statistical Society, Series B, 39, 1–38.

    Google Scholar 

  • Devlin, S.J., Gnanadesikan, R., & Kettenring, J.R. (1981). Robust estimation of dispersion matrices and principal components.Journal of the American Statistical Association, 76, 354–362.

    Google Scholar 

  • Fang, K.-T., Kotz, S., & Ng., K.W. (1990).Symmetric multivariate and related distributions. London, England: Chapman & Hall.

    Google Scholar 

  • Ferguson, T.S. (1996).A course in large sample theory. London, England: Chapman & Hall.

    Google Scholar 

  • Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing.Psychometrika, 44, 409–420.

    Google Scholar 

  • Fouladi, R.T. (2000). Performance of modified test statistics in covariance and correlation structure analysis under conditions of multivariate nonnormality.Structural Equation Modeling, 7, 356–410.

    Google Scholar 

  • Gabriel, K.R., & Odoroff, L. (1984). Resistant lower rank approximation of matrices. In E. Diday M. Jambu, L. Lebart, J. Pages, & R. Tomassone (Eds.),Data analysis and informatics III (pp. 23–30). Amsterdam: North-Holland.

    Google Scholar 

  • Gnanadesikan, R. (1997).Methods for statistical data analysis of multivariate observations. New York, NY: Wiley.

    Google Scholar 

  • Gorsuch, R.L. (1983).Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Green, P.J. (1984). Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistent alternatives (with discussion).Journal of the Royal Statistical Society, Series B, 46, 149–192.

    Google Scholar 

  • Hampel, F.R. (1974). The influence curve and its role in robust estimation.Journal of the American Statistical Association, 69, 383–393.

    Google Scholar 

  • Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., & Stahel, W.A. (1986).Robust statistics: The approach based on influence functions. New York, NY: Wiley.

    Google Scholar 

  • Harman, H.H. (1976).Modern factor analysis (3rd ed.). Chicago, IL: The University of Chicago Press.

    Google Scholar 

  • Hayashi, K., & Sen, P.K. (1998). On covariance estimators of factor loadings in factor analysis.Journal of Multivariate Analysis, 66, 38–45.

    Google Scholar 

  • Hayashi, K., & Yung, Y.F. (1999). Standard errors for the class of orthomax-rotated factor loadings: Some matrix results.Psychometrika, 64, 451–460.

    Google Scholar 

  • Heiser, W.J. (1987). Correspondence analysis with least absolute residuals.Computational Statistics & Data Analysis, 5, 337–356.

    Google Scholar 

  • Hoaglin, D.C., Mosteller, F., & Tukey, J.W. (1983).Understanding robust and exploratory data analysis. New York, NY: Wiley.

    Google Scholar 

  • Holland, P.W., & Welsch, R.E. (1977). Robust regression using iteratively reweighted least-squares.Communications in Statistics-Theory and Methods, Series A, 6, 813–827.

    Google Scholar 

  • Holzinger, K.J., & Swineford, F. (1939).A Study in factor analysis: The stability of a bi-factor solution (Supplementary Educational Monographs, No. 48). Chicago, IL: University of Chicago.

    Google Scholar 

  • Hu, L.T., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted?Psychological Bulletin, 112, 351–362.

    Google Scholar 

  • Huber, P.J. (1977). Robust covariances. In S.S. Gupta & D.S. Moore (Eds.),Statistical decision theory and related topics, Vol. 2 (pp. 165–191). New York, NY: Academic Press.

    Google Scholar 

  • Huber, P.J. (1981).Robust statistics. New York, NY: Wiley.

    Google Scholar 

  • Ichikawa, M., & Konishi, S. (1995). Application of the bootstrap methods in factor analysis.Psychometrika, 60, 77–93.

    Google Scholar 

  • Jamshidian, M., & Bentler, P.M. (1999). Using complete data routines for ML estimation of mean and covariance structures with missing data.Journal Educational and Behavioral Statistics, 23, 21–41.

    Google Scholar 

  • Jennrich, R.I. (1973). Standard errors for obliquely rotated factor loadings.Psychometrika, 38, 593–604.

    Google Scholar 

  • Jennrich, R.I. (1974). Simplified formulae for standard errors in maximum-likelihood factor analysis.British Journal of Mathematical and Statistical Psychology, 27, 122–131.

    Google Scholar 

  • Jennrich, R.I. (1978). Rotational equivalence of factor loading matrices with specified values.Psychometrika, 43, 421–426.

    Google Scholar 

  • Jennrich, R.I., & Thayer, D.T. (1973). A note on Lawley's formulas for standard errors in maximum likelihood factor analysis.Psychometrika, 38, 571–580.

    Google Scholar 

  • Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis.Psychometrika, 23, 187–200.

    Google Scholar 

  • Kano, Y. (1994). Consistency property of elliptical probability density functions.Journal of Multivariate Analysis, 51, 139–147.

    Google Scholar 

  • Kano, Y., Berkane, M., & Bentler, P.M. (1993). Statistical inference based on pseudo-maximum likelihood estimators in elliptical populations.Journal of the American Statistical Association, 88, 135–143.

    Google Scholar 

  • Kenward, M.G., & Molenberghs, G. (1998). Likelihood based frequentist inference when data are missing at random.Statistical Science, 13, 236–247.

    Google Scholar 

  • Kharin, Y.S. (1996). Robustness in discriminant analysis. In H. Rieder (Ed.),Robust statistics, data analysis, and computer intensive methods (pp. 225–234). New York, NY: Springer.

    Google Scholar 

  • Krane, W.R., & McDonald, R.P. (1978). Scale invariance and the factor analysis of correlation matrices.British Journal of Mathematical and Statistical Psychology, 31, 218–228.

    Google Scholar 

  • Krijnen, W.P., Dijkstra, T.K., & Gill, R.D. (1998). Conditions for factor (in)determinacy in factor analysis.Psychometrika, 63, 359–367.

    Google Scholar 

  • Kwan, C.W., & Fung, W.K. (1998). Assessing local influence for specific restricted likelihood: Application to factor analysis.Psychometrika, 63, 35–46.

    Google Scholar 

  • Laird, N.M. (1988). Missing data in longitudinal studies.Statistics in Medicine, 7, 305–315.

    Google Scholar 

  • Lange, K.L., Little, R.J.A., & Taylor, J.M.G. (1989). Robust statistical modeling using the t distribution.Journal of the American Statistical Association, 84, 881–896.

    Google Scholar 

  • Lawley, D.N., & Maxwell, A.E. (1971).Factor analysis as a statistical method (2nd ed.). New York, NY: American Elsevier.

    Google Scholar 

  • Lee, S.-Y. (1986). Estimation for structural equation models with missing data.Psychometrika, 51, 93–99.

    Google Scholar 

  • Lehmann, E.L., & Casella, G. (1998).Theory of point estimation. New York, NY: Springer-Verlag.

    Google Scholar 

  • Liang, K.Y., & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models.Biometrika, 73, 13–22.

    Google Scholar 

  • Little, R.J.A. (1988). Robust estimation of the mean and covariance matrix from data with missing values.Applied Statistics, 37, 23–38.

    Google Scholar 

  • Little, R.J.A., & Rubin, D.B. (1987).Statistical analysis with missing data. New York, NY: Wiley.

    Google Scholar 

  • Little, R.J.A., & Smith, P.J. (1987). Editing and imputation for quantitative survey data.Journal of the American Statistical Association, 82, 58–68.

    Google Scholar 

  • Liu, C., & Rubin, D.B. (1998). Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data.Statistica Sinica, 8, 729–747.

    Google Scholar 

  • Lopuhaä, H.P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariances.Annals of Statistics, 17, 1662–1683.

    Google Scholar 

  • Magnus, J.R., & Neudecker, H. (1988).Matrix differential calculus with applications in statistics and econometrics. New York, NY: Wiley.

    Google Scholar 

  • Mardia, K.V. (1970). Measure of multivariate skewness and kurtosis with applications.Biometrika, 57, 519–530.

    Google Scholar 

  • Maronna, R.A. (1976). Robust M-estimators of multivariate location and scatter.Annals of Statistics, 4, 51–67.

    Google Scholar 

  • McDonald, R.P. (1999).Test theory: A unified treatment. New Jersey, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.Psychological Bulletin, 105, 156–166.

    Google Scholar 

  • Mooijaart, A. (1985). Factor analysis for nonnormal variables.Psychometrika, 50, 323–342.

    Google Scholar 

  • Mooijaart, A., & Bentler, P.M. (1985). The weight matrix in asymptotic distribution-free methods.British Journal of Mathematical and Statistical Psychology, 38, 190–196.

    Google Scholar 

  • Muthén, B., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random.Psychometrika, 52, 431–462.

    Google Scholar 

  • Ogasawara, H. (1996). Standard errors for rotated factor loadings by normalized orthomax method.Japanese Journal of Behaviormetrics, 23, 122–129.

    Google Scholar 

  • Ogasawara, H. (1998). Standard errors for rotation matrices with an application to promax solution.British Journal of Mathematical and Statistical Psychology, 51, 163–178.

    Google Scholar 

  • Ogasawara, H. (1999). Standard errors for procrustes solutions.Japanese Psychological Research, 41, 121–130.

    Google Scholar 

  • Rousseeuw, P.J., & van Zomeren, B.C. (1990). Unmasking multivariate outliers and leverage points.Journal of the American Statistical Association, 85, 633–639.

    Google Scholar 

  • Rao, C.R. (1955). Estimation and tests of significance in factor analysis.Psychometrika, 20, 93–111.

    Google Scholar 

  • Rao, C.R. (1973).Linear statistical inference and its applications (2nd ed.). New York, NY: Wiley.

    Google Scholar 

  • Rovine, M.J. (1994). Latent variables models and missing data analysis. In A. von Eye & C.C. Clogg (Eds.)Latent variables analysis: Applications for developmental research (pp. 181–225). Thousand Oaks, CA: Sage.

    Google Scholar 

  • Rubin, D.B. (1987).Multiple imputation for nonresponse in surveys. New York, NY: Wiley.

    Google Scholar 

  • Rudin, W. (1976).Principles of mathematical analysis (3rd ed.). New York, NY: McGraw-Hill.

    Google Scholar 

  • SAS Institute. (1999).SAS/STAT (V.8) PROC TFACTOR. Cary, NC: Author.

    Google Scholar 

  • Satorra, A., & Bentler, P.M. (1986). Some robustness properties of goodness of fit statistics in covariance structure analysis.1986 Proceedings of Business and Economics Sections of the American Statistical Association (pp 549–554). Alexandria, VA: American Statistical Association.

    Google Scholar 

  • Satorra, A., & Bentler, P.M. (1988). Scaling corrections for chi-square statistic in covariance structure analysis.Proceedings of the American Statistical Association (pp. 308–313). Alexandria, VA: American Statistical Association.

    Google Scholar 

  • Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.)Latent variables analysis: Applications for developmental research (pp. 399–419). Newbury Park, CA: Sage.

    Google Scholar 

  • Shapiro, A. (1985). Identifiability of factor analysis: Some results and open problems.Linear Algebra and Its Applications, 70, 1–7.

    Google Scholar 

  • Shapiro, A., & Browne, M.W. (1990). On the treatment of correlation structures as covariance structures.Linear Algebra and Its Applications, 127, 567–587.

    Google Scholar 

  • Steiger, J.H., & Hakstian, A.R. (1982). The asymptotic distribution of elements of a correlation matrix: Theory and application.British Journal of Mathematical and Statistical Psychology, 35, 208–215.

    Google Scholar 

  • Swaminathan, H., & Algina, J. (1978). Scale freeness in factor analysis.Psychometrika, 43, 581–583.

    Google Scholar 

  • Tanaka, Y., & Odaka, Y. (1989). Influential observations in principal factor analysis.Psychometrika, 54, 475–485.

    Google Scholar 

  • Tyler, D.E. (1983). Robustness and efficiency properties of scatter matrices.Biometrika, 70, 411–420.

    Google Scholar 

  • Verboon, P., & Heiser, W.J. (1994). Resistant lower rank approximation of matrices by iterative majorization.Computational Statistics & Data Analysis, 18, 457–467.

    Google Scholar 

  • Wilcox, R.R. (1997).Introduction to robust estimation and hypothesis testing. San Diego, CA: Academic Press.

    Google Scholar 

  • Yuan, K.-H., & Bentler, P.M. (1998a). Robust mean and covariance structure analysis.British Journal of Mathematical and Statistical Psychology, 51, 63–88.

    Google Scholar 

  • Yuan, K.-H., & Bentler, P.M. (1998b). Structural equation modeling with robust covariances.Sociological methodology, 28, 363–396.

    Google Scholar 

  • Yuan, K.-H., & Bentler, P.M. (2000a). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data.Sociological Methodology, 30, 167–202.

    Google Scholar 

  • Yuan, K.-H., & Bentler, P.M. (2000b). On equivariance and invariance of standard errors in three exploratory factor models.Psychometrika, 65, 121–133.

    Google Scholar 

  • Yuan, K.-H., Bentler, P.M., & Chan, W. (1999).Structural equation modeling with heavy tailed distributions through bootstrap. Manuscript submitted for publication.

  • Yuan, K.-H., & Jennrich, R.I. (1998). Asymptotics of estimating equations under natural conditions.Journal of Multivariate Analysis, 65, 245–260.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Hai Yuan.

Additional information

This project was supported by a University of North Texas Faculty Research Grant, Grant #R49/CCR610528 for Disease Control and Prevention from the National Center for Injury Prevention and Control, and Grant DA01070 from the National Institute on Drug Abuse. The results do not necessarily represent the official view of the funding agencies. The authors are grateful to three reviewers for suggestions that improved the presentation of this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, KH., Marshall, L.L. & Bentler, P.M. A unified approach to exploratory factor analysis with missing data, nonnormal data, and in the presence of outliers. Psychometrika 67, 95–121 (2002). https://doi.org/10.1007/BF02294711

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02294711

Key words

Navigation