Abstract
Factor analysis is regularly used for analyzing survey data. Missing data, data with outliers and consequently nonnormal data are very common for data obtained through questionnaires. Based on covariance matrix estimates for such nonstandard samples, a unified approach for factor analysis is developed. By generalizing the approach of maximum likelihood under constraints, statistical properties of the estimates for factor loadings and error variances are obtained. A rescaled Bartlett-corrected statistic is proposed for evaluating the number of factors. Equivariance and invariance of parameter estimates and their standard errors for canonical, varimax, and normalized varimax rotations are discussed. Numerical results illustrate the sensitivity of classical methods and advantages of the proposed procedures.
Similar content being viewed by others
References
Aitchison, J., & Silvey, S.D. (1958). Maximum likelihood estimation of parameters subject to restraints.Annals of Mathematical Statistics, 29, 813–828.
Algina, J. (1980). A note on identification in the oblique and orthogonal factor analysis models.Psychometrika, 45, 393–396.
Allison, P.D. (1987). Estimation of linear models with incomplete data.Sociological Methodology, 17, 71–103.
Ammann, L.P. (1989). Robust principal components.Communications in Statistics: Simulation and Computation, 18, 857–874.
Anderson, T.W., & Rubin, H. (1956). Statistical inference in factor analysis.Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability (pp. 111–150). Berkeley and Los Angeles: University of California Press.
Arbuckle, J.L. (1996). Full information estimation in the presence of incomplete data. In G.A. Marcoulides & R.E. Schumacker (Eds.),Advanced structural equation modeling: Issues and techniques (pp. 243–277). New Jersey, NJ: Lawrence Erlbaum Associates.
Archer, C.O., & Jennrich, R.I. (1973). Standard errors for orthogonally rotated factor loadings.Psychometrika, 38, 581–592.
Arminger, G., & Sobel, M.E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data.Journal of the American Statistical Association, 85, 195–203.
Bartlett, M.S. (1951). The effect of standardisation on an approximation in factor analysis.Biometrika, 38, 337–344.
Bentler, P.M., & Yuan, K.-H. (1999). Structural equation modeling with small samples: Test statistics.Multivariate Behavioral Research, 34, 181–197.
Birch, J.B., & Myers, R.H. (1982). Robust analysis of covariance.Biometrics, 38, 699–713.
Bishop, Y.M.M., Fienberg, S.E., & Holland, P.W. (1975).Discrete multivariate analysis: Theory and practice. Cambridge: MIT Press.
Brown, C.H. (1983). Asymptotic comparison of missing data procedures for estimating factor loadings.Psychometrika, 48, 269–291.
Browne, M.W. (1982). Covariance structures. In D.M. Hawkins (Ed.),Topics in applied multivariate analysis (pp. 72–141). Cambridge, England: Cambridge University Press.
Browne, M.W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures.British Journal of Mathematical and Statistical Psychology, 37, 62–83.
Browne, M.W., Cudeck, R., Tateneni, K., & Mels, G. (1998).CEFA: Comprehensive exploratory factor analysis [Computer software]. Columbus, OH: Authors.
Browne, M.W., & Du Toit, S.H.C. (1992). Automated fitting of nonstandard models.Multivariate Behavior Research, 27, 269–300.
Browne, M.W., & Shapiro, A. (1986). The asymptotic covariance matrix of sample correlation coefficients under general conditions.Linear Algebra and Its Applications, 82, 169–176.
Campbell, N.A. (1980). Robust procedures in multivariate analysis I: Robust covariance estimation.Applied Statistics, 29, 231–237.
Campbell, N.A. (1982). Robust procedures in multivariate analysis II: Robust canonical variate analysis.Applied Statistics, 31, 1–8.
Castaño-Tostado, E., & Tanaka, Y. (1991). Sensitivity measures of influence on the loading matrix in exploratory factor analysis.Communications in Statistics: Theory and Methods, 20, 1329–1343.
Chung, E.K.P., & Zak, S.H. (1996).An introduction to optimization. New York, NY: Wiley.
Cudeck, R. (1989). Analysis of correlation matrices using covariance structure models.Psychological Bulletin, 105, 317–327.
Cudeck, R., & O'Dell, L.L. (1994). Applications of standard error estimates in unrestricted factor analysis: Significance tests for factor loadings and correlations.Psychological Bulletin, 115, 475–487.
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion).Journal of the Royal Statistical Society, Series B, 39, 1–38.
Devlin, S.J., Gnanadesikan, R., & Kettenring, J.R. (1981). Robust estimation of dispersion matrices and principal components.Journal of the American Statistical Association, 76, 354–362.
Fang, K.-T., Kotz, S., & Ng., K.W. (1990).Symmetric multivariate and related distributions. London, England: Chapman & Hall.
Ferguson, T.S. (1996).A course in large sample theory. London, England: Chapman & Hall.
Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing.Psychometrika, 44, 409–420.
Fouladi, R.T. (2000). Performance of modified test statistics in covariance and correlation structure analysis under conditions of multivariate nonnormality.Structural Equation Modeling, 7, 356–410.
Gabriel, K.R., & Odoroff, L. (1984). Resistant lower rank approximation of matrices. In E. Diday M. Jambu, L. Lebart, J. Pages, & R. Tomassone (Eds.),Data analysis and informatics III (pp. 23–30). Amsterdam: North-Holland.
Gnanadesikan, R. (1997).Methods for statistical data analysis of multivariate observations. New York, NY: Wiley.
Gorsuch, R.L. (1983).Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Green, P.J. (1984). Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistent alternatives (with discussion).Journal of the Royal Statistical Society, Series B, 46, 149–192.
Hampel, F.R. (1974). The influence curve and its role in robust estimation.Journal of the American Statistical Association, 69, 383–393.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., & Stahel, W.A. (1986).Robust statistics: The approach based on influence functions. New York, NY: Wiley.
Harman, H.H. (1976).Modern factor analysis (3rd ed.). Chicago, IL: The University of Chicago Press.
Hayashi, K., & Sen, P.K. (1998). On covariance estimators of factor loadings in factor analysis.Journal of Multivariate Analysis, 66, 38–45.
Hayashi, K., & Yung, Y.F. (1999). Standard errors for the class of orthomax-rotated factor loadings: Some matrix results.Psychometrika, 64, 451–460.
Heiser, W.J. (1987). Correspondence analysis with least absolute residuals.Computational Statistics & Data Analysis, 5, 337–356.
Hoaglin, D.C., Mosteller, F., & Tukey, J.W. (1983).Understanding robust and exploratory data analysis. New York, NY: Wiley.
Holland, P.W., & Welsch, R.E. (1977). Robust regression using iteratively reweighted least-squares.Communications in Statistics-Theory and Methods, Series A, 6, 813–827.
Holzinger, K.J., & Swineford, F. (1939).A Study in factor analysis: The stability of a bi-factor solution (Supplementary Educational Monographs, No. 48). Chicago, IL: University of Chicago.
Hu, L.T., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted?Psychological Bulletin, 112, 351–362.
Huber, P.J. (1977). Robust covariances. In S.S. Gupta & D.S. Moore (Eds.),Statistical decision theory and related topics, Vol. 2 (pp. 165–191). New York, NY: Academic Press.
Huber, P.J. (1981).Robust statistics. New York, NY: Wiley.
Ichikawa, M., & Konishi, S. (1995). Application of the bootstrap methods in factor analysis.Psychometrika, 60, 77–93.
Jamshidian, M., & Bentler, P.M. (1999). Using complete data routines for ML estimation of mean and covariance structures with missing data.Journal Educational and Behavioral Statistics, 23, 21–41.
Jennrich, R.I. (1973). Standard errors for obliquely rotated factor loadings.Psychometrika, 38, 593–604.
Jennrich, R.I. (1974). Simplified formulae for standard errors in maximum-likelihood factor analysis.British Journal of Mathematical and Statistical Psychology, 27, 122–131.
Jennrich, R.I. (1978). Rotational equivalence of factor loading matrices with specified values.Psychometrika, 43, 421–426.
Jennrich, R.I., & Thayer, D.T. (1973). A note on Lawley's formulas for standard errors in maximum likelihood factor analysis.Psychometrika, 38, 571–580.
Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis.Psychometrika, 23, 187–200.
Kano, Y. (1994). Consistency property of elliptical probability density functions.Journal of Multivariate Analysis, 51, 139–147.
Kano, Y., Berkane, M., & Bentler, P.M. (1993). Statistical inference based on pseudo-maximum likelihood estimators in elliptical populations.Journal of the American Statistical Association, 88, 135–143.
Kenward, M.G., & Molenberghs, G. (1998). Likelihood based frequentist inference when data are missing at random.Statistical Science, 13, 236–247.
Kharin, Y.S. (1996). Robustness in discriminant analysis. In H. Rieder (Ed.),Robust statistics, data analysis, and computer intensive methods (pp. 225–234). New York, NY: Springer.
Krane, W.R., & McDonald, R.P. (1978). Scale invariance and the factor analysis of correlation matrices.British Journal of Mathematical and Statistical Psychology, 31, 218–228.
Krijnen, W.P., Dijkstra, T.K., & Gill, R.D. (1998). Conditions for factor (in)determinacy in factor analysis.Psychometrika, 63, 359–367.
Kwan, C.W., & Fung, W.K. (1998). Assessing local influence for specific restricted likelihood: Application to factor analysis.Psychometrika, 63, 35–46.
Laird, N.M. (1988). Missing data in longitudinal studies.Statistics in Medicine, 7, 305–315.
Lange, K.L., Little, R.J.A., & Taylor, J.M.G. (1989). Robust statistical modeling using the t distribution.Journal of the American Statistical Association, 84, 881–896.
Lawley, D.N., & Maxwell, A.E. (1971).Factor analysis as a statistical method (2nd ed.). New York, NY: American Elsevier.
Lee, S.-Y. (1986). Estimation for structural equation models with missing data.Psychometrika, 51, 93–99.
Lehmann, E.L., & Casella, G. (1998).Theory of point estimation. New York, NY: Springer-Verlag.
Liang, K.Y., & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models.Biometrika, 73, 13–22.
Little, R.J.A. (1988). Robust estimation of the mean and covariance matrix from data with missing values.Applied Statistics, 37, 23–38.
Little, R.J.A., & Rubin, D.B. (1987).Statistical analysis with missing data. New York, NY: Wiley.
Little, R.J.A., & Smith, P.J. (1987). Editing and imputation for quantitative survey data.Journal of the American Statistical Association, 82, 58–68.
Liu, C., & Rubin, D.B. (1998). Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data.Statistica Sinica, 8, 729–747.
Lopuhaä, H.P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariances.Annals of Statistics, 17, 1662–1683.
Magnus, J.R., & Neudecker, H. (1988).Matrix differential calculus with applications in statistics and econometrics. New York, NY: Wiley.
Mardia, K.V. (1970). Measure of multivariate skewness and kurtosis with applications.Biometrika, 57, 519–530.
Maronna, R.A. (1976). Robust M-estimators of multivariate location and scatter.Annals of Statistics, 4, 51–67.
McDonald, R.P. (1999).Test theory: A unified treatment. New Jersey, NJ: Lawrence Erlbaum Associates.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.Psychological Bulletin, 105, 156–166.
Mooijaart, A. (1985). Factor analysis for nonnormal variables.Psychometrika, 50, 323–342.
Mooijaart, A., & Bentler, P.M. (1985). The weight matrix in asymptotic distribution-free methods.British Journal of Mathematical and Statistical Psychology, 38, 190–196.
Muthén, B., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random.Psychometrika, 52, 431–462.
Ogasawara, H. (1996). Standard errors for rotated factor loadings by normalized orthomax method.Japanese Journal of Behaviormetrics, 23, 122–129.
Ogasawara, H. (1998). Standard errors for rotation matrices with an application to promax solution.British Journal of Mathematical and Statistical Psychology, 51, 163–178.
Ogasawara, H. (1999). Standard errors for procrustes solutions.Japanese Psychological Research, 41, 121–130.
Rousseeuw, P.J., & van Zomeren, B.C. (1990). Unmasking multivariate outliers and leverage points.Journal of the American Statistical Association, 85, 633–639.
Rao, C.R. (1955). Estimation and tests of significance in factor analysis.Psychometrika, 20, 93–111.
Rao, C.R. (1973).Linear statistical inference and its applications (2nd ed.). New York, NY: Wiley.
Rovine, M.J. (1994). Latent variables models and missing data analysis. In A. von Eye & C.C. Clogg (Eds.)Latent variables analysis: Applications for developmental research (pp. 181–225). Thousand Oaks, CA: Sage.
Rubin, D.B. (1987).Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
Rudin, W. (1976).Principles of mathematical analysis (3rd ed.). New York, NY: McGraw-Hill.
SAS Institute. (1999).SAS/STAT (V.8) PROC TFACTOR. Cary, NC: Author.
Satorra, A., & Bentler, P.M. (1986). Some robustness properties of goodness of fit statistics in covariance structure analysis.1986 Proceedings of Business and Economics Sections of the American Statistical Association (pp 549–554). Alexandria, VA: American Statistical Association.
Satorra, A., & Bentler, P.M. (1988). Scaling corrections for chi-square statistic in covariance structure analysis.Proceedings of the American Statistical Association (pp. 308–313). Alexandria, VA: American Statistical Association.
Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.)Latent variables analysis: Applications for developmental research (pp. 399–419). Newbury Park, CA: Sage.
Shapiro, A. (1985). Identifiability of factor analysis: Some results and open problems.Linear Algebra and Its Applications, 70, 1–7.
Shapiro, A., & Browne, M.W. (1990). On the treatment of correlation structures as covariance structures.Linear Algebra and Its Applications, 127, 567–587.
Steiger, J.H., & Hakstian, A.R. (1982). The asymptotic distribution of elements of a correlation matrix: Theory and application.British Journal of Mathematical and Statistical Psychology, 35, 208–215.
Swaminathan, H., & Algina, J. (1978). Scale freeness in factor analysis.Psychometrika, 43, 581–583.
Tanaka, Y., & Odaka, Y. (1989). Influential observations in principal factor analysis.Psychometrika, 54, 475–485.
Tyler, D.E. (1983). Robustness and efficiency properties of scatter matrices.Biometrika, 70, 411–420.
Verboon, P., & Heiser, W.J. (1994). Resistant lower rank approximation of matrices by iterative majorization.Computational Statistics & Data Analysis, 18, 457–467.
Wilcox, R.R. (1997).Introduction to robust estimation and hypothesis testing. San Diego, CA: Academic Press.
Yuan, K.-H., & Bentler, P.M. (1998a). Robust mean and covariance structure analysis.British Journal of Mathematical and Statistical Psychology, 51, 63–88.
Yuan, K.-H., & Bentler, P.M. (1998b). Structural equation modeling with robust covariances.Sociological methodology, 28, 363–396.
Yuan, K.-H., & Bentler, P.M. (2000a). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data.Sociological Methodology, 30, 167–202.
Yuan, K.-H., & Bentler, P.M. (2000b). On equivariance and invariance of standard errors in three exploratory factor models.Psychometrika, 65, 121–133.
Yuan, K.-H., Bentler, P.M., & Chan, W. (1999).Structural equation modeling with heavy tailed distributions through bootstrap. Manuscript submitted for publication.
Yuan, K.-H., & Jennrich, R.I. (1998). Asymptotics of estimating equations under natural conditions.Journal of Multivariate Analysis, 65, 245–260.
Author information
Authors and Affiliations
Corresponding author
Additional information
This project was supported by a University of North Texas Faculty Research Grant, Grant #R49/CCR610528 for Disease Control and Prevention from the National Center for Injury Prevention and Control, and Grant DA01070 from the National Institute on Drug Abuse. The results do not necessarily represent the official view of the funding agencies. The authors are grateful to three reviewers for suggestions that improved the presentation of this paper.
Rights and permissions
About this article
Cite this article
Yuan, KH., Marshall, L.L. & Bentler, P.M. A unified approach to exploratory factor analysis with missing data, nonnormal data, and in the presence of outliers. Psychometrika 67, 95–121 (2002). https://doi.org/10.1007/BF02294711
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02294711