Abstract
This chapter introduces principal component analysis (PCA), a technique for dimension reduction in multivariate datasets. At its core there is a matrix decomposition technique called singular value decomposition, which is introduced at the beginning of this chapter. This is followed by PCA model formulation, computation, and an application. Relationships with exploratory factor analysis are discussed as well. Subsequently, some PCA variants such as robust and sparse PCA are briefly discussed. The final two sections introduce two extensions of PCA. The first one is three-way PCA for three-way input data structures. The second one is called independent component analysis and illustrated using electroencephalography (EEG) data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For readers who are more attracted to a sonic introduction to SVD rather than a formal one, Michael Greenacre composed an SVD song (“It had to be U”), available on YouTube.
- 2.
In this section we follow a notation for indices of the data and components as well as the number of them which is different from that used before but which is standard in the literature of three-way analysis (see Kiers, 2000).
- 3.
We omit the matrix formulation for all three-way PCA models since it requires some 3D matrix operations that are beyond scope of this book (see, e.g., Kroonenberg, 2008, for corresponding expressions).
- 4.
Biplots are introduced in more detail in Chap. 10.
- 5.
Thanks for Hrag Pailian for sharing this dataset.
References
Arend, A. M., & Zimmer, H. D. (2011). What does ipsilateral delay activity reflect? Inferences from slow potentials in a lateralized visual working memory task. Journal of Cognitive Neuroscience, 23, 4048–4056.
Carroll, J. D., & Arabie, P. (1980). Multidimensional scaling. Annual Review of Psychology, 31, 607–649.
Carroll, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an N-way generalization of Eckart-Young decomposition. Psychometrika, 35, 283–319.
Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218.
Fox, J., & Weisberg, S. (2011). An R companion to applied regression (2nd ed.). Thousand Oaks: Sage.
Giordani, P., Kiers, H., & Del Ferraro, M. (2014). Three-way component analysis using the R package ThreeWay. Journal of Statistical Software, 57(1), 1–23. https://www.jstatsoft.org/index.php/jss/article/view/v057i07
Harshman, R. A. (1970). Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis (Technical report 16, UCLA Working Papers in Phonetics).
Helwig, N. E. (2015a). eegkit: Toolkit for electroencephalography data. R package version 1.0-2. https://CRAN.R-project.org/package=eegkit
Helwig, N. E. (2015b). ica: Independent component analysis. R package version 1.0-1. https://CRAN.R-project.org/package=ica
Helwig, N. E. (2017). multiway: Component models for multi-way data. R package version 1.0-3. https://CRAN.R-project.org/package=multiway
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.
Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis. New York: Wiley.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York: Springer.
Joliffe, I. T. (2002). Principal component analysis (2nd ed.). New York: Springer.
Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab: An S4 package for kernel methods in R. Journal of Statistical Software, 11(9), 1–20. http://www.jstatsoft.org/v11/i09/
Kiers, H. A. L. (2000). Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics, 14, 105–122.
Kroonenberg, P. M. (2008). Applied multiway data analysis. Hoboken: Wiley.
Leibovici, D. (2010). Spatio-temporal multiway data decomposition using principal tensor analysis on k-modes: The R package PTAk. Journal of Statistical Software, 34(1), 1–34. https://www.jstatsoft.org/index.php/jss/article/view/v034i10
Marchini, J. L., Heaton, C., & Ripley, B. D. (2013). fastICA: FastICA algorithms to perform ICA and projection pursuit. R package version 1.2-0. https://CRAN.R-project.org/package=fastICA
Mevik, B. H., Wehrens, R., & Liland, K. H. (2016). pls: Partial least squares and principal component regression. R package version 2.6-0. https://CRAN.R-project.org/package=pls
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2, 559–572.
Revelle, W. (2017). psych: Procedures for psychological, psychometric, and personality research. R package version 1.7.8. http://CRAN.R-project.org/package=psych
Sharma, S. (1996). Applied multivariate techniques. New York: Wiley.
Sidanius, J., Levin, S., van Laar, C., & Sears, D. O. (2010). The diversity challenge: Social identity and intergroup relations on the college campus. New York: The Russell Sage Foundation.
Sigg, C. D., & Buhmann, J. M. (2008). Expectation-maximization for sparse and non-negative PCA. In Proceedings of the 25th International Conference on Machine Learning.
Stacklies, W., Redestig, H., Scholz, M., Walther, D., & Selbig, J. (2007). pcaMethods: A bioconductor package providing PCA methods for incomplete data. Bioinformatics, 23, 1164–1167.
Stone, J. V. (2004). Independent component analysis: A tutorial introduction. Cambridge: The MIT Press.
Timmerman, M. E. (2001). Component analysis of multisubject multivariate longitudinal data. PhD thesis, University of Groningen, Groningen.
Treiblmaier, H. (2006). Datenqualität und individualisierte Kommunikation [Data Quality and Individualized Communication]. Wiesbaden: DUV Gabler Edition Wissenschaft.
Treiblmaier, H., Bentler, P. M., & Mair, P. (2011). Formative constructs implemented via common factors. Structural Equation Modeling: A Multidisciplinary Journal, 18, 1–17.
Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31, 279–311.
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer.
Vogel, E. K., & Machizawa, M. G. (2004). Neural activity predicts individual differences in visual working memory capacity. Nature, 428, 748–751.
Willerman, L., Schultz, R., Rutledge, J. N., & Bigler, E. (1991). In vivo brain size and intelligence. Intelligence, 15, 223–228.
Zou, H., & Hastie, T. (2012). elasticnet: Elastic-net for sparse estimation and sparse PCA. R package version 1.1. https://CRAN.R-project.org/package=elasticnet
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 262–286.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Mair, P. (2018). Principal Component Analysis and Extensions. In: Modern Psychometrics with R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-93177-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-93177-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93175-3
Online ISBN: 978-3-319-93177-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)