Abstract
In this chapter we discuss models that classify samples using linear classification boundaries. We begin this chapter by describing a grant applications case study data set (Section 12.1) which will be used to illustrate models throughout this chapter as well as for Chapters 13-15. As foundational models, we discuss logistic regression (Section 12.2) and linear discriminant analysis (Section 12.3). In Section 12.4 we define and illustrates partial least squares discriminant analysis and its fundamental connection to linear discriminant analysis. Penalized models such as ridge penalty for logistic regression, glmnet, penalized linear discriminant analysis are discussed in Section 12.5. Nearest shrunken centroids, an approach tailored towards high dimensional data, is presented in Section 12.6. We demonstrate in the Computing Section (12.7) how to train each of these models in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
The RFCD codes can be found at http://tinyurl.com/25zvts while the SEO codes can be found at http://tinyurl.com/8435ae4.
- 4.
However, there are several tree-based methods described in Chap. 14 that are more effective if the categorical predictors are not converted to dummy variables. In these cases, the full set of categories are used.
- 5.
Data with three or more classes are usually modeled using the multinomial distribution. See Agresti (2002) for more details.
- 6.
Bayes’ Rule is examined in more detail in Sect. 13.6.
- 7.
This situation is addressed again for the naïve Bayes models in the next chapter.
- 8.
Mathematically, if we know C − 1 of the dummy variables, then the value of the last dummy variable is directly implied. Hence, it is also possible to only use C − 1 dummy variables.
- 9.
The perturbed covariance structure is due to the optimality constraints for the response matrix. Barker and Rayens (2003) astutely recognized that the response optimality constraint in this setting did not make sense, removed the constraint, and resolved the problem. Without the response-space constraint, the PLS solution is one that involves exactly the between-group covariance matrix.
- 10.
As a reminder, the set of predictors is not being selected on the basis of their association with the outcome. This unsupervised selection should not produce selection bias, which is an issue described in Sect. 19.5.
- 11.
Recall that the models are built on the pre-2008 data and then tuned based on the year 2008 holdout set. These predictions are from the PLSDA model with four components created using only the pre-2008 data.
- 12.
Another method for adding this penalty is discussed in the next chapter using neural networks. In this case, a neural network with weight decay and a single hidden unit constitutes a penalized logistic regression model. However, neural networks do not necessarily use thebinomial likelihood when determining parameter estimates (see Sect. 13.2).
- 13.
- 14.
In a later chapter, several models are discussed that can represent the categorical predictors in different ways. For example, trees can use the dummy variables in splits but can often create splits based on one or more groupings of categories. In that chapter, factor versions of these predictors are discussed at length.
- 15.
Another duplicate naming issue may occur here. A function called sda in the sda package (for shrinkage discriminant analysis) may cause confusion. If both packages are loaded, using sparseLDA:::sda and sda:::sda will mitigate the issue.
- 16.
In microarray data, the number of predictors is usually much larger than the number of samples. Because of this, and the limited number of columns in popular spreadsheet software, the convention is reversed.
- 17.
References
Agresti A (2002). Categorical Data Analysis. Wiley–Interscience.
Barker M, Rayens W (2003). “Partial Least Squares for Discrimination.” Journal of Chemometrics, 17(3), 166–173.
Berntsson P, Wold S (1986). “Comparison Between X-ray Crystallographic Data and Physiochemical Parameters with Respect to Their Information About the Calcium Channel Antagonist Activity of 4-Phenyl-1,4-Dihydropyridines.” Quantitative Structure-Activity Relationships, 5, 45–50.
Chung D, Keles S (2010). “Sparse Partial Least Squares Classification for High Dimensional Data.” Statistical Applications in Genetics and Molecular Biology, 9(1), 17.
Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.
Dobson A (2002). An Introduction to Generalized Linear Models. Chapman & Hall/CRC.
Dunn W, Wold S (1990). “Pattern Recognition Techniques in Drug Design.” In C Hansch, P Sammes, J Taylor (eds.), “Comprehensive Medicinal Chemistry,” pp. 691–714. Pergamon Press, Oxford.
Eilers P, Boer J, van Ommen G, van Houwelingen H (2001). “Classification of Microarray Data with Penalized Logistic Regression.” In “Proceedings of SPIE,” volume 4266, p. 187.
Fisher R (1936). “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics, 7(2), 179–188.
Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1–22.
Guo Y, Hastie T, Tibshirani R (2007). “Regularized Linear Discriminant Analysis and its Application in Microarrays.” Biostatistics, 8(1), 86–100.
Harrell F (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York.
Hastie T, Tibshirani R (1990). Generalized Additive Models. Chapman & Hall/CRC.
Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.
Liu Y, Rayens W (2007). “PLS and Dimension Reduction for Classification.” Computational Statistics, pp. 189–208.
Park M, Hastie T (2008). “Penalized Logistic Regression for Detecting Gene Interactions.” Biostatistics, 9(1), 30.
Tibshirani R, Hastie T, Narasimhan B, Chu G (2002). “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.” Proceedings of the National Academy of Sciences, 99(10), 6567–6572.
Tibshirani R, Hastie T, Narasimhan B, Chu G (2003). “Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays.” Statistical Science, 18(1), 104–117.
Welch B (1939). “Note on Discriminant Functions.” Biometrika, 31, 218–220.
Witten D, Tibshirani R (2009). “Covariance–Regularized Regression and Classification For High Dimensional Problems.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 71(3), 615–636.
Witten D, Tibshirani R (2011). “Penalized Classification Using Fisher’s Linear Discriminant.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73(5), 753–772.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kuhn, M., Johnson, K. (2013). Discriminant Analysis and Other Linear Classification Models. In: Applied Predictive Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6849-3_12
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6849-3_12
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6848-6
Online ISBN: 978-1-4614-6849-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)