Abstract
Chapter 12 discussed classification models that defined linear classification boundaries. In this chapter we present models that generate nonlinear boundaries. We begin with explaining several generalizations to the linear discriminant analysis framework such as quadratic discriminant analysis, regularized discriminant analysis, and mixture discriminant analysis (Section 13.1). Other nonlinear classification models include neural networks (Section 13.2), flexible discriminant analysis (Section 13.3), support vector machines (Section 13.4), K-nearest neighbors (Section 13.5), and naive Bayes (Section 13.6). In the Computing Section (13.7) we demonstrate how to train each of these models in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
However, MARS and FDA models tend to be more stable than tree-based models since they use linear regression to estimate the model parameters.
- 2.
Recall a similar situation with support vector regression models where the prediction function was determined by the samples with the largest residuals.
References
Bergmeir C, Benitez JM (2012). “Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS.” Journal of Statistical Software, 46(7), 1–26.
Bishop C (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford.
Boser B, Guyon I, Vapnik V (1992). “A Training Algorithm for Optimal Margin Classifiers.” In “Proceedings of the Fifth Annual Workshop on Computational Learning Theory,” pp. 144–152.
Cancedda N, Gaussier E, Goutte C, Renders J (2003). “Word–Sequence Kernels.” The Journal of Machine Learning Research, 3, 1059–1082.
Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.
Cortes C, Vapnik V (1995). “Support–Vector Networks.” Machine Learning, 20(3), 273–297.
Dillon W, Goldstein M (1984). Multivariate Analysis: Methods and Applications. Wiley, New York.
Duan K, Keerthi S (2005). “Which is the Best Multiclass SVM Method? An Empirical Study.” Multiple Classifier Systems, pp. 278–285.
Friedman J (1989). “Regularized Discriminant Analysis.” Journal of the American Statistical Association, 84(405), 165–175.
Hardle W, Werwatz A, Müller M, Sperlich S, Hardle W, Werwatz A, Müller M, Sperlich S (2004). “Nonparametric Density Estimation.” In “Nonparametric and Semiparametric Models,” pp. 39–83. Springer Berlin Heidelberg.
Hastie T, Tibshirani R (1996). “Discriminant Analysis by Gaussian Mixtures.” Journal of the Royal Statistical Society. Series B, pp. 155–176.
Hastie T, Tibshirani R, Buja A (1994). “Flexible Discriminant Analysis by Optimal Scoring.” Journal of the American Statistical Association, 89(428), 1255–1270.
Hsu C, Lin C (2002). “A Comparison of Methods for Multiclass Support Vector Machines.” IEEE Transactions on Neural Networks, 13(2), 415–425.
Kline DM, Berardi VL (2005). “Revisiting Squared–Error and Cross–Entropy Functions for Training Neural Network Classifiers.” Neural Computing and Applications, 14(4), 310–318.
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002). “Text Classification Using String Kernels.” The Journal of Machine Learning Research, 2, 419–444.
Mahé P, Ueda N, Akutsu T, Perret J, Vert J (2005). “Graph Kernels for Molecular Structure–Activity Relationship Analysis with Support Vector Machines.” Journal of Chemical Information and Modeling, 45(4), 939–951.
Mahé P, Vert J (2009). “Graph Kernels Based on Tree Patterns for Molecules.” Machine Learning, 75(1), 3–35.
Milborrow S (2012). Notes On the earth Package. URL http://cran.r-project.org/package=earth.
Niblett T (1987). “Constructing Decision Trees in Noisy Domains.” In I Bratko, N Lavrač (eds.), “Progress in Machine Learning: Proceedings of EWSL–87,” pp. 67–78. Sigma Press, Bled, Yugoslavia.
Osuna E, Freund R, Girosi F (1997). “Support Vector Machines: Training and Applications.” Technical report, MIT Artificial Intelligence Laboratory.
Platt J (2000). “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods.” In B Bartlett, B Schölkopf, D Schuurmans, A Smola (eds.), “Advances in Kernel Methods Support Vector Learning,” pp. 61–74. Cambridge, MA: MIT Press.
Provost F, Domingos P (2003). “Tree Induction for Probability–Based Ranking.” Machine Learning, 52(3), 199–215.
Suykens J, Vandewalle J (1999). “Least Squares Support Vector Machine Classifiers.” Neural processing letters, 9(3), 293–300.
Tipping M (2001). “Sparse Bayesian Learning and the Relevance Vector Machine.” Journal of Machine Learning Research, 1, 211–244.
Vapnik V (2010). The Nature of Statistical Learning Theory. Springer.
Venables W, Ripley B (2002). Modern Applied Statistics with S. Springer.
Zadrozny B, Elkan C (2001). “Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers.” In “Proceedings of the 18th International Conference on Machine Learning,” pp. 609–616. Morgan Kaufmann.
Zhu J, Hastie T (2005). “Kernel Logistic Regression and the Import Vector Machine.” Journal of Computational and Graphical Statistics, 14(1), 185–205.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kuhn, M., Johnson, K. (2013). Nonlinear Classification Models. In: Applied Predictive Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6849-3_13
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6849-3_13
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6848-6
Online ISBN: 978-1-4614-6849-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)