Measuring Performance in Classification Models

Kuhn, Max; Johnson, Kjell

doi:10.1007/978-1-4614-6849-3_11

Max Kuhn³ &
Kjell Johnson⁴

211k Accesses
14 Citations

Abstract

When predicting a categorical outcome, some measure of classification accuracy is typically used to evaluate the model’s effectiveness. However, there are different ways to measure classification accuracy, depending of the modeler’s primary objectives. Most classification models can produce both a continuous and categorical prediction output. In Section 11.1, we review these outputs, demonstrate how to adjust probabilities based on calibration plots, recommend ways for displaying class predictions, and define equivocal or indeterminate zones of prediction. In Section 11.2, we review common metrics for assessing classification predictions such as accuracy, kappa, sensitivity, specificity, and positive and negative predicted values. This section also addresses model evaluation when costs are applied to making false positive or false negative mistakes. Classification models may also produce predicted classification probabilities. Evaluating this type of output is addressed in Section 11.3, and includes a discussion of receiver operating characteristic curves as well as lift charts. In Section 11.4, we demonstrate how measures of classification performance can be generated in R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In medical terminology, this rate is referred to as the prevalence of a disease while in Bayesian statistics it would be the prior distribution of the event.
2.
This is true since predictive models seek to find a concordant relationship with the truth. A large negative Kappa would imply that there is relationship between the predictors and the response and the predictive model would seek to find the relationship in the correct direction.
3.
In relation to Bayesian statistics, the sensitivity and specificity are the conditional probabilities, the prevalence is the prior, and the positive/negative predicted values are the posterior probabilities.
4.
This depends on a few assumptions which may or may not be true. Section 20.1 discusses this aspect of the example in more detail in the context of net lift modeling.
5.
In this analysis, we have used the test set to investigate the effects of alternative thresholds. Generally, a new threshold should be derived from a separate data set than those used to train the model or evaluate performance.
6.
R has a number of packages that can compute the ROC curve, including ROCR, caTools, PresenceAbsence, and others.

References

Agresti A (2002). Categorical Data Analysis. Wiley–Interscience.
Google Scholar
Altman D, Bland J (1994). “Diagnostic Tests 3: Receiver Operating Characteristic Plots.” British Medical Journal, 309(6948), 188.
Article Google Scholar
Becton Dickinson and Company (1991). ProbeTec ET Chlamydia trachomatis and Neisseria gonorrhoeae Amplified DNA Assays (Package Insert).
Google Scholar
Bridle J (1990). “Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition.” In “Neurocomputing: Algorithms, Architectures and Applications,” pp. 227–236. Springer–Verlag.
Google Scholar
Brown C, Davis H (2006). “Receiver Operating Characteristics Curves and Related Decision Measures: A Tutorial.” Chemometrics and Intelligent Laboratory Systems, 80(1), 24–38.
Article Google Scholar
Cohen J (1960). “A Coefficient of Agreement for Nominal Data.” Educational and Psychological Measurement, 20, 37–46.
Article Google Scholar
DeLong E, DeLong D, Clarke-Pearson D (1988). “Comparing the Areas Under Two Or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics, 44(3), 837–45.
Article MATH Google Scholar
Dobson A (2002). An Introduction to Generalized Linear Models. Chapman & Hall/CRC.
Google Scholar
Drummond C, Holte R (2000). “Explicitly Representing Expected Cost: An Alternative to ROC Representation.” In “Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,” pp. 198–207.
Chapter Google Scholar
Fawcett T (2006). “An Introduction to ROC Analysis.” Pattern Recognition Letters, 27(8), 861–874.
Article MathSciNet Google Scholar
Gupta S, Hanssens D, Hardie B, Kahn W, Kumar V, Lin N, Ravishanker N, Sriram S (2006). “Modeling Customer Lifetime Value.” Journal of Service Research, 9(2), 139–155.
Article Google Scholar
Hall P, Hyndman R, Fan Y (2004). “Nonparametric Confidence Intervals for Receiver Operating Characteristic Curves.” Biometrika, 91, 743–750.
Article MathSciNet MATH Google Scholar
Hand D, Till R (2001). “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems.” Machine Learning, 45(2), 171–186.
Article MATH Google Scholar
Hanley J, McNeil B (1982). “The Meaning and Use of the Area under a Receiver Operating (ROC) Curvel Characteristic.” Radiology, 143(1), 29–36.
Article Google Scholar
Lachiche N, Flach P (2003). “Improving Accuracy and Cost of Two–Class and Multi–Class Probabilistic Classifiers using ROC Curves.” In “Proceedings of the Twentieth International Conference on Machine Learning,” volume 20, pp. 416–424.
Google Scholar
Larose D (2006). Data Mining Methods and Models. Wiley.
Google Scholar
Li J, Fine JP (2008). “ROC Analysis with Multiple Classes and Multiple Tests: Methodology and Its Application in Microarray Studies.” Biostatistics, 9(3), 566–576.
Article MATH Google Scholar
Ling C, Li C (1998). “Data Mining for Direct Marketing: Problems and solutions.” In “Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining,” pp. 73–79.
Google Scholar
McClish D (1989). “Analyzing a Portion of the ROC Curve.” Medical Decision Making, 9, 190–195.
Article Google Scholar
Pepe MS, Longton G, Janes H (2009). “Estimation and Comparison of Receiver Operating Characteristic Curves.” Stata Journal, 9(1), 1–16.
Google Scholar
Piersma A, Genschow E, Verhoef A, Spanjersberg M, Brown N, Brady M, Burns A, Clemann N, Seiler A, Spielmann H (2004). “Validation of the Postimplantation Rat Whole-embryo Culture Test in the International ECVAM Validation Study on Three In Vitro Embryotoxicity Tests.” Alternatives to Laboratory Animals, 32, 275–307.
Google Scholar
Platt J (2000). “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods.” In B Bartlett, B Schölkopf, D Schuurmans, A Smola (eds.), “Advances in Kernel Methods Support Vector Learning,” pp. 61–74. Cambridge, MA: MIT Press.
Google Scholar
Provost F, Fawcett T, Kohavi R (1998). “The Case Against Accuracy Estimation for Comparing Induction Algorithms.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453.
Google Scholar
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M (2011). “pROC: an open-source package for R and S+ to analyze and compare ROC curves.” BMC Bioinformatics, 12(1), 77.
Google Scholar
Venkatraman E (2000). “A Permutation Test to Compare Receiver Operating Characteristic Curves.” Biometrics, 56(4), 1134–1138.
Article MathSciNet MATH Google Scholar
Youden W (1950). “Index for Rating Diagnostic Tests.” Cancer, 3(1), 32–35.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Division of Nonclinical Statistics, Pfizer Global Research and Development, Groton, Connecticut, USA
Max Kuhn
Arbor Analytics, Saline, Michigan, USA
Kjell Johnson

Authors

Max Kuhn
View author publications
You can also search for this author in PubMed Google Scholar
Kjell Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kuhn, M., Johnson, K. (2013). Measuring Performance in Classification Models. In: Applied Predictive Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6849-3_11

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6849-3_11
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6848-6
Online ISBN: 978-1-4614-6849-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics