Logistic Regression Tree Analysis

Loh, Wei-Yin

doi:10.1007/978-1-84628-288-1_29

Wei-Yin Loh²

Part of the book series: Springer Handbooks ((SHB))

8509 Accesses
11 Citations

Abstract

This chapter describes a tree-structured extension and generalization of the logistic regression method for fitting models to a binary-valued response variable. The technique overcomes a significant disadvantage of logistic regression viz. the interpretability of the model in the face of multi-collinearity and Simpsonʼs paradox. Section 29.1 summarizes the statistical theory underlying the logistic regression model and the estimation of its parameters. Section 29.2 reviews two standard approaches to model selection for logistic regression, namely, model deviance relative to its degrees of freedom and the Akaike information criterion (AIC) criterion. A dataset on tree damage during a severe thunderstorm is used to compare the approaches and to highlight their weaknesses. A recently published partial one-dimensional model that addresses some of the weaknesses is also reviewed.

Section 29.3 introduces the idea of a logistic regression tree model. The latter consists of a binary tree in which a simple linear logistic regression (i.e., a linear logistic regression using a single predictor variable) is fitted to each leaf node. A split at an intermediate node is characterized by a subset of values taken by a (possibly different) predictor variable. The objective is to partition the dataset into rectangular pieces according to the values of the predictor variables such that a simple linear logistic regression model adequately fits the data in each piece. Because the tree structure and the piecewise models can be presented graphically, the whole model can be easily understood. This is illustrated with the thunderstorm dataset using the LOTUS algorithm.

Section 29.4 describes the basic elements of the LOTUS algorithm, which is based on recursive partitioning and cost-complexity pruning. A key feature of the algorithm is a correction for bias in variable selection at the splits of the tree. Without bias correction, the splits can yield incorrect inferences. Section 29.5 shows an application of LOTUS to a dataset on automobile crash tests involving dummies. This dataset is challenging because of its large size, its mix of ordered and unordered variables, and its large number of missing values. It also provides a demonstration of Simpsonʼs paradox. The chapter concludes with some remarks in Sect. 29.5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 309.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

R. D. Cook, S. Weisberg: Partial one-dimensional regression models, Am. Stat. 58, 110–116 (2004)
Article MathSciNet Google Scholar
A. Agresti: An Introduction to Categorical Data Analysis (Wiley, New York 1996)
MATH Google Scholar
J. M. Chambers, T. J. Hastie: Statistical Models in S (Wadsworth, Pacific Grove 1992)
MATH Google Scholar
K.-Y. Chan, W.-Y. Loh: LOTUS: An algorithm for building accurate and comprehensible logistic regression trees, J. Comp. Graph. Stat. 13, 826–852 (2004)
Article MathSciNet Google Scholar
J. N. Morgan, J. A. Sonquist: Problems in the analysis of survey data, and a proposal, J. Am. Stat. Assoc. 58, 415–434 (1963)
Article MATH Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees (Wadsworth, Belmont 1984)
MATH Google Scholar
J. R. Quinlan: Learning with continuous classes, Proceedings of AIʼ92 Australian National Conference on Artificial Intelligence (World Scientific, Singapore 1992) pp. 343–348
Google Scholar
P. Doyle: The use of automatic interaction detector and similar search procedures, Oper. Res. Q. 24, 465–467 (1973)
Article MathSciNet Google Scholar
W.-Y. Loh: Regression trees with unbiased variable selection and interaction detection, Stat. Sin. 12, 361–386 (2002)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Wisconsin – Madison, 1300 University Avenue, 53706, Madison, WI, USA
Wei-Yin Loh

Authors

Wei-Yin Loh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Yin Loh .

Editor information

Editors and Affiliations

Department of Industrial and Systems Engineering, Rutgers the State University of New Jersey, 96 Frelinghuysen Road, 08854, Piscataway, NJ, USA
Hoang Pham Prof.

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Loh, WY. (2006). Logistic Regression Tree Analysis. In: Pham, H. (eds) Springer Handbook of Engineering Statistics. Springer Handbooks. Springer, London. https://doi.org/10.1007/978-1-84628-288-1_29

Download citation

DOI: https://doi.org/10.1007/978-1-84628-288-1_29
Publisher Name: Springer, London
Print ISBN: 978-1-85233-806-0
Online ISBN: 978-1-84628-288-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics