Skip to main content
Log in

Classification and regression trees

  • Hints & Kinks
  • Published:
International Journal of Public Health

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

References

  • Breiman L, Friedman JH, Stone CJ, Olshen RA (1993) Classification and regression trees. Chapman and Hall, New York

    Google Scholar 

  • Cardoen S, Van Huffel X, Berkvens D, Quoilin S, Ducoffre G, Saegerman C, Speybroeck N, Imberechts H, Herman L, Ducatelle R, Dierick K (2009) Evidence-based semi-quantitative methodology for prioritization of food-borne zoonoses. Foodborne Pathog Dis 6:1083–1096

    Article  PubMed  CAS  Google Scholar 

  • Havelaar AH, van Rosse F, Bucura C, Toetenel MA, Haagsma JA, Kurowicka D, Heesterbeek JH, Speybroeck N, Langelaar MF, van der Giessen JW, Cooke RM, Braks MA (2010) Prioritizing emerging zoonoses in the Netherlands. PLoS One 5:e13965

    Article  PubMed  Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651–674

    Article  Google Scholar 

  • Kim H, Loh W (2001) Classification trees with unbiased multiway splits. J Am Stat Assoc 96:589–604

    Article  Google Scholar 

  • Protopopoff N, Van Bortel W, Speybroeck N, D’Alessandro U, Coosemans M (2009) Ranking malaria risk factors to guide malaria control efforts in African Highlands. PLoS One 25:e8022

    Article  Google Scholar 

  • Rosicova K, Geckova AM, Rosic M, Speybroeck N, Groothoff JW, van Dijk JP (2011) Socioeconomic factors, ethnicity and alcohol-related mortality in regions in Slovakia. What might a tree analysis add to our understanding? Health Place 17:701–709

    Article  PubMed  Google Scholar 

  • Saegerman C, Speybroeck N, Roels S, Vanopdenbosch E, Thiry E, Berkvens D (2004) Decision support tools in clinical diagnosis in cows with suspected bovine spongiform encephalopathy. J Clin Microbiol 42:172–178

    Article  PubMed  CAS  Google Scholar 

  • Speybroeck N, Berkvens D, Mfoukou-Ntsakala A, Aerts M, Hens N, Van Huylenbroeck G, Thys E (2004) Classification trees versus multinomial models in the analysis of urban farming systems in Central Africa. Agric Syst 80:133–149

    Article  Google Scholar 

  • Thang ND, Erhart A, Speybroeck N, Hung LX, Thuan LK, Hung TK, Van Ky P, Coosemans M, D’Alessandro U (2008) Malaria in Central Vietnam: analysis of risk factors by multivariate analysis and classification tree models. Malar J 7:28

    Article  PubMed  Google Scholar 

  • White A, Liu W (1994) Bias in information based measures in decision tree induction. Mach Learn 15:321–329

    Google Scholar 

  • Yewhalaw D, Legesse W, Van Bortel W, Gebre-Selassie S, Kloos H, Duchateau L, Speybroeck N (2009) Malaria and water resource development: the case of Gilgel-Gibe hydroelectric dam in Ethiopia. Malar J 8:21

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

I would like to express my thanks to the Reviewer for the constructive and interesting comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Speybroeck.

Appendix: R code to run the decomposition (Comments in different font)

Appendix: R code to run the decomposition (Comments in different font)

The R software is free of charge and can be downloaded from http://www.r-project.org. An R package called rpart can handle several types of outcomes and generate classification and regression trees. As an example we will indicate how a CT can be constructed for analyzing the relation between malaria (infected/non-infected) and its determinants in Vietnam (Thang et al. 2008) can be generated through the rpart package. After installing the package rpart into R, the following code (in different font) can be copied and used into R and immediately used after having adapted the variables to the users’ needs.

library(rpart)

# To grow a tree, use the command

rpart(Malaria ~ Age + Forrest + Education + Income + Bednet + Housetype + Ethnicity + Gender, method = class)

# with Forrest Activity, Education, Income, Bednet use, House structure, Ethnicity and Gender the

# explanatory variables [these variables were used in Thang et al. (2008)]

# method can be e.g. “class” for classification trees, “anova” for regression trees, “poisson” for count data.

# detailed summary of splits

summary(fit)

# prune the tree and select the minimal error tree (i.e., with the smallest cross-validated error)

pfit <- prune(fit, cp = fit$cptable[which.min(fit$cptable[,″xerror″]),″CP″])

# detailed summary of the pruned tree

summary(pfit)

# plot the final tree

plot(pfit)

A simplified version of the resulting tree in Thang et al. (2008) is shown in Fig. 2. The tree starts with a root node, containing all the 3023 individuals in the sample, with a malaria prevalence (pr) of 14%. The root node is first split into two subgroups according to the wealth status, with the malaria prevalence in the poorer subgroup being higher (pr = 16%) than in the richer subgroup (pr 9%). The richer subgroup is split again into a subgroup engaged in regular forest activity (pr = 31%) and a group not engaged in regular forest activity (pr = 8%). The latter subgroup was split according to their bednet use, with bednet users showing a lower prevalence (pr = 7%) than non-bednet users (pr = 26%).

Fig. 2
figure 2

Illustrative classification tree for malaria in Vietnam (adapted from Thang et al. 2008)

The example simplified for the sake of brevity (see reference for more information), indicates that CaRTs can be powerful tools for the analysis of complex public health data.

Conditional inference trees can be created via the function ctree (see Hothorn et al. 2006 for additional background)

# The party package provides regression trees.

library(party)

# To grow a conditional inference tree, use the command.

ctree(Malaria ~ Age + Forrest + Education + Income + Bednet + Housetype + Ethnicity + Gender)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Speybroeck, N. Classification and regression trees. Int J Public Health 57, 243–246 (2012). https://doi.org/10.1007/s00038-011-0315-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00038-011-0315-z

Keywords

Navigation