Skip to main content

A Method for Visual Cluster Validation

  • Conference paper

Abstract

Cluster validation is necessary because the clusters resulting from cluster analysis algorithms are not in general meaningful patterns. I propose a methodology to explore two aspects of a cluster found by any cluster analysis method: the cluster should be separated from the rest of the data, and the points of the cluster should not split up into further separated subclasses. Both aspects can be visually assessed by linear projections of the data onto the two-dimensional Euclidean space. Optimal separation of the cluster in such a projection can be attained by asymmetric weighted coordinates (Hennig (2002)). Heterogeneity can be explored by the use of projection pursuit indexes as defined in Cook, Buja and Cabrera (1993). The projection methods can be combined with splitting up the data set into clustering data and validation data. A data example is given.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BUJA, A., COOK, D. and SWAYNE, D. (1996): Interactive High-Dimensional Data Visualization. Journal of Computational and Graphical Statistics, 5, 78–99.

    Article  Google Scholar 

  • COOK, D., BUJA, A. and CABRERA, J. (1993): Projection Pursuit Indexes Based on Orthonormal Function Expansions. Journal of Computational and Graphical Statistics, 2, 225–250.

    Article  MathSciNet  Google Scholar 

  • FRALEY, C. and RAFTERY, A. E. (2003): Enhanced Model-Based Clustering, Density Estimation and Discriminant Analysis Software: MCLUST. Journal of Classification, 20, 263–293.

    Article  MathSciNet  Google Scholar 

  • GORDON, A.D. (1999): Classification, 2nd Ed. Chapman & Hall/CRC, Boca Raton.

    Google Scholar 

  • HALKIDI, M., BATISTAKIS, Y. and VAZIRGIANNIS, M. (2002): Cluster Validity Methods: Part I. SIGMOD Record, 31, 40–45.

    Article  Google Scholar 

  • HENNIG, C. (2002): Symmetric, asymmetric and robust linear dimension reduction for classification. To appear in Journal of Computational and Graphical Statistics, ftp://ftp.stat.math.ethz.ch/Research-Reports/108.html.

    Google Scholar 

  • HUBER, P. J. (1985): Projection pursuit (with discussion). Annals of Statistics, 13, 435–475.

    MATH  MathSciNet  Google Scholar 

  • NG, M. and HUANG, J. (2002): M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining. In: M.-S. Chen, P. S. Yu and B. Liu (Eds.): Advances in Knowledge Discovery and Data Mining. Proceedings of PAKDD 2002, Taipei, Taiwan. Springer, Heidelberg, 224–236.

    Google Scholar 

  • RAO, C. R. (1952): Advanced Statistical Methods in Biometric Research, Wiley, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Hennig, C. (2005). A Method for Visual Cluster Validation. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_15

Download citation

Publish with us

Policies and ethics