A Method for Visual Cluster Validation

Hennig, Christian

doi:10.1007/3-540-28084-7_15

A Method for Visual Cluster Validation

Christian Hennig²¹

Conference paper

2323 Accesses
4 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

Cluster validation is necessary because the clusters resulting from cluster analysis algorithms are not in general meaningful patterns. I propose a methodology to explore two aspects of a cluster found by any cluster analysis method: the cluster should be separated from the rest of the data, and the points of the cluster should not split up into further separated subclasses. Both aspects can be visually assessed by linear projections of the data onto the two-dimensional Euclidean space. Optimal separation of the cluster in such a projection can be attained by asymmetric weighted coordinates (Hennig (2002)). Heterogeneity can be explored by the use of projection pursuit indexes as defined in Cook, Buja and Cabrera (1993). The projection methods can be combined with splitting up the data set into clustering data and validation data. A data example is given.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BUJA, A., COOK, D. and SWAYNE, D. (1996): Interactive High-Dimensional Data Visualization. Journal of Computational and Graphical Statistics, 5, 78–99.
Article Google Scholar
COOK, D., BUJA, A. and CABRERA, J. (1993): Projection Pursuit Indexes Based on Orthonormal Function Expansions. Journal of Computational and Graphical Statistics, 2, 225–250.
Article MathSciNet Google Scholar
FRALEY, C. and RAFTERY, A. E. (2003): Enhanced Model-Based Clustering, Density Estimation and Discriminant Analysis Software: MCLUST. Journal of Classification, 20, 263–293.
Article MathSciNet Google Scholar
GORDON, A.D. (1999): Classification, 2nd Ed. Chapman & Hall/CRC, Boca Raton.
Google Scholar
HALKIDI, M., BATISTAKIS, Y. and VAZIRGIANNIS, M. (2002): Cluster Validity Methods: Part I. SIGMOD Record, 31, 40–45.
Article Google Scholar
HENNIG, C. (2002): Symmetric, asymmetric and robust linear dimension reduction for classification. To appear in Journal of Computational and Graphical Statistics, ftp://ftp.stat.math.ethz.ch/Research-Reports/108.html.
Google Scholar
HUBER, P. J. (1985): Projection pursuit (with discussion). Annals of Statistics, 13, 435–475.
MATH MathSciNet Google Scholar
NG, M. and HUANG, J. (2002): M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining. In: M.-S. Chen, P. S. Yu and B. Liu (Eds.): Advances in Knowledge Discovery and Data Mining. Proceedings of PAKDD 2002, Taipei, Taiwan. Springer, Heidelberg, 224–236.
Google Scholar
RAO, C. R. (1952): Advanced Statistical Methods in Biometric Research, Wiley, New York.
Google Scholar

Download references

Author information

Authors and Affiliations

Fachbereich Mathematik - SPST, Universität Hamburg, 20146, Hamburg, Germany
Christian Hennig

Authors

Christian Hennig
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich Statistik, Universität Dortmund, 44221, Dortmund
Claus Weihs
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hennig, C. (2005). A Method for Visual Cluster Validation. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_15

Download citation

DOI: https://doi.org/10.1007/3-540-28084-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics