Abstract
Cluster validation is necessary because the clusters resulting from cluster analysis algorithms are not in general meaningful patterns. I propose a methodology to explore two aspects of a cluster found by any cluster analysis method: the cluster should be separated from the rest of the data, and the points of the cluster should not split up into further separated subclasses. Both aspects can be visually assessed by linear projections of the data onto the two-dimensional Euclidean space. Optimal separation of the cluster in such a projection can be attained by asymmetric weighted coordinates (Hennig (2002)). Heterogeneity can be explored by the use of projection pursuit indexes as defined in Cook, Buja and Cabrera (1993). The projection methods can be combined with splitting up the data set into clustering data and validation data. A data example is given.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
BUJA, A., COOK, D. and SWAYNE, D. (1996): Interactive High-Dimensional Data Visualization. Journal of Computational and Graphical Statistics, 5, 78–99.
COOK, D., BUJA, A. and CABRERA, J. (1993): Projection Pursuit Indexes Based on Orthonormal Function Expansions. Journal of Computational and Graphical Statistics, 2, 225–250.
FRALEY, C. and RAFTERY, A. E. (2003): Enhanced Model-Based Clustering, Density Estimation and Discriminant Analysis Software: MCLUST. Journal of Classification, 20, 263–293.
GORDON, A.D. (1999): Classification, 2nd Ed. Chapman & Hall/CRC, Boca Raton.
HALKIDI, M., BATISTAKIS, Y. and VAZIRGIANNIS, M. (2002): Cluster Validity Methods: Part I. SIGMOD Record, 31, 40–45.
HENNIG, C. (2002): Symmetric, asymmetric and robust linear dimension reduction for classification. To appear in Journal of Computational and Graphical Statistics, ftp://ftp.stat.math.ethz.ch/Research-Reports/108.html.
HUBER, P. J. (1985): Projection pursuit (with discussion). Annals of Statistics, 13, 435–475.
NG, M. and HUANG, J. (2002): M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining. In: M.-S. Chen, P. S. Yu and B. Liu (Eds.): Advances in Knowledge Discovery and Data Mining. Proceedings of PAKDD 2002, Taipei, Taiwan. Springer, Heidelberg, 224–236.
RAO, C. R. (1952): Advanced Statistical Methods in Biometric Research, Wiley, New York.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Hennig, C. (2005). A Method for Visual Cluster Validation. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_15
Download citation
DOI: https://doi.org/10.1007/3-540-28084-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)