Abstract
An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random seed points were used to start theK-means algorithms. However, two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust with respect to all of the types of error examined.
Similar content being viewed by others
Reference Notes
Dudewicz, E. J.IRCCRAND-The Ohio State University random number generator package (Tech. Rep. No. 104). Columbus, Ohio: The Ohio State University, Department of Statistics, 1974.
Learmonth, G. P., & Lewis, P. A. W.Naval Postgraduate School random number generator package LLRANDOM (Tech. Rep. NP S55LW73061A). Monterey, Calif.: Naval Postgraduate School, Department of Operations Research and Administrative Sciences, 1973.
References
Anderberg, M. R.Cluster analysis for applications. New York: Academic Press, 1973.
Baker, F. B. Stability of two hierarchical grouping techniques Case I: Sensitivity to data errors.Journal of the American Statistical Association, 1974,69, 440–445.
Bartko, J. J., Straus, J. S., & Carpenter, W. T. An evaluation of taxometric techniques for psychiatric data.Classification Society Bulletin, 1971,2, 2–28.
Blashfield, R. K. Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods.Psychological Bulletin, 1976,83, 377–388.
Bromley, D. B. Rank order cluster analysis.British Journal of Mathematical and Statistical Psychology, 1966,19, 105–123.
Cattel, R. B.r p and other coefficients of pattern similarity.Psychometrika, 1949,14, 279–298.
Cormack, R. M. A review of classification.Journal of the Royal Statistical Society (Series A), 1971,134, 321–367.
Cronbach, L. J., & Gleser, G. C. Assessing the similarity between profiles.Psychological Bulletin, 1953,50, 456–473.
Cunningham, K. M., & Ogilvie, J. C. Evaluation of hierarchical grouping techniques: A preliminary study.Computer Journal, 1972,15, 209–213.
D'Andrade, R. G.U-statistic hierarchical clustering.Psychometrika, 1978,43, 59–67.
Dudewicz, E. J. Speed and quality of random numbers for simulation.Journal of Quality Technology, 1976,8, 171–178.
Edelbrock, C. Comparing the accuracy of hierarchical clustering algorithms: The problem of classifying everybody.Multivariate Behavioral Research, 1979,14, 367–384.
Everitt, B. S.Cluster analysis. London: Halstead Press, 1974.
Fleiss, L., & Zubin, J. On the methods and theory of clustering.Multivariate Behavioral Research, 1969,4, 235–250.
Friedman, H. P., & Rubin, J. On some invariant criteria for grouping data.Journal of the American Statistical Association, 1967,62, 1159–1178.
Hartigan, J. A.Clustering algorithms. New York: Wiley, 1975.
Helmstadter, G. An empirical comparison of methods for estimating profile similarity.Educational and Psychological Measurement, 1957,17, 71–82.
Hubert, L. J., & Levin, J. R. Evaluating object set partitions: Free sort analysis and some generalizations.Journal of Verbal Learning and Verbal Behavior, 1976,15, 459–470.
Jardine, N., & Sibson, R.Mathematical taxonomy. New York: Wiley, 1971.
Johnson, S. C. Hierarchical clustering schemes.Psychometrika, 1967,32, 241–254.
Kuiper, F. K., & Fisher, L. A Monte Carlo comparison of six clustering procedures.Biometrics, 1975,31, 777–783.
Levinsohn, J. R., & Funk, S. G. CLUSTER-Hierarchical clustering program for large data sets (N greater than 100).Behavior Research Methods and Instrumentation, 1973,5, 432.
Mezich, J. E. An evaluation of quantitative taxonomic methods (Doctral dissertation, The Ohio State University, 1975).Dissertation Abstracts International, 1975,36, 3008-B. (University Microfilms No. 75-26, 616).
Milligan, G. W. An examination of the effect of error perturbation of constructed data on fifteen clustering algorithms (Doctoral dissertation, The Ohio State University, 1978).Dissertation Abstracts International, 1979,40, 4010B-4011B. (University Microfilms No. 7902188).
Milligan, G. W. Ultrametric hierarchical clustering algorithms.Psychometrika, 1979,44, 343–346.
Milligan, G. W., & Isaac, P. D. The validation of four ultrametric clustering algorithms.Pattern Recognition, 1980,12, 41–50.
Peay, E. R. Nonmetric grouping: Clusters and cliques.Psychometrika, 1975,40, 297–313.
Rand, W. M. Objective criteria for the evaluation of clustering methods.Journal of the American Statistical Association, 1971,66, 846–850.
Rohlf, F. J. Methods of comparing classifications.Annual Review of Ecology and Systematics, 1974,5, 101–113.
Shepard, R. N. Representation of structure in similarity data: Problems and prospects.Psychometrika, 1974,39, 373–421.
Sneath, P. H. A. A comparison of different clustering methods as applied to randomly-spaced points.Classification Society Bulletin, 1966,1, 2–18.
Sneath, P. H. A. Evaluation of clustering methods. In A. J. Cole (Ed.),Numerical taxonomy, New York: Academic Press, 1969.
Sneath, P. H. A., & Sokal, R. R.Numerical taxonomy, San Francisco: Freeman, 1973.
Williams, W. T., Lance, G. N., Dale, M. B., & Clifford, H. T. Controversy concerning the criteria for taxonometric strategies.Computer Journal, 1971,14, 162–165.
Zahn, C. T. Graph theory methods for detecting and describing Gestalt clusters.IEEE Transactions on Computers, 1971,C-20, 68–86.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Milligan, G.W. An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45, 325–342 (1980). https://doi.org/10.1007/BF02293907
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02293907