Abstract
Exemplar theories of categorization depend on similarity for explaining subjects’ ability to generalize to new stimuli. A major criticism of exemplar theories concerns their lack of abstraction mechanisms and thus, seemingly, of generalization ability. Here, we use insights from machine learning to demonstrate that exemplar models can actually generalize very well. Kernel methods in machine learning are akin to exemplar models and are very successful in real-world applications. Their generalization performance depends crucially on the chosen similarity measure. Although similarity plays an important role in describing generalization behavior, it is not the only factor that controls generalization performance. In machine learning, kernel methods are often combined with regularization techniques in order to ensure good generalization. These same techniques are easily incorporated in exemplar models. We show that the generalized context model (Nosofsky, 1986) and ALCOVE (Kruschke, 1992) are closely related to a statistical model called kernel logistic regression. We argue that generalization is central to the enterprise of understanding categorization behavior, and we suggest some ways in which insights from machine learning can offer guidance.
Similar content being viewed by others
References
Aizerman, M. A., Braverman, E. M., & Rozonoer, L. I (1964). The probability problem of pattern recognition learning and the method of potential functions. Automation & Remote Control, 25, 1175–1190.
Alfonso-Reese, L. A., Ashby, F. G., & Brainard, D. H. (2002). What makes a categorization task difficult? Perception & Psychophysics, 64, 570–583.
Ashby, F. G., & Alfonso-Reese, L. A. (1995). Categorization as probability density estimation. Journal of Mathematical Psychology, 39, 216–233.
Ashby, F. G., & Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli. Journal of Experimental Psychology: Learning, Memory, & Cognition, 14, 33–53.
Ashby, F. G., & Maddox, W. T. (1992). Complex decision rules in categorization: Contrasting novice and experienced performance. Journal of Experimental Psychology: Human Perception & Performance, 18, 50–71.
Ashby, F. G., & Maddox, W. T. (1993). Relations between prototype, exemplar, and decision bound models of categorization. Journal of Mathematical Psychology, 37, 372–400.
Ashby, F. G., Waldron, E. M., Lee, W. W., & Berkman, A. (2001). Suboptimality in human categorization and identification. Journal of Experimental Psychology: General, 130, 77–96.
Beals, R., Krantz, D. H., & Tversky, A. (1968). Foundations of multidimensional scaling. Psychological Review, 75, 127–142.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press, Clarendon Press.
Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2, 499–526.
Bradley, R. A. (1976). Science, statistics, and paired comparisons. Biometrics, 32, 213–239.
Briscoe, E., & Feldman, J. (2006). Conceptual complexity and the bias-variance tradeoff. In R. Sun, N. Miyake, & C. Schunn (Eds.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 1038–1043). Mahwah, NJ: Erlbaum.
Brown, J. S. (1965). Generalization and discrimination. In D. I. Mostofsky (Ed.), Stimulus generalization (pp. 7–23). Stanford: Stanford University Press.
Bülthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proceedings of the National Academy of Sciences, 89, 60–64.
Bush, R. R., & Mosteller, F. (1951). A model for stimulus generalization and discrimination. Psychological Review, 58, 413–423.
Chater, N., & Vitányi, P. M. B. (2003). The generalized universal law of generalization. Journal of Mathematical Psychology, 47, 346–369.
Cristianini, N., & Schölkopf, B. (2002). Support vector machines and kernel methods: The new generation of learning machines. AI Magazine, 23(3), 31–42.
David, H. A. (1988). The method of paired comparisons (2nd ed.). London: Griffin.
Fass, D., & Feldman, J. (2003). Categorization under complexity: A unified MDL account of human learning of regular and irregular categories. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems 15 (pp. 35–42). Cambridge, MA: MIT Press.
Feldman, J. (2000). Minimization of Boolean complexity in human concept learning. Nature, 407, 630–633.
Fried, L. S., & Holyoak, K. J. (1984). Induction of category distributions: A framework for classification learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 10, 234–257.
Garner, W. R. (1974). The processing of information and structure. Potomac, MD: Erlbaum.
Ghirlanda, S., & Enquist, M. (2003). A century of generalization. Animal Behaviour, 66, 15–36.
Graf, A. B. A., & Wichmann, F. A. (2004). Insights from machine learning applied to human visual classification. In S. Thrun, L. K. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems 16 (pp. 905–912). Cambridge, MA: MIT Press.
Graf, A. B. A., Wichmann, F. A., Bülthoff, H. H., & Schölkopf, B. (2006). Classification of faces in man and machine. Neural Computation, 18, 143–165.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2007). A tutorial on kernel methods for categorization. Journal of Mathematical Psychology, 51, 343–358.
Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2008). Similarity, kernels, and the triangle inequality. Manuscript submitted for publication.
Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22–44.
Lamberts, K. (1994). Flexible tuning of similarity in exemplar-based categorization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 20, 1003–1021.
Logothetis, N. K., Pauls, J., Bülthoff, H. H., & Poggio, T. (1994). View-dependent object recognition by monkeys. Current Biology, 4, 401–414.
Logothetis, N. K., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior temporal cortex of monkeys. Current Biology, 5, 552–563.
Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: A network model of category learning. Psychological Review, 111, 309–332.
Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley.
Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 1, pp. 103–189). New York: Wiley.
Luce, R. D. (1977). The choice axiom after twenty years. Journal of Mathematical Psychology, 15, 215–233.
McKinley, S. C., & Nosofsky, R. M. (1995). Investigations of exemplar and decision bound models in large, ill-defined category structures. Journal of Experimental Psychology: Human Perception & Performance, 21, 128–148.
McKinley, S. C., & Nosofsky, R. M. (1996). Selective attention and the formation of linear decision boundaries. Journal of Experimental Psychology: Human Perception & Performance, 22, 294–317.
Medin, D. L., Goldstone, R. L., & Gentner, D. (1993). Respects for similarity. Psychological Review, 100, 254–278.
Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207–238.
Mostofsky, D. I. (Ed.) (1965). Stimulus generalization. Stanford: Stanford University Press.
Navarro, D. J. (2002). Representing stimulus similarity. Unpublished doctoral dissertation, University of Adelaide, Adelaide, Australia.
Navarro, D. J. (2007). On the interaction between exemplar-based concepts and a response scaling process. Journal of Mathematical Psychology, 51, 85–98.
Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39–57.
Nosofsky, R. M. (1987). Attention and learning processes in the identification and categorization of integral stimuli. Journal of Experimental Psychology: Learning, Memory, & Cognition, 13, 87–108.
Nosofsky, R. M. (1990). Relations between exemplar-similarity and likelihood models of classification. Journal of Mathematical Psychology, 34, 393–418.
Nosofsky, R. M. (1991a). Tests of an exemplar model for relating perceptual classification and recognition memory. Journal of Experimental Psychology: Human Perception & Performance, 17, 3–27.
Nosofsky, R. M. (1991b). Typicality in logically defined categories: Exemplar-similarity versus rule instantiation. Memory & Cognition, 19, 131–150.
Nosofsky, R. M. (1992). Exemplar-based approach to relating categorization, identification, and recognition. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 363–393). Hillsdale, NJ: Erlbaum.
Nosofsky, R. M., & Zaki, S. R. (2002). Exemplar and prototype models revisited: Response strategies, selective attention, and stimulus generalization. Journal of Experimental Psychology: Learning, Memory, & Cognition, 28, 924–940.
Ohl, F. W., Scheich, H., & Freeman, W. J. (2001). Change in the pattern of ongoing cortical activity with auditory category learning. Nature, 412, 733–736.
Op de Beeck, H., Wagemans, J., & Vogels, R. (2001). Inferotemporal neurons represent low-dimensional configurations of parameterized shapes. Nature Neuroscience, 4, 1244–1252.
Op de Beeck, H., Wagemans, J., & Vogels, R. (2004). A diverse stimulus representation underlies shape categorization by primates (Abstract). Journal of Vision, 4(8), 518a.
Orr, G. B., & Müller, K.-R. (Eds.) (1998). Neural networks: Tricks of the trade. Berlin: Springer.
Palmeri, T. J., & Gauthier, I. (2004). Visual object understanding. Nature Reviews Neuroscience, 5, 291–303.
Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491.
Poggio, T. (1990). A theory of how the brain might work. Cold Spring Harbor Symposia on Quantitative Biology, 55, 899–910.
Poggio, T., & Bizzi, E. (2004). Generalization in vision and motor control. Nature, 431, 768–774.
Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional objects. Nature, 343, 263–266.
Poggio, T., & Girosi, F. (1989). A theory of networks for approximation and learning (Tech. Rep. No. A. I. Memo No. 1140). Cambridge, MA: MIT AI LAB & Center for Biological Information Processing Whitaker College.
Poggio, T., Rifkin, R., Mukherjee, S., & Niyogi, P. (2004). General conditions for predictivity in learning theory. Nature, 428, 419–422.
Poggio, T., & Smale, S. (2003). The mathematics of learning: Dealing with data. Notices of the American Mathematical Society, 50, 537–544.
Posner, M. I., & Keele, S. W. (1968). On the genesis of abstract ideas. Journal of Experimental Psychology, 77, 353–363.
Reed, S. K. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382–407.
Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.
Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7, 573–605.
Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382–439.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.
Rosseel, Y. (2002). Mixture models of categorization. Journal of Mathematical Psychology, 46, 178–210.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 318–362). Cambridge, MA: MIT Press, Bradford Books.
Schoenberg, I. J. (1938). Metric spaces and positive definite functions. Transactions of the American Mathematical Society, 44, 522–536.
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.
Shepard, R. N. (1957). Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrika, 22, 325–345.
Shepard, R. N. (1958). Stimulus and response generalization: Deduction of the generalization gradient from a trace model. Psychological Review, 65, 242–256.
Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance function. Part I. Psychometrika, 27, 125–140.
Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54–87.
Shepard, R. N. (1965). Approximation to uniform gradients of generalization by monotone transformations of scale. In D. I. Mostofsky (Ed.), Stimulus generalization (pp. 94–110). Stanford: Stanford University Press.
Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317–1323.
Shepard, R. N., & Chang, J.-J. (1963). Stimulus generalization in the learning of classifications. Journal of Experimental Psychology, 65, 94–102.
Shepard, R. N., Hovland, C. I., & Jenkins, H. M. (1961). Learning and memorization of classifications. Psychological Monographs, 75(13, Whole No. 517), 1–42.
Sigala, N., Gabbiani, F., & Logothetis, N. K. (2002). Visual categorization and object representation in monkeys and humans. Journal of Cognitive Neuroscience, 14, 187–198.
Sigala, N., & Logothetis, N. K. (2002). Visual categorization shapes feature selectivity in the primate temporal cortex. Nature, 415, 318–320.
Smith, J. D., & Minda, J. P. (1998). Prototypes in the mist: The early epochs of category learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 24, 1411–1436.
Smith, J. D., & Minda, J. P. (2000). Thirty categorization results in search of a model. Journal of Experimental Psychology: Learning, Memory, & Cognition, 26, 3–27.
Spence, K. W. (1937). The differential response in animals to stimuli varying within a single dimension. Psychological Review, 44, 430–444.
Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similarity and Bayesian inference. Behavioral & Brain Sciences, 24, 629–640.
Train, K. E. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University Press.
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352.
Tversky, A., & Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89, 123–154.
Vapnik, V. N. (2000). The nature of statistical learning theory (2nd ed.). New York: Springer.
Verguts, T., Ameel, E., & Storms, G. (2004). Measures of similarity in models of categorization. Memory & Cognition, 32, 379–389.
Wichmann, F. A., Graf, A. B. A., Simoncelli, E. P., Bülthoff, H. H., & Schölkopf, B. (2005). Machine learning applied to perception: Decision images for gender classification. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (pp. 1489–1496). Cambridge, MA: MIT Press.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jäkel, F., Schölkopf, B. & Wichmann, F.A. Generalization and similarity in exemplar models of categorization: Insights from machine learning. Psychonomic Bulletin & Review 15, 256–271 (2008). https://doi.org/10.3758/PBR.15.2.256
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/PBR.15.2.256