Special Issue: Probabilistic models of cognition
Theory-based Bayesian models of inductive learning and reasoning

https://doi.org/10.1016/j.tics.2006.05.009Get rights and content

Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.

Introduction

Human cognition rests on a unique talent for extracting generalizable knowledge from a few specific examples. Consider how a child might first grasp the meaning of a common word, such as ‘horse’. Given several examples of horses labeled prominently by her parents, she is likely to make an inductive leap that goes far beyond the data observed. She could now judge whether any new entity is a horse or not, and she would be mostly correct, except for the occasional donkey, deer or camel. The ability to generalize from sparse data is crucial not only in learning word meanings, but in learning about the properties of objects, cause–effect relations, social rules, and many other domains of knowledge.

This article describes recent research that seeks to understand human inductive learning and reasoning in computational terms (see also Conceptual Foundations Editorial by Chater, Tenenbaum and Yuille in this issue). The goal is to build broadly applicable, quantitatively predictive models that approximate optimal inference in natural environments, and thereby explain why human generalization works the way it does and how it can succeed given such sparse data 1, 2. Our focus is on computational-level theories [3], characterizing the functional capacities of human inference rather than specific psychological processes that implement those functions.

Most previous accounts of inductive generalization represent one of two approaches. The first focuses on relatively domain-general, knowledge-independent statistical mechanisms of inference, based on similarity, association, correlation or other statistical metrics 1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13. This approach has led to successful mathematical models of human generalization in laboratory tasks, but fails to account for many important phenomena of learning and reasoning in complex, real-world domains, such as intuitive biology, intuitive physics or intuitive psychology. The second approach aims to capture more of the richness of human inference, by appealing to sophisticated domain-specific knowledge representations, or intuitive theories 14, 15, 16, 17, 18, 19, 20. An intuitive theory may be thought of as a system of related concepts, together with a set of causal laws, structural constraints, or other explanatory principles, that guide inductive inference in a particular domain. However, theory-based approaches to induction have been notoriously difficult to formalize, particularly in terms that make quantitative predictions about behavior or can be understood in terms of rational statistical inference.

We will argue for an alternative approach, where structured knowledge and statistical inference cooperate rather than compete, allowing us to build on the insights of both traditions. We cast induction as a form of Bayesian statistical inference over structured probabilistic models of the world. These models can be seen as probabilistic versions of intuitive theories 14, 18, 20 or schemas 21, 22, capturing the knowledge about a domain that enables inductive generalization from sparse data. This approach has only become possible in recent years, as advances in artificial intelligence [23] and statistics [24] have provided essential tools for formalizing intuitive theories and theory-based statistical inferences. The influence is bidirectional, as these Bayesian cognitive models have led to new machine-learning algorithms with more powerful and more human-like capacities 25, 26.

Theory-based Bayesian models of induction focus on three important questions: what is the content of probabilistic theories, how are they used to support rapid learning, and how can they themselves be learned? The learner evaluates hypotheses h about some aspect of the world – the meaning of a word, the extension of a property or category, or the presence of a hidden cause – given observed data x and subject to the constraints of a background theory T. Hypotheses are scored by computing posterior probabilities via Bayes' rule:P(h|x,T)=P(x|h,T)P(h|T)hHTP(x|h,T)P(h|T)

The likelihood P(x|h,T) measures how well each hypothesis predicts the data, and the prior probability P(h|T) expresses the plausibility of the hypothesis given the learner's background knowledge. Posterior probabilities P(h|x,T) are proportional to the product of these two terms, representing the learner's degree of belief in each hypothesis given both the constraints of the background theory T and the observed data x (see the Technical Introduction to this special issue by Griffiths and Yuille for further background: Supplementary material online) Adopting this Bayesian framework is just the starting point for our cognitive models. The challenge comes in specifying hypothesis spaces and probability distributions that support Bayesian inference for a given task and domain. In theory-based Bayesian models, the domain theory plays this crucial role.

More formally, the domain theory T generates a space HT of candidate hypotheses, such as all possible meanings for a word, along with the priors P(h|T) and likelihoods P(x|h,T). Prior probabilities and likelihoods are thus not simply statistical records of the learner's previous observations, as in some Bayesian analyses of perception and motor control 27, 28, or previous Bayesian analyses of inductive reasoning [29]. Neither are they assumed to share a single universal structure across all domains, as in Shepard's pioneering Bayesian analysis of generalization [30]. Rather, they are products of abstract systems of knowledge that go substantially beyond the learner's direct experience of the world, and can take qualitatively different forms in different domains.

We will distinguish at least two different levels of knowledge in a theory (Figure 1). Although intuitive theories may well be much richer than this picture suggests, we focus on the minimal aspects of theories needed to support inductive generalization. The base level of a theory is a structured probabilistic model that defines a probability distribution over possible observables – entities, properties, variables, events. This model is typically built on some kind of graph structure capturing relations between observables, such as a taxonomic hierarchy or a causal network, together with a set of numerical parameters. The graph structure determines qualitative aspects of the probabilistic model; the numerical parameters determine more fine-grained quantitative details. At a higher level of knowledge are abstract principles that generate the class of structured models a learner may consider, such as the specification that a given domain is organized taxonomically or causally. Inference at all levels of this theory hierarchy (Figure 1) – using theories to infer unobserved aspects of the data, learning structured models given the abstract domain principles of a theory, and learning the abstract domain principles themselves – can be carried out in a unified and tractable way with hierarchical Bayesian models [24].

The following sections describe theory-based Bayesian models for several important inductive tasks, contrasting them with alternative approaches emphasizing either statistical learning or structured knowledge alone. We begin with the task of learning words or category labels, and focus on the lowest level of inference: theory-based generalization. Then we illustrate the full hierarchical approach in two other domains, property induction and causal inference.

Section snippets

Learning names for things

Behavioral studies of human inductive generalization arguably began with the study of category learning [31]. The basic experimental task presents learners with a set of objects or visual stimuli, and a verbal label (e.g. ‘blicket’) that applies to a subset of the objects. Learners observe several examples of blickets, and perhaps negative examples (non-blickets), and must then infer which other objects the label applies to.

These artificial category-learning tasks abstract the essence of the

Reasoning about hidden properties

Many kinds of predicates may be true of a given entity. Some of these predicates correspond to category labels (is a horse, is a fish) but many correspond to properties, such as is brown, has a spleen, or can fly. Property induction has been the subject of numerous behavioral experiments and formal models. In a typical task, learners find out that one or more categories have a novel property, and must decide how to extend the property to other categories in the domain. For instance, subjects

Causal learning and reasoning

The role of intuitive theories in learning and reasoning has been most prominently studied in the context of causal cognition 33, 18, 51, 19. For many authors, causality is central to the notion of a theory. Carey, for instance, suggests that a theory comprises ‘a set of phenomena that are in its domain, the causal laws and other explanatory mechanisms in terms of which the phenomena are accounted for, and the concepts in terms of which the phenomena and explanatory apparatus are expressed’ (

Conclusion

The theory-based Bayesian framework provides a formal means to address several fundamental questions about human cognition. What is the content and form of human knowledge, at multiple levels of abstraction? How can abstract domain knowledge guide learning of new concepts? How can abstract domain knowledge be learned? What conceptual resources must be built in innately? How do mechanisms of statistical learning and inference interact with – and operate over – structured symbolic knowledge?

Acknowledgements

We thank the current and past members of the Computational Cognitive Science group at MIT for innumerable discussions of the work described here. We owe particular debts to Fei Xu, Patrick Shafto, and Sourabh Niyogi, who have collaborated closely on the projects described here. We acknowledge support from NTT Communication Sciences Laboratories, Mitsubishi Electric Research Labs, the National Science Foundation, the James S. McDonnell Foundation Causal Learning Collaborative, the Paul E. Newton

References (70)

  • Blok, S. et al. (2003) Probability from similarity. In Working Papers of the 2003 AAAI Spring Symposium on Logical...
  • T. Regier

    The emergence of words: attentional learning in form and meaning

    Cogn. Sci.

    (2005)
  • D.R. Shanks

    The Psychology of Associative Learning

    (1995)
  • P. Cheng

    From covariation to causation: A causal power theory

    Psychol. Rev.

    (1997)
  • A. Gopnik

    A theory of causal learning in children: Causal maps and Bayes nets

    Psychol. Rev.

    (2004)
  • R.M. Nosofsky

    Attention, similarity, and the identification-categorization relationship

    J. Exp. Psychol. Gen.

    (1986)
  • J.K. Kruschke

    ALCOVE: An exemplar-based connectionist model of category learning

    Psychol. Rev.

    (1992)
  • S. Carey

    Conceptual Change in Childhood

    (1985)
  • S. Atran

    Folk biology and the anthropology of science: Cognitive universals and cultural particulars

    Behav. Brain Sci.

    (1998)
  • E. Markman

    Naming and Categorization in Children

    (1989)
  • P. Bloom

    How Children Learn the Meanings of Words

    (2000)
  • G.L. Murphy et al.

    The role of theories in conceptual coherence

    Psychol. Rev.

    (1985)
  • H.M. Wellman et al.

    Cognitive development: Foundational theories of core domains

    Annu. Rev. Psychol.

    (1992)
  • A. Gopnik et al.

    Words, Thoughts, and Theories

    (1997)
  • D.E. Rumelhart

    Schemata: the building blocks of cognition

  • M. Minsky

    A framework for representing knowledge

  • S.J. Russell et al.

    Artificial Intelligence: A Modern Approach

    (2002)
  • A. Gelman

    Bayesian Data Analysis

    (2003)
  • Kemp, C. et al. (2004) Semi-supervised learning with trees. In Advances in Neural Information Processing Systems 16,...
  • Kemp, C. et al. Learning systems of concepts with an infinite relational model. In Proc. 21st Natl Conf. Artif. Intell....
  • D. Purves

    Why we see what we do

    Am. Sci.

    (2002)
  • K.P. Kording et al.

    Bayesian integration in sensorimotor learning

    Nature

    (2004)
  • E. Heit

    Properties of inductive reasoning

    Psychon. Bull. Rev.

    (2000)
  • R.N. Shepard

    Towards a universal law of generalization for psychological science

    Science

    (1987)
  • J.A. Bruner

    A Study of Thinking

    (1956)
  • Cited by (579)

    View all citing articles on Scopus
    View full text