Abstract
Active learning differs from “learning from examples” in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples.
In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning calledselective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers “useful.” We test our implementation, called anSG-network, on three domains and observe significant improvement in generalization.
Article PDF
Similar content being viewed by others
References
Aggoune, M., Atlas, L., Cohn, D., Damborg, M., El-Sharkawi, M., & Marks, R. H. (1989). Artificial neural networks for power system static security assessment.Proceedings, International Symposium on Circuits and Systems. IEEE.
Angluin, D. (1986). Learning regular sets from queries and counter-examples. (Technical Report YALEU/DCS/TR-64). Dept. of Computer Science, Yale University, New Haven, CT.
Ash, T. (1989). Dynamic node creation in backpropagation networks.ICS Report 8901. Institute for Cognitive Science, University of California, San Diego, CA.
Aum, E., & Haussler, D. (1989). What size net gives valid generalization? In D. Touretzky (Ed.),Advances in neural information processing systems, (Vol. 1). San Francisco, CA: Morgan Kaufmann.
Baum, E., & Lang, K. (1991). Constructing hidden units using examples and queries. In R. Lippmann et al. (Eds.),Advances in neural information processing systems (Vol. 3). San Francisco, CA: Morgan Kaufmann.
Blum, A., & Rivest, R. (1989). Training a 3-node neural network is NP-complete. In D. Touretzky (Ed.),Advances in neural information processing systems, Volume 1. San Francisco, CA: Morgan Kaufmann.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension.JACM, 36(4), 929–965.
Cohn, D., Atlas, L., & Ladner, R. (1990). Training connectionist networks with queries and selective sampling. In D. Touretzky (Ed.),Advances in neural information processing systems, (Vol. 2). San Francisco, CA: Morgan Kaufmann.
Cohn, D., & Tesauro, G. (1992). How tight are the Vapnik-Chervonenkis bounds?Neural Computation 4(2), 249–269.
Eisenberg, B., & Rivest, R. (1990). On the sample complexity of pac-learning using random and chosen examples. In M. Fulk & J. Case (Eds.),ACM 3rd Annual Workshop on Computational Learning Theory. San Francisco, CA: Morgan Kaufmann.
Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant preference for Motherese speech.Infant Behavior and Development, 10, 279–293.
Freund, Y., Seung, H.S., Shamir, E., & Tishby, N. (1993). Information, prediction, and query by committee. In S. Hanson et al., (Eds.),Advances in Neural Information Processing Systems (Vol. 5). San Francisco, CA: Morgan Kaufmann.
Haussler, D. (1987). Learning conjunctive concepts in structural domains.Proceedings, AAAI '87 (pp. 466–470). San Francisco, CA: Morgan Kaufmann.
Haussler, D., (1992). Decision-theoretic generalizations of the PAC model for neural net and other applications.Information and Computation, 100(1), 78–150.
Hwang, J.-N., Choi, J., Oh, S., & Marks, R. (1990). Query learning based on boundary search and gradient computation of trained multilayer perceptrons.IJCNN 90. San Diego, CA.
Judd, S. (1988). On the complexity of loading shallow neural networks.Journal of Complexity, 4, 177–192.
Le Cunn, Y., Denker, J., & Solla, S. (1990). Optimal brain damage. In D. Touretzky (Ed.),Advances in neural information processing systems (Vol. 2). San Francisco, CA: Morgan Kaufmann.
MacKay, D. (1992). Information-based objective functions for active data selection.Neural Computation, 4(4), 590–604.
Mitchell, T. (1982). Generalization as search.Artificial Intelligence, 18, 203–226.
Pratt, L.Y. (1993). Discriminability-based transfer between neural networks. In C.L. Giles, et al. (Eds.),Advances in Neural Information Processing Systems, (Vol. 5). San Francisco, CA: Morgan Kaufmann.
Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning internal representations by error propagation. In D. Rumelhart & J. McClelland (Eds.),Parallel distributed processing, Cambridge, MA: MIT Press.
Seung, H.S., Opper, M., & Sompolinsky, H. (1992). Query by committee. InProceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 287–294). New York: ACM.
Valiant, L. (1984). A theory of the learnable.Communications of the ACM, 27, 1134–1142.
Author information
Authors and Affiliations
Additional information
A preliminary version of this article appears as Cohn et al. (1990).
Rights and permissions
About this article
Cite this article
Cohn, D., Atlas, L. & Ladner, R. Improving generalization with active learning. Mach Learn 15, 201–221 (1994). https://doi.org/10.1007/BF00993277
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00993277