Abstract
Building intelligent systems that are capable of learning, acting reactively and planning actions before their execution is a major goal of artificial intelligence. This paper presents two reactive and planning systems that contain important novelties with respect to previous neural-network planners and reinforcement-learning based planners: (a) the introduction of a new component (”matcher”) allows both planners to execute genuine taskable planning (while previous reinforcement-learning based models have used planning only for speeding up learning); (b) the planners show for the first time that trained neural-network models of the world can generate long prediction chains that have an interesting robustness with regards to noise; (c) two novel algorithms that generate chains of predictions in order to plan, and control the flows of information between the systems’ different neural components, are presented; (d) one of the planners uses backward ”predictions” to exploit the knowledge of the pursued goal; (e) the two systems presented nicely integrate reactive behavior and planning on the basis of a measure of ”confidence” in action. The soundness and potentialities of the two reactive and planning systems are tested and compared with a simulated robot engaged in a stochastic path-finding task. The paper also presents an extensive literature review on the relevant issues.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, J., Hendler, J., Tate, A. (eds.): Readings in Planning. Morgan Kaufmann, Palo Alto Ca (1990)
Arkin, R.C.: Behavior-Based Robotics. The MIT Press, Cambridge (1998)
Baldassarre, G.: Planning with neural networks and reinforcement learning. Computer Science Department, University of Essex, Ph.D. Thesis (2002)
Baldassarre, G.: A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviors. Cognitive Systems Research 3, 5–13 (2002)
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artificial Intelligence 72, 81–138 (1995)
Brooks, R.A.: A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation 2, 14–23 (1986)
Dearden, R.: Structured prioritized sweeping. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 82–89 (2001)
Fikes, R.E., Nilsson, N.J.: STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, 189–208 (1971)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (1999)
Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement learning algorithm for partially observable Markov decision problems. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems 7, pp. 345–352. The MIT Press, Cambridge Mass (1995)
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)
Korf, R.E.: Optimal path finding algorithms. In: Kanal, L.N., Kumar, V. (eds.) Search in Artificial Intelligence, pp. 223–267. Springer, Berlin (1988)
Lin, L.-J., Mitchell, T.M.: Memory approaches to reinforcement learning in non-Markovian domains. Carnegie Mellon University, Technical Report CMU-CS-92-138 (1992)
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–391 (1992)
Meyer, J.-A., Berthoz, A., Floreano, D. (eds.): From Animals to Animats 6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior. The MIT Press, Cambridge (2000)
Moore, A.W., Atkeson, C.G.: Prioritised sweeping: Reinforcement learning with less data and less real time. Machine Learning 13, 103–130 (1993)
Neal, R.M.: Bayesian learning for neural networks. Springer, Berlin (1996)
Nolfi, S., Elman, J.L., Parisi, D.: Learning and evolution in neural networks. Adaptive Behavior 3, 5–28 (1994)
Reynolds, S.I.: Experience stack reinforcement learning for off-policy control. University of Birmingham, Technical report CSRP-02-1 (2002)
Ross, S.: Introduction to stochastic dynamic programming. Academic Press, New York (1983)
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (1995)
Schmidhuber, J., Wahnsiedler, R.: Planning simple trajectories using neural subgoal generators. In: Meyer, J.-A., Wilson, S.W. (eds.) From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior, pp. 196–202. The MIT Press, Cambridge (1992)
Schmidhuber, J.: Artificial curiosity based on discovering novel algorithmic predictability through coevolution. In: Angeline, P., Michalewicz, X., Schoenauer, M., Yao, X., Zalzala, Z. (eds.) Congress on Evolutionary Computation, pp. 1612–1618. IEEE Press, Piscataway (1999)
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 623–656 (1948)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge Mass (1998)
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceeding of the Seventh International Conference on Machine Learning, pp. 216–224. Morgan Kaufmann, San Mateo Ca (1990)
Tani, J., Nolfi, S.: Learning to perceive the world as articulated: An approach for hierarchical learning in sensory-motor systems. Neural Networks 12, 1131–1141 (1999)
Tani, J.: Model-Based Learning for Mobile Robot Navigation from the Dynamical Systems Perspective. IEEE Transactions in System, Man and Cybernetics, Part B 26, 421–436 (1996)
Thrun, S.B., Moller, K., Linden, A.: Planning with an adaptive world model. In: Tourtezky, D.S., Lippmann, R. (eds.) Advances in Neural Information Processing Systems, vol. 3, pp. 450–456. Morgan Kaufmann, San Mateo Ca (1991)
Thrun, S.: Efficient exploration in reinforcement learning. Carnegie-Mellon University, Technical Report CMU-CS-92-102 (1992)
Widrow, B., Hoff, M.E.: Adaptive switching circuits. IRE WESCON Convention Record Part IV, 96–104 (1960)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Baldassarre, G. (2003). Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot. In: Butz, M.V., Sigaud, O., Gérard, P. (eds) Anticipatory Behavior in Adaptive Learning Systems. Lecture Notes in Computer Science(), vol 2684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45002-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-45002-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40429-3
Online ISBN: 978-3-540-45002-3
eBook Packages: Springer Book Archive