Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot

Baldassarre, Gianluca

doi:10.1007/978-3-540-45002-3_11

Gianluca Baldassarre^9,10

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2684))

862 Accesses
11 Citations

Abstract

Building intelligent systems that are capable of learning, acting reactively and planning actions before their execution is a major goal of artificial intelligence. This paper presents two reactive and planning systems that contain important novelties with respect to previous neural-network planners and reinforcement-learning based planners: (a) the introduction of a new component (”matcher”) allows both planners to execute genuine taskable planning (while previous reinforcement-learning based models have used planning only for speeding up learning); (b) the planners show for the first time that trained neural-network models of the world can generate long prediction chains that have an interesting robustness with regards to noise; (c) two novel algorithms that generate chains of predictions in order to plan, and control the flows of information between the systems’ different neural components, are presented; (d) one of the planners uses backward ”predictions” to exploit the knowledge of the pursued goal; (e) the two systems presented nicely integrate reactive behavior and planning on the basis of a measure of ”confidence” in action. The soundness and potentialities of the two reactive and planning systems are tested and compared with a simulated robot engaged in a stochastic path-finding task. The paper also presents an extensive literature review on the relevant issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Learning to Plan with Uncertain Topological Maps

Inferring Adaptive Goal-Directed Behavior Within Recurrent Neural Networks

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

References

Allen, J., Hendler, J., Tate, A. (eds.): Readings in Planning. Morgan Kaufmann, Palo Alto Ca (1990)
Google Scholar
Arkin, R.C.: Behavior-Based Robotics. The MIT Press, Cambridge (1998)
Google Scholar
Baldassarre, G.: Planning with neural networks and reinforcement learning. Computer Science Department, University of Essex, Ph.D. Thesis (2002)
Google Scholar
Baldassarre, G.: A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviors. Cognitive Systems Research 3, 5–13 (2002)
Article Google Scholar
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artificial Intelligence 72, 81–138 (1995)
Article Google Scholar
Brooks, R.A.: A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation 2, 14–23 (1986)
Article MathSciNet Google Scholar
Dearden, R.: Structured prioritized sweeping. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 82–89 (2001)
Google Scholar
Fikes, R.E., Nilsson, N.J.: STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, 189–208 (1971)
Article MATH Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (1999)
MATH Google Scholar
Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement learning algorithm for partially observable Markov decision problems. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems 7, pp. 345–352. The MIT Press, Cambridge Mass (1995)
Google Scholar
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)
Article MATH MathSciNet Google Scholar
Korf, R.E.: Optimal path finding algorithms. In: Kanal, L.N., Kumar, V. (eds.) Search in Artificial Intelligence, pp. 223–267. Springer, Berlin (1988)
Google Scholar
Lin, L.-J., Mitchell, T.M.: Memory approaches to reinforcement learning in non-Markovian domains. Carnegie Mellon University, Technical Report CMU-CS-92-138 (1992)
Google Scholar
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–391 (1992)
Google Scholar
Meyer, J.-A., Berthoz, A., Floreano, D. (eds.): From Animals to Animats 6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior. The MIT Press, Cambridge (2000)
Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritised sweeping: Reinforcement learning with less data and less real time. Machine Learning 13, 103–130 (1993)
Google Scholar
Neal, R.M.: Bayesian learning for neural networks. Springer, Berlin (1996)
MATH Google Scholar
Nolfi, S., Elman, J.L., Parisi, D.: Learning and evolution in neural networks. Adaptive Behavior 3, 5–28 (1994)
Article Google Scholar
Reynolds, S.I.: Experience stack reinforcement learning for off-policy control. University of Birmingham, Technical report CSRP-02-1 (2002)
Google Scholar
Ross, S.: Introduction to stochastic dynamic programming. Academic Press, New York (1983)
MATH Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (1995)
MATH Google Scholar
Schmidhuber, J., Wahnsiedler, R.: Planning simple trajectories using neural subgoal generators. In: Meyer, J.-A., Wilson, S.W. (eds.) From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior, pp. 196–202. The MIT Press, Cambridge (1992)
Google Scholar
Schmidhuber, J.: Artificial curiosity based on discovering novel algorithmic predictability through coevolution. In: Angeline, P., Michalewicz, X., Schoenauer, M., Yao, X., Zalzala, Z. (eds.) Congress on Evolutionary Computation, pp. 1612–1618. IEEE Press, Piscataway (1999)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 623–656 (1948)
MathSciNet Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge Mass (1998)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceeding of the Seventh International Conference on Machine Learning, pp. 216–224. Morgan Kaufmann, San Mateo Ca (1990)
Google Scholar
Tani, J., Nolfi, S.: Learning to perceive the world as articulated: An approach for hierarchical learning in sensory-motor systems. Neural Networks 12, 1131–1141 (1999)
Article Google Scholar
Tani, J.: Model-Based Learning for Mobile Robot Navigation from the Dynamical Systems Perspective. IEEE Transactions in System, Man and Cybernetics, Part B 26, 421–436 (1996)
Article Google Scholar
Thrun, S.B., Moller, K., Linden, A.: Planning with an adaptive world model. In: Tourtezky, D.S., Lippmann, R. (eds.) Advances in Neural Information Processing Systems, vol. 3, pp. 450–456. Morgan Kaufmann, San Mateo Ca (1991)
Google Scholar
Thrun, S.: Efficient exploration in reinforcement learning. Carnegie-Mellon University, Technical Report CMU-CS-92-102 (1992)
Google Scholar
Widrow, B., Hoff, M.E.: Adaptive switching circuits. IRE WESCON Convention Record Part IV, 96–104 (1960)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Essex, CO4 3SQ, Colchester, United Kingdom
Gianluca Baldassarre
Institute of Cognitive Sciences and Technologies, National Research Council of Italy (ISTC-CNR), Viale Marx 15, 00137, Rome, Italy
Gianluca Baldassarre

Authors

Gianluca Baldassarre
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, University of Würzburg, Röntgenring 11, 97070, Würzburg, Germany
Martin V. Butz
Animat Lab, University Paris VI, 104 Av du Président Kennedy, 75016, Paris, France
Olivier Sigaud
ADAge, LIPN, Univ. de Paris-Nord, 93 430, Villetaneuse, France
Pierre Gérard

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Baldassarre, G. (2003). Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot. In: Butz, M.V., Sigaud, O., Gérard, P. (eds) Anticipatory Behavior in Adaptive Learning Systems. Lecture Notes in Computer Science(), vol 2684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45002-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-45002-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40429-3
Online ISBN: 978-3-540-45002-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics