Skip to main content

Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot

  • Chapter
Anticipatory Behavior in Adaptive Learning Systems

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2684))

Abstract

Building intelligent systems that are capable of learning, acting reactively and planning actions before their execution is a major goal of artificial intelligence. This paper presents two reactive and planning systems that contain important novelties with respect to previous neural-network planners and reinforcement-learning based planners: (a) the introduction of a new component (”matcher”) allows both planners to execute genuine taskable planning (while previous reinforcement-learning based models have used planning only for speeding up learning); (b) the planners show for the first time that trained neural-network models of the world can generate long prediction chains that have an interesting robustness with regards to noise; (c) two novel algorithms that generate chains of predictions in order to plan, and control the flows of information between the systems’ different neural components, are presented; (d) one of the planners uses backward ”predictions” to exploit the knowledge of the pursued goal; (e) the two systems presented nicely integrate reactive behavior and planning on the basis of a measure of ”confidence” in action. The soundness and potentialities of the two reactive and planning systems are tested and compared with a simulated robot engaged in a stochastic path-finding task. The paper also presents an extensive literature review on the relevant issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Allen, J., Hendler, J., Tate, A. (eds.): Readings in Planning. Morgan Kaufmann, Palo Alto Ca (1990)

    Google Scholar 

  2. Arkin, R.C.: Behavior-Based Robotics. The MIT Press, Cambridge (1998)

    Google Scholar 

  3. Baldassarre, G.: Planning with neural networks and reinforcement learning. Computer Science Department, University of Essex, Ph.D. Thesis (2002)

    Google Scholar 

  4. Baldassarre, G.: A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviors. Cognitive Systems Research 3, 5–13 (2002)

    Article  Google Scholar 

  5. Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artificial Intelligence 72, 81–138 (1995)

    Article  Google Scholar 

  6. Brooks, R.A.: A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation 2, 14–23 (1986)

    Article  MathSciNet  Google Scholar 

  7. Dearden, R.: Structured prioritized sweeping. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 82–89 (2001)

    Google Scholar 

  8. Fikes, R.E., Nilsson, N.J.: STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, 189–208 (1971)

    Article  MATH  Google Scholar 

  9. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (1999)

    MATH  Google Scholar 

  10. Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement learning algorithm for partially observable Markov decision problems. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems 7, pp. 345–352. The MIT Press, Cambridge Mass (1995)

    Google Scholar 

  11. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  12. Korf, R.E.: Optimal path finding algorithms. In: Kanal, L.N., Kumar, V. (eds.) Search in Artificial Intelligence, pp. 223–267. Springer, Berlin (1988)

    Google Scholar 

  13. Lin, L.-J., Mitchell, T.M.: Memory approaches to reinforcement learning in non-Markovian domains. Carnegie Mellon University, Technical Report CMU-CS-92-138 (1992)

    Google Scholar 

  14. Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–391 (1992)

    Google Scholar 

  15. Meyer, J.-A., Berthoz, A., Floreano, D. (eds.): From Animals to Animats 6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior. The MIT Press, Cambridge (2000)

    Google Scholar 

  16. Moore, A.W., Atkeson, C.G.: Prioritised sweeping: Reinforcement learning with less data and less real time. Machine Learning 13, 103–130 (1993)

    Google Scholar 

  17. Neal, R.M.: Bayesian learning for neural networks. Springer, Berlin (1996)

    MATH  Google Scholar 

  18. Nolfi, S., Elman, J.L., Parisi, D.: Learning and evolution in neural networks. Adaptive Behavior 3, 5–28 (1994)

    Article  Google Scholar 

  19. Reynolds, S.I.: Experience stack reinforcement learning for off-policy control. University of Birmingham, Technical report CSRP-02-1 (2002)

    Google Scholar 

  20. Ross, S.: Introduction to stochastic dynamic programming. Academic Press, New York (1983)

    MATH  Google Scholar 

  21. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (1995)

    MATH  Google Scholar 

  22. Schmidhuber, J., Wahnsiedler, R.: Planning simple trajectories using neural subgoal generators. In: Meyer, J.-A., Wilson, S.W. (eds.) From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior, pp. 196–202. The MIT Press, Cambridge (1992)

    Google Scholar 

  23. Schmidhuber, J.: Artificial curiosity based on discovering novel algorithmic predictability through coevolution. In: Angeline, P., Michalewicz, X., Schoenauer, M., Yao, X., Zalzala, Z. (eds.) Congress on Evolutionary Computation, pp. 1612–1618. IEEE Press, Piscataway (1999)

    Google Scholar 

  24. Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 623–656 (1948)

    MathSciNet  Google Scholar 

  25. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge Mass (1998)

    Google Scholar 

  26. Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceeding of the Seventh International Conference on Machine Learning, pp. 216–224. Morgan Kaufmann, San Mateo Ca (1990)

    Google Scholar 

  27. Tani, J., Nolfi, S.: Learning to perceive the world as articulated: An approach for hierarchical learning in sensory-motor systems. Neural Networks 12, 1131–1141 (1999)

    Article  Google Scholar 

  28. Tani, J.: Model-Based Learning for Mobile Robot Navigation from the Dynamical Systems Perspective. IEEE Transactions in System, Man and Cybernetics, Part B 26, 421–436 (1996)

    Article  Google Scholar 

  29. Thrun, S.B., Moller, K., Linden, A.: Planning with an adaptive world model. In: Tourtezky, D.S., Lippmann, R. (eds.) Advances in Neural Information Processing Systems, vol. 3, pp. 450–456. Morgan Kaufmann, San Mateo Ca (1991)

    Google Scholar 

  30. Thrun, S.: Efficient exploration in reinforcement learning. Carnegie-Mellon University, Technical Report CMU-CS-92-102 (1992)

    Google Scholar 

  31. Widrow, B., Hoff, M.E.: Adaptive switching circuits. IRE WESCON Convention Record Part IV, 96–104 (1960)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Baldassarre, G. (2003). Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot. In: Butz, M.V., Sigaud, O., Gérard, P. (eds) Anticipatory Behavior in Adaptive Learning Systems. Lecture Notes in Computer Science(), vol 2684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45002-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45002-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40429-3

  • Online ISBN: 978-3-540-45002-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics