|
Edinburgh Research Archive >
Informatics, School of >
Informatics Publications >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/3712
|
| Title: | Scaling Reinforcement Learning Paradigms for Motor Control |
| Authors: | Vijayakumar, Sethu Peters, Jan Schaal, Stefan |
| Issue Date: | May-2003 |
| Abstract: | Reinforcement learning offers a general framework to explain reward
related learning in artificial and biological motor control. However, current
reinforcement learning methods rarely scale to high dimensional movement
systems and mainly operate in discrete, low dimensional domains
like game-playing, artificial toy problems, etc. This drawback makes them
unsuitable for application to human or bio-mimetic motor control. In
this poster, we look at promising approaches that can potentially scale
and suggest a novel formulation of the actor-critic algorithm which takes
steps towards alleviating the current shortcomings. We argue that methods
based on greedy policies are not likely to scale into high-dimensional
domains as they are problematic when used with function approximation
– a must when dealing with continuous domains. We adopt the path
of direct policy gradient based policy improvements since they avoid the
problems of unstabilizing dynamics encountered in traditional value iteration
based updates. While regular policy gradient methods have demonstrated
promising results in the domain of humanoid notor control, we
demonstrate that these methods can be significantly improved using the
natural policy gradient instead of the regular policy gradient. Based on
this, it is proved that Kakade’s ‘average natural policy gradient’ is indeed
the true natural gradient. A general algorithm for estimating the
natural gradient, the Natural Actor-Critic algorithm, is introduced. This
algorithm converges with probability one to the nearest local minimum in
Riemannian space of the cost function. The algorithm outperforms nonnatural
policy gradients by far in a cart-pole balancing evaluation, and
offers a promising route for the development of reinforcement learning for
truly high-dimensionally continuous state-action systems. |
| Keywords: | Reinforcement learning |
| URI: | http://hdl.handle.net/1842/3712 |
| Appears in Collections: | Informatics Publications
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|