<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>ERA Collection:</title>
  <link rel="alternate" href="http://hdl.handle.net/1842/3390" />
  <subtitle />
  <id>http://hdl.handle.net/1842/3390</id>
  <updated>2013-05-23T16:54:52Z</updated>
  <dc:date>2013-05-23T16:54:52Z</dc:date>
  <entry>
    <title>The Polyadic pi-Calculus: A Tutorial</title>
    <link rel="alternate" href="http://hdl.handle.net/1842/6050" />
    <author>
      <name>Milner, Robin</name>
    </author>
    <id>http://hdl.handle.net/1842/6050</id>
    <updated>2012-07-06T10:33:31Z</updated>
    <published>1991-01-01T00:00:00Z</published>
    <summary type="text">Title: The Polyadic pi-Calculus: A Tutorial
Authors: Milner, Robin
Abstract: The pi-calculus is a model of concurrent computation based upon the notion of naming. It is first presented in its simplest and original form, with the help of several illustrative applications. Then it is generalized from monadic to polyadic form. Semantics is done in terms of both a reduction system and a version of labelled transitions called commitment; the known algebraic axiomatization of strong bisimilarity is given in the new setting, and so also is a characterization in modal logic. Some theorems about the replication operator are proved.&#xD;
&#xD;
Justification for the polyadic form is provided by the concept of sort, sorting and sort discipline which it supports. Several illustrations of different sortings are given. One example is the presentation of data structures as processes which respect a particular sorting; another is the sorting for a known translation of the lambda-calculus in to pi-calculus. For this translation, the equational validity of beta-conversion is proved with the help of replication theorems. The paper ends with an extension of the pi-calculus to w-order processes, and a brief account of the demonstration by Davide Sangiorgi that higher-order processes may be faithfully encoded at first-order. This extends and strengthens the original result of this kind given by Bent Thomsen for second-order processes.
Description: This report was published in F. L. Hamer, W. Brauer and H. Schwichtenberg, editors, Logic and Algebra of Specification. Springer-Verlag, 1993.</summary>
    <dc:date>1991-01-01T00:00:00Z</dc:date>
  </entry>
  <entry>
    <title>Value Function Approximation on Non-Linear Manifolds for Robot Motor Control</title>
    <link rel="alternate" href="http://hdl.handle.net/1842/3714" />
    <author>
      <name>Sugiyama, Masashi</name>
    </author>
    <author>
      <name>Hachiya, Hirotaka</name>
    </author>
    <author>
      <name>Towell, Christopher</name>
    </author>
    <author>
      <name>Vijayakumar, Sethu</name>
    </author>
    <id>http://hdl.handle.net/1842/3714</id>
    <updated>2010-08-31T15:11:26Z</updated>
    <published>2007-04-01T00:00:00Z</published>
    <summary type="text">Title: Value Function Approximation on Non-Linear Manifolds for Robot Motor Control
Authors: Sugiyama, Masashi; Hachiya, Hirotaka; Towell, Christopher; Vijayakumar, Sethu
Abstract: The least squares approach works efficiently in&#xD;
value function approximation, given appropriate basis functions.&#xD;
Because of its smoothness, the Gaussian kernel is a&#xD;
popular and useful choice as a basis function. However, it&#xD;
does not allow for discontinuity which typically arises in realworld&#xD;
reinforcement learning tasks. In this paper, we propose&#xD;
a new basis function based on geodesic Gaussian kernels,&#xD;
which exploits the non-linear manifold structure induced by&#xD;
the Markov decision processes. The usefulness of the proposed&#xD;
method is successfully demonstrated in a simulated robot arm&#xD;
control and Khepera robot navigation.</summary>
    <dc:date>2007-04-01T00:00:00Z</dc:date>
  </entry>
  <entry>
    <title>Reinforcement Learning for Humanoid Robots - Policy Gradients and Beyond</title>
    <link rel="alternate" href="http://hdl.handle.net/1842/3710" />
    <author>
      <name>Vijayakumar, Sethu</name>
    </author>
    <author>
      <name>Peters, Jan</name>
    </author>
    <author>
      <name>Schaal, Stefan</name>
    </author>
    <id>http://hdl.handle.net/1842/3710</id>
    <updated>2010-08-31T15:09:07Z</updated>
    <published>2004-07-01T00:00:00Z</published>
    <summary type="text">Title: Reinforcement Learning for Humanoid Robots - Policy Gradients and Beyond
Authors: Vijayakumar, Sethu; Peters, Jan; Schaal, Stefan
Abstract: Reinforcement learning offers one of the most general frameworks to take traditional robotics towards true autonomy&#xD;
and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid&#xD;
robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms&#xD;
of their applicability in humanoid robotics. Methods can be coarsely classified in to three different categories, i.e.,&#xD;
greedy methods, ’vanilla’ policy gradient methods, and natural gradient methods. We discuss that greedy methods are&#xD;
not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation.&#xD;
Vanilla’ policy gradient methods on the other hand have been successfully applied on real-world robots including at&#xD;
least one humanoid robot [3]. We demonstrate that these methods can be significantly improved using the natural&#xD;
policy gradient instead of the regular policy gradient. A derivation of the natural policy gradient is provided, proving&#xD;
that the average policy gradient of Kakade[10] is indeed the true natural gradient. A general algorithm for estimating&#xD;
the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges to the nearest local&#xD;
minimum of the cost function with respect to the Fisher information metric under suitable conditions. The algorithm&#xD;
outperforms non-natural policy gradients by far in a cart-pole balancing evaluation, and for learning non-linear dynamic&#xD;
motor primitives for humanoid robot control. It offers a promising route for the development of reinforcement&#xD;
learning for truly high-dimensionally continuous state-action systems.</summary>
    <dc:date>2004-07-01T00:00:00Z</dc:date>
  </entry>
  <entry>
    <title>Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators</title>
    <link rel="alternate" href="http://hdl.handle.net/1842/3709" />
    <author>
      <name>Howard, Matthew</name>
    </author>
    <author>
      <name>Vijayakumar, Sethu</name>
    </author>
    <id>http://hdl.handle.net/1842/3709</id>
    <updated>2010-08-31T15:03:06Z</updated>
    <published>2007-01-01T00:00:00Z</published>
    <summary type="text">Title: Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators
Authors: Howard, Matthew; Vijayakumar, Sethu
Abstract: We consider the problem of direct policy learning in situations where the policies are only&#xD;
observable through their projections into the null-space of a set of dynamic, non-linear&#xD;
task constraints. We tackle the issue of deriving consistent data for the learning of such&#xD;
policies and make two contributions towards its solution. Firstly, we derive the conditions&#xD;
required to exactly reconstruct null-space policies and suggest a learning strategy based on&#xD;
this derivation. Secondly, we consider the case that the null-space policy is conservative&#xD;
and show that such a policy can be learnt more easily and robustly by learning the&#xD;
underlying potential function and using this as our representation of the policy.</summary>
    <dc:date>2007-01-01T00:00:00Z</dc:date>
  </entry>
</feed>

