Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/532

This item has been viewed 38 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
wyatt_ps.psPostScript2.09 MBPostscriptView/Open
wyatt_pdfAdobe PDF format1.06 MBAdobe PDFView/Open
Title: Exploration and Inference in Learning from Reinforcement
Authors: Wyatt, Jeremy
Supervisor(s): Hayes, Gillian
Hallam, John
Issue Date: Jul-1998
Publisher: University of Edinburgh. College of Science and Engineering. School of Informatics.
Abstract: Recently there has been a good deal of interest in using techniques developed for learning from reinforcement to guide learning in robots. Motivated by the desire to find better robot learning methods, this thesis prsents a number of novel extensions to existing techniques for controlling exploration and inference in reinforcement learning. First I distinguish between the well known exploration-exploitation trade-off and what I term exploration for future exploitation. it is argued that there are many tasks where it is more appropriate to maximise this latter measure. In particular it is appropriate when we want to employ learning algorithms as part of the process of designing a controller.Informed by this insight I develop a number of novel measures of the probability of a particular course of action being the optimal ourse of action. Estimators are developed for this measure for boolean and non-boolean processes. These are used in turn to develp probability matching techniques for guiding the exploration-exploitation trade-off. A proof is presented that one such method will converge in the limit to the optimal policy. Following this I develop an engropic measure of task-knowledg, based on the previous measure.
Description: Institute of Perception, Action and Behaviour
Sponsor(s): Engineering and Physical Sciences Research Council (EPSRC)
URI: http://hdl.handle.net/1842/532
Appears in Collections:Informatics thesis and dissertation collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2013, and/or the original authors. Privacy and Cookies Policy