Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >
Please use this identifier to cite or link to this item:
|Title: ||Hierarchical Reinforcement Learning for Spoken Dialogue Systems|
|Authors: ||Cuayáhuitl, Heriberto|
|Supervisor(s): ||Renals, Steve|
|Issue Date: ||2009|
|Abstract: ||This thesis focuses on the problem of scalable optimization of dialogue behaviour in speech-based conversational systems using reinforcement learning. Most previous investigations in dialogue strategy learning have proposed flat reinforcement learning methods, which are more suitable for small-scale spoken dialogue systems.
This research formulates the problem in terms of Semi-Markov Decision Processes (SMDPs), and proposes two hierarchical reinforcement learning methods to optimize sub-dialogues rather than full dialogues. The first method uses a hierarchy of SMDPs, where every SMDP ignores irrelevant state variables and actions in order to optimize a sub-dialogue. The second method extends the first one by constraining every SMDP in the hierarchy with prior expert knowledge. The latter method proposes a learning algorithm called 'HAM+HSMQ-Learning', which combines two existing algorithms in the literature of hierarchical reinforcement learning. Whilst the first method generates fully-learnt behaviour, the second one generates semi-learnt behaviour. In addition, this research proposes a heuristic dialogue simulation environment for automatic dialogue strategy learning. Experiments were performed on simulated and real environments based on a travel planning spoken dialogue system. Experimental results provided evidence to support the following claims: First, both methods scale well at the cost of near-optimal solutions, resulting in slightly longer dialogues than the optimal solutions. Second, dialogue strategies learnt with coherent user behaviour and conservative recognition error rates can outperform a reasonable hand-coded strategy. Third, semi-learnt dialogue behaviours are a better alternative (because of their higher overall performance) than hand-coded or fully-learnt dialogue behaviours. Last, hierarchical reinforcement learning dialogue agents are feasible and
promising for the (semi) automatic design of adaptive behaviours in larger-scale spoken dialogue systems.
This research makes the following contributions to spoken dialogue systems which learn their dialogue behaviour. First, the Semi-Markov Decision Process (SMDP) model was proposed to learn spoken dialogue strategies in a scalable way. Second, the concept of 'partially specified dialogue strategies' was proposed for integrating simultaneously hand-coded and learnt spoken dialogue behaviours into a single learning framework. Third, an evaluation with real users of hierarchical reinforcement learning dialogue agents was essential to validate their effectiveness in a realistic environment.|
|Description: ||Institute for Communicating and Collaborative Systems|
|Keywords: ||Spoken dialogue systems|
Semi-automatic dialogue strategy design
Prior expert knowledge
Semi-Markov decision processes
Hierarchical reinforcement learning
|Appears in Collections:||Informatics thesis and dissertation collection|
This item is licensed under a Creative Commons License
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.