Generative probabilistic models of goal-directed users in task-oriented dialogs
A longstanding objective of human-computer interaction research is to develop better dialog systems for end users. The subset of user modelling research specifically, aims to provide dialog researchers with models of user behaviour to aid with the design and improvement of dialog systems. Where dialog systems are commercially deployed, they are often to be used by a vast number of users, where sub-optimal performance could lead to an immediate financial loss for the service provider, and even user alienation. Thus, there is a strong incentive to make dialog systems as functional as possible immediately, and crucially prior to their release to the public. Models of user behaviour fill this gap, by simulating the role of human users in the lab, without the losses associated with sub-optimal system performance. User models can also tremendously aid design decisions, by serving as tools for exploratory analysis of real user behaviour, prior to designing dialog software. User modelling is the central problem of this thesis. We focus on a particular kind of dialogs termed task-oriented dialogs (those centred around solving an explicit task) because they represent the frontier of current dialog research and commercial deployment. Users taking part in these dialogs behave according to a set of user goals, which specify what they wish to accomplish from the interaction, and tend to exhibit variability of behaviour given the same set of goals. Our objective is to capture and reproduce (at the semantic utterance level) the range of behaviour that users exhibit while being consistent with their goals. We approach the problem as an instance of generative probabilistic modelling, with explicit user goals, and induced entirely from data. We argue that doing so has numerous practical and theoretical benefits over previous approaches to user modelling which have either lacked a model of user goals, or have been not been driven by real dialog data. A principal problem with user modelling development thus far has been the difficulty in evaluation. We demonstrate how treating user models as probabilistic models alleviates some of these problems through the ability to leverage a whole raft of techniques and insights from machine learning for evaluation. We demonstrate the efficacy of our approach by applying it to two different kinds of task-oriented dialog domains, which exhibit two different sub-problems encountered in real dialog corpora. The first are informational (or slot-filling) domains, specifically those concerning flight and bus route information. In slot-filling domains, user goals take categorical values which allow multiple surface realisations, and are corrupted by speech recognition errors. We address this issue by adopting a topic model representation of user goals which allows us capture both synonymy and phonetic confusability in a unified model. We first evaluate our model intrinsically using held-out probability and perplexity, and demonstrate substantial gains over an alternative string-goal representations, and over a non-goal-directed model. We then show in an extrinsic evaluation that features derived from our model lead to substantial improvements over strong baseline in the task of discriminating between real dialogs (consistent dialogs) and dialogs comprised of real turns sampled from different dialogs (inconsistent dialogs). We then move on to a spatial navigational domain in which user goals are spatial trajectories across a landscape. The disparity between the representation of spatial routes as raw pixel coordinates and their grounding as semantic utterances creates an interesting challenge compared to conventional slot-filling domains. We derive a feature-based representation of spatial goals which facilitates reasoning and admits generalisation to new routes not encountered at training time. The probabilistic formulation of our model allows us to capture variability of behaviour given the same underlying goal, a property frequently exhibited by human users in the domain. We first evaluate intrinsically using held-out probability and perplexity, and find a substantial reduction in uncertainty brought by our spatial representation. We further evaluate extrinsically in a human judgement task and find that our model’s behaviour does not differ significantly from the behaviour of real users. We conclude by sketching two novel ideas for future work: the first is to deploy the user models as transition functions for MDP-based dialog managers; the second is to use the models as a means of restricting the search space for optimal policies, by treating optimal behaviour as a subset of the (distributions over) plausible behaviour which we have induced.