Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >
Please use this identifier to cite or link to this item:
Files in This Item:
|Hsueh2009.pdf||PhD thesis||3.16 MB||Adobe PDF||View/Open||Hsueh2009_Supplementary.zip||File not available for download||7.76 MB||Unknown|
|Title: ||Meeting decision detection: multimodal information fusion for multi-party dialogue understanding|
|Authors: ||Hsueh, Pei-Yun|
|Supervisor(s): ||Moore, Johanna D.|
|Issue Date: ||2009|
|Publisher: ||The University of Edinburgh|
|Abstract: ||Modern advances in multimedia and storage technologies have led to huge archives
of human conversations in widely ranging areas. These archives offer a wealth of information
in the organization contexts. However, retrieving and managing information
in these archives is a time-consuming and labor-intensive task. Previous research applied
keyword and computer vision-based methods to do this. However, spontaneous
conversations, complex in the use of multimodal cues and intricate in the interactions
between multiple speakers, have posed new challenges to these methods. We need
new techniques that can leverage the information hidden in multiple communication
modalities – including not just “what” the speakers say but also “how” they express
themselves and interact with others.
In responding to this need, the thesis inquires into the multimodal nature of meeting
dialogues and computational means to retrieve and manage the recorded meeting
information. In particular, this thesis develops the Meeting Decision Detector (MDD)
to detect and track decisions, one of the most important outcomes of the meetings.
The MDD involves not only the generation of extractive summaries pertaining to the
decisions (“decision detection”), but also the organization of a continuous stream of
meeting speech into locally coherent segments (“discourse segmentation”).
This inquiry starts with a corpus analysis which constitutes a comprehensive empirical
study of the decision-indicative and segment-signalling cues in the meeting
corpora. These cues are uncovered from a variety of communication modalities, including
the words spoken, gesture and head movements, pitch and energy level, rate
of speech, pauses, and use of subjective terms. While some of the cues match the
previous findings of speech segmentation, some others have not been studied before.
The analysis also provides empirical grounding for computing features and integrating
them into a computational model. To handle the high-dimensional multimodal
feature space in the meeting domain, this thesis compares empirically feature discriminability
and feature pattern finding criteria. As the different knowledge sources are
expected to capture different types of features, the thesis also experiments with methods
that can harness synergy between the multiple knowledge sources.
The problem formalization and the modeling algorithm so far correspond to an
optimal setting: an off-line, post-meeting analysis scenario. However, ultimately the
MDD is expected to be operated online – right after a meeting, or when a meeting
is still in progress. Thus this thesis also explores techniques that help relax the optimal
setting, especially those using only features that can be generated with a higher
degree of automation. Empirically motivated experiments are designed to handle the
corresponding performance degradation.
Finally, with the users in mind, this thesis evaluates the use of query-focused summaries
in a decision debriefing task, which is common in the organization context. The
decision-focused extracts (which represent compressions of 1%) is compared against
the general-purpose extractive summaries (which represent compressions of 10-40%).
To examine the effect of model automation on the debriefing task, this evaluation experiments
with three versions of decision-focused extracts, each relaxing one manual
annotation constraint. Task performance is measured in actual task effectiveness, usergenerated
report quality, and user-perceived success. The users’ clicking behaviors are
also recorded and analyzed to understand how the users leverage the different versions
of extractive summaries to produce abstractive summaries.
The analysis framework and computational means developed in this work is expected
to be useful for the creation of other dialogue understanding applications, especially
those that require to uncover the implicit semantics of meeting dialogues.|
|Sponsor(s): ||Google Anita Borg Memorial Scholarship program|
|Keywords: ||Meeting Decision Detector|
|Appears in Collections:||Informatics thesis and dissertation collection|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.