Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >
Please use this identifier to cite or link to this item:
|Title: ||Entity Coherence for Descriptive Text Structuring|
|Authors: ||Karamanis, Nikiforos|
|Supervisor(s): ||Mellish, Chris|
|Issue Date: ||Jul-2004|
|Publisher: ||University of Edinburgh. College of Science and Engineering. School of Informatics.|
|Abstract: ||Although entity coherence, i.e. the coherence that arises from certain patterns of references to
entities, is of attested importance for characterising a descriptive text structure, whether and how current formal models of entity coherence such as Centering Theory can be used for the purposes of natural language generation remains unclear. This thesis investigates this issue and sets out to explore which of the many formulations of Centering best suits text structuring. In doing this, we assume text
structuring to be a search task where different orderings of propositions are evaluated according to scores assigned by a metric.
The main question behind this study is how to choose a metric of entity coherence among many
alternatives as the only guidance to the text structuring component of a system that produces descriptions of objects. Different ways of defining metrics of entity coherence using Centering’s notions are discussed and a general corpus-based methodology is introduced to identify which of these metrics constitute the most promising candidates for search-based text structuring before the actual generation
of the descriptive structure takes place.
The performance of a large set of metrics is estimated empirically in a series of computational
experiments using two kinds of data: (i) a reliably annotated corpus representing the genre of interest and (ii) data derived from an existing natural language generation system and ordered according to the instructions of a domain expert.
A final experiment supplements our main methodology by automatically evaluating the best scoring orderings of some of the best performing metrics in comparison to an upper bound defined by orderings produced by multiple experts on additional application-specific data and a lower bound defined by a random baseline.
The main findings are summarised as follows: In general, the simplest metric of entity coherence
constitutes a very robust baseline for both datasets. However, when the metrics are modified
according to an additional constraint on entity coherence, then the baseline is beaten in domain (ii).
The employed modification is supported by the subsidiary evaluation which renders all employed
metrics superior to the random baseline and helps identify the metric which overall constitutes the
most suitable candidate (among the ones investigated) for search-based descriptive text structuring in
This thesis provides substantial insight into the role of entity coherence as a descriptive text structuring
constraint. Viewing Centering from an NLG perspective raises a series of interesting challenges
that the thesis identifies and attempts to investigate to a certain extent. The general evaluation methodology
and the results of the empirical studies are useful for any subsequent attempt to generate a descriptive
text structure in the context of an application that makes use of the notion of entity coherence
as modelled by Centering.|
|Description: ||Institute for Communicating and Collaborative Systems|
|Sponsor(s): ||Greek Scholarships Foundation(IKY)|
|Appears in Collections:||Informatics thesis and dissertation collection|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.