|
Edinburgh Research Archive >
Philosophy, Psychology and Language Sciences, School of >
Linguistics and English Language >
Linguistics and English Language Masters thesis collection >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/2052
|
| Title: | Extracting information from fiction |
| Authors: | Givon, Sharon |
| Supervisor(s): | Milosavljevic, Maria Lapata, Mirella |
| Issue Date: | 2006 |
| Abstract: | Information Extraction (IE) based techniques have great potential to enable
companies to leverage valuable information embedded in unstructured
textual data. Such data could be exploited to help drive sales and to enhance
the customer's experience when searching or browsing for products.
Extensive research has been performed in the field of IE; however, to date
no work has been directly applied to the domain of fiction. The aim of this
study is to explore the ability of IE techniques to extract the central
characters and their relationships from the full textual content of works of
fiction. To begin our investigation, we present a collection of hypotheses
outlining our expectations in approaching and resolving these problems. We
then outline our data collection process, which resulted in the creation of a
Gold Standard containing ordered lists of characters and their relationships
for eight classic book texts. For the task of character extraction, we test two
rule-based co-reference resolution models, and two ordering techniques.
Our best model achieves an average of 100% coverage on the three most
important characters and 78.4% across all central characters, compared to a
baseline of 73.3% and 57.4% respectively. For the task of relation
extraction, we implement rule-based systems to detect the presence and
types of relationships between characters. We achieved 73.3% coverage in
detecting the top three pairs of characters involved in relationships. The
results for inferring relationship types are preliminary. We provide an
analysis of relationship mentions in works of fiction and propose a number
of approaches for future work. |
| Keywords: | Information extraction linguistics |
| URI: | http://hdl.handle.net/1842/2052 |
| Appears in Collections: | Linguistics and English Language Masters thesis collection
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|