Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Philosophy, Psychology and Language Sciences, School of >
Linguistics and English Language >
Linguistics and English Language Masters thesis collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/2052

This item has been viewed 60 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Sharon Givon.pdf862.54 kBAdobe PDFView/Open
Title: Extracting information from fiction
Authors: Givon, Sharon
Supervisor(s): Milosavljevic, Maria
Lapata, Mirella
Issue Date: 2006
Abstract: Information Extraction (IE) based techniques have great potential to enable companies to leverage valuable information embedded in unstructured textual data. Such data could be exploited to help drive sales and to enhance the customer's experience when searching or browsing for products. Extensive research has been performed in the field of IE; however, to date no work has been directly applied to the domain of fiction. The aim of this study is to explore the ability of IE techniques to extract the central characters and their relationships from the full textual content of works of fiction. To begin our investigation, we present a collection of hypotheses outlining our expectations in approaching and resolving these problems. We then outline our data collection process, which resulted in the creation of a Gold Standard containing ordered lists of characters and their relationships for eight classic book texts. For the task of character extraction, we test two rule-based co-reference resolution models, and two ordering techniques. Our best model achieves an average of 100% coverage on the three most important characters and 78.4% across all central characters, compared to a baseline of 73.3% and 57.4% respectively. For the task of relation extraction, we implement rule-based systems to detect the presence and types of relationships between characters. We achieved 73.3% coverage in detecting the top three pairs of characters involved in relationships. The results for inferring relationship types are preliminary. We provide an analysis of relationship mentions in works of fiction and propose a number of approaches for future work.
Keywords: Information extraction
linguistics
URI: http://hdl.handle.net/1842/2052
Appears in Collections:Linguistics and English Language Masters thesis collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback