Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/2384

This item has been viewed 105 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Clarke J thesis 08.zipOriginal files are restricted access1.3 MBZip file
Clarke J thesis 08.pdfOpen Access version1.23 MBAdobe PDFView/Open
Title: Global Inference for Sentence Compression: An Integer Linear Programming Approach
Authors: Clarke, James
Supervisor(s): Lapata, Mirella
Renals, Steve
Issue Date: 2008
Abstract: In this thesis we develop models for sentence compression. This text rewriting task has recently attracted a lot of attention due to its relevance for applications (e.g., summarisation) and simple formulation by means of word deletion. Previous models for sentence compression have been inherently local and thus fail to capture the long range dependencies and complex interactions involved in text rewriting. We present a solution by framing the task as an optimisation problem with local and global constraints and recast existing compression models into this framework. Using the constraints we instil syntactic, semantic and discourse knowledge the models otherwise fail to capture. We show that the addition of constraints allow relatively simple local models to reach state-of-the-art performance for sentence compression. The thesis provides a detailed study of sentence compression and its models. The differences between automatic and manually created compression corpora are assessed along with how compression varies across written and spoken text. We also discuss various techniques for automatically and manually evaluating compression output against a gold standard. Models are reviewed based on their assumptions, training requirements, and scalability. We introduce a general method for extending previous approaches to allow for more global models. This is achieved through the optimisation framework of Integer Linear Programming (ILP). We reformulate three compression models: an unsupervised model, a semi-supervised model and a fully supervised model as ILP problems and augment them with constraints. These constraints are intuitive for the compression task and are both syntactically and semantically motivated. We demonstrate how they improve compression quality and reduce the requirements on training material. Finally, we delve into document compression where the task is to compress every sentence of a document and use the resulting summary as a replacement for the original document. For document-based compression we investigate discourse information and its application to the compression task. Two discourse theories, Centering and lexical chains, are used to automatically annotate documents. These annotations are then used in our compression framework to impose additional constraints on the resulting document. The goal is to preserve the discourse structure of the original document and most of its content. We show how a discourse informed compression model can outperform a discourse agnostic state-of-the-art model using a question answering evaluation paradigm.
Description: Institute for Communicating and Collaborative Systems
Keywords: Informatics
Computer Science
URI: http://hdl.handle.net/1842/2384
Appears in Collections:Informatics thesis and dissertation collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2013, and/or the original authors. Privacy and Cookies Policy