Show simple item record

dc.contributor.advisorLemon, Oliver
dc.contributor.advisorLapata, Maria
dc.contributor.advisorSutton, Charles
dc.contributor.authorKonstas, Ioannis
dc.date.accessioned2014-06-03T14:45:26Z
dc.date.available2014-06-03T14:45:26Z
dc.date.issued2014-06-27
dc.identifier.urihttp://hdl.handle.net/1842/8926
dc.description.abstractMuch of the data found on the world wide web is in numeric, tabular, or other nontextual format (e.g., weather forecast tables, stock market charts, live sensor feeds), and thus inaccessible to non-experts or laypersons. However, most conventional search engines and natural language processing tools (e.g., summarisers) can only handle textual input. As a result, data in non-textual form remains largely inaccessible. Concept-to- text generation refers to the task of automatically producing textual output from non-linguistic input, and holds promise for rendering non-linguistic data widely accessible. Several successful generation systems have been produced in the past twenty years. They mostly rely on human-crafted rules or expert-driven grammars, implement a pipeline architecture, and usually operate in a single domain. In this thesis, we present several novel statistical models that take as input a set of database records and generate a description of them in natural language text. Our unique idea is to combine the processes of structuring a document (document planning), deciding what to say (content selection) and choosing the specific words and syntactic constructs specifying how to say it (lexicalisation and surface realisation), in a uniform joint manner. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). This joint representation allows individual processes (i.e., document planning, content selection, and surface realisation) to communicate and influence each other naturally. We recast generation as the task of finding the best derivation tree for a set of input database records and our grammar, and describe several algorithms for decoding in this framework that allows to intersect the grammar with additional information capturing fluency and syntactic well-formedness constraints. We implement our generators using the hypergraph framework. Contrary to traditional systems, we learn all the necessary document, structural and linguistic knowledge from unannotated data. Additionally, we explore a discriminative reranking approach on the hypergraph representation of our model, by including more refined content selection features. Central to our approach is the idea of porting our models to various domains; we experimented on four widely different domains, namely sportscasting, weather forecast generation, booking flights, and troubleshooting guides. The performance of our systems is competitive and often superior compared to state-of-the-art systems that use domain specific constraints, explicit feature engineering or labelled data.en_US
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en_US
dc.language.isoenen_US
dc.publisherThe University of Edinburghen_US
dc.relation.hasversionKonstas, I. and Lapata, M. (2012). Concept-to-text generation via discriminative reranking. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 369–378, Jeju Island, Korea.en_US
dc.relation.hasversionKonstas, I. and Lapata, M. (2012). Unsupervised concept-to-text generation with hypergraphs. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 752–761, Montr´eal, Canada.en_US
dc.relation.hasversionKonstas, I. and Lapata, M. (2013). A global model for concept-to-text generation. Journal of Artificial Intelligence Research, 48:305–346.en_US
dc.relation.hasversionKonstas, I. and Lapata, M. (2013). Inducing document plans for concept-to-text generation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1503–1514, Seattle, Washington, USA.en_US
dc.subjectnatural language generationen_US
dc.subjectnatural language processingen_US
dc.titleJoint models for concept-to-text generationen_US
dc.typeThesis or Dissertationen_US
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhD Doctor of Philosophyen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record