|
Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/6254
|
Files in This Item:
| File |
Description |
Size | Format |
Lang2012.pdf | one year restriction | 1.36 MB | Adobe PDF | |
|
| Title: | Unsupervised induction of semantic roles |
| Authors: | Lang, Joel |
| Supervisor(s): | Lapata, Mirella Sutton, Charles |
| Issue Date: | 25-Jun-2012 |
| Publisher: | The University of Edinburgh |
| Abstract: | In recent years, a considerable amount of work has been devoted to the task of automatic
frame-semantic analysis. Given the relative maturity of syntactic parsing technology,
which is an important prerequisite, frame-semantic analysis represents a realistic
next step towards broad-coverage natural language understanding and has been
shown to benefit a range of natural language processing applications such as information
extraction and question answering.
Due to the complexity which arises from variations in syntactic realization, data-driven
models based on supervised learning have become the method of choice for this task.
However, the reliance on large amounts of semantically labeled data which is costly
to produce for every language, genre and domain, presents a major barrier to the
widespread application of the supervised approach.
This thesis therefore develops unsupervised machine learning methods, which automatically
induce frame-semantic representations without making use of semantically
labeled data. If successful, unsupervised methods would render manual data annotation
unnecessary and therefore greatly benefit the applicability of automatic framesemantic
analysis.
We focus on the problem of semantic role induction, in which all the argument instances
occurring together with a specific predicate in a corpus are grouped into clusters
according to their semantic role. Our hypothesis is that semantic roles can be induced
without human supervision from a corpus of syntactically parsed sentences, by
leveraging the syntactic relations conveyed through parse trees with lexical-semantic
information.
We argue that semantic role induction can be guided by three linguistic principles. The
first is the well-known constraint that semantic roles are unique within a particular
frame. The second is that the arguments occurring in a specific syntactic position
within a specific linking all bear the same semantic role. The third principle is that
the (asymptotic) distribution over argument heads is the same for two clusters which
represent the same semantic role. We consider two approaches to semantic role induction based on two fundamentally
different perspectives on the problem. Firstly, we develop feature-based probabilistic
latent structure models which capture the statistical relationships that hold between the
semantic role and other features of an argument instance. Secondly, we conceptualize
role induction as the problem of partitioning a graph whose vertices represent argument
instances and whose edges express similarities between these instances. The graph
thus represents all the argument instances for a particular predicate occurring in the
corpus. The similarities with respect to different features are represented on different
edge layers and accordingly we develop algorithms for partitioning such multi-layer
graphs.
We empirically validate our models and the principles they are based on and show that
our graph partitioning models have several advantages over the feature-based models.
In a series of experiments on both English and German the graph partitioning models
outperform the feature-based models and yield significantly better scores over a strong
baseline which directly identifies semantic roles with syntactic positions.
In sum, we demonstrate that relatively high-quality shallow semantic representations
can be induced without human supervision and foreground a promising direction of
future research aimed at overcoming the problem of acquiring large amounts of lexicalsemantic
knowledge. |
| Sponsor(s): | Engineering and Physical Sciences Research Council (EPSRC) |
| Keywords: | automatic frame-semantic analysis. natural language unsupervised machine learning methods, semantic role induction semantic roles |
| URI: | http://hdl.handle.net/1842/6254 |
| Appears in Collections: | Informatics thesis and dissertation collection
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|