Composition in distributional models of semantics
Mitchell, Jeffrey John
MetadataShow full item record
Distributional models of semantics have proven themselves invaluable both in cognitive modelling of semantic phenomena and also in practical applications. For example, they have been used to model judgments of semantic similarity (McDonald, 2000) and association (Denhire and Lemaire, 2004; Griffiths et al., 2007) and have been shown to achieve human level performance on synonymy tests (Landuaer and Dumais, 1997; Griffiths et al., 2007) such as those included in the Test of English as Foreign Language (TOEFL). This ability has been put to practical use in automatic thesaurus extraction (Grefenstette, 1994). However, while there has been a considerable amount of research directed at the most effective ways of constructing representations for individual words, the representation of larger constructions, e.g., phrases and sentences, has received relatively little attention. In this thesis we examine this issue of how to compose meanings within distributional models of semantics to form representations of multi-word structures. Natural language data typically consists of such complex structures, rather than just individual isolated words. Thus, a model of composition, in which individual word meanings are combined into phrases and phrases combine to form sentences, is of central importance in modelling this data. Commonly, however, distributional representations are combined in terms of addition (Landuaer and Dumais, 1997; Foltz et al., 1998), without any empirical evaluation of alternative choices. Constructing effective distributional representations of phrases and sentences requires that we have both a theoretical foundation to direct the development of models of composition and also a means of empirically evaluating those models. The approach we take is to first consider the general properties of semantic composition and from that basis define a comprehensive framework in which to consider the composition of distributional representations. The framework subsumes existing proposals, such as addition and tensor products, but also allows us to define novel composition functions. We then show that the effectiveness of these models can be evaluated on three empirical tasks. The first of these tasks involves modelling similarity judgements for short phrases gathered in human experiments. Distributional representations of individual words are commonly evaluated on tasks based on their ability to model semantic similarity relations, e.g., synonymy or priming. Thus, it seems appropriate to evaluate phrase representations in a similar manner. We then apply compositional models to language modelling, demonstrating that the issue of composition has practical consequences, and also providing an evaluation based on large amounts of natural data. In our third task, we use these language models in an analysis of reading times from an eye-movement study. This allows us to investigate the relationship between the composition of distributional representations and the processes involved in comprehending phrases and sentences. We find that these tasks do indeed allow us to evaluate and differentiate the proposed composition functions and that the results show a reasonable consistency across tasks. In particular, a simple multiplicative model is best for a semantic space based on word co-occurrence, whereas an additive model is better for the topic based model we consider. More generally, employing compositional models to construct representations of multi-word structures typically yields improvements in performance over non-compositonal models, which only represent individual words.