|
Edinburgh Research Archive >
Philosophy, Psychology and Language Sciences, School of >
Linguistics and English Language >
Linguistics and English Language Masters thesis collection >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/2066
|
| Title: | Exploiting linguistically-enriched models of phrase-based statistical machine translation |
| Authors: | Guthmann, Noemie |
| Supervisor(s): | Koehn, Philipp Lapata, Mirella |
| Issue Date: | 2006 |
| Abstract: | This thesis presents the design and implementation of linguistically-informed models for
statistical phrase-based machine translation. Using Koehn’s Pharaoh (2004), a state-of-the-art
SMT system, and Moses (Hoang, 2006), a variant of the former which supports factored
translation models, we have investigated two approaches: Combined Feature Models and
Factored Models. While Combined Feature Models make use of concatenations of linguistic
features to enrich their models, Factored Models view a token as a vector of factors, enabling
to build relatively independent models for each factor. In the context of machine translation,
both models were expected to enrich the existing surface word model with additional
linguistic information.
The research undertaken focused on finding ways to improve output translation quality
for English-to-French and French-to-English translations from various standpoints. A better
general readability and understandability of a generated document should be achieved mainly
by ensuring the text fluency in the target language (syntactic correctness), its adequacy (use of
adequate terminology) and its fidelity (semantic adequacy). These main goals were addressed
by first of all analysing the Pharaoh’s current performance, and understanding language specific
and model-related problems encountered. Several experiments were then performed
using our two approaches, and their results were compared.
Despite a few noted improvements in some of the linguistic issues discussed, notably
fixed expression translation and part-of-speech ambiguity, major problems involving complex
syntactic structures in the source language still posed a hard challenge to the approach of
linguistically augmenting phrase-based statistical machine translation. |
| Keywords: | machine translation linguistics |
| URI: | http://hdl.handle.net/1842/2066 |
| Appears in Collections: | Linguistics and English Language Masters thesis collection
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|