Show simple item record

dc.contributor.advisorKoehn, Philipp
dc.contributor.advisorLapata, Mirella
dc.contributor.authorGuthmann, Noemie
dc.date.accessioned2007-10-31T10:38:26Z
dc.date.available2007-10-31T10:38:26Z
dc.date.issued2006
dc.identifier.urihttp://hdl.handle.net/1842/2066
dc.description.abstractThis thesis presents the design and implementation of linguistically-informed models for statistical phrase-based machine translation. Using Koehn’s Pharaoh (2004), a state-of-the-art SMT system, and Moses (Hoang, 2006), a variant of the former which supports factored translation models, we have investigated two approaches: Combined Feature Models and Factored Models. While Combined Feature Models make use of concatenations of linguistic features to enrich their models, Factored Models view a token as a vector of factors, enabling to build relatively independent models for each factor. In the context of machine translation, both models were expected to enrich the existing surface word model with additional linguistic information. The research undertaken focused on finding ways to improve output translation quality for English-to-French and French-to-English translations from various standpoints. A better general readability and understandability of a generated document should be achieved mainly by ensuring the text fluency in the target language (syntactic correctness), its adequacy (use of adequate terminology) and its fidelity (semantic adequacy). These main goals were addressed by first of all analysing the Pharaoh’s current performance, and understanding language specific and model-related problems encountered. Several experiments were then performed using our two approaches, and their results were compared. Despite a few noted improvements in some of the linguistic issues discussed, notably fixed expression translation and part-of-speech ambiguity, major problems involving complex syntactic structures in the source language still posed a hard challenge to the approach of linguistically augmenting phrase-based statistical machine translation.en
dc.format.extent486136 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.subjectmachine translationen
dc.subjectlinguisticsen
dc.titleExploiting linguistically-enriched models of phrase-based statistical machine translationen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelMastersen
dc.type.qualificationnameMSc Master of Scienceen
dcterms.accessRightsRestricted Accessen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record