Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Philosophy, Psychology and Language Sciences, School of >
Linguistics and English Language >
Linguistics and English Language Masters thesis collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/5317

This item has been viewed 70 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
diss.pdfMSc Thesis Machine Translation for Twitter392.91 kBAdobe PDFView/Open
Title: Machine Translation for Twitter
Authors: Jehl, Laura Elisabeth
Supervisor(s): Osborne, Miles
Issue Date: 24-Nov-2010
Publisher: The University of Edinburgh
Abstract: We carried out a study in which we explored the feasibility of machine translation for Twitter for the language pair English and German. As a first step we created a small bilingual corpus of 1,000 tweets. Using this corpus we carried out an analysis of the linguistic features of tweets. We tested di erent strategies of domain adaptation and found that they improved translation performance. In our experiments we found large di erences in performance due to the handling of unknown words. By using xml-markup we were able to reduce this di erence. We also replaced special Twitter expressions with placeholders, which enabled us to learn more robust n-gram statistics from Twitter data. We carried out a small-scale human evaluation to balance our automatic scores. Finally, we tested strategies to enforce translation output of legal length. Generating n-best-lists of translation candidates and searching for legal tweets was found to be helpful, but ultimately too unreliable because there was no systematic way to determine the required value of n. We suggested a feature function based on character count as a potential solution.
Keywords: machine translation
URI: http://hdl.handle.net/1842/5317
Appears in Collections:Linguistics and English Language Masters thesis collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback