|
Edinburgh Research Archive >
Philosophy, Psychology and Language Sciences, School of >
Linguistics and English Language >
Linguistics and English Language Masters thesis collection >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/5317
|
Files in This Item:
| File |
Description |
Size | Format |
| diss.pdf | MSc Thesis Machine Translation for Twitter | 392.91 kB | Adobe PDF | View/Open |
|
| Title: | Machine Translation for Twitter |
| Authors: | Jehl, Laura Elisabeth |
| Supervisor(s): | Osborne, Miles |
| Issue Date: | 24-Nov-2010 |
| Publisher: | The University of Edinburgh |
| Abstract: | We carried out a study in which we explored the feasibility of machine translation for Twitter
for the language pair English and German. As a first step we created a small bilingual corpus
of 1,000 tweets. Using this corpus we carried out an analysis of the linguistic features of
tweets. We tested di erent strategies of domain adaptation and found that they improved
translation performance. In our experiments we found large di erences in performance due to
the handling of unknown words. By using xml-markup we were able to reduce this di erence.
We also replaced special Twitter expressions with placeholders, which enabled us to learn
more robust n-gram statistics from Twitter data. We carried out a small-scale human evaluation
to balance our automatic scores. Finally, we tested strategies to enforce translation output of
legal length. Generating n-best-lists of translation candidates and searching for legal tweets
was found to be helpful, but ultimately too unreliable because there was no systematic way to
determine the required value of n. We suggested a feature function based on character count
as a potential solution. |
| Keywords: | machine translation |
| URI: | http://hdl.handle.net/1842/5317 |
| Appears in Collections: | Linguistics and English Language Masters thesis collection
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|