Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Philosophy, Psychology and Language Sciences, School of >
Linguistics and English Language >
Linguistics and English Language Masters thesis collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/2070

This item has been viewed 47 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Emina Kurtic.pdf815.7 kBAdobe PDFView/Open
Title: Polyglot voice design for unit selection speech synthesis
Authors: Kurtic, Emina
Supervisor(s): Richmond, Korin
Clark, Robert A J
Issue Date: 2004
Abstract: Current text-to-speech (TTS) systems are increasingly faced with mixed language textual input. Most TTS systems are designed to allow building synthetic voices for different languages, but each voice is able to ”speak” only one language at a time. In order to synthesize mixed language input, polyglot voices are needed which are able to switch between languages when it is required by textual input. A polyglot voice will typically have one basic language and additionally the ability to synthesize foreign words when these are encountered in the textual input. Design of polyglot voices for unit selection speech synthesis is still a research question. An inherent problem of unit selection speech synthesis is that the synthesis quality is closely related to the contents of the unit database. Concatenation of units not in the database usually results in bad synthesis quality. At the same time, building the database with good coverage of units results in a prohibitively large database if the intended domain of synthesized text is unlimited. Polyglot databases have an additional problem that not only single language units have to be stored in the database, but also the concatenation points of words from foreign languages have to be accounted for. This exceeds the database size even more, so that it is worth exploring whether database size can be reduced by including only single language units in the database and handling multilingual units on synthesis time. The present work is concerned with database design for a polyglot unit selection voice. It’s main aim is to examine whether alternative methods for handling multilingual cross-word diphones result in same or better synthesis quality than including these diphones in the database. Three alternative approaches are suggested and model polyglot voices are built to test these methods. The languages included in the synthesizer are Bosnian, English and German. The output quality of the synthesized multilingual word boundary is tested on Bosnian-English and Bosnian-German word pairs in a perceptual experiment.
Keywords: speech synthesis
polyglot voice design
linguistics
URI: http://hdl.handle.net/1842/2070
Appears in Collections:Linguistics and English Language Masters thesis collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2013, and/or the original authors. Privacy and Cookies Policy