Edinburgh Research Archive >
Philosophy, Psychology and Language Sciences, School of >
Linguistics and English Language >
Linguistics and English Language Masters thesis collection >
Please use this identifier to cite or link to this item:
|Title: ||Polyglot voice design for unit selection speech synthesis|
|Authors: ||Kurtic, Emina|
|Supervisor(s): ||Richmond, Korin|
Clark, Robert A J
|Issue Date: ||2004|
|Abstract: ||Current text-to-speech (TTS) systems are increasingly faced with mixed language textual
input. Most TTS systems are designed to allow building synthetic voices for different
languages, but each voice is able to ”speak” only one language at a time. In
order to synthesize mixed language input, polyglot voices are needed which are able to
switch between languages when it is required by textual input. A polyglot voice will
typically have one basic language and additionally the ability to synthesize foreign
words when these are encountered in the textual input.
Design of polyglot voices for unit selection speech synthesis is still a research question.
An inherent problem of unit selection speech synthesis is that the synthesis quality
is closely related to the contents of the unit database. Concatenation of units not
in the database usually results in bad synthesis quality. At the same time, building
the database with good coverage of units results in a prohibitively large database if
the intended domain of synthesized text is unlimited. Polyglot databases have an additional
problem that not only single language units have to be stored in the database, but
also the concatenation points of words from foreign languages have to be accounted
for. This exceeds the database size even more, so that it is worth exploring whether
database size can be reduced by including only single language units in the database
and handling multilingual units on synthesis time.
The present work is concerned with database design for a polyglot unit selection voice.
It’s main aim is to examine whether alternative methods for handling multilingual
cross-word diphones result in same or better synthesis quality than including these
diphones in the database. Three alternative approaches are suggested and model polyglot
voices are built to test these methods. The languages included in the synthesizer
are Bosnian, English and German. The output quality of the synthesized multilingual
word boundary is tested on Bosnian-English and Bosnian-German word pairs in a perceptual
|Keywords: ||speech synthesis|
polyglot voice design
|Appears in Collections:||Linguistics and English Language Masters thesis collection|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.