Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Informatics, School of >
Informatics Publications >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/4867

This item has been viewed 121 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
SpeechCommVol53iss3p442-450.pdf1.12 MBAdobe PDFView/Open
Title: The Romanian Speech Synthesis (RSS) corpus: building a high quality HMM-based speech synthesis system using a high sampling rate
Authors: Stan, Adriana
Yamagishi, Junichi
King, Simon
Aylett, Matthew
Issue Date: Mar-2011
Citation: Speech Communication, vol.53, issue 3, pp.442--450
Publisher: Elsevier
Abstract: This paper first introduces a newly-recorded high quality Romanian speech corpus designed for speech synthesis, called “RSS”, along with Romanian front-end text processing modules and HMM-based synthetic voices built from the corpus. All of these are now freely available for academic use in order to promote Romanian speech technology research. The RSS corpus comprises 3500 training sentences and 500 test sentences uttered by a female speaker and was recorded using multiple microphones at 96 kHz sampling frequency in a hemianechoic chamber. The details of the new Romanian text processor we have developed are also given. Using the database, we then revisit some basic configuration choices of speech synthesis, such as waveform sampling frequency and auditory frequency warping scale, with the aim of improving speaker similarity, which is an acknowledged weakness of current HMM-based speech synthesisers. As we demonstrate using perceptual tests, these configuration choices can make substantial differences to the quality of the synthetic speech. Contrary to common practice in automatic speech recognition, higher waveform sampling frequencies can offer enhanced feature extraction and improved speaker similarity for HMM-based speech synthesis.
Sponsor(s): European Social Fund, project POSDRU/6/1.5/S/5 and European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant agreement 213845 (the EMIME project)
Keywords: Speech Synthesis
HTS
Romanian
HMMs
Sampling frequency
Auditory scale
URI: http://hdl.handle.net/1842/4867
Appears in Collections:Informatics Publications

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback