|
Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/4544
|
| Title: | Simple methods for improving speaker-similarity of HMM-based speech synthesis |
| Authors: | Yamagishi, Junichi King, Simon |
| Issue Date: | 2010 |
| Journal Title: | Proc. ICASSP 2010 |
| Abstract: | In this paper we revisit some basic configuration choices of HMM based
speech synthesis, such as waveform sampling rate, auditory
frequency warping scale and the logarithmic scaling of F0, with
the aim of improving speaker similarity which is an acknowledged
weakness of current HMM-based speech synthesisers. All of the
techniques investigated are simple but, as we demonstrate using perceptual
tests, can make substantial differences to the quality of the
synthetic speech. Contrary to common practice in automatic speech
recognition, higher waveform sampling rates can offer enhanced feature
extraction and improved speaker similarity for speech synthesis.
In addition, a generalized logarithmic transform of F0 results
in larger intra-utterance variance of F0 trajectories and hence more
dynamic and natural-sounding prosody. |
| URI: | http://hdl.handle.net/1842/4544 |
| Appears in Collections: | CSTR publications
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|