|
Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/4560
|
| Title: | Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis |
| Authors: | Yamagishi, Junichi Watts, Oliver King, Simon Usabaev, Bela |
| Issue Date: | 2010 |
| Journal Title: | Proc. Interspeech 2010 |
| Abstract: | In speaker-adaptive HMM-based speech synthesis, there are typically a few speakers for which the output synthetic speech sounds worse than that of other speakers, despite having the same amount of adaptation data from within the same corpus. This paper investigates these fluctuations in quality and concludes that as melcepstral distance from the average voice becomes larger, the MOS naturalness scores generally become worse. Although this negative correlation is not that strong, it suggests a way to improve the training and adaptation strategies. We also draw comparisons between our findings and the work of other researchers regarding ``vocal attractiveness.'' |
| URI: | http://hdl.handle.net/1842/4560 |
| Appears in Collections: | CSTR publications
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|