|
Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/1082
|
| Title: | Unit selection in a concatenative speech synthesis system using a large speech database |
| Authors: | Hunt, Andrew Black, Alan W |
| Issue Date: | May-1996 |
| Citation: | Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, Volume 1, 7-10 May 1996 Page(s):373 - 376. |
| Publisher: | IEEE Signal Processing Society |
| Abstract: | One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two methods for training from speech are presented which provide weights which produce more natural speech than can be obtained by hand-tuning. |
| URI: | http://ieeexplore.ieee.org/ http://hdl.handle.net/1842/1082 |
| Appears in Collections: | CSTR publications
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|