|
Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/1208
|
| Title: | Duration, Pitch and Diphones in the CSTR TTS System |
| Authors: | Campbell, Nick Isard, Stephen Monaghan, Alex Verhoeven, J. |
| Issue Date: | Nov-1990 |
| Citation: | [ICSLP-1990] First International Conference on Spoken Language Processing (ICSLP 90), Kobe, Japan, November 18-22, 1990. pp.825-828. |
| Publisher: | International Speech Communication Association |
| Abstract: | This paper describes the prosodic processing and wave-form generation components of the text-to-speech system being developed at Edinburgh University's Centre for Speech Technology Research. Intonation is specified as a sequence of minimal descriptors whose locations are given in terms of syntactically-determined prosodic domains. A pitch contour is computed by converting the descriptors into a sequence of abstract targets whose absolute values depend on a specific speaker model. Duration is determined first at the level of the syllable by a neural network, then accommodated at the segment level according to the distributions observed in a phonetically balanced database. The output waveform is generated by LPC resynthesis of diphone units. Three methods of diphone segmentation are discussed. |
| URI: | http://www.isca-speech.org/archive/icslp_1990 http://hdl.handle.net/1842/1208 |
| Appears in Collections: | CSTR publications
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|