Show simple item record

dc.contributor.authorCabral, Joao P
dc.contributor.authorRenals, Steve
dc.contributor.authorRichmond, Korin
dc.contributor.authorYamagishi, Junichi
dc.date.accessioned2007-09-19T12:41:47Z
dc.date.available2007-09-19T12:41:47Z
dc.date.issued2007
dc.identifier.citationJ. Cabral, S. Renals, K. Richmond, and J. Yamagishi. Towards an improved modeling of the glottal source in statistical parametric speech synthesis. In Proc.of the 6th ISCA Workshop on Speech Synthesis, Bonn, Germany, 2007en
dc.identifier.urihttp://hdl.handle.net/1842/2003
dc.description.abstractThis paper proposes the use of the Liljencrants-Fant model (LF-model) to represent the glottal source signal in HMM-based speech synthesis systems. These systems generally use a pulse train to model the periodicity of the excitation signal of voiced speech. However, this model produces a strong and uniform harmonic structure throughout the spectrum of the excitation which makes the synthetic speech sound buzzy. The use of a mixed band excitation and phase manipulation reduces this effect but it can result in degradation of the speech quality if the noise component is not weighted carefully. In turn, the LF-waveform has a decaying spectrum at higher frequencies, which is more similar to the real glottal source excitation signal. We conducted a perceptual experiment to test the hypothesis that the LF-model can perform as well as or better than the pulse train in a HMM-based speech synthesizer. In the synthesis, we used the mean values of the LF-parameters, calculated by measurements of the recorded speech. The result of this study is important not only regarding the improvement in speech quality of these type of systems, but also because the LF-model can be used to model many characteristics of the glottal source, such as voice quality, which are important for voice transformation and generation of expressive speech.en
dc.format.extent108129 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.subjectspeech technologyen
dc.titleTowards an improved modeling of the glottal source in statistical parametric speech synthesisen
dc.typeConference Paperen


Files in this item

This item appears in the following Collection(s)

Show simple item record