Show simple item record

dc.contributor.authorMayo, Catherine
dc.contributor.authorClark, Robert A J
dc.contributor.authorKing, Simon
dc.coverage.spatial4en
dc.date.accessioned2006-05-09T11:48:25Z
dc.date.available2006-05-09T11:48:25Z
dc.date.issued2005
dc.identifier.citationIn Proceedings, Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005en
dc.identifier.urihttp://www.isca-speech.org/archive/interspeech_2005
dc.identifier.urihttp://hdl.handle.net/1842/937
dc.description.abstractThe move to unit-selection in speech synthesis has resulted in system improvements being made at subtle sub- and suprasegmental levels. Human perceptual evaluation of such subtle improvements requires a highly sophisticated level of perceptual attention to specific acoustic characteristics or cues. However, it is not well understood what acoustic cues listeners attend to by default when asked to evaluate synthetic speech. It may, therefore, be potentially quite difficult to design an evaluation method that allows listeners to concentrate on only one dimension of the signal, while ignoring others that are perceptually more important to them. This paper describes a pilot study which aims to evaluate multidimensional scaling (MDS) as a possible method of determining what acoustic characteristics of synthetic speech influence listeners’ judgements of the naturalness of the speech. Using distance measures (either real or perceived distances), MDS techniques represent stimuli as points in n-dimensional space. The space is configured so that similar stimuli are close together, while different stimuli are farther apart. Additionally, the dimensions of the space correspond to characteristics of the stimuli which influenced the perceived distances. Our results indicate that MDS techniques should be a useful tool in understanding the complex psychoacoustic processes that listeners undergo when evaluating synthetic speech. This method has allowed us to identify a number of cues that appear to be particularly perceptually salient to listeners evaluating synthetic speech naturalness, namely prosodic cues (in terms of duration and/or intonation) and segmental or unit level cues (in terms of appropriateness of units, or number of units).en
dc.format.extent42959 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherInternational Speech Communication Associationen
dc.subjectspeech synthesisen
dc.subjectmultidimensional scalingen
dc.titleMultidimensional scaling of listener responses to synthetic speechen
dc.typeConference Paperen


Files in this item

This item appears in the following Collection(s)

Show simple item record