<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel rdf:about="http://hdl.handle.net/1842/906">
    <title>ERA Collection:</title>
    <link>http://hdl.handle.net/1842/906</link>
    <description />
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://hdl.handle.net/1842/4663" />
        <rdf:li rdf:resource="http://hdl.handle.net/1842/4662" />
        <rdf:li rdf:resource="http://hdl.handle.net/1842/4661" />
        <rdf:li rdf:resource="http://hdl.handle.net/1842/4660" />
      </rdf:Seq>
    </items>
    <dc:date>2013-05-25T21:54:22Z</dc:date>
  </channel>
  <item rdf:about="http://hdl.handle.net/1842/4663">
    <title>Further exploration of the possibilities and pitfalls of multidimensional scaling as a tool for the evaluation of the quality of synthesized speech</title>
    <link>http://hdl.handle.net/1842/4663</link>
    <description>Title: Further exploration of the possibilities and pitfalls of multidimensional scaling as a tool for the evaluation of the quality of synthesized speech
Authors: Janska, Anna C.; Clark, Robert A J
Abstract: Multidimensional scaling (MDS) has been suggested as a useful tool for the evaluation of the quality of synthesized speech. However, it has not yet been extensively tested for its applica- tion in this specific area of evaluation. In a series of experiments based on data from the Blizzard Challenge 2008 the relations between Weighted Euclidean Distance Scaling and Simple Euclidean Distance Scaling is investigated to understand how aggregating data affects the MDS configuration. These results are compared to those collected as mean opinion scores (MOS). The ranks correspond, and MOS can be predicted from an object's space in the MDS generated stimulus space. The big advantage of MDS over MOS is its diagnostic value; dimensions along which stimuli vary are not correlated, as is the case in modular evaluation using MOS. Finally, it will be attempted to generalize from the MDS representations of the thoroughly tested subset to the aggregated data of the larger-scale Blizzard Challenge.</description>
    <dc:date>2010-01-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/1842/4662">
    <title>Transforming Voice Source Parameters in a HMM-based Speech Synthesiser with Glottal Post-Filtering</title>
    <link>http://hdl.handle.net/1842/4662</link>
    <description>Title: Transforming Voice Source Parameters in a HMM-based Speech Synthesiser with Glottal Post-Filtering
Authors: Cabral, Joao P; Renals, Steve; Richmond, Korin; Yamagishi, Junichi
Abstract: Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice re- lated to speaker's identity and to improve expressiveness. How- ever, it is hard to modify voice characteristics of the synthetic speech, without degrading speech quality. State-of-the-art sta- tistical speech synthesisers, in particular, do not typically al- low control over parameters of the glottal source, which are strongly correlated with voice quality. Consequently, the con- trol of voice characteristics in these systems is limited. In con- trast, the HMM-based speech synthesiser proposed in this paper uses an acoustic glottal source model. The system passes the glottal signal through a whitening filter to obtain the excitation of voiced sounds. This technique, called glottal post-filtering, allows to transform voice characteristics of the synthetic speech by modifying the source model parameters. We evaluated the proposed synthesiser in a perceptual ex- periment, in terms of speech naturalness, intelligibility, and similarity to the original speaker's voice. The results show that it performed as well as a HMM-based synthesiser, which generates the speech signal with a commonly used high-quality speech vocoder.</description>
    <dc:date>2010-01-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/1842/4661">
    <title>Augmentation of adaptation data</title>
    <link>http://hdl.handle.net/1842/4661</link>
    <description>Title: Augmentation of adaptation data
Authors: Vipperla, Ravi Chander; Renals, Steve; Frankel, Joe
Abstract: Linear regression based speaker adaptation approaches can improve Automatic Speech Recognition (ASR) accuracy significantly for a target speaker. However, when the available adaptation data is limited to a few seconds, the accuracy of the speaker adapted models is often worse compared with speaker independent models. In this paper, we propose an approach to select a set of reference speakers acoustically close to the target speaker whose data can be used to augment the adaptation data. To determine the acoustic similarity of two speakers, we propose a distance metric based on transforming sample points in the acoustic space with the regression matrices of the two speakers. We show the validity of this approach through a speaker identification task. ASR results on SCOTUS and AMI corpora with limited adaptation data of 10 to 15 seconds augmented by data from selected reference speakers show a significant improvement in Word Error Rate over speaker independent and speaker adapted models.</description>
    <dc:date>2010-01-01T00:00:00Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/1842/4660">
    <title>Evaluating speech synthesis intelligibility using Amazon Mechanical Turk</title>
    <link>http://hdl.handle.net/1842/4660</link>
    <description>Title: Evaluating speech synthesis intelligibility using Amazon Mechanical Turk
Authors: Wolters, Maria K.; Isaac, Karl B.; Renals, Steve
Abstract: Microtask platforms such as Amazon Mechanical Turk (AMT) are increasingly used to create speech and language resources. AMT in particular allows researchers to quickly recruit a large number of fairly demographically diverse participants. In this study, we investigated whether AMT can be used for comparing the intelligibility of speech synthesis systems. We conducted two experiments in the lab and via AMT, one comparing US English diphone to US English speaker-adaptive HTS synthesis and one comparing UK English unit selection to UK English speaker-dependent HTS synthesis. While AMT word error rates were worse than lab error rates, AMT results were more sensitive to relative differences between systems. This is mainly due to the larger number of listeners. Boxplots and multilevel modelling allowed us to identify listeners who performed particularly badly, while thresholding was sufficient to eliminate rogue workers. We conclude that AMT is a viable platform for synthetic speech intelligibility comparisons.</description>
    <dc:date>2010-01-01T00:00:00Z</dc:date>
  </item>
</rdf:RDF>

