<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>ERA Collection:</title>
    <link>http://hdl.handle.net/1842/3763</link>
    <description />
    <pubDate>Wed, 19 Jun 2013 23:15:02 GMT</pubDate>
    <dc:date>2013-06-19T23:15:02Z</dc:date>
    <item>
      <title>Unsupervised adaptation for HMM-based speech synthesis</title>
      <link>http://hdl.handle.net/1842/3841</link>
      <description>Title: Unsupervised adaptation for HMM-based speech synthesis
Authors: King, Simon; Tokuda, Keiichi; Zen, Heiga; Yamagishi, Junichi
Abstract: It is now possible to synthesise speech using HMMs with a comparable quality to unit-selection techniques. Generating speech from a model has many potential advantages over concatenating waveforms. The most exciting is model adaptation. It has been shown that supervised speaker adaptation can yield high- quality synthetic voices with an order of magnitude less data than required to train a speaker-dependent model or to build a basic unit-selection system. Such supervised methods require labelled adaptation data for the target speaker. In this paper, we introduce a method capable of unsupervised adaptation, using only speech from the target speaker without any labelling.</description>
      <pubDate>Mon, 01 Sep 2008 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/1842/3841</guid>
      <dc:date>2008-09-01T00:00:00Z</dc:date>
    </item>
    <item>
      <title>A Shrinkage Estimator for Speech Recognition with Full Covariance HMMs</title>
      <link>http://hdl.handle.net/1842/3839</link>
      <description>Title: A Shrinkage Estimator for Speech Recognition with Full Covariance HMMs
Authors: Bell, Peter; King, Simon
Abstract: We consider the problem of parameter estimation in full-covariance Gaussian mixture systems for automatic speech recognition. Due to the high dimensionality of the acoustic feature vector, the standard sample covariance matrix has a high variance and is often poorly-conditioned when the amount of training data is limited. We explain how the use of a shrinkage estimator can solve these problems, and derive a formula for the optimal shrinkage intensity. We present results of experiments on a phone recognition task, showing that the estimator gives a performance improvement over a standard full-covariance system</description>
      <pubDate>Tue, 01 Jan 2008 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/1842/3839</guid>
      <dc:date>2008-01-01T00:00:00Z</dc:date>
    </item>
    <item>
      <title>Cross-lingual Portability of MLP-Based Tandem Features -- A Case Study for English and Hungarian</title>
      <link>http://hdl.handle.net/1842/3838</link>
      <description>Title: Cross-lingual Portability of MLP-Based Tandem Features -- A Case Study for English and Hungarian
Authors: Toth, Laszlo; Frankel, Joe; Gosztolya, Gabor; King, Simon
Abstract: One promising approach for building ASR systems for less-resourced languages is cross-lingual adaptation. Tandem ASR is particularly well suited to such adaptation, as it includes two cascaded modelling steps: feature extraction using multi-layer perceptrons (MLPs), followed by modelling using a standard HMM. The language-specific tuning can be performed by adjusting the HMM only, leaving the MLP untouched. Here we examine the portability of feature extractor MLPs between an Indo-European (English) and a Finno-Ugric (Hungarian) language. We present experiments which use both conventional phone-posterior and articulatory feature (AF) detector MLPs, both trained on a much larger quantity of (English) data than the monolingual (Hungarian) system. We find that the cross-lingual configurations achieve similar performance to the monolingual system, and that, interestingly, the AF detectors lead to slightly worse performance, despite the expectation that they should be more language-independent than phone-based MLPs. However, the cross-lingual system outperforms all other configurations when the English phone MLP is adapted on the Hungarian data.</description>
      <pubDate>Tue, 01 Jan 2008 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/1842/3838</guid>
      <dc:date>2008-01-01T00:00:00Z</dc:date>
    </item>
    <item>
      <title>A comparison of phone and grapheme-based spoken term detection</title>
      <link>http://hdl.handle.net/1842/3837</link>
      <description>Title: A comparison of phone and grapheme-based spoken term detection
Authors: Wang, Dong; Frankel, Joe; Tejedor, Javier; King, Simon
Abstract: We propose grapheme-based sub-word units for spoken term detection (STD). Compared to phones, graphemes have a number of potential advantages. For out-of-vocabulary search terms, phone- based approaches must generate a pronunciation using letter-to-sound rules. Using graphemes obviates this potentially error-prone hard decision, shifting pronunciation modelling into the statistical models describing the observation space. In addition, long-span grapheme language models can be trained directly from large text corpora. We present experiments on Spanish and English data, comparing phone and grapheme-based STD. For Spanish, where phone and grapheme-based systems give similar transcription word error rates (WERs), grapheme-based STD significantly outperforms a phone- based approach. The converse is found for English, where the phone-based system outperforms a grapheme approach. However, we present additional analysis which suggests that phone-based STD performance levels may be achieved by a grapheme-based approach despite lower transcription accuracy, and that the two approaches may usefully be combined. We propose a number of directions for future development of these ideas, and suggest that if grapheme-based STD can match phone-based performance, the inherent flexibility in dealing with out-of-vocabulary terms makes this a desirable approach.</description>
      <pubDate>Tue, 01 Jan 2008 00:00:00 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/1842/3837</guid>
      <dc:date>2008-01-01T00:00:00Z</dc:date>
    </item>
  </channel>
</rss>

