Recent Submissions

  • Unsupervised adaptation for HMM-based speech synthesis 

    King, Simon; Tokuda, Keiichi; Zen, Heiga; Yamagishi, Junichi (ISCA, 2008-09)
    It is now possible to synthesise speech using HMMs with a comparable quality to unit-selection techniques. Generating speech from a model has many potential advantages over concatenating waveforms. The most exciting is ...
  • A Shrinkage Estimator for Speech Recognition with Full Covariance HMMs 

    Bell, Peter; King, Simon (2008)
    We consider the problem of parameter estimation in full-covariance Gaussian mixture systems for automatic speech recognition. Due to the high dimensionality of the acoustic feature vector, the standard sample covariance ...
  • Cross-lingual Portability of MLP-Based Tandem Features -- A Case Study for English and Hungarian 

    Toth, Laszlo; Frankel, Joe; Gosztolya, Gabor; King, Simon (2008)
    One promising approach for building ASR systems for less-resourced languages is cross-lingual adaptation. Tandem ASR is particularly well suited to such adaptation, as it includes two cascaded modelling steps: feature ...
  • A comparison of phone and grapheme-based spoken term detection 

    Wang, Dong; Frankel, Joe; Tejedor, Javier; King, Simon (2008)
    We propose grapheme-based sub-word units for spoken term detection (STD). Compared to phones, graphemes have a number of potential advantages. For out-of-vocabulary search terms, phone- based approaches must generate a ...
  • A comparison of grapheme and phoneme-based units for Spanish spoken term detection 

    Tejedor, Javier; Wang, Dong; Frankel, Joe; King, Simon; Colás, José (Elsevier, 2008)
    The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search ...
  • The Blizzard Challenge 2008 

    King, Simon; Clark, Robert A J; Mayo, Catherine; Karaiskos, Vasilis (2008)
    The Blizzard Challenge 2008 was the fourth annual Blizzard Challenge. This year, participants were asked to build two voices from a UK English corpus and one voice from a Man- darin Chinese corpus. This is the first time ...
  • Single Speaker Segmentation and Inventory Selection Using Dynamic Time Warping Self Organization and Joint Multigram Mapping 

    Aylett, Matthew; King, Simon (2008)
    In speech synthesis the inventory of units is decided by inspection and on the basis of phonological and phonetic expertise. The ephone (or emergent phone) project at CSTR is investigating how self organisation techniques ...
  • Covariance Updates for Discriminative Training by Constrained Line Search 

    Bell, Peter; King, Simon (2008)
    We investigate the recent Constrained Line Search algorithm for discriminative training of HMMs and propose an alternative formula for variance update. We compare the method to standard techniques on a phone recognition task.
  • Robustness of HMM-based Speech Synthesis 

    Yamagishi, Junichi; Ling, Zhenhua; King, Simon (2008)
    As speech synthesis techniques become more advanced, we are able to consider building high-quality voices from data collected outside the usual highly-controlled recording studio environment. This presents new challenges ...
  • HMM-based synthesis of child speech 

    Watts, Oliver; Yamagishi, Junichi; Berkling, Kay; King, Simon (2008)
    The synthesis of child speech presents challenges both in the collection of data and in the building of a synthesiser from that data. Because only limited data can be collected, and the domain of that data is constrained, ...
  • Thousands of Voices for HMM-Based Speech Synthesis-Analysis and Application of TTS Systems Built on Various ASR Corpora 

    Yamagishi, Junichi; Usabaev, Bela; King, Simon; Watts, Oliver; Dines, John; Tian, Jilei; Guan, Yong; Hu, Rile; Oura, Keiichiro; Wu, Yi-Jian; Tokuda, Keiichi; Karhila, Reima; Kurimo, Mikko (IEEE, 2010-05)
    In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a ...
  • Improved Average-Voice-based Speech Synthesis Using Gender-Mixed Modeling and a Parameter Generation Algorithm Considering GV 

    Yamagishi, Junichi; Kobayashi, Takao; Renals, Steve; King, Simon; Zen, Heiga; Toda, Tomoki; Tokuda, Keiichi (2007-08)
    For constructing a speech synthesis system which can achieve diverse voices, we have been developing a speaker independent approach of HMM-based speech synthesis in which statistical average voice models are adapted to ...
  • Factoring Gaussian Precision Matrices for Linear Dynamic Models 

    Frankel, Joe; King, Simon (2007)
    The linear dynamic model (LDM), also known as the Kalman filter model, has been the subject of research in the engineering, control, and more recently, machine learning and speech technology communities. The Gaussian noise ...
  • Articulatory feature classifiers trained on 2000 hours of telephone speech 

    Frankel, Joe; Magimai-Doss, Mathew; King, Simon; Livescu, Karen; Çetin, Ozgur (2007)
    The so-called tandem approach, where the posteriors of a multilayer perceptron (MLP) classifier are used as features in an automatic speech recognition (ASR) system has proven to be a very effective method. Most tandem ...
  • Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop. 

    Livescu, Karen; Çetin, Ozgur; Hasegawa-Johnson, Mark; King, Simon; Bartels, Chris; Borges, Nash; Kantor, Arthur; Lal, Partha; Yung, Lisa; Bezman, Ari; Dawson-Haggerty, Stephen; Woods, Bronwyn; Frankel, Joe; Magimai-Doss, Mathew; Saenko, Kate (2007)
    We report on investigations, conducted at the 2006 Johns HopkinsWorkshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we ...
  • Speech production knowledge in automatic speech recognition 

    King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam (2007)
    Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds and numerous models, speech production knowledge is ...
  • Sparse gaussian graphical models for speech recognition. 

    Bell, Peter; King, Simon (2007)
    We address the problem of learning the structure of Gaussian graphical models for use in automatic speech recognition, a means of controlling the form of the inverse covariance matrices of such systems. With particular ...
  • Modelling prominence and emphasis improves unit-selection synthesis 

    Strom, Volker; Nenkova, Ani; Clark, Robert A J; Vazquez-Alvarez, Yolanda; Brenier, Jason; King, Simon; Jurafsky, Daniel (2007)
    We describe the results of large scale perception experiments showing improvements in synthesising two distinct kinds of prominence: standard pitch-accent and strong emphatic accents. Previously prominence assignment has ...
  • Articulatory feature recognition using dynamic Bayesian networks. 

    Frankel, Joe; Wester, Mirjam; King, Simon (2007)
    We describe a dynamic Bayesian network for articulatory feature recognition. The model is intended to be a component of a speech recognizer that avoids the problems of conventional ``beads-on-a-string'' phoneme-based models. ...
  • Speech recognition using linear dynamic models. 

    Frankel, Joe; King, Simon (IEEE, 2007-01)
    The majority of automatic speech recognition (ASR) systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with subphone states. This approach, whilst successful, models ...

View more