CSTR is an interdisciplinary research centre linking Informatics and Linguistics and English Language.

Founded in 1984, CSTR is concerned with research in all areas of speech technology including speech recognition, speech synthesis, speech signal processing, information access, multimodal interfaces and dialogue systems. We have many collaborations with the wider community of researchers in speech science, language, cognition and machine learning for which Edinburgh is renowned.

Recent Submissions

  • Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners 

    Pucher, Michael; Schabus, Dietmar; Yamagishi, Junichi (2010)
    In this paper we evaluate a method for generating synthetic speech at high speaking rates based on the interpolation of hidden semi-Markov models (HSMMs) trained on speech data recorded at normal and fast speaking rates. ...
  • Out-of-vocabulary spoken term detection 

    Wang, Dong (The University of Edinburgh, 2010)
    Spoken term detection (STD) is a fundamental task for multimedia information retrieval. A major challenge faced by an STD system is the serious performance reduction when detecting out-of-vocabulary (OOV) terms. The ...
  • Speaker normalisation for large vocabulary multiparty conversational speech recognition 

    Garau, Giulia (The University of Edinburgh, 2009)
    One of the main problems faced by automatic speech recognition is the variability of the testing conditions. This is due both to the acoustic conditions (different transmission channels, recording devices, noises etc.) ...
  • Identification of Contrast and Its Emphatic Realization in HMM-based Speech Synthesis 

    Badino, Leonardo; Andersson, J. Sebastian; Yamagishi, Junichi; Clark, Robert A J (2009)
    The work presented in this paper proposes to identify contrast in the form of contrastive word pairs and prosodically signal it with emphatic accents in a Text-to-Speech (TTS) application using a Hidden-Markov-Model (HMM) ...
  • A Tangible Interface for the AMI Content Linking Device -- The Automated Meeting Assistant 

    Ehnes, Jochen (2009)
    In this Paper we describe our approach to support ongoing meetings with an automated meeting assistant. The system based on the AMIDA Content Linking Device aims at providing relevant documents used in previous meetings ...
  • Reducing Working Memory Load in Spoken Dialogue Systems 

    Wolters, Maria; Georgila, Kallirroi; Logie, Robert; MacPherson, Sarah; Johanna, Moore; Watson, Matt (Elsevier, 2009)
    We evaluated two strategies for alleviating working memory load for users of voice interfaces: presenting fewer options per turn and providing confirmations. Forty-eight users booked appointments using nine different ...
  • A Posterior Probability-based System Hybridisation and Combination for Spoken Term Detection 

    Tejedor, Javier; Wang, Dong; King, Simon; Frankel, Joe; Colas, Jose (2009)
    Spoken term detection (STD) is a fundamental task for multimedia information retrieval. To improve the detection performance, we have presented a direct posterior-based confidence measure generated from a neural network. ...
  • Participant Subjectivity and Involvement as a Basis for Discourse Segmentation 

    Niekrasz, John; Moore, Johanna (2009)
    We propose a framework for analyzing episodic conversational activities in terms of expressed relationships between the participants and utterance content. We test the hypothesis that linguistic features which express such ...
  • A Parallel Training Algorithm for Hierarchical Pitman-Yor Process Language Models 

    Huang, Songfang; Renals, Steve (2009)
    The Hierarchical Pitman Yor Process Language Model (HPYLM) is a Bayesian language model based on a non-parametric prior, the Pitman-Yor Process. It has been demonstrated, both theoretically and practically, that the HPYLM ...
  • Speech-driven animation using multi-modal hidden Markov models 

    Hofer, Gregor Otto (The University of Edinburgh, 2010)
    The main objective of this thesis was the synthesis of speech synchronised motion, in particular head motion. The hypothesis that head motion can be estimated from the speech signal was confirmed. In order to achieve ...
  • Join Cost for Unit Selection Speech Synthesis 

    Vepa, Jithendra (The University of Edinburgh. College of Science and Engineering. School of Informatics, 2004-07)
    Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high quality synthetic speech. this is due to a large speech database containing many instances of each speech unit, with a varied ...
  • Bayesian regularisation methods in a hybrid MLP-HMM system. 

    Renals, Steve; MacKay, David (International Speech Communication Association, 1993)
    We have applied Bayesian regularisation methods to multi-layer percepuon (MLP) training in the context of a hybrid MLP-HMM (hidden Markov model) continuous speech recognition system. The Bayesian framework adopted here ...
  • Speaker-Adaptation for Hybrid HMM-ANN Continuous Speech Recognition System 

    Neto, Joao; Almeida, Luis; Hochberg, Mike; Martins, Ciro; Nunes, Luis; Renals, Steve; Robinson, Tony (International Speech Communication Association, 1995)
    It is well known that recognition performance degrades significantly when moving from a speaker-dependent to a speaker-independent system. Traditional hidden Markov model (HMM) systems have successfully applied speaker-adaptation ...
  • Characterization of Speakers for Improved Automatic Speech Recognition 

    Lincoln, Michael (School of Information Systems. University of East Anglia, 1999-06)
    Automatic speech recognition technology is becoming increasingly widespread in many applications. For dictation tasks, where a single talker is to use the system for long periods of time, the high recognition accuracies ...
  • Linear dynamic models for automatic speech recognition 

    Frankel, Joe (The University of Edinburgh. College of Science and Engineering. School of Informatics, 2004-06)
    The majority of automatic speech recognition (ASR) systems rely on hidden Markov models (HMM), in which the output distribution associated with each state is modelled by a mixture of diagonal covariance Gaussians. Dynamic ...
  • Precise Estimation of Vocal Tract and Voice Source Characteristics 

    Shiga, Yoshinori (The University of Edinburgh. College of Science and Engineering. School of Informatics, 2006-04)
    This thesis addresses the problem of quality degradation in speech produced by parameter-based speech synthesis, within the framework of an articulatory-acoustic forward mapping. I first investigate current problems in ...

View more