CSTR is an interdisciplinary research centre linking Informatics and Linguistics and English Language.

Founded in 1984, CSTR is concerned with research in all areas of speech technology including speech recognition, speech synthesis, speech signal processing, information access, multimodal interfaces and dialogue systems. We have many collaborations with the wider community of researchers in speech science, language, cognition and machine learning for which Edinburgh is renowned.

Recent Submissions

  • Further exploration of the possibilities and pitfalls of multidimensional scaling as a tool for the evaluation of the quality of synthesized speech 

    Janska, Anna C.; Clark, Robert A J (2010)
    Multidimensional scaling (MDS) has been suggested as a useful tool for the evaluation of the quality of synthesized speech. However, it has not yet been extensively tested for its applica- tion in this specific area of ...
  • Transforming Voice Source Parameters in a HMM-based Speech Synthesiser with Glottal Post-Filtering 

    Cabral, Joao P; Renals, Steve; Richmond, Korin; Yamagishi, Junichi (2010)
    Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice re- lated to speaker's identity and to improve ...
  • Augmentation of adaptation data 

    Vipperla, Ravi Chander; Renals, Steve; Frankel, Joe (2010)
    Linear regression based speaker adaptation approaches can improve Automatic Speech Recognition (ASR) accuracy significantly for a target speaker. However, when the available adaptation data is limited to a few seconds, the ...
  • Evaluating speech synthesis intelligibility using Amazon Mechanical Turk 

    Wolters, Maria K.; Isaac, Karl B.; Renals, Steve (2010)
    Microtask platforms such as Amazon Mechanical Turk (AMT) are increasingly used to create speech and language resources. AMT in particular allows researchers to quickly recruit a large number of fairly demographically diverse ...
  • Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech 

    De Leon, P.L.; Pucher, M.; Yamagishi, Junichi (2010)
    In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both SV and speech synthesis ...
  • Ageing voices: The effect of changes in voice parameters on ASR performance 

    Vipperla, Ravi Chander; Renals, Steve; Frankel, Joe (2010)
    With ageing, human voices undergo several changes which are typically characterized by increased hoarseness and changes in articulation patterns. In this study, we have examined the effect on Automatic Speech Recognition ...
  • Unsupervised Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis 

    Oura, Keiichiro; Tokuda, Keiichi; Yamagishi, Junichi; Wester, Mirjam; King, Simon (2010)
    In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user's spoken input in one language is used to produce spoken output in another language, while ...
  • Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection 

    Andersson, Sebastian; Georgila, Kallirroi; Traum, David; Aylett, Matthew; Clark, Robert A J (2010)
    Unit selection speech synthesis has reached high levels of naturalness and intelligibility for neutral read aloud speech. However, synthetic speech generated using neutral read aloud data lacks all the attitude, intention ...
  • A Digital Microphone Array for Distant Speech Recognition 

    Zwyssig, Erich; Lincoln, Mike; Renals, Steve (2010)
    In this paper, the design, implementation and testing of a digital microphone array is presented. The array uses digital MEMS microphones which integrate the microphone, amplifier and analogue to digital converter on a ...
  • Designing Usable and Acceptable Reminders for the Home 

    Wolters, Maria; McGee-Lennon, Marilyn (2010)
    Electronic reminders can play a key role in enabling people to manage their care and remain independent in their own homes for longer. The MultiMemoHome project aims to develop reminder designs that are accessible and ...
  • Native and Non-Native Speaker Judgements on the Quality of Synthesized Speech 

    Janska, Anna C.; Clark, Robert A J (2010)
    The difference between native speakers' and non-native speak- ers' naturalness judgements of synthetic speech is investigated. Similar/difference judgements are analysed via a multidimen- sional scaling analysis and compared ...
  • A classifier-based target cost for unit selection speech synthesis trained on perceptual data 

    Strom, Volker; King, Simon (2010)
    Our goal is to automatically learn a PERCEPTUALLY-optimal target cost function for a unit selection speech synthesiser. The approach we take here is to train a classifier on human perceptual judgements of synthetic speech. ...
  • Recognition and Understanding of Meetings 

    Renals, Steve (2010)
    This paper is about interpreting human communication in meetings using audio, video and other signals. Automatic meeting recognition and understanding is extremely challenging, since communication in a meeting is spontaneous ...
  • Learning Dialogue Strategies from Older and Younger Simulated Users 

    Georgila, Kallirroi; Wolters, Maria; Moore, Johanna D. (2010)
    Older adults are a challenging user group because their behaviour can be highly variable. To the best of our knowledge, this is the first study where dialogue strategies are learned and evaluated with both simulated younger ...
  • The role of higher-level linguistic features in HMM-based speech synthesis 

    Watts, Oliver; Yamagishi, Junichi; King, Simon (2010)
    We analyse the contribution of higher-level elements of the linguistic specification of a data-driven speech synthesiser to the naturalness of the synthetic speech which it generates. The system is trained using various ...
  • Personalising speech-to-speech translation in the EMIME project 

    Kurimo, Mikko; Byrne, William; Dines, John; Garner, Philip N.; Gibson, Matthew; Guan, Yong; Hirsimaki, Teemu; Karhila, Reima; King, Simon; Liang, Hui; Oura, Keiichiro; Saheer, Lakshmi; Shannon, Matt; Shiota, Sayaka; Tian, Jilei; Tokuda, Keiichi; Wester, Mirjam; Wu, Yi-Jian; Yamagishi, Junichi (2010)
    In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt ...
  • HMM-based Text-to-Articulatory-Movement Prediction and Analysis of Critical Articulators 

    Ling, Zhen-Hua; Richmond, Korin; Yamagishi, Junichi (2010)
    In this paper we present a method to predict the movement of a speaker's mouth from text input using hidden Markov models (HMM). We have used a corpus of human articulatory movements, recorded by electromagnetic articulography ...
  • Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis 

    Yamagishi, Junichi; Watts, Oliver; King, Simon; Usabaev, Bela (2010)
    In speaker-adaptive HMM-based speech synthesis, there are typically a few speakers for which the output synthetic speech sounds worse than that of other speakers, despite having the same amount of adaptation data from ...
  • Comparison of HMM and TMD Methods for Lip Synchronisation 

    Hofer, Gregor; Richmond, Korin (2010)
    This paper presents a comparison between a hidden Markov model (HMM) based method and a novel artificial neural network (ANN) based method for lip synchronisation. Both model types were trained on motion tracking data, and ...
  • Power Law Discounting for N-Gram Language Models 

    Huang, Songfang; Renals, Steve (2010)
    We present an approximation to the Bayesian hierarchical Pitman-Yor process language model which maintains the power law distribution over word tokens, while not requiring a computationally expensive approximate inference ...

View more