Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Philosophy, Psychology and Language Sciences, School of >
Linguistics and English Language >
Linguistics and English Language PhD thesis collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/4912

This item has been viewed 29 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Bell 2010 source.zipFile not available for download27.28 MBZip file
Bell 2010.pdfPhD thesis2.26 MBAdobe PDFView/Open
Title: Full Covariance Modelling for Speech Recognition
Authors: Bell, Peter
Supervisor(s): King, Simon
Issue Date: 2010
Publisher: The University of Edinburgh
Abstract: HMM-based systems for Automatic Speech Recognition typically model the acoustic features using mixtures of multivariate Gaussians. In this thesis, we consider the problem of learning a suitable covariance matrix for each Gaussian. A variety of schemes have been proposed for controlling the number of covariance parameters per Gaussian, and studies have shown that in general, the greater the number of parameters used in the models, the better the recognition performance. We therefore investigate systems with full covariance Gaussians. However, in this case, the obvious choice of parameters – given by the sample covariance matrix – leads to matrices that are poorly-conditioned, and do not generalise well to unseen test data. The problem is particularly acute when the amount of training data is limited. We propose two solutions to this problem: firstly, we impose the requirement that each matrix should take the form of a Gaussian graphical model, and introduce a method for learning the parameters and the model structure simultaneously. Secondly, we explain how an alternative estimator, the shrinkage estimator, is preferable to the standard maximum likelihood estimator, and derive formulae for the optimal shrinkage intensity within the context of a Gaussian mixture model. We show how this relates to the use of a diagonal covariance smoothing prior. We compare the effectiveness of these techniques to standard methods on a phone recognition task where the quantity of training data is artificially constrained. We then investigate the performance of the shrinkage estimator on a large-vocabulary conversational telephone speech recognition task. Discriminative training techniques can be used to compensate for the invalidity of the model correctness assumption underpinning maximum likelihood estimation. On the large-vocabulary task, we use discriminative training of the full covariance models and diagonal priors to yield improved recognition performance.
Keywords: Speech technology
Automatic Speech Recognition
URI: http://hdl.handle.net/1842/4912
Appears in Collections:Linguistics and English Language PhD thesis collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback