|
Edinburgh Research Archive >
Philosophy, Psychology and Language Sciences, School of >
Linguistics and English Language >
Linguistics and English Language PhD thesis collection >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/4912
|
Files in This Item:
| File |
Description |
Size | Format |
Bell 2010 source.zip | File not available for download | 27.28 MB | Zip file | | | Bell 2010.pdf | PhD thesis | 2.26 MB | Adobe PDF | View/Open |
|
| Title: | Full Covariance Modelling for Speech Recognition |
| Authors: | Bell, Peter |
| Supervisor(s): | King, Simon |
| Issue Date: | 2010 |
| Publisher: | The University of Edinburgh |
| Abstract: | HMM-based systems for Automatic Speech Recognition typically model
the acoustic features using mixtures of multivariate Gaussians. In this
thesis, we consider the problem of learning a suitable covariance matrix
for each Gaussian. A variety of schemes have been proposed for
controlling the number of covariance parameters per Gaussian, and
studies have shown that in general, the greater the number of parameters
used in the models, the better the recognition performance. We
therefore investigate systems with full covariance Gaussians. However,
in this case, the obvious choice of parameters – given by the sample
covariance matrix – leads to matrices that are poorly-conditioned, and
do not generalise well to unseen test data. The problem is particularly
acute when the amount of training data is limited.
We propose two solutions to this problem: firstly, we impose the requirement
that each matrix should take the form of a Gaussian graphical
model, and introduce a method for learning the parameters and
the model structure simultaneously. Secondly, we explain how an
alternative estimator, the shrinkage estimator, is preferable to the
standard maximum likelihood estimator, and derive formulae for the
optimal shrinkage intensity within the context of a Gaussian mixture
model. We show how this relates to the use of a diagonal covariance
smoothing prior.
We compare the effectiveness of these techniques to standard methods
on a phone recognition task where the quantity of training data is
artificially constrained. We then investigate the performance of the
shrinkage estimator on a large-vocabulary conversational telephone
speech recognition task. Discriminative training techniques can be used to compensate for the
invalidity of the model correctness assumption underpinning maximum
likelihood estimation. On the large-vocabulary task, we use discriminative
training of the full covariance models and diagonal priors
to yield improved recognition performance. |
| Keywords: | Speech technology Automatic Speech Recognition |
| URI: | http://hdl.handle.net/1842/4912 |
| Appears in Collections: | Linguistics and English Language PhD thesis collection
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|