|
Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/1291
|
| Title: | Speech recognition using linear dynamic models. |
| Authors: | Frankel, Joe King, Simon |
| Issue Date: | Jan-2007 |
| Citation: | J. Frankel and S. King. Speech recognition using linear dynamic models. IEEE Transactions on Speech and Audio Processing, 15(1):246-256, January 2007. |
| Publisher: | IEEE |
| Abstract: | The majority of automatic speech recognition (ASR)
systems rely on hidden Markov models, in which Gaussian
mixtures model the output distributions associated with subphone
states. This approach, whilst successful, models consecutive
feature vectors (augmented to include derivative information)
as statistically independent. Furthermore, spatial correlations
present in speech parameters are frequently ignored through
the use of diagonal covariance matrices. This paper continues
the work of Digalakis and others who proposed instead a firstorder
linear state-space model which has the capacity to model
underlying dynamics, and furthermore give a model of spatial
correlations. This paper examines the assumptions made in
applying such a model and shows that the addition of a hidden
dynamic state leads to increases in accuracy over otherwise
equivalent static models. We also propose a time-asynchronous
decoding strategy suited to recognition with segment models. We
describe implementation of decoding for linear dynamic models
and present TIMIT phone recognition results. |
| Keywords: | LDM ASR Stack decoding |
| URI: | http://hdl.handle.net/1842/1291 |
| Appears in Collections: | CSTR publications Linguistics and English Language publications
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|