A fast method of speaker normalisation using formant estimation.
MetadataShow full item record
It has recently been shown that normalisation of vocal tract length can significantly increase recognition accuracy in speaker independent automatic speech recognition systems. An inherent difficulty with this technique is in automatically estimating the normalisation parameter from a new speaker's speech and previous techniques have typically relied on an exhaustive search to estimate this parameter. In this paper, we present a method of normalising utterances by a linear warping of the mel filter bank channels in which in which the normalisation parameter is estimated by fitting formant estimates to a probabilistic model. This method is fast, computitionally inexpensive and requires only a limited amount of data for estimation. It generates normalisations which are close to those which would be found by an exhaustive search. The normalisation is applied to a phoneme recognition task using the TIMIT database and results show a useful improvement over an un-normalised speaker independent system.