|
Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR thesis and dissertation collection >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/3914
|
| Title: | A Parallel Training Algorithm for Hierarchical Pitman-Yor Process Language Models |
| Authors: | Huang, Songfang Renals, Steve |
| Issue Date: | 2009 |
| Journal Title: | Proc. Interspeech'09 |
| Abstract: | The Hierarchical Pitman Yor Process Language Model (HPYLM) is a Bayesian language model based on a non-parametric prior, the Pitman-Yor Process. It has been demonstrated, both theoretically and practically, that the HPYLM can provide better smoothing for language modeling, compared with state-of-the-art approaches such as interpolated Kneser-Ney and modified Kneser-Ney smoothing. However, estimation of Bayesian language models is expensive in terms of both computation time and memory; the inference is approximate and requires a number of iterations to converge. In this paper, we present a parallel training algorithm for the HPYLM, which enables the approach to be applied in the context of automatic speech recognition, using large training corpora with large vocabularies. We demonstrate the effectiveness of the proposed algorithm by estimating language models from corpora for meeting transcription containing over 200 million words, and observe significant reductions in perplexity and word error rate. |
| URI: | http://hdl.handle.net/1842/3914 |
| Appears in Collections: | CSTR thesis and dissertation collection
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|