Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR thesis and dissertation collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/3914

This item has been viewed 14 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
sh_interspeech09.pdf567.42 kBAdobe PDFView/Open
Title: A Parallel Training Algorithm for Hierarchical Pitman-Yor Process Language Models
Authors: Huang, Songfang
Renals, Steve
Issue Date: 2009
Journal Title: Proc. Interspeech'09
Abstract: The Hierarchical Pitman Yor Process Language Model (HPYLM) is a Bayesian language model based on a non-parametric prior, the Pitman-Yor Process. It has been demonstrated, both theoretically and practically, that the HPYLM can provide better smoothing for language modeling, compared with state-of-the-art approaches such as interpolated Kneser-Ney and modified Kneser-Ney smoothing. However, estimation of Bayesian language models is expensive in terms of both computation time and memory; the inference is approximate and requires a number of iterations to converge. In this paper, we present a parallel training algorithm for the HPYLM, which enables the approach to be applied in the context of automatic speech recognition, using large training corpora with large vocabularies. We demonstrate the effectiveness of the proposed algorithm by estimating language models from corpora for meeting transcription containing over 200 million words, and observe significant reductions in perplexity and word error rate.
URI: http://hdl.handle.net/1842/3914
Appears in Collections:CSTR thesis and dissertation collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback