Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/4553

This item has been viewed 5 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
huang-icassp10.pdf251.99 kBAdobe PDFView/Open
Title: Power Law Discounting for N-Gram Language Models
Authors: Huang, Songfang
Renals, Steve
Issue Date: 2010
Journal Title: Proc. IEEE ICASSP--10
Abstract: We present an approximation to the Bayesian hierarchical Pitman-Yor process language model which maintains the power law distribution over word tokens, while not requiring a computationally expensive approximate inference process. This approximation, which we term power law discounting, has a similar computational complexity to interpolated and modified Kneser-Ney smoothing. We performed experiments on meeting transcription using the NIST RT06s evaluation data and the AMI corpus, with a vocabulary of 50,000 words and a language model training set of up to 211 million words. Our results indicate that power law discounting results in statistically significant reductions in perplexity and word error rate compared to both interpolated and modified Kneser-Ney smoothing, while producing similar results to the hierarchical Pitman-Yor process language model.
URI: http://dx.doi.org/10.1109/ICASSP.2010.5495007
http://hdl.handle.net/1842/4553
Appears in Collections:CSTR publications

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2013, and/or the original authors. Privacy and Cookies Policy