Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >

Please use this identifier to cite or link to this item:

This item has been viewed 23 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
eurosp97-lsa.pdf142.93 kBAdobe PDFView/Open kBGzipped PostscriptView/Open
Title: Document space models using latent semantic analysis.
Authors: Gotoh, Yoshihiko
Renals, Steve
Issue Date: 1997
Citation: In Proc. Eurospeech, pages 1443-1446, Rhodes, 1997.
Publisher: International Speech Communication Association
Abstract: In this paper, an approach for constructing mixture language models (LMs) based on some notion of semantics is discussed. To this end, a technique known as latent semantic analysis (LSA) is used. The approach encapsulates corpus-derived semantic information and is able to model the varying style of the text. Using such information, the corpus texts are clustered in an unsupervised manner and mixture LMs are automatically created. This work builds on previous work in the field of information retrieval which was recently applied by Bellegarda et. al. to the problem of clustering words by semantic categories. The principal contribution of this work is to characterize the document space resulting from the LSA modeling and to demonstrate the approach for mixture LM application. Comparison is made between manual and automatic clustering in order to elucidate how the semantic information is expressed in the space. It is shown that, using semantic information, mixture LMs performs better than a conventional single LM with slight increase of computational cost.
Appears in Collections:CSTR publications

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2013, and/or the original authors. Privacy and Cookies Policy