Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >

Please use this identifier to cite or link to this item:

This item has been viewed 4 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
rs00-preprint Gotoh.pdf250.66 kBAdobe PDFView/Open
Title: Information extraction from broadcast news
Authors: Gotoh, Yoshihiko
Renals, Steve
Issue Date: 15-Apr-2000
Citation: Philosophical Transactionsof the Royal Society of London: Series A, Vol. 358, No. 1769 (Apr. 15, 2000), pp. 1295-1310.
Publisher: The Royal Society
Abstract: This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular, we concentrate on statistical finite-state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first represents name class information as a word attribute; the second represents both word-word and class-class transitions explicitly. A common n-gram-based formulation is used for both models. The task of named-entity identification is characterized by relatively sparse training data, and issues related to smoothing are discussed. Experiments are reported using the DARPA/NIST Hub-4E evaluation for North American broadcast news.
Keywords: named entity
information extraction
language modelling
ISSN: 1364-503X
Appears in Collections:CSTR publications

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2013, and/or the original authors. Privacy and Cookies Policy