Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >

Please use this identifier to cite or link to this item:

This item has been viewed 56 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
asr2000.pdf136.24 kBAdobe PDFView/Open
Title: Sentence Boundary Detection in Broadcast Speech Transcripts
Authors: Gotoh, Yoshihiko
Renals, Steve
Issue Date: Sep-2000
Citation: [ASR-2000] ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium, ISCA Tutorial and Research Workshop (ITRW), Paris, France, September 18-20, 2000. pp.228-235.
Publisher: International Speech Communication Association
Abstract: This paper presents an approach to identifying sentence boundaries in broadcast speech transcripts. We describe finite state models that extract sentence boundary information statistically from text and audio sources. An n-gram language model is constructed from a collection of British English news broadcasts and scripts. An alternative model is estimated from pause duration information in speech recogniser outputs aligned with their programme script counterparts. Experimental results show that the pause duration model alone outperforms the language modelling approach and that, by combining these two models, it can be improved further and precision and recall scores of over 70% were attained for the task.
Appears in Collections:CSTR publications

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2013, and/or the original authors. Privacy and Cookies Policy