Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/2130

This item has been viewed 9 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Cetin_etal_ASRU2007.pdf54.98 kBAdobe PDFView/Open
Title: Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs
Authors: Çetin, Ozgur
Magimai-Doss, Mathew
Kantor, Arthur
King, Simon
Bartels, Chris
Frankel, Joe
Issue Date: Dec-2007
Citation: Ö. Çetin, M. Magimai-Doss, A. Kantor, S. King, C. Bartels, J. Frankel, and K. Livescu. Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs. In Proc. ASRU, Kyoto, December 2007. IEEE.
Publisher: IEEE
Abstract: In recent years, the features derived from posteriors of a multilayer perceptron (MLP), known as tandem features, have proven to be very effective for automatic speech recognition. Most tandem features to date have relied on MLPs trained for phone classification. We recently showed on a relatively small data set that MLPs trained for articulatory feature classification can be equally effective. In this paper, we provide a similar comparison using MLPs trained on a much larger data set - 2000 hours of English conversational telephone speech. We also explore how portable phone- and articulatory feature- based tandem features are in an entirely different language - Mandarin - without any retraining. We find that while phone-based features perform slightly better in the matched-language condition, they perform significantly better in the cross-language condition. Yet, in the cross-language condition, neither approach is as effective as the tandem features extracted from an MLP trained on a relatively small amount of in-domain data. Beyond feature concatenation, we also explore novel observation modelling schemes that allow for greater flexibility in combining the tandem and standard features at hidden Markov model (HMM) outputs.
Keywords: speech technology
Speech recognition
URI: http://hdl.handle.net/1842/2130
Appears in Collections:CSTR publications

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback