|
Edinburgh Research Archive >
Centre for Speech Technology Research >
CSTR publications >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/4659
|
| Title: | Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech |
| Authors: | De Leon, P.L. Pucher, M. Yamagishi, Junichi |
| Issue Date: | 2010 |
| Journal Title: | Proc. Odyssey (The speaker and language recognition workshop) 2010 |
| Abstract: | In this paper, we evaluate the vulnerability of a speaker verification
(SV) system to synthetic speech. Although this problem
was first examined over a decade ago, dramatic improvements
in both SV and speech synthesis have renewed interest in
this problem. We use a HMM-based speech synthesizer, which
creates synthetic speech for a targeted speaker through adaptation
of a background model and a GMM-UBM-based SV system.
Using 283 speakers from the Wall-Street Journal (WSJ)
corpus, our SV system has a 0.4% EER. When the system
is tested with synthetic speech generated from speaker models
derived from the WSJ journal corpus, 90% of the matched
claims are accepted. This result suggests a possible vulnerability
in SV systems to synthetic speech. In order to detect
synthetic speech prior to recognition, we investigate the
use of an automatic speech recognizer (ASR), dynamic-timewarping
(DTW) distance of mel-frequency cepstral coefficients
(MFCC), and previously-proposed average inter-frame difference
of log-likelihood (IFDLL). Overall, while SV systems
have impressive accuracy, even with the proposed detector,
high-quality synthetic speech can lead to an unacceptably high
acceptance rate of synthetic speakers. |
| URI: | http://hdl.handle.net/1842/4659 |
| Appears in Collections: | CSTR publications
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|