Investigating Non-Uniqueness in the Acoustic-Articulatory Inversion Mapping
Item statusRestricted Access
MetadataShow full item record
The task of inferring articulatory configurations from a given acoustic signal is a problem for which a reliable and accurate solution has been lacking for a number of decades. The changing shape of the vocal-tract is responsible for altering the parameters of sound. Each different configuration of articulators will regularly lead to a single distinct sound being produced (a unique mapping from the articulator to the acoustics). Therefore, it should be possible to take an acoustic signal and invert the process, giving the exact vocal-tract shape for a given sound. This would have wide-reaching applications in the field of speech and language technology, such as in improving facial animation and speech recognition systems. Using vocal-tract information inferred from the acoustic signal can facilitate a richer understanding of the actual constraints in articulator movement. However, research concerned with the inversion mapping has revealed that there is often a multi-valued mapping from the acoustic domain to the articulatory domain. Work in identifying and resolving this non-uniqueness thus far has been somewhat successful, with Mixture-Density Networks (MDN) and articulator trajectory systems presenting probabilistic methods of finding the most likely articulatory configuration for a given signal. Using an subset of an EMA corpus, along with a combination of an instantaneous inversion mapping and a non-parametric clustering algorithm, I aim to quantify the extent to which acoustically similar vectors to a given phone can exhibit qualitatively different vocal-tract shapes. Categorical identification of acoustically similar sounds that can have shown a multi-valued mapping in the articulatory domain, as well as identifying which articulators this occurs for, could be key to resolving issues in the reliability and quality of the inversion mapping.