Head motion synthesis: evaluation and a template motion approach
Braude, David Adam
MetadataShow full item record
The use of conversational agents has increased across the world. From providing automated support for companies to being virtual psychologists they have moved from an academic curiosity to an application with real world relevance. While many researchers have focused on the content of the dialogue and synthetic speech to give the agents a voice, more recently animating these characters has become a topic of interest. An additional use for character animation technology is in the film and video game industry where having characters animated without needing to pay for expensive labour would save tremendous costs. When animating characters there are many aspects to consider, for example the way they walk. However, to truly assist with communication automated animation needs to duplicate the body language used when speaking. In particular conversational agents are often only an animation of the upper parts of the body, so head motion is one of the keys to a believable agent. While certain linguistic features are obvious, such as nodding to indicate agreement, research has shown that head motion also aids understanding of speech. Additionally head motion often contains emotional cues, prosodic information, and other paralinguistic information. In this thesis we will present our research into synthesising head motion using only recorded speech as input. During this research we collected a large dataset of head motion synchronised with speech, examined evaluation methodology, and developed a synthesis system. Our dataset is one of the larger ones available. From it we present some statistics about head motion in general. Including differences between read speech and story telling speech, and differences between speakers. From this we are able to draw some conclusions as to what type of source data will be the most interesting in head motion research, and if speaker-dependent models are needed for synthesis. In our examination of head motion evaluation methodology we introduce Forced Canonical Correlation Analysis (FCCA). FCCA shows the difference between head motion shaped noise and motion capture better than standard methods for objective evaluation used in the literature. We have shown that for subjective testing it is best practice to use a variation of MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) based testing, adapted for head motion. Through experimentation we have developed guidelines for the implementation of the test, and the constraints on the length. Finally we present a new system for head motion synthesis. We make use of simple templates of motion, automatically extracted from source data, that are warped to suit the speech features. Our system uses clustering to pick the small motion units, and a combined HMM and GMM based approach for determining the values of warping parameters at synthesis time. This results in highly natural looking motion that outperforms other state of the art systems. Our system requires minimal human intervention and produces believable motion. The key innovates were the new methods for segmenting head motion and creating a process similar to language modelling for synthesising head motion.
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International
Showing items related by title, author, creator and subject.
Peacock, Andrew M (University of Edinburgh; College of Science and Engineering; School of Engineering and Electronics, 2001-07)Motion Estimation is an important research field with many commercial applications including surveillance, navigation, robotics, and image compression. As a result, the field has received a great deal of attention and ...
Visualisation of the Lip Motion of Brass Instrument Players, and Investigations of an Artificial Mouth as a Tool for Comparative Studies of Instruments Bromage, Seona (2007)When playing a brass instrument the lips of the player fulfil a similar role to the cane reeds of wood-wind instruments. The nature of the motion of this lip-reed determines the ow of air through the lips, between the ...
Hofer, Gregor; Shimodaira, Hiroshi; Yamagishi, Junichi (2007)Making human-like characters more natural and life-like requires more inventive approaches than current standard techniques such as synthesis using text features or triggers. In this poster we present a novel approach ...