|
|
Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1842/5765
|
Files in This Item:
| File |
Description |
Size | Format |
Coco2011.pdf | one year restriction | 28.01 MB | Adobe PDF | |
|
| Title: | Coordination of vision and language in cross-modal referential processing |
| Authors: | Coco, Moreno Ignazio |
| Supervisor(s): | Keller, Frank |
| Issue Date: | 24-Nov-2011 |
| Publisher: | The University of Edinburgh |
| Abstract: | This thesis investigates the mechanisms underlying the formation, maintenance,
and sharing of reference in tasks in which language and vision
interact. Previous research in psycholinguistics and visual cognition has
provided insights into the formation of reference in cross-modal tasks. The
conclusions reached are largely independent, with the focus on mechanisms
pertaining to either linguistic or visual processing.
In this thesis, we present a series of eye-tracking experiments that aim to
unify these distinct strands of research by identifying and quantifying factors
that underlie the cross-modal interaction between scene understanding
and sentence processing. Our results show that both low-level (imagebased)
and high-level (object-based) visual information interacts actively
with linguistic information during situated language processing tasks. In
particular, during language understanding (Chapter 3), image-based information,
i.e., saliency, is used to predict the upcoming arguments of the
sentence, when the linguistic material alone is not sufficient to make such
predictions.
During language production (Chapter 4), visual attention has the active
role of sourcing referential information for sentence encoding. We show
that two important factors influencing this process are the visual density
of the scene, i.e., clutter, and the animacy of the objects described. Both
factors influence the type of linguistic encoding observed and the associated
visual responses. We uncover a close relationship between linguistic
descriptions and visual responses, triggered by the cross-modal interaction
of scene and object properties, which implies a general mechanism
of cross-modal referential coordination. Further investigation (Chapter 5)
shows that visual attention and sentence processing are closely coordinated
during sentence production: similar sentences are associated with
similar scan patterns. This finding holds across different scenes, which
suggests that coordination goes beyond the well-known scene-based effects
guiding visual attention, again supporting the existence of a general
mechanism for the cross-modal coordination of referential information.
The extent to which cross-modal mechanisms are activated depends on the
nature of the task performed. We compare the three tasks of visual search,
object naming, and scene description (Chapter 6) and explore how the
modulation of cross-modal reference is reflected in the visual responses of
participants. Our results show that the cross-modal coordination required
in naming and description triggers longer visual processing and higher
scan pattern similarity than in search. This difference is due to the coordination
required to integrate and organize visual and linguistic referential
processing.
Overall, this thesis unifies explanations of distinct cognitive processes (visual
and linguistic) based on the principle of cross-modal referentiality,
and provides a new framework for unraveling the mechanisms that allow
scene understanding and sentence processing to share and integrate information
during cross-modal processing. |
| Sponsor(s): | Engineering and Physical Sciences Research Council (EPSRC) |
| Keywords: | eye-tracking coordination scene understanding |
| URI: | http://hdl.handle.net/1842/5765 |
| Appears in Collections: | Informatics thesis and dissertation collection
|
Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.
|