Until recently, research efforts in automated Question Answering (QA) have mainly
focused on getting a good understanding of questions to retrieve correct answers. This
includes deep parsing, lookups in ontologies, question typing and machine learning
of answer patterns appropriate to question forms. In contrast, I have focused on the
analysis of the relationships between answer candidates as provided in open domain
QA on multiple documents. I argue that such candidates have intrinsic properties,
partly regardless of the question, and those properties can be exploited to provide better
quality and more user-oriented answers in QA.
Information fusion refers to the technique of merging pieces of information from
different sources. In QA over free text, it is motivated by the frequency with which
different answer candidates are found in different locations, leading to a multiplicity
of answers. The reason for such multiplicity is, in part, the massive amount of data
used for answering, and also its unstructured and heterogeneous content: Besides am¬
biguities in user questions leading to heterogeneity in extractions, systems have to deal
with redundancy, granularity and possible contradictory information. Hence the need
for answer candidate comparison. While frequency has proved to be a significant char¬
acteristic of a correct answer, I evaluate the value of other relationships characterizing
answer variability and redundancy.
Partially inspired by recent developments in multi-document summarization, I re¬
define the concept of "answer" within an engineering approach to QA based on the
Model-View-Controller (MVC) pattern of user interface design. An "answer model"
is a directed graph in which nodes correspond to entities projected from extractions
and edges convey relationships between such nodes. The graph represents the fusion
of information contained in the set of extractions. Different views of the answer model
can be produced, capturing the fact that the same answer can be expressed and pre¬
sented in various ways: picture, video, sound, written or spoken language, or a formal
data structure. Within this framework, an answer is a structured object contained in the
model and retrieved by a strategy to build a particular view depending on the end user
(or taskj's requirements.
I describe shallow techniques to compare entities and enrich the model by discovering four broad categories of relationships between entities in the model: equivalence,
inclusion, aggregation and alternative. Quantitatively, answer candidate modeling im¬
proves answer extraction accuracy. It also proves to be more robust to incorrect answer
candidates than traditional techniques. Qualitatively, models provide meta-information
encoded by relationships that allow shallow reasoning to help organize and generate
the final output.