Functional inferences over heterogeneous data
Nuamah, Kwabena Amoako
MetadataShow full item record
Inference enables an agent to create new knowledge from old or discover implicit relationships between concepts in a knowledge base (KB), provided that appropriate techniques are employed to deal with ambiguous, incomplete and sometimes erroneous data. The ever-increasing volumes of KBs on the web, available for use by automated systems, present an opportunity to leverage the available knowledge in order to improve the inference process in automated query answering systems. This thesis focuses on the FRANK (Functional Reasoning for Acquiring Novel Knowledge) framework that responds to queries where no suitable answer is readily contained in any available data source, using a variety of inference operations. Most question answering and information retrieval systems assume that answers to queries are stored in some form in the KB, thereby limiting the range of answers they can find. We take an approach motivated by rich forms of inference using techniques, such as regression, for prediction. For instance, FRANK can answer “what country in Europe will have the largest population in 2021?" by decomposing Europe geo-spatially, using regression on country population for past years and selecting the country with the largest predicted value. Our technique, which we refer to as Rich Inference, combines heuristics, logic and statistical methods to infer novel answers to queries. It also determines what facts are needed for inference, searches for them, and then integrates the diverse facts and their formalisms into a local query-specific inference tree. Our primary contribution in this thesis is the inference algorithm on which FRANK works. This includes (1) the process of recursively decomposing queries in way that allows variables in the query to be instantiated by facts in KBs; (2) the use of aggregate functions to perform arithmetic and statistical operations (e.g. prediction) to infer new values from child nodes; and (3) the estimation and propagation of uncertainty values into the returned answer based on errors introduced by noise in the KBs or errors introduced by aggregate functions. We also discuss many of the core concepts and modules that constitute FRANK. We explain the internal “alist” representation of FRANK that gives it the required flexibility to tackle different kinds of problems with minimal changes to its internal representation. We discuss the grammar for a simple query language that allows users to express queries in a formal way, such that we avoid the complexities of natural language queries, a problem that falls outside the scope of this thesis. We evaluate the framework with datasets from open sources.