Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/3781

This item has been viewed 72 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Byrne2009.pdfPhD thesis1.52 MBAdobe PDFView/Open
Byrne2009sourceFiles.zipFile not available for download2.92 MBUnknown
Title: Populating the semantic web: combining text and relational databases as RDF graphs
Authors: Byrne, Kate
Supervisor(s): Klein, Ewan
Grover, Claire
Issue Date: 2009
Publisher: The University of Edinburgh
Abstract: The Semantic Web promises a way of linking distributed information at a granular level by interconnecting compact data items instead of complete HTML pages. New data is gradually being added to the SemanticWeb but there is a need to incorporate existing knowledge. This thesis explores ways to convert a coherent body of information from various structured and unstructured formats into the necessary graph form. The transformation work crosses several currently active disciplines, and there are further research questions that can be addressed once the graph has been built. Hybrid databases, such as the cultural heritage one used here, consist of structured relational tables associated with free text documents. Access to the data is hampered by complex schemas, confusing terminology and difficulties in searching the text effectively. This thesis describes how hybrid data can be unified by assembly into a graph. A major component task is the conversion of relational database content to RDF. This is an active research field, to which this work contributes by examining weaknesses in some existing methods and proposing alternatives. The next significant element of the work is an attempt to extract structure automatically from English text using natural language processing methods. The first claim made is that the semantic content of the text documents can be adequately captured as a set of binary relations forming a directed graph. It is shown that the data can then be grounded using existing domain thesauri, by building an upper ontology structure from these. A schema for cultural heritage data is proposed, intended to be generic for that domain and as compact as possible. Another hypothesis is that use of a graph will assist retrieval. The structure is uniform and very simple, and the graph can be queried even if the predicates (or edge labels) are unknown. Additional benefits of the graph structure are examined, such as using path length between nodes as a measure of relatedness (unavailable in a relational database where there is no equivalent concept of locality), and building information summaries by grouping the attributes of nodes that share predicates. These claims are tested by comparing queries across the original and the new data structures. The graph must be able to answer correctly queries that the original database dealt with, and should also demonstrate valid answers to queries that could not previously be answered or where the results were incomplete.
Sponsor(s): Economic and Social Research Council (ESRC)
Keywords: semantic web
relation extraction
ontology
RDB2RDF
TXT2RDF
URI: http://hdl.handle.net/1842/3781
Appears in Collections:Informatics thesis and dissertation collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2013, and/or the original authors. Privacy and Cookies Policy