Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Informatics, School of >
Informatics thesis and dissertation collection >

Please use this identifier to cite or link to this item:

This item has been viewed 31 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Makalias2010.pdf4.11 MBAdobe PDFView/Open
Title: Exploring a striped XML world
Authors: Makalias, Savvas
Supervisor(s): Viglas, Stratis
Issue Date: 2010
Publisher: The University of Edinburgh
Abstract: EXtensible Markup Language, XML, was designed as a markup language for structuring, storing and transporting data on the World Wide Web. The focus of XML is on data content; arbitrary markup is used to describe data. This versatile, self-describing data representation has established XML as the universal data format and the de facto standard for information exchange on the Web. This has gradually given rise to the need for efficient storage and querying of large XML repositories. To that end, we propose a new model for building a native XML store which is based on a generalisation of vertical decomposition. Nodes of a document satisfying the same label-path, are extracted and stored together in a single container, a Stripe. Stripes make use of a labelling scheme allowing us to maintain full structural information. Over this new representation, we introduce various evaluation techniques, which allow us to handle a large fragment of XPath 2.0. We also focus on the optimisation opportunities that arise from our decomposition model during any query evaluation phase. During query validation, we present an input minimisation process that exploits the proposed model for identifying input that is only relevant to the given query, in terms of Stripes. We also define query equivalence rules for query rewriting over our proposed model. Finally, during query optimisation, we deal with whether and under which circumstances certain evaluation algorithms can be replaced by others having lower I/O and/or CPU cost. We propose three storage schemes under our general decomposition technique. The schemes differ in the compression method imposed on the structural part of the XML document. The first storage scheme imposes no compression. The second storage scheme exploits structural regularities of the document to minimise storage and, thus, I/O cost during query evaluation. Finally, the third storage scheme performs structureagnostic compression of the document structure which results in minimised storage, regardless the actual XML structure. We experiment on XML repositories of varying size, recursion and structural regularity. We consider query input size, execution plan size and query response time as metrics for our experimental results. We process query workloads by applying each of the proposed optimisations in isolation and then all of their combinations. In addition, we apply the same execution pipeline for all proposed storage schemes. As a reference to our proposed query evaluation pipeline, we use the current state-of-the-art system for XML query processing. Our results demonstrate that: • Our proposed data model provides the infrastructure for efficiently selecting the parts of the document that are relevant to a given query. • The application of query rewriting, combined with input minimisation, reduces query input size as well as the number of physical operators used. In addition, when evaluation algorithms are specialised to the decomposition method, query response time is further reduced. • Query evaluation performance is largely affected by the storage schemes, which are closely related to the structural properties of the data. The achieved compression ratio greatly affects storage size and therefore, query response times.
Keywords: XML
native store
Appears in Collections:Informatics thesis and dissertation collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh 2013, and/or the original authors. Privacy and Cookies Policy