Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Biological Sciences, School of >
Biological Sciences thesis and dissertation collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/5020

This item has been viewed 109 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Mungall2011.pdf5.96 MBAdobe PDFView/Open
Title: Next-generation information systems for genomics
Authors: Mungall, Christopher
Supervisor(s): Tyers, Mike
Issue Date: 27-Jun-2011
Publisher: The University of Edinburgh
Abstract: The advent of next-generation sequencing technologies is transforming biology by enabling individual researchers to sequence the genomes of individual organisms or cells on a massive scale. In order to realize the translational potential of this technology we will need advanced information systems to integrate and interpret this deluge of data. These systems must be capable of extracting the location and function of genes and biological features from genomic data, requiring the coordinated parallel execution of multiple bioinformatics analyses and intelligent synthesis of the results. The resulting databases must be structured to allow complex biological knowledge to be recorded in a computable way, which requires the development of logic-based knowledge structures called ontologies. To visualise and manipulate the results, new graphical interfaces and knowledge acquisition tools are required. Finally, to help understand complex disease processes, these information systems must be equipped with the capability to integrate and make inferences over multiple data sets derived from numerous sources. RESULTS: Here I describe research, design and implementation of some of the components of such a next-generation information system. I first describe the automated pipeline system used for the annotation of the Drosophila genome, and the application of this system in genomic research. This was succeeded by the development of a flexible graphoriented database system called Chado, which relies on the use of ontologies for structuring data and knowledge. I also describe research to develop, restructure and enhance a number of biological ontologies, adding a layer of logical semantics that increases the computability of these key knowledge sources. The resulting database and ontology collection can be accessed through a suite of tools. Finally I describe how the combination of genome analysis, ontology-based database representation and powerful tools can be combined in order to make inferences about genotype-phenotype relationships within and across species. CONCLUSION: The large volumes of complex data generated by high-throughput genomic and systems biology technology threatens to overwhelm us, unless we can devise better computing tools to assist us with its analysis. Ontologies are key technologies, but many existing ontologies are not interoperable or lack features that make them computable. Here I have shown how concerted ontology, tool and database development can be applied to make inferences of value to translational research.
Sponsor(s): Howard Hughes Medical Institute
National Institutes of Health NIH Grant HG00739 to FlyBase (W.M.Gelbart)
roadmap initiative grant U54 HG004028 from the NIH National Human Genome Research Institute (P41 grant 5P41HG002273-08 to Gene Ontology Consortium).
UK BBSRC, 1996-2003 PAGA and GAIT Bioinformatics grants.
Keywords: genomic data
bioinformatics
genome analysis
ontology
URI: http://hdl.handle.net/1842/5020
Appears in Collections:Biological Sciences thesis and dissertation collection

This item is licensed under a Creative Commons License
Creative Commons

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback