Information Services banner Edinburgh Research Archive The University of Edinburgh crest

Edinburgh Research Archive >
Biological Sciences, School of >
Biological Sciences thesis and dissertation collection >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1842/3869

This item has been viewed 60 times in the last year. View Statistics

Files in This Item:

File Description SizeFormat
Edwards2009.pdfMPhil thesis4.3 MBAdobe PDFView/Open
Edwards2009.docFile not available for download5.69 MBMicrosoft Word
MilkMine tutorial.docadditional tutorial11.13 MBMicrosoft WordView/Open
Title: MilkMine: text-mining, milk proteins and hypothesis generation
Authors: Edwards, Stephen
Supervisor(s): Sawyer, Lindsay
Webber, Bonnie
Issue Date: 2009
Publisher: The University of Edinburgh
Abstract: The vast and increasing volume of biological data can make it a struggle for scientists to keep up-to-date with the latest research and as a consequence they may miss significant biological links, particularly those that extend outwith their own area of expertise. MilkMine is an attempt to provide a single informatics resource to help milk protein scientists mine this information mountain more effectively, by integrating standard experimental data types with data generated by emerging text-mining techniques. A method was initially developed to identify milk-related terminology from peer-reviewed biological literature and this was used to complement the Unified Medical Language System (UMLS), a large thesaurus of biological concepts, their variant names and their types. The resultant enriched ontology was then mapped to the free text of peer-reviewed biological literature using the MMTx program producing a database of semantically enriched sentences. A co-occurrence relation extraction algorithm was written to identify relationships between milk proteins and peptides, and other biological concepts, such as diseases or biological processes. Using these literature relation sets new hypotheses can be generated using the basic principle that if “A is linked to B”, and if “B is linked to C” then we can infer an association between A and C. Filtering and downstream processing of the many generated relationships promotes significant interactions. These literature relations and hypotheses are integrated with biological data into the MilkMine database. The MilkMine database is built upon on a generic data warehousing system, InterMine. This tool enabled the integration of traditional data types, such as protein sequence or structural data, from a variety of sources (e.g. UniProt). However, the standard InterMine model was also extended by the author to include other data sources (e.g. the Protein Data Bank) and to incorporate the output of the text-mining algorithm. This integration of otherwise disparate information allows more complex querying of the data, across many data types. For example, protein sequences are mapped to instances of the names, synonyms or symbols of the protein in text, therefore a raw fragment of amino acid sequence (e.g. a particular binding region) can be used to search the MilkMine database for literature information as well as the interactions and hypotheses of those proteins that contain the sequence. The MilkMine resource is accessible online (www.bioinformatics.ed.ac.uk/milkmine) through a professional level query interface offering many features such as an interactive query builder, standard ready-to-run queries, bulk downloads and the ability to store user preferences and query histories. Evaluation of MilkMine showed that the text-mining algorithm, as well as the data integration, could provide the user with interesting connections for further study.
Sponsor(s): Biotechnology and Biological Sciences Research Council (BBSRC)
Keywords: text-mining
informatics
milk protein
milk peptide
URI: http://hdl.handle.net/1842/3869
Appears in Collections:Biological Sciences thesis and dissertation collection

Items in ERA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback