Genome-scale transcriptomic and epigenomic analysis of stem cells
MetadataShow full item record
Embryonic stem cells (ESCs) are a special type of cell marked by two key properties: The capacity to create an unlimited number of identical copies of themselves (self-renewal) and the ability to give rise to differentiated progeny that can contribute to all tissues of the adult body (pluripotency). Decades of past research have identified many of the genetic determinants of the state of these cells, such as the transcription factors Pou5f1, Sox2 and Nanog. Many other transcription factors and, more recently, epigenetic determinants like histone modifications, have been implicated in the establishment, maintenance and loss of pluripotent stem cell identity. The study of these regulators has been boosted by technological advances in the field of high-throughput sequencing (HTS) that have made it possible to investigate the binding and modification of many proteins on a genome-wide level, resulting in an explosion of the amount of genomic data available to researchers. The challenge is now to effectively use these data and to integrate the manifold measurements into coherent and intelligible models that will actually help to better understand the way in which gene expression in stem cells is regulated to maintain their precarious identity. In this thesis, I first explore the potential of HTS by describing two pilot studies using the technology to investigate global differences in the transcriptional profiles of different cell populations. In both cases, I was able to identify a number of promising candidates that mark and, possibly, explain the phenotypic and functional differences between the cells studied. The pilot studies highlighted a strong requirement for specialised software to deal with the analysis of HTS data. I have developed GeneProf, a powerful computational framework for the integrated analysis of functional genomics experiments. This software platform solves many recurring data analysis challenges and streamlines, simplifies and standardises data analysis work flows promoting transparent and reproducible methodologies. The software offers a graphical, user-friendly interface and integrates expert knowledge to guide researchers through the analysis process. All primary analysis results are supplemented with a range of informative plots and summaries that ease the interpretation of the results. Behind the scenes, computationally demanding tasks are handled remotely on a distributed network of high-performance computers, removing rate-limiting requirements on local hardware set-up. A flexible and modular software design lays the foundations for a scalable and extensible framework that will be expanded to address an even wider range of data analysis tasks in future. Using GeneProf, billions of data points from over a hundred published studies have been re-analysed. The results of these analyses are stored in an web-accessible database as part of the GeneProf system, building up an accessible resource for all life scientists. All results, together with details about the analysis procedures used, can be browsed and examined in detail and all final and intermediate results are available and can instantly be reused and compared with new findings. In an attempt to elucidate the regulatory mechanisms of ESCs, I use this knowledge base to identify high-confidence candidate genes relevant to stem cell characteristics by comparing the transcriptional profiles of ESCs with those of other cell types. Doing so, I describe 229 genes with highly ESC-specific transcription. I then integrate the expression data for these ES-specific genes with genome-wide transcription factor binding and histone modification data. After investigating the global characteristics of these "regulatory inputs", I employ machine learning methods to first cluster subgroups of genes with ESC-specific expression patterns and then to define a "regulatory code" that marks one of the subgroups based on their regulatory signatures. The tightly co-regulated core cluster of genes identified in this analysis contains many known members of the transcriptional circuitry of ESCs and a number of novel candidates that I deem worthy of further investigations thanks to their similarity to their better known counterparts. Integrating these candidates and the regulatory code that drives them into our models of the workings of ESCs might eventually help to refine the ways in which we derive, culture and manipulate these cells - with all its prospective benefits to research and medicine.