Identification of risk factors for major depressive disorder
For complex traits, population genetic studies ask: to what extent do genetic variation and environmental variation influence, determine and predict phenotypic variation? More specifically, researchers ask two questions. First, how much of the phenotypic variation is genetic in origin? Second, if the genetic component of a trait has been ascertained, then by what mechanisms do the causal variants contribute to the genetic variation that impacts on the phenotype? Previous studies have indicated a polygenic structure for many complex traits, which means that the genetic variation in those traits is the result of the cumulative effect from hundreds or even thousands of genetic variants. To further decipher the polygenic genetic architecture of a complex trait, genetic studies aim to identify the number, the location in the genome, and the distribution of the effect sizes of causal variants, as well as their individual and interacting effects. Linkage analysis and genome-wide association studies (GWAS), either based on single variants or sets of variants categorized by functional annotations, can be applied to map the potentially causal variants in the genome. The identification of disease-associated loci, however, is only the starting point in identifying causal variants. Causal variants are usually difficult to distinguish from the large number of variants in linkage disequilibrium (LD) within the associated loci, and may be in incomplete LD with genotyped variants. Computational prediction integrated with multi-level ‘Omic’ data will help the prioritization of candidate causal variants, which then become important targets for experimental validation (Chapter 1). Major depressive disorder (MDD) is a complex trait, contributes the second most important burden to global disease. Both genetic and environmental components have been suggested for this disorder in previous studies, although a clear partitioning of the contribution of each component and the identification of major contributing components is yet to be achieved. In efforts to map causal genetic variants, genome-wide association studies of MDD have identified few significant associations so far. The polygenic architecture combined with the widespread clinical and genetic heterogeneity of MDD between populations may impede the identification of causal variants (Chapter 2). In this thesis, I will present three studies; the first study estimated the proportions of the phenotypic variation that are genetic or familial environmental in origin in two depression definitions(chapter 3), followed by two studies where distinct (non- GWAS) methods were used to identify candidate causal genetic variants for MDD (chapter 4,5). In detail, in chapter 3, a variance component analysis was applied to GS:SFHS (Generation Scotland: Scottish Family Health Study) to investigate the relative genetic and environmental contributions to diagnosed major depressive disorder (MDD) and self-declared depression (SDD). Models for MDD and SDD that simultaneously included genetic and environmental effects suggested that narrow-sense heritability could be inflated by the environments shared by nuclear family members. The most parsimonious models selected for both MDD and SDD included SNP and pedigree-associated genetic effects and the effect of the common environment of couples. In chapter 4, I integrated pathway analysis and multi-level regional heritability analyses in a pipeline designed to identify MDD-associated pathways. The pipeline was applied to two independent GWAS studies (GS:SFHS and PGC1-MDD). The NETRIN1 signalling pathway showed the most consistent association with MDD across the two samples. Polygenic risk scores (PRSs) from this pathway showed predictive accuracy better than whole-genome PRSs when using AUC statistics, logistic regression and the linear mixed model. In chapter 5, genome-wide Haplotype-block-based regional heritability mapping (HRHM) was applied to identify haplotype blocks significantly contributing to MDD. A haplotype block across a 24kb region within the TOX2 gene reached genotype-wide significance in GS:SFHS. Single-SNP and haplotype based association tests were used to localize the association signal within the region identified by HRHM, and demonstrated that five out of nine genotyped SNPs and two haplotypes were significantly associated with MDD. The results were replicated in the UK-Ireland group in PGC2-MDD. The brain expression of TOX2 and brain-specific LncRNA RP1-269M15.3 were also significantly regulated by MDD-associated SNPs within the identified haplotype block. The three studies highlight the value of the application of multiple population genetics and bioinformatics methods to multiple family-based and population-based cohorts in identification of risk factors for MDD.