Discovery of Polygenic Effects in Complex Diseases
Although many genetic variants associated with complex diseases have been discovered, most have only a small effect on disease risk and therefore have not been useful for the purposes of diagnosis or prognosis. This gap between the known impact of heredity on disease risk and the lack of clinically useful genetic variants has been called the "missing heritability problem" and is a major challenge facing the field of complex disease genetics. One current hypothesis is that this "missing heritability" may be found in the complex web of interactions (epistasis) between multiple genes; that one variant will only exert a meaningful effect when in the presence of one or more other variants. My early work was focused on the developed an algorithm for detecting multi-locus associations in genome-wide association study data. The algorithm adapts a standard search and optimization technique, called a genetic algorithm (based on the process of natural selection), and is guided by multiple sources of prior knowledge about the disease of interest, including known genetic associations and known protein-protein interactions. Given the enormous number of possible combinations of interactions between variants on a genome-wide scale, this type of probabilistic algorithm in combination with prior knowledge allows for the discovery of meaningful genetic interactions without the need for extensive computational resources.
The realization that complex diseases are polygenic in nature, has led researchers to develop a variety of methods to provide biological context for the numerous genetic associations identified to date. These various techniques are collectively referred to as gene-set analyses or pathway analyses (since biological pathway models are often used to group genes in a meaningful way), and they aim to provide a measure of association between a disease and a group of genes or variants. Given that there are numerous methods for summarizing the effects of multiple genetic variants at the level of genes or gene sets, the interpretation and comparison of results from these types of analyses has become a major challenge. The focus of my work in this area has been to shed light on the inherent biases and the statistical and bioinformatic intricacies of these methods, with the goal of improving cross-method and cross-study comparability.
The development of techniques for measuring the abundance of all mRNA (the transcriptome) in a tissue sample has been a major breakthrough in complex disease genetics. Gene expression microarray experiments have allowed researchers to gain insights into how the complex patterns of gene activity inside a cell contribute to many complex traits. Furthermore, the vast number of genome-wide expression data sets produced in recent years has provided a wealth of publicly available data useful for secondary analyses and meta-analyses. However, the fact that a number of different platforms have been used to produce the data has led to challenges for comparing and integrating data across studies. My work has focused on evaluating methods for cross-platform comparison of gene expression data, and has illustrated differences among platforms in terms of their ability to identify alternatively spliced transcripts and the impact of polymorphisms on microarray performance.