Custers of gene duplicates: Identification, evolutionary consequences, and phylogenetic patterns
Ortiz, Juan Felipe
:
2019-04-16
Abstract
Animal body plans, olfaction, immune response, and, neuronal connectivity, among many others, traits that in addition to being highly polymorphic, are driven by clusters of tandemly duplicated genes (CTDG). CTDGs are not only capable of producing genic variability, but are also recurrent genomic structures across species. Although the importance of individual CTDGs have been extensively studied (e.g. the Hox CTDGs), a systematic approach to studying CTDGs is still lacking. In this dissertation, I start by identifying the problems with our current understanding of how CTDGs are defined, and design a formal definition. Next, I implement that definition as a computational algorithm called CTDGFinder, to identify CTDGs across genomes and facilitate comparative and evolutionary analyses. With CTDGFinder, I show how CTDGs are features of mammalian genomes, but also that they are not uniformally distributed. I focus on the observation that the human chromosome 19 is by far the most clustered chromosome of the human genome. Taking that observation further, I found that chromosome 19 was already clustered in one (out of the two) ancestral chromosomes that gave rise to chromosome 19 in the history of mammalian diversification. That piece of evidence suggests that CTDGs are not only prevalent in genomes, but are also maintained as such. Moreover, in order to hint at other sources of evidence, I show how the syntenic regions of the chromosome 19 orthologs have been contracting. Finally, I use CTDGFinder to develop LOLCAT, an algorithm for the identification of sequentially duplicated clusters, to present the interesting case of the keratin type I CTDG. In mammals, the keratin type I CTDG is not only very conserved in the number of genes, but also in the order of those genes. Given that level of conservation, I end by proposing a method for resolving complex orthology calling problems in CTDGs by using gene order information and CTDG-specific genic divergence patterns. After this dissertation, we will have a more consistent and systematic way of studying CTDG biology, and interesting challenges are now possible. With this definition and the kinds of analyses presented here, we are able to start thinking about building evolutionary models of genome structure evolution taking into account the gene family dynamics of CTDGs.