Date of Award

2019

Degree Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Stastistics

First Advisor

Noah Daniels

Abstract

While a number of methods exist for gene based fin prediction, these studies are mostly limited to single sites and to binary morphological characters and they are very susceptible to missing data. Here we present GATRS (Gene Association in TRee Space), a novel algorithm than can operate on continuous morphological observations such as fin size and is robust to missing data by operating in gene tree space, analyzing linkages between entire genes and morphological traits. GATRS performs a large number of comparisons between closely related trees which requires a tree distance method that is particularly apt at distinguishing closely related trees in a distant forest. Due to the number of comparisons the employed method's computational requirements must also be reasonable. Finally, gene trees are likely to contain substantial amounts of missing data to which any distance method deployed within GATRS must be reasonably robust. I conduct a thorough comparison of popular distance methods that are known to provide the best distance scores and find that none of the methods serve GATRS' purpose as the best performing methods are computationally prohibitive while the fastest methods falter in the presence of missing data. I therefore develop my own novel tree distance method called TDeft that focuses on taxon neighborhoods and produces scores of a quality rivaling that of the best methods while requiring modest computational time that only marginally exceeds that of the fastest methods. It also proves to be robust to a wide range of realistic scenarios. I test GATRS on a comprehensive dataset of osteichthyes with 211 taxa and 2072 loci and produce 160 implied associations. In spite of the sparse amount of existing research, I am able to demonstrate the correctness of more than half of the most significant associations, confirming the validity of my approach. I also report new discoveries for which no previous laboratory research exists and suggest that GATRS serve as a guide for future biological research studies, reducing the need for expensive and invasive exploratory laboratory studies required to narrow the field of candidate genes.

Available for download on Saturday, April 17, 2021

Share

COinS