Date of Award

2020

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Statistics

First Advisor

Ying Zhang

Abstract

The field of metagenomics studies microbes from environmental samples in a process generating millions of short DNA sequences called reads. The final outcome is to discover the diversity as well as the function of complex microbial communities using computational tools. The general idea adopted by most taxonomic classifiers, which aim to assign taxonomic groups to individual reads, is to compare the target DNA sequences to a reference database and identify the taxon of the best match. This method has many disadvantages including the necessity for large computing resources in order to process large sequence databases. The current project aims to explore the application of deep learning architectures to taxonomic classification. For this purpose, datasets of 150 bp reads were generated with genomes from 6 bacterial species that are thought to be relevant to the marine microbiome and three different deep neural networks were implemented resulting in a collection of 8 different toy models trained with the same parameters. After evaluating their performance, it was concluded that the best configuration in terms of training time and test accuracy was a convolutional neural network. In parallel, a new method useful when dealing with imbalance data was developed to expand the training data via simulated evolution. This method also has the potential to improve the identification of reads from unknown genomes that are closely related to the species in the training dataset.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.