Author(s)

Emily LightFollow

Major

Computer Science

Advisor

Daniels, Noah, M

Advisor Department

Computer Science and Statistics

Date

5-2023

Keywords

Computer Science; Computational Biology; Multiple Sequence Alignment

Abstract

For my honors project, I am continuing my research with my academic advisor, Dr. Daniels on creating an approach to the Multiple Sequence Alignment problem in the Rust programming language. This approach will be attached to Dr. Daniels’ CLAM (Clustered Learning for Approximate Manifolds) to enable it to globally and locally align DNA sequences. This research was divided into three separate parts: building the algorithms, implementing them into CLAM’s metric, and measuring and improving the performance of the algorithms. This project includes two separate but related algorithms; Needleman-Wunsch algorithm and the Smith-Waterman algorithm.

The Needleman-Wunsch algorithm takes in two DNA sequences and adds gaps (extra spaces between nucleotides) in a sequence to allow the most amount of nucleotides to be the same and in the same position in both sequences. The Needleman-Wunsch returns the best global alignment (alignment that includes all nucleotides in both sequences) and the Smith-Waterman algorithm returns the best local alignment (alignment that only includes smaller sub-sequences of the two starting sequences). These two algorithms are often used to compare and identify different strands of DNA. This makes them very useful tools to study and understand biological problems related to genetics.

Once completely developed, the alignments were incorporated into the CLAM as a library. Tools such as Valgrind and Calgrind were used to find areas within the algorithms that have poor and inefficient performance. These areas were revised to have better performance.

Share

COinS