Date of Award

2015

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Lutz Hamel

Abstract

The exponential growth of proteome databases has increased the demand for methodologies that can reveal the structural relationships between proteins. In general, large protein families need to be approached on several different levels in order to be fully understood. In such families, key characteristics and relationships are hidden under their sophisticated structures. While similarities in the primary sequences of two proteins give basic clues about their relationship, three-dimensional structural information provides crucial details needed for determining protein functionality.

As such, powerful and efficient computational analytic methods are becoming all the more essential. In the case of proteins, functionalities are most closely related with their three-dimensional structures. Thus, analysis based on the three-dimensional structure is absolutely necessary. The functions of proteins, particularly the functions of specific functional sites, are determined primarily by structural features. Thus, it can be said that structural similarities often point to functional similarities as well.

This analysis, based on the functional site, suggests a unique way of constructing a structural comparison model using SOM, an unsupervised machine learning algorithm. The experiment was performed with two popular protein families. Structural alignment of protein structure was performed prior to the analysis, in hopes of minimizing the error in the three-dimensional structures of the proteins. The SOM technique was then applied to the aligned structures. The results obtained with the SOM algorithm highlight the similarity and dissimilarity of the proteins. Finally, by analyzing clusters in a SOM grid, the structure-function relationship between proteins could be identified.

Share

COinS