PSD-Project Sequence Database
Document Type
Report
Date of Original Version
4-30-2006
Abstract
1. Introduction
The sequencing of nucleic acids and the subsequent analysis of their encoded the genes and proteins has revolutionized biological sciences and allowed for the determination of evolutionary origins and biochemical potential of organisms. While the technology to determine molecular sequence data has become extremely powerful in terms of both automation and thoughput, sequence analysis technology has lagged behind, unable to efficiently keep up with the vast amount of sequence data being generated both in individual labs and by researchers worldwide. The foundation of robust sequence analysis is dependent on the building and maintaining of project sequence databases (PSD) specific to the interest of the researcher. Laboratory generated sequence data is combined with sequences of similar origin or function from public databases, such as the National Center for Biotechnology Information (NCBI). Currently, one must perform a single BLAST search on each individual sequence to obtain phylogenetically related sequences within the desired confidence value. Each related sequence file must then be opened independently and the pertinent information is essentially ‘copy and pasted’ by hand into a document or excel spreadsheet. From this document, sequences which are to make up the database must be imported into the preferred program for phylogenetic analysis. It is only then that analysis of the laboratory generated sequence can be preformed. Since the NCBI database is updated continuously with new sequences being generated each day, it is impossible to keep a PSD up-to-date. The time consumed trying to build and maintain the PSD is unacceptable in a modern lab. The capacity of performing multiple BLAST searches at once needs to be combined with automated extraction of the resulting relevant information into a format accessible by many subsequent applications. Also, as NCBI is updated, the searches must be repeated and new results added to the database without creating redundancy. It is also necessary to add lab generated sequence data to the PSD in a format matching that of the public databases. This type of database management program will save researchers valuable time and streamline the analysis of gene and protein sequences, especially if the software is made to be user-friendly and intuitive enough to be used by any biologist, even a novice computer user.
Appendices Table of Contents:
-
PSD Formal Specification
-
StaticDiagrams
-
PSD Overall Architecture
-
psd.commobjects Class Diagram
-
psd.client.ui Class Diagram
-
psd.client.models Class Diagram
-
psd.client.si Class Diagram
-
psd.server.servlets Class Diagram
-
psd.server.registries Class Diagram
-
psd.server.sql Class Diagram
-
psd.server.blastmanager Class Diagram
-
psd.server.filemanager Class Diagram
-
-
PSDDatabaseSchema
-
Sequence Diagrams
-
Client Initialization
-
Initializer Servlet
-
LGF Servlet
-
DF Servlet
-
SI Servlet
-
Admin Servlet
-
-
Use Case Diagrams
-
Original Project Plan