Temperature Effects on the Proteome of Ciona intestinalis

Ciona intestinalis is a solitary ascidian that lives in temperate waters around the globe. C. intestinalis in Rhode Island exhibit normal embryonic development at water temperatures between 10° C and 18° C. This thesis is interested in what changes occur in the ovary tissue at temperatures above 18° C and below 10° C. A preliminary study to assess the reproductive fitness of C. intestinalis, found that C. intestinalis raised at 22 °C had lower embryo viability when compared to the normal temperature range. 22 °C was chosen to mimic the high temperature predicted in some climate change models (IPCC, 2014). During the coldest part of the winter when temperatures drop below 8 °C, feeding slows, the production of gametes ceases, and much of the ovary tissue is resorbed (Dybern, 1965). Ovary samples were collected from animals reared at 18 °C, 22 °C, and from a non-temperature controlled tank at 8 oC. The samples were processed and sequenced by mass spectrometry. The data showed a decrease in the number of proteins produced between 18 oC and 22 oC samples. It also showed that there were more proteins upregulated in 22 °C compared to the 18 °C samples. Over the winter, the number of proteins also decreases when compared to 18 °C, and metabolic pathways were downregulated. Due to the lack of methods for ecological studies dealing with proteomic data, new methods were developed for both tissue and data processing.


Introduction
Ocean warming can have a profound effect on the physiology of marine organisms. It can have dramatic effects on metabolism and reproduction in addition to causing an overall stress response (Dorts, et al., 2012). In most organisms, exposure to both chronic and acute stressors decreases reproductive capacity (Einarsson, et al., 2008). It is predicted in some climate change models that ocean temperature is expected to rise at least 4 degrees Celsius in the next 100 years (IPCC, 2014). Marine organisms often use temperature as a cue to spawn or begin migration. These temperature cues usually correspond with seasonal changes. These seasonal changes include changes in sunlight and therefore, changes in food availability to provide the necessary energy for reproduction. Lower temperature is also important in triggering diapause, which happens over winter (Caceres, 1997). Because of climate change, marine organisms will experience higher temperatures, and it will be important to assess the physiological effects.
The organism used for this study was C. intestinalis, a type of solitary ascidian.
Most populations of C. intestinalis are found in cold temperate water, and are present on six of the seven continents (Dybern, 1965). In the population local to Rhode Island, the highest water temperature they are likely to encounter is 25 ºC in the summer.
They survive and reproduce best at a maximum temperature of around 18 ºC (Berrill, 1947). Cold-water C. intestinalis have a one to two year life span. The first generation will spawn during the summer months producing many offspring that will grow to maturity by the end of summer. This second generation will go dormant over the winter months in order to reproduce in the spring. The first generation will often not survive the winter. The third generation that was spawned at the end of summer will remain in juvenile form, which is more resilient to the cold temperatures (Dybern, 1965). Most of the Mediterranean population dies off in the summer, then those that remain spawn in the spring and the fall (Dybern, 1965). If the cold-water animals were to be subjected to a temperature over 25 º C for an extended period, the population may die back, mimicking the life cycle of the Mediterranean population.
C. intestinalis is an invasive species that can out-compete local fauna for space. It is a common fouling organism often found on boats and fishing equipment (Carver, et al., 2006). C. intestinalis is a filter feeder eating plankton and algae floating in the water column. When the water temperature is decreased below a certain limit C.
intestinalis lowers its food intake by slowing the beating of its cilia (Petersen, et al., 1999). If the water temperature never reaches this lower limit, C. intestinalis will continue to feed and this will decrease the food supply of the other filter feeding local fauna.
C. intestinalis is also a model organism for developmental genetics, and its genome has been completely sequenced making it a good candidate for proteomics studies (Endo, et al., 2011). It is important to have a fully sequenced and annotated genome when performing a proteomics study, allowing for a larger base of predicted proteins to map peptides.
Due to new orbitrap tandem mass spectrometry technology, it is possible to assess which proteins an organism is expressing globally. This gives information about the biological processes that the animal is performing (Tomanek, 2011). Instead of first separating proteins on a 2D PAGE gel, shotgun proteomics uses total tissue lysate for analysis. After the lysate is prepared, a tryptic digest is performed to break the proteins into peptides. The peptides are sequenced. The sequences obtained should be representative of the proteins present in the tissue (Zhang, et al., 2013). By comparing differences in protein quantity using peak intensity data given by the mass spectrometer, it is possible to compare the expression level of individual proteins among experimental samples (Zhang, et al., 2013). If there is a large fold change in peak intensity between the two samples, and consistency between the biological replicates, then this protein is considered upregulated or downregulated. Knowing which proteins are up and down regulated informs which biological processes are being modified in response to the increase in temperature. Overall, this work is attempting to discover the differences between the proteomes of the animals reared at normal and elevated temperatures. The hope is to attribute these differences to changes in physiology.

Study Species:
The species used for this study is Ciona intestinalis, sp. B. All of the animals that were spawned for use in this experiment came from Point Judith Marina on Point Judith Pond, in Wakefield, Rhode Island.

Set up of Experiment:
At the URI Graduate School of Oceanography (GSO) research aquarium, two 40gallon tanks were used to house the animals for the experiment. Any food or water that was added to the tanks was added via a head tank in order to keep the water and food levels in the tanks consistent. Raw seawater flowed from the head tank into each tank at a rate of 1 liter per minute. The Group 1 tank was set to 18 ºC while the Group 2 tank was set at 22 º C. Each tank was fitted with a chiller unit and pump used to chill the seawater to the appropriate temperature. The pump was attached to keep the flow rate consistent between the tanks. A heater unit was also used to help maintain a constant temperature. Temperatures and flowrates from each tank were recorded daily to insure that the conditions were kept consistent between tanks and that the water was at the desired temperature. After the summer, the rate of growth slowed and it was necessary to supplement the food given to the animals. A mixture of algae, and zooplankton was added once per day into the head tank to insure that all the tanks received equal amounts of food.

Spawning and Plating:
Two sexually mature C. intestinalis (larger than 5 cm) were collected from Snug Harbor, Point Judith Pond in early July of 2015. They were placed in the dark for one to two hours, and then placed under direct light, causing them to release gametes (Lambert & Brandt, 1967) (Whittingham, 1967. The eggs were collected from both animals and placed into separate dishes. Each was cross-fertilized with the sperm from the other. After a fifteen-minute time interval to allow fertilization to occur, the eggs were washed in filtered seawater to remove any excess sperm. They were incubated at 15 ºC for 24 hours. After 24 hours, the eggs began to hatch and the free-swimming larva emerged. The larva and remaining eggs were transferred to a new dish containing a PVC or acrylic plate and fresh filtered seawater. After another 12 hours, the free-swimming larvae settled on the plates and started to metamorphose into juveniles (reabsorb tail, become sessile, and begin to feed). The plates containing the juveniles were incubated at 15 ºC for up to a week, at which point the number of juveniles settled on the plates was determined using a dissecting microscope. Plates containing 5 or more juveniles were placed into a 15-gallon tank at 15 ºC. This tank was placed into an incubator and, over the course of a week, brought up to 18 ºC. Half of the plates were carefully transferred to tank one and the other half were transferred to tank two. Tank two was gradually raised over the course of two weeks to its final temperature of 22 ºC.

Collection and reproductive stress test:
After 3 months, the animals reached sexual maturity and reached about 5-6 cm in length. Four animals (two from tank 1 and two from tank 2) were collected and spawned. The eggs from the Group 1 animals were collected and placed in separate labeled dishes. The sperm from each animal was used to cross-fertilize the eggs. The same was done for Group 2. The fertilized eggs were then allowed to grow at both a normal temperature and a stress temperature. Group 1 embryos were placed at 18 °C and 22 °C. Group 2 embryos were placed at 22 °C and 26 °C (see figure 2). After 24 hours, the larvae were fixed and scored for normal development (Morgan, 1945). This test was used to score the reproductive ability of the animals, as well as the viability of the offspring. This was performed until six animals from tank one and 6 animals from tank two were collected and spawned.

Dissection and Tissue Preparation:
Directly after spawning, the adult animals were placed in clean dishes containing filtered seawater. The ovary and a portion of the testis were collected from each animal. Each tissue type was removed and immediately frozen in liquid nitrogen.
After freezing, the labeled tubes were stored at -80 ºC.

Lysate Preparation:
Ten of the ovary samples were chosen based on spawning test results (5 from tank one and 5 from tank two), this insured the samples used were from reproductively healthy individuals.

Data Analysis:
The LC-MS/MS raw data was analyzed to identify proteins at a 1% false discovery rate. Three statistical tests were run on peak area intensity data. First a q-value test for multiple hypothesis testing identified only five statistically upregulated proteins, using a significance q-value <0.05 (see Appendix 1 for statistical method).
To identify more putatively differentially regulated proteins, a Student's two-tailed ttest and a Mann-Whitney U-test were performed for all proteins. Protein IDs with pvalues < 0.05 by either test were examined for mean peak area intensity fold changes, between Group One and Group Two greater than that of the protein identified by the stringent q-value test with the lowest fold change (HSP71, UniProt ID F6W8I3_CIOIN). This criterion gave a value of 2 or greater for 10 x log 10 (mean fold change), which was considered the minimum cutoff value for upregulated proteins.
Likewise, a 10 x log 10 (mean fold change) value of -2 or less was considered downregulated.
The differentially regulated proteins were uploaded as accession numbers to UniProtKB database in order to get a .fasta file containing the sequences.

Results and Discussion
Setting up the Experiment: Two tanks were set up at the URI GSO facility because it has access to raw seawater. One of the tanks was set to 18 °C (Group 1) and the other was set to 22 °C (Group 2). 18 °C is the upper limit of healthy development for C. intestinalis larva from the Rhode Island population. This was used as the control temperature. 22 °C is an increase of 4 degrees from the control, which increase corresponds to some climate change models, and was used as the experimental temperature (IPCC, 2014).

Reproductive Stress Test:
The C. intestinalis collected from the two tanks were spawned by placing them in the dark for 2 hours then placing them into bright light. The eggs were fertilized with sperm from another animal grown at the same temperature. These embryos were incubated at a stress temperature (22 °C for the Group 1 animals and 26 °C for the Group 2 animals) and a control temperature (18 °C and 22 °C respectively) ( Figure 2).
Larvae were counted, sorted, and scored for normal development. If there were more than 20 larvae/embryos, only a random subset was scored. There were four Group 1 (18 ºC) and four Group 2 (22 ºC) animals tested, each with a subset of eggs at the control and at the stressed temperature. The number of normally developing offspring was averaged together by temperature. Figure 3 shows the averaged counts of normally developed larvae as a percentage of the total. The Group 1 animals produced a higher percentage of normal larvae in both 1a (18 ºC) and 1b (22 ºC) than the Group 2 animals in either 2a (22 ºC) or 2b (26 ºC), indicating a reduction in reproductive success at the higher temperature. In addition to having fewer normal larvae overall at the higher temperature, there is less difference between 2a (control) and 2b (stressed) than between 1a (control) and 1b (stressed). One possible reason for the less dramatic decrease in normally developing larvae in the higher temperature Group 2 is likely due to the group 2 animals already being stressed. Therefore stressing them further does not provide a substantial difference in response. A T-test between 1b and 2a (both incubated at 22 ºC) had a p value of 0.081, which although not significant when using 95% confidence rating, suggests that more replicates would show a statistically significant decrease in reproductive success in the group reared at the higher temperature.

Protein Sequencing Results:
After dissecting out the ovarian tissue from five replicate animals at each condition, tissue lysates were prepared and sent for shotgun LC-MS/MS. Ovary tissue was used because it should be informative of what is happening to female reproduction. It was also used because it is possible to dissect ovaries cleanly, without gut contamination.
1,616 proteins were identified in both conditions at a 1% false discovery rate.
Based on 2-tailed Q tests, allowing for multiple hypothesis testing, 5 proteins were considered significantly differentially expressed between the control and stressed conditions (see Table 1, P<0.05). Using a heat map (See Figure 4) generated using the peak intensity data for each biological replicate; it was possible to choose all proteins with at least three biological replicates up or down regulated and no more than two with contradictory expression in the complementary condition. This resulted in 168 proteins, which were considered consistent among the biological replicates. Of these proteins, 151 were upregulated (10 log 10 fold change > 2) and 17 were downregulated (10 log 10 < -2). This indicates that more genes were being activated than down regulated due to the increased temperature.

Data Analysis:
The LC-MS/MS data was received as an Excel file. The Excel file contained columns for protein name, peptide sequence, and peak intensity for each sample, fold change ratios, q values, and accession numbers for NCBI and UniProtKB. The UniProt accession numbers were used on the UniProtKB (http://www.uniprot.org/uploadlists/) website to obtain a .fasta file containing the sequences for all of the proteins. This was also done for the up and downregulated lists. UniProtKB was used because it had more protein sequence data available than the NCBI database (http://www.ncbi.nlm.nih.gov/). It has a feature that allows batch conversion of accession numbers into sequences that can be downloaded into one file.
The fasta sequences are needed for the BLAST2GO program and for Ghost KOALA.
BLAST2GO (https://www.blast2go.com/) is a program that gives the gene ontology (GO) information for proteins. GO terms are a way of expressing gene function in a way that is comparable among species by using a consistent vocabulary.
Due to the nature of GO terms they can be coded making them readable by many different computer programs (The Gene Ontology Consortium , 2015). GO terms also allow different analyses to be performed on a data set. The main purpose of running the BLAST2GO software is to get an idea of the types of genes that are present in the data set and how they group together by functionality. Having this information is useful for identifying a specific function or group of genes. It is also useful for comparing the up and down regulated lists to the total list in order to identify functional groups over or under represented in the differentially regulated proteins ( Figure 6). This chart can be used to identify GO terms (or protein types/functions) that are more abundant either in the up or down regulated lists. Once this is identified, the individual proteins can be further investigated to see if an interesting pattern emerges. The groups that contain more than 20 % of the proteins are single organism metabolic process, single organism cellular process, single multicellular, organism process, response to stress, positive regulation of metabolic process, and anatomical structure morphogenesis. The categories that contain only upregulated proteins single organism signaling, regulation of localization, positive regulation of metabolic process, positive regulation of cellular process, negative regulation of cellular processes, macromolecule localization, and cellular localization From the BLAST2GO software, transcription factors, signaling molecules, stress proteins, and heat shock proteins were identified (Table 2). Transcription factors can give insight into which genes are activated due to the increase in temperature. In addition, signaling molecules are used to activate or turn off certain pathways.
Knowing which pathways are activated allows for the identification of the ligand or molecule that triggered the signal. Stress proteins and heat shock proteins are expected to appear during times of stress, such as the increase in temperature (Kultz, 2005).
This makes them good indicators that the 4 degree increase in temperature is actually causing a stress response.
After finding the GO terms for the proteins, PANTHER (http://www.pantherdb.org/geneListAnalysis.do) was used as a tool for performing an overrepresentation test. This test compares the list of up and down regulated proteins (or whatever list of proteins and fold change data that needs processing) to the whole genome database for a chosen organism. The over representation test works via an algorithm that counts the number of proteins in a genome that map to a particular pathway or function. It determines the degree of difference between those expected by random chance and what was found in the list. The software can compare individual proteins as well as KEGG pathways. Knowing what proteins and pathways are over or underrepresented is a good way to find important trends in the data. The reason for using PANTHER instead of other software is that it was user friendly and had the whole genome annotations for C. intestinalis built into the program. Many of the other free online-based applications limit the species available for use. In the up and down regulated list, the ubiquitin mediated proteolysis pathway was significantly overrepresented, according to the application default criteria.
After running the list through PANTHER, it was run through Ghost KOALA (http://www.kegg.jp/ghostkoala/), which provides KEGG annotations for each of the proteins and maps them to individual pathways. Ghost KOALA was designed for analyzing large datasets, thus allowing all the data to be input at once rather than searching one at a time. After running the list through the program, it generates a list of all the proteins in each pathway and a KEGG pathway diagram. The diagram shows where the proteins from the reference list are located in comparison to all other proteins mapped in the pathway. From the PANTHER data, it was determined that there was one significant overrepresented pathway, as mentioned above, which was then searched in the Ghost KOALA pathway data for a match.

Findings
It was found that there were some differences in the proteome of the animals grown at a higher temperature. For example, there was an upregulation of heat shock and other stress related proteins. Ubiquitin mediated proteolysis was the pathway that PANTHER designated as overrepresented. Ubiquitination is a process where ubiquitin tags are added to proteins in order to mark them for degradation by the proteasome.
During times of heat stress, proteins can begin to unfold or misfold. Those that do not regain their native conformation are often marked for degradation. This pathway also plays a role in degradation of damaged or short-lived proteins, which often have regulatory functions (Bachmair, et al., 1986). Heat and oxidative stress can cause protein damage, which is a probable cause for the upregulation in this pathway (Somero & Hofmann, 1995). It could be related to the upregulation in signaling molecules, which are short-lived and degraded in the same pathway. An upregulation in signaling molecules is a sign that other pathways are being signaled to perform a function due to the temperature difference (Table 2).

Summary
In summary, this work endeavored to determine if there was a noticeable difference in the proteome of C. intestinalis when grown at elevated temperatures. It was also attempting to determine what was causing decreased reproductive success when water temperatures are at their local high range. Though no proteins could be exclusively linked to reproduction, many proteins involved in stress were detected. This work also provides a method for analyzing shotgun proteomics data for non-model organisms (see figure 5), and for using proteomics for studies with an ecological aspect.

Future work:
In order to validate this data, subsequent research may involve measuring transcript levels by RT-PCR. This will be used to validate certain up and down regulated proteins among the biological replicates; making clear some of the biological variation and giving a better picture of what is going on in the organism. In addition, the uncharacterized proteins that are found to be consistently up or down regulated using the heat-map might be characterized using BLAST and a program called Phyer 2 (http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index). Phyer 2 looks at the amino acid sequence and picks out homologous protein domains based on structure and folding patterns. Knowing the biological function of the peptides contained in the protein helps to inform the biological process that it belongs in.
Therefore, it can be grouped with other proteins in the pathway thereby confirming that the process is in fact upregulated. Figure 1. Fold changes between the means of protein quantifications at 22 °C vs. 18 °C sorted from highest to lowest. 10 x log 10 of fold change values were plotted against the sequential numbering of the protein.
Inflection points for up or down regulated proteins were identified visually to include the five proteins with differential fold changes with pvalues <0.05 by 2-tailed unpaired t test (see Appendix I). The fold change values at these infection points were used to determine which proteins were up or down regulated. The top graph is based on the raw data and the bottom graph is based off only those proteins that were consistent among the 5 biological replicates.        . Work flow for the mass spectrometer data analysis This is a flow chart for how to analyze the raw data. Two paths can be taken. Path one takes the ids from the excel file and turns them into sequences. This is fed into BLAST2GO, which gives gene ontology information about each protein. This gives insight into its function. This is used when looking for a specific class of protein.
The sequences are also used as input for Ghost KOALA, which gives protein pathway information. The other path uses PANTHER to determine interesting pathways and then Ghost KOALA is used to visualize all the proteins in that pathway. Both processes can be performed simultaneously using different software packages.

Introduction
Ciona intestinalis is a solitary ascidian found on most rocky shores in the Northern Hemisphere. C. intestinalis prefers the cold temperate waters of protected bays often growing on the side of docks and other manmade structures. Ciona intestinalis, which translates to "column of intestines", uses a branchial basket filled with beating cilia to filter algae and other particles from the water column (Carver, et al., 2006). As water temperature increases, so does the beat frequency of the cilia, increasing the amount of water and food passed over the branchial basket (Petersen, et al., 1999). A decrease in water temperature therefore decreases the intake of food and many of the biological processes that require energy are downregulated.
Dormancy is defined as some type of metabolic and/or developmental depression due to an ecological cue, such as temperature, and encompassing a wide range of physiological changes (Caceres, 1997). During dormancy, C. intestinalis is not actively producing gametes and no mature gametes are present in the reproductive tissue. This generally occurs during January, February, and March in the North Atlantic, when the water temperatures are at their lowest, but can vary based on local water temperature. As the water temperature rises, the adult animals that survived the winter spawn and die in early-mid spring. This produces a new population that will spawn at the end of summer and then overwinter to spawn and die in the next spring (Carver, et al., 2006). Those populations that do not experience extreme cold or hot temperatures, like those in Southern England (5º C-20 ºC) and California may spawn continuously because they do not reach a critically lower temperature limit or large fluctuations in water temperature (Carver, et al., 2006). The local population in Rhode Island can experience temperature as low as 2 ºC and as high as 25 ºC over the course of the year (NOAA, 2015).
Traditionally, 2D gels were used when comparing protein expression between two environmental samples (Wright, et al., 2012). Only those spots that differ between the two samples are sent for sequencing on a mass spectrometer. This method however, may not give a clear picture of all the processes that are affected by the change in environment. Due to new mass spectrometry technology, it is possible to sequence a larger number of proteins at once, allowing a more complete proteome to be sequenced in one run (Wright, et al., 2012). Having a more complete proteome allows for an in depth analysis of biological pathways that are being affected (Tomanek, 2011). The main benefit of using proteomics, rather than transcriptomics for this study, is that it shows the protein expression directly, which may not be accurately reflected by the transcriptome (Horgan & Kenny, 2011). Therefore, it is possible to predict which physiological processes are altered during dormancy. I hypothesize that the expression level of certain genes in the ovarian tissue will decrease due to the lack of active reproduction, more specifically those dealing directly with metabolism.

Methods
Set up: C. intestinalis were spawned and settled on petri dishes. They were transported to the University of Rhode Island Bay Campus (GSO) where they were suspended in outdoor cylindrical tanks pumped with raw seawater. The tanks were covered to decrease the amount of light to prevent an overgrowth of algae in the tanks, which could cause hypoxic events.

Tissue Collection, Dissection, and Lysate Preparation:
Five animals were collected at 8 ºC in late February, and the ovaries were removed. The tissue was frozen in 1.5 ml tubes using liquid nitrogen.

Results
Because the 18 °C and 8 ºC samples were not run on the same HPLC column, it was not possible to reliably compare protein quantification between the two conditions. Therefore, I compared the samples using the presence or absence of individual proteins. There were 178 proteins unique to the 18 °C samples (Table 3) and 41 proteins unique to the 8 ºC samples (Table 4) (Figure 7). BLAST2GO was used to determine the functions of the individual proteins. The number of proteins that fell into each of the functional categories is listed in Table 5. The two highlighted rows in the table show the two categories in which there was the greatest or least difference in number of mapped proteins. The category with the highest difference is "response to stimulus", which is defined as any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism (Carbon, et al., 2009). This general category encompasses any response to a hormone, chemical or stimulus of any kind. MAP/microtubule affinity-regulating kinase, Ras-related C3 botulinum toxin substrate 1, and Niemann-Pick C2 protein are a few examples of proteins that fall into this category. The category with the least difference was negative regulation of biological process, which is defined as any process that stops, prevents, or reduces the frequency, rate or extent of a biological process. Biological processes are regulated by many means; examples include the control of gene expression, protein modification or interaction with a protein or substrate molecule (Carbon, et al., 2009). Glucose-6-phosphate 1-dehydrogenase, calcium-transporting ATPase, and tubulin alpha chain are a few examples of proteins in this category.
The Ghost KOALA results are listed in Tables 6, 7, and 8. Table 6 lists the KEGG terms (or protein pathways) that are unique to the 8 ºC samples, Table 7 shows those unique to the 18 °C samples, and Table 8 shows those present in both. There are 166 pathways listed for 18 °C, as opposed to 9 pathways listed for 8 °C, including more at 18 °C in the category of signal transduction, body systems, and metabolism.

Discussion
The sequence data was analyzed using BLAST2GO and Ghost KOALA. The purpose of this experiment was to determine the difference in the proteome of the reproductive tissues of C. intestinalis when it is actively reproducing compared to when it is reproductively dormant. As the temperature and food availability decrease, the ovary shrinks in size due to both resorption of gametes and the reallocation of energy stored for reproduction. This energy is used elsewhere and resorbed tissue is not regenerated until favorable conditions return in the spring (Carver, et al., 2006) (Airi, et al., 2014). Below 10 °C for the Rhode Island population, C. intestinalis likely decreases its food intake and therefore must get its energy from other sources to include breaking down of the ovary (Petersen, et al., 1999) (Dybern, 1965).
The number of proteins detected in 8 ºC samples is lower than of 18 °C samples (1291 proteins detected in 8 °C samples compared to 1399 proteins in 18 °C samples). This is to be expected; when the animal is not actively reproducing, it does not need to produce proteins necessary to generate mature gametes. In addition, many of the cellular processes in the ovary cells shut down when not in active use in order to conserve energy, including cell signaling, digestion, and general metabolism (based on pathway analysis, see tables). Some associated proteins found only in the 18 °C data are adversin a signaling molecule in the FAS signaling pathway, microsomal triglyceride transfer protein that is involved in the digestion and absorption of lipids, and phosphoenolpyruvate carboxykinase, a part of pyruvate metabolism. In many organisms, diapause or dormancy is accompanied by a decrease in metabolism in order to slow depletion of energy stores during a time of low food abundance (Caceres, 1997).
Proteins present in the transport pathways remained consistent between the samples, except for protein export and RNA transport pathways, for which representative proteins were unique to the 8 °C samples (see table 6). Therefore, it is possible that the components from the ovary cells that are not in current use are sent to other parts of the body. In addition, many of the proteins expressed in the 8 ºC samples played some role in RNA production or processing. The increase in RNA processing is likely due to the time of year that the animals were harvested. Since they were collected toward the end of winter right before the water temperatures began to rise, the animals may have begun increasing the amount of transcription in order to prepare for the production of eggs (Berrill, 1947). Although the water temperature was beginning to rise, the temperature is still below 10 ºC which is the lower limit for reproduction in C. intestinalis (Berrill, 1947). In addition, no gametes were found in the reproductive tract or organs at the time of dissection.
Based on the pathways assigned by Ghost KOALA to the proteins from the 8 °C data, the active systems were the nervous system, excretory system, endocrine system, and the circulatory system, meaning that proteins were detected in these categories There were fewer proteins present in these categories in the 8 ºC samples than the 18 °C. During dormancy, it is expected that many non-essential functions will be downregulated in order to conserve energy. In conclusion, there was an overall decrease in the number of proteins detected in the 8 ºC samples and many nonessential systems were turned off including egg production. For future experiments, more samples should be taken at varying points during dormancy in order to understand the metabolic decline better, and to see if there was an increase in energy storage prior to metabolic decline. In addition, taking measurements of food intake and respiration will verify the extent of depression in metabolism occurs over the winter. This will help to clarify the process of dormancy in C. intestinalis.        This table shows the number of proteins from each sample that fall into each functional category. The percentage of proteins that fell into each category was then taken. There are more proteins in the 18° C samples overall. The top and bottom rows show the smallest and largest difference respectively. The largest difference was in response to stimulus, Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism (Carbon, et al., 2009). The smallest difference was in negative regulation of biological processes, Any process that stops, prevents, or reduces the frequency, rate or extent of a biological process. Biological processes are regulated by many means; examples include the control of gene expression, protein modification or interaction with a protein or substrate molecule (Carbon, et al., 2009 Table 5. Table 6. KEGG terms unique to the cold-water data set.
Over all there are far fewer pathways represented in the coldwater data than in the control samples. This is likely due to dormancy and the lack of active reproduction. During dormancy metabolism slows and many non-essential functions are turned off. Numbers in parentheses are the number of proteins assigned to the pathway. This list contains far more pathways than the cold-water data set. Therefore, it can be implied that more pathways are functioning when the animal is actively reproducing. Numbers in parentheses are the number of proteins assigned to the pathway.

Appendix 1
Proteomics Core Rhode Island Hospital Protocol (How the samples were processed by the mass spectrometer facility)

LC-MS/MS analysis
LC/MS was performed as described previously (Nguyen, et al., 2009). Tryptic peptides were analyzed by a fully automated proteomic technology platform , (Yu & Salomon, 2010). The nanoLC-MS/MS experiments were performed with an Agilent 1200 Series Quaternary HPLC system (Agilent Technologies, Santa Clara, CA) connected to a "Q Exactive Plus" mass spectrometer (Thermo Fisher Scientific, Waltham, MA). The lyophilized tryptic peptides were reconstituted in buffer A (0.1 M acetic acid) at a concentration of 1 µg/µl and 5 µl was injected for each analysis. The electrospray ion source was operated at 2.0 kv in a split flow configuration, as described previously (Elias & Gygi, 2007). The Q Exactive was operated in the data dependent mode using a top-9 data dependent method. Survey full scan MS spectra (m/z 400-1800) were acquired at a resolution of 70,000 with an AGC target value of 3×10 6 ions or a maximum ion injection time of 200 ms. Peptide fragmentation was performed via higher-energy collision dissociation (HCD) with the energy set at 28 NCE. The MS/MS spectra were acquired at a resolution of 17,500, with a targeted value of 2×10 4 ions or a maximum integration time of 250 ms. The under fill ratio, which specifies the minimum percentage of the target value likely to be reached at maximum fill time, was defined as 1.0%. The ion selection abundance threshold was set at 8.0×10 2 with charge state exclusion of unassigned and z =1, or 6-8 ions and dynamic exclusion time of 20 seconds.

Data analysis
Peptide spectrum matching of MS/MS spectra from whole cell lysate tryptic digest samples was performed against a human-specific database (UniProt; downloaded 2/1/2013) using MASCOT v. 2.4 (Matrix Science, Ltd, London W1U 7GB UK). A concatenated database containing 144,156 "target" and "decoy" sequences was employed to estimate the false discovery rate (FDR). Msconvert from ProteoWizard (v. 3.0.5047), using default parameters and with the MS2Deisotope filter on, was employed to create peak lists for Mascot. Mascot database searches were performed with the following parameters: trypsin enzyme cleavage specificity, 2 possible missed cleavages, 7 ppm mass tolerance for precursor ions, 20 mmu mass tolerance for fragment ions. Search parameters permitted variable modification of methionine oxidation (+15.9949 Da), and static modification of carbamidomethylation (+57.0215 Da) on cysteine. The resulting peptide spectrum matches (PSMs) were reduced to sets of unique PSMs by eliminating lower scoring duplicates. Mascot results were filtered by Mowse Score (>20). Peptide assignments from the database search were filtered down to 1% false discovery rate (FDR) by a logistic spectral score, as previously described (Ficarro, et al., 2005), ).

Quantitation of relative peptide abundance
Relative quantification of peptide abundance was performed via calculation of selected ion chromatograms (SIC) peak areas. Retention time alignment of individual replicate analyses was performed as previously described (Demirkan, et al., 2011). Peak areas were calculated by inspection of SICs using in-house software programmed in R 3.0 based on the Scripps Center for Metabolomics' XCMS package (version 1.40.0). This approach performed multiple passes through XCMS's central wavelet transformation algorithm (implemented in the centWave function) over increasingly narrower ranges of peak widths, and used the following parameters: mass window of 10 ppm, minimum peak widths ranging from 2 to 20 seconds, maximum peak width of 80 seconds, signal to noise threshold of 10, and detection of peak limits via descent on the non-transformed data enabled. For cases when centWave did not identify an MS peak, we used the getPeaks function available in XCMS to integrate in a pre-defined region surrounding the maximum intensity signal of the SIC. SIC peak areas were determined for every peptide that was identified by MS/MS. In the case of a missing MS/MS for a particular peptide, in a particular replicate, the SIC peak area was calculated according to the peptide's isolated mass and the retention time calculated from retention time alignment. A minimum SIC peak area equivalent to the typical spectral noise level of 1000 was required of all data reported for label-free quantitation. Individual SIC peak areas were normalized to the peak area of the exogenously spiked synthetic peptide DRVYHPF added prior to reversed-phase elution into the mass spectrometer. P-values were calculated from five replicates. To select peptides that show a statistically significant change in abundance between control and treatment, two-tailed unpaired Student's t tests and q-values for multiple hypothesis tests were calculated based on the determined p-values using the R package QVALUE as previously described (Storey, 2003), (Storey & Tibshirani, 2003).

Introduction
Diplosoma listerianum is an invasive species of tunicate that is believed to have originated on the Northwest coast of North America. It is now found all over the world in warm, temperate and cold waters. It is likely spread by encrusting on shipping vessels or in ballast water (Lambert, 2001). It continues to spread to new places due to its rapid growth, and can displace local species. It smothers bivalves and other animals in the filter feeding community (Dijkstra, et al., 2007). Each colony contains many small animals about 2 mm in length, but can be up to 20 cm across. In the United Kingdom, scientists received the unique opportunity to see how introduction happens first hand. In 2005, Diplosoma listerianum was not present in the UK, but by 2009, it became the dominant species in the fouling community (Vance, Lauterbatch and Wahl, 2009). This showed that it only takes a few years for Diplosoma listerianum to dominate a new habitat and oust local organisms. This is why it is important to understand how this organism spreads by looking into its reproductive capabilities.
In other colonial tunicates such as Botryllid ascidians, there are multi and pluripotent stem cells that contribute to both regeneration of the colony as well as reproduction (Kurn, Rendulic and Tiozzo, 2011). However, the molecular mechanisms that control these processes are not well understood. Therefore, this study will provide more insight into the genetic base for the claim that stem cells give rise to bud tissue (Brown and Swalla, 2012). The gene used as a marker was DlOct2, which is a transcription factor believed to be associated with stem-cells (Pan, et al., 2002 ).
DlOct2 is part of the POUgene family.
Another reason for using Diplosoma listerianum to study this phenomenon is that the didemnids have a unique method of semi-conservative budding that is not used by other ascidians (N.J. Berrill, 1935). There is controversy about which tissues give rise to buds in didemnids. For example, it could be the epithelial tissue surrounding the esophagus or the epicardial tissue adjacent to the esophagus. By performing in situ hybridization using stem cell genes, it may be possible to pinpoint the tissue from which the buds originate. Due to the small size and delicate nature of the animal, it makes sectioning techniques difficult, making histology impractical.
Overall, this research is important to understand the role that stem cells play in reproduction and how the DlOct2 gene is associated with stem cells in Diplosoma listerianum.

Study Organism:
Diplosoma listerianum is a colonial ascidian, which is a member of the fouling community and an invasive species to Rhode Island. The animals used in this study came from Point Judith Pond, Rhode Island. These animals are clear and fragile making them difficult to find and collect, but making RNA extraction and dissection of individual animals easy.

Collection and Dissection:
Diplosoma listerianum specimens were collected from the side of docks or from plates suspended in the water. The specimens were removed from the substrate and brought back to the University of Rhode Island main Campus where they were kept alive in a closed system tank for up to 2 weeks before they were dissected. A colony was placed into a Petri dish of filtered seawater and using a stereoscope individual zooids were dissected. The upper tunic was removed and the animals were detached from their lower tunic using sharp tweezers and placed into a 1.5 ml tube, which was kept on ice. After the 1.5 ml tube was full, it was flash frozen in either liquid nitrogen or an ethanol/dry ice bath at -80 ºC for future isolation of RNA.
Another 1.5 ml tube was used to collect individual zooids, which were fixed in 4% formaldehyde in PBS and stored at -20 ºC in ethanol, for in situ hybridization.

RNA Preparation:
The frozen samples were homogenized with a cell lysis buffer in 1.5 ml tubes with a plastic sterile pestle. Total RNA was extracted by adding one volume of phenol /chloroform to the homogenate, vortexed for 1 minute, and spun in a microcentrofuge at 13,000 g for 5 min. The aqueous phase was transferred to a new 1.5 ml tube and extracted again. Two volumes of 100% ethanol were added to the tube, which was placed in the -20 °C freezer for 10 minutes. The sample was spun at 13,000 g for 10 minutes to precipitate the RNA. The 100% ethanol was removed and 70% ethanol was added to wash the pellet. It was again spun for 5 minutes the ethanol removed and the pellet allowed to air dry. After drying, nuclease-free water was added to suspend the RNA. DNase was added to eliminate any DNA molecules. The PolyATract mRNA Isolation System III from Promega was used to extract and purify the mRNA from the total RNA. cDNA was made from mRNA using the Protoscript reverse transcriptase kit from New England Biolabs with a poly-T anchor primer.

PCR and Gel Purification:
The cDNA was used as the template for PCR using degenerative primers for DlOct2, which were designed using the sequence from Ciona intestinalis. The primer sequences used were 5'-ACNCAGGGNGATGTNGGNC and 5'-GNCGNCGNTTGCARAACCA. The PCR product was run on an agarose gel to confirm the presence of a band at the expected size. Expected size was determined by locating where the primers would fall in the C. intestinalis genome and determining the number of nucleotides that fell between them giving an approximate band size.
The band was excised and the DNA purified using MinElute Gel Extraction Kit from Qiagen. Another PCR was run using the purified gel band as template to amplify the specific sequence in order to increase the amount of DNA for later steps. A phenol chloroform extraction followed by ethanol precipitation (same as in above method) purified the sample, and a portion was sent to the Genome Sequencing Center (GSC) at University of Rhode Island for Sanger sequencing. The sequence was run through NCBI BLAST to confirm that it contained the expected gene.

Transformation:
The PCR product was cloned into pUC18 vector using a blunt end ligation and transformed into E. coli strain Top 10. Pure Yield Plasmid Miniprep System from Promega was used to purify the plasmid, which was sent for sequencing at the GSC.

Sequence Alignment:
The Sequence returned from the sequencing reaction was trimmed to get rid of vector and primer sequence. BLAST was used to check that the sequence was an Oct homolog. Ciona, frog and human homologs taken from the NCBI database (see table   2) were used for the alignment (U.S. National Library of Medicine, 2016). ExPASy Translator was used to translate the nucleotide sequence into amino acid sequence.
The UniProt UGENE program was used to align the sequences (see figure 1).

Results and Discussion
Diplosoma listerianum was collected and the individual zooids dissected, RNA extracted, and cDNA synthesized. DlOct2 was isolated from the cDNA, cloned, and the plasmid sent for Sanger sequencing. This is the first successful attempt at isolating and cloning a stem cell regulatory gene from Diplosoma listerianum. Its genome has yet to be sequenced and its genes lack functional classification or expression data. Due to the anatomy of Diplosoma listerianum, it is easy to dissect and homogenize. Therefore, the methods used in this project can be used to isolate other genes from this organism.
The DlOct2 fragment isolated is about 120 amino acids long and is about 360 base pairs in length (Figure 2). When our DlOct2 sequence is run through the BLAST database, Oct 2 or POU2f2 (the name of the human homolog) is the top hit. Oct 2 is found in mammalian B-cells and is a transcription factor regulating immunoglobulin genes in mammals (Corcoran, et al., 1993). On the POU family evolutionary tree, POU 2 diverged directly before POU 5 (Gold, et al., 2014). Oct 4 or POU5f1 codes for a transcription factor that activates genes involved in inducing pluripotent stem cells in mammals (Pan, et al., 2002 ). However, it is unclear if uro-chordates also have this gene or if the POU genes that it possess have different functionality or classification than those found in mammals (Gold, et al., 2014).
The sequence alignment shows small areas that are highly conserved which are consistent with the previous alignments (Rosenfeld, 1991). The reason for choosing the genes in the alignment was to get a good sample of the different POU genes including POU2f2, POU2f1, POU5f1, POU3f4, and POU4. This covers almost all of the POU groups. By having this wide range, it shows that sequence similarity is close between DlOct2 and POU2f2/POU2f1 but it is also similar to POU5f1.

Future work:
The next step on this project is to perform in situ hybridization trials to test for the expression of DlOct2. The sequence that we obtained will be used to design a probe for use in the in situ trials. The hypothesis is that the bud tissue will show expression of the gene because of its role as a stem cell specification gene. If no expression is found in the bud tissue, it may occur in other reproductive tissues or precursors like epicardial tissue or epithelium of the esophagus. Stem cells also play a role in regeneration of damaged tissue, and therefore the gene might be expressed in wound healing (Laird, et al., 2008). The sequences used in the alignment were downloaded from the NCBI database (excluding DlOct2). Homologs of different POU genes were used for the alignment. A representative of each of the 5 classes of POU genes and from human, Ciona, and Frog. The alignment shows that DlOct2 aligns best with Oct 2.