COMMUNITY COMPOSITION OF THE MORPHOLOGICALLY CRYPTIC DIATOM GENUS SKELETONEMA IN NARRAGANSETT BAY

It is well known that morphologically cryptic species are routinely present in planktonic communities but their role in important ecological and biogeochemical processes is poorly understood. I investigated the presence of cryptic species in the genus Skeletonema, an important bloom-forming diatom, using high-throughput genetic sequencing and examined the ecological dynamics of communities relative to environmental conditions. Samples were obtained from the Narragansett Bay Long-Term Plankton Time Series, where Skeletonema spp. can comprise >95% of microplankton cells during blooms. DNA was extracted and sequenced from monthly samples between December 2008 and December 2013, and Skeletonema specific primers were used to exclusively target and differentiate known species. Seven species of Skeletonema were found in Narragansett Bay over the sampling period, including five species that were previously undetected in this location. Skeletonema community composition was highly seasonal, and temperature had the greatest effect on composition changes over time. Winter and spring communities demonstrated less species richness than did summer and autumn communities. These data suggest that a Skeletonema community can be comprised of different species, even in one estuary, and that seasonal changes appear to have a substantial effect on community structure, and perhaps important ecological processes, over time. ACKNOWLEDGMENTS Many people were involved in the formation of this thesis. First and foremost, I must thank Dr. Tatiana Rynearson for her knowledge, advice, patience, and good humor throughout my time in graduate school. She helped me grow as a scientist and student by encouraging me to overcome all obstacles in my way and to seek out answers to even the most difficult of questions, both in science and in life. She demonstrated to me the importance of leading with passion, sincerity, and respect. I am honored to have had Tatiana as a mentor and will carry her guidance with me throughout my career and life. Many thanks to my thesis committee consisting of Dr. Candace Oviatt and Dr. Chris Lane for their thoughtful feedback and advice, as well as Dr. Bethany Jenkins for serving as defense chair and providing her own useful insight. During my time at GSO, I also worked closely with the Metcalf Institute for Marine and Environmental Reporting, and for that I must thank Dr. Sunshine Menezes, Katharine McDuffie, and Karen Southern for opening my eyes to new possibilities and allowing me to expand my horizons beyond the lab bench. GSO will always have a special place in my heart because of the many wonderful faculty, staff, and students I met there. I would like to particularly thank Dr. Susanne Menden-Deuer, Dr. David Smith, and Meredith Clark for their hard work at GSO and their personal guidance, as well as the many faculty members who contributed to my studies in oceanography. The research described in this thesis was supported by Rhode Island Sea Grant. Additional academic support was provided by the Metcalf Institute for Marine and Environmental Reporting, the Graduate School of Oceanography Alumni Fellowship, and the Lance A. Ricci Fellowship.


INTRODUCTION
Diatoms are single-celled planktonic algae that account for 40-45% of oceanic carbon production, meaning they are capable of providing nearly half of the energy required by ocean life (Field, 1998). Their incredible contribution to marine ecosystems necessitates a strong understanding of the dynamics controlling diatom populations and communities, which is tied to the understanding and long-term monitoring of ecosystem health. An estimated 12,000 species have been described, but some researchers estimate that greater than 200,000 diatom species exist, within which a vast majority are undescribed (Guiry, 2012). The discovery of undescribed diatom species can lead to improved understanding of the dynamics and interactions within phytoplankton communities, particularly when considering the important ecological role diatoms play in marine ecosystems.
A large portion of these undescribed species belong to a category called 'cryptic species', loosely defined as being morphologically indistinguishable (at least superficially) but genetically distinct (Bickford et al., 2007). The concept of cryptic speciation is not new to microbial communities, but marine systems are so diverse that they remain key targets for the study of cryptic species (Mann, 1999). Marine habitats are known for their vastly unknown diversity among not only diatoms but also many other groups and the myriad specialized interactions between unique organisms. Diatoms in particular play an important functional role in marine habitats, but how cryptic species may differ in their ecological roles is unclear.
One example of a cryptic diatom genus is Skeletonema, once considered to be the single species Skeletonema costatum with a high degree of genetic variability but is in fact a complex of at least 10 marine species. Species identified include S. menzelii (Guillard et al., 1974), S. pseudocostatum (Medlin et al., 1991), and S. subsalum (Hasle and Evensen, 1975). Sarno et al. (2005) discovered four additional species of Skeletonema using light microscopy, electron microscopy and genetic sequencing. These species included: S. dohrnii, S. marinoi, S. japonicum, and S. grethae. Other species have been described, including S. ardens and S. grevelli (Sarno et al., 2007). Few studies have addressed species composition in situ and only two species have been identified in Narragansett Bay (Kooistra et al., 2008).
Skeletonema is one of the most abundant phytoplankton genera found in Narragansett Bay. It often dominates phytoplankton blooms and is found in the bay throughout the year (Windecker, 2010). It is a cosmopolitan genus, also found in marine ecosystems globally (Kooistra et al., 2008). Skeletonema represent an average of 49% of the phytoplankton community in Narragansett Bay and can comprise up to 99% of the total phytoplankton cells during peak bloom times (Windecker, 2010).
Over time, Skeletonema abundance in Narragansett Bay has decreased from an average of 2137 cells ml -1 between the years 1959 and 1980 to 1128 cells ml -1 between 1980 and 1997, with average cell abundance remaining at approximately 1220 cells ml -1 until 2005 (Borkman and Smayda, 2009). Skeletonema abundance has changed in other locations as well. A study by Biswas et al. (2009) demonstrated that average cell abundance phytoplankton changed over a period of nearly two decades from 1990 to 2007 in the Bay of Bengal. Skeletonema increased from a relative cell abundance of 1.42% (out of the total phytoplankton population) in 1990 to 12.66% in 2000, and even as high as 18.95% in 2007. This differs from the study in Narragansett Bay not only because the relative cell abundance appears to be much lower than Narragansett Bay's average Skeletonema cell abundance of 49%, but also because the relative abundance has actually increased in recent years. While it is apparent that these changes are occurring both in Narragansett Bay and globally, a mechanism has not yet been described to explain these shifts. Furthermore, the presence of cryptic species of Skeletonema in Narragansett Bay is unknown and may account for changes in community composition based on potential physiological differences among species.
Studying changes in Skeletonema community composition over time is a first step towards understanding differences among morphologically cryptic species and the roles they play in marine ecosystems. These distinctions may also inform projections of future change in phytoplankton communities as the environment and climate undergo rapid changes over time, not only in Narragansett Bay but also in coastal ecosystems around the globe. We address three questions in this study: 1) Do multiple species of Skeletonema exist in Narragansett Bay? 2) Does Skeletonema community composition vary seasonally, or does community composition remain the same throughout the year?
3) Do Skeletonema community shifts correspond to seasonal fluctuations in environmental variables, such as temperature, irradiance, and nutrient concentration? We predict that multiple cryptic species will be detected in Narragansett Bay using novel molecular techniques, and that environmental conditions will cause community composition to change on seasonal time scales due to genetic, and possibly physiologic, differences among species.

Sample collection
Filtered field samples, cell counts, and environmental data were provided by the  (Table 1).

Method validation
In order to evaluate the utility of high-throughput sequencing to identify closelyrelated Skeletonema species, simulated communities of mixed phytoplankton species were created and included along with analysis of field samples. Three simulated communities were generated by mixing exponentially growing cultures of five Skeletonema species in equal abundances and adding this to exponentially growing cultures of four other phytoplankton species that were also mixed in equal abundances.
The three simulated communities were comprised of an 80%, 50% and 20% mixture of A total of 16.35 x 10 6 reads representing 96 amplicon library samples were analyzed. Each sample produced two .fastq files, one containing forward reads and one containing reverse reads. A series of tests were performed to select the optimal methods for merging paired-end (PE) reads. First, we compared the QIIME (Caporaso et al., 2010) join_paired_ends.py algorithm and the USEARCH (Edgar, 2010) -fastq_mergepairs algorithm. Ultimately, PE reads were merged using the USEARCH algorithm. Within the USEARCH algorithm, a quality filter was tested. Raw reads were truncated to the first base pair with a quality score of at least Q30, and this was done separately both before and after PE reads were merged. Based on the accuracy of the resulting data, we opted to perform Q30 trimming before merging PE reads. Also within the -fastq_mergepairs algorithm, PE reads were removed if one or more mismatches occurred in the overlapping sequence. Additional quality filtering and conversion to .fasta format was performed using the USEARCH fastq_filter algorithm, which further truncated sequences to a length of 315 bp to generate reads of equal length for downstream processing.
Converted .fasta files were imported to the Center for Computation and Visualization (Brown University) where QIIME was used to combine .fasta files into a single .fna file using the add_qiime_labels.py script. Sequences were assigned OTUs using QIIME's pick_open_reference_otus.py script. A minimum percent ID of 60% (which is the default value used in QIIME) was compared to a minimum percent ID of 99%, and the 99% ID filter produced more reliable results. A reference database representing a suite of Skeletonema species was created by searching NCBI for Skeletonema 28S sequences (Table 4). All sequences were clustered into OTUs based on the reference database and de novo clustering was not necessary.

Data analysis and statistical methods
For each sample, the percent sequence reads corresponding to each species was determined by dividing the number of sequence reads per species by the total number of sequence reads. Species represented by < 0.1% of the sequence reads in a sample were considered as potential sequencing errors (due to the high degree of similarity among Skeletonema 28S sequences) and removed from the data set.

Calculation of absolute species abundance for winter samples
For a subset of the samples, absolute species abundance was calculated. To generate the conversions from relative to absolute species abundance, we used the sequence data from the simulated communities. For each of three simulated communities (analyzed in triplicate), the expected number of sequence reads for each Skeletonema species was 20% of the total sequence reads since the five species were added in equal proportions. To obtain a transformation coefficient for each species, the number of expected sequence reads was divided by the observed number of sequence reads. The transformation coefficients were averaged across all 9 samples to obtain a single coefficient for each of the five species used in the simulated communities (Table 5).
Because transformation coefficients could only be determined for the five species used in the simulated communities, two species identified in Narragansett Bay (S. costatum and S. pseudocostatum) were not included in the analysis. Since we did not have an estimate of over-or underrepresentation for either of these two species, we did not consider field samples with > 1% of either or both species present. As a result, only field samples containing the five species with coefficients (and possibly minute amounts of two additional species) were considered. Transformation coefficients were first multiplied by the number of sequence reads. Next, the transformed sequence reads were converted to percent composition values. Finally, percent composition values were applied to absolute Skeletonema cell counts from each field sample to obtain individual species cell counts. Since spring, summer, and autumn samples tended to have high levels of S. costatum and S. pseudocostatum and could not be considered in our analysis, we focused on winter samples that consistently lacked these two species.

Propagation of error in DNA processing
Technical replication was evaluated at three steps of the method: extraction, amplification (PCR), and sequencing of DNA. Extraction replicates were derived from simulated communities that were filtered and extracted in triplicate. Extraction replicates were generally associated with low levels of variance, defined as a coefficient of variation (CV) less than 15%, although variance exceeded 15% for some species.
Variance exceeding 15% was associated with S. menzelii in the 50% Skeletonema community and with S. dohrnii and S. menzelii in the 20% Skeleonema community (Table 6). Among extraction triplicates, the overall contribution of a single species to a community was not associated with standard deviation, as species with both higher and lower contributions to the community experienced lower levels of variance than those that exceeded 15% variance.
Amplification and sequencing triplicates were associated with less error than extraction triplicates (Table 6). Species with low sequence read abundance were associated with high levels of variance but were rare among the samples. In contrast, species that dominated samples with high sequence read abundance were associated with low levels of variance. For example, S. pseudocostatum in the June 2010 amplification triplicates comprised 75.42% of sequence reads in that sample and demonstrated only 0.92% variance among triplicates (Table 6). Overall, technical errors associated with amplification and sequencing were low and tended to affect low abundance species much more than high abundance species.

Selection of optimal sequence analysis method
Due to the complexity of amplicon MiSeq data and the close genetic relationship among Skeletonema species, computational steps were extensively tested to identify the most accurate method of processing high-throughput sequencing data. These tests were performed on a single simulated community containing equal concentrations of five species: S. dohrnii, S. grethae, S. japonicum, S. marinoi, and S. menzelii. The final workflow consisted of quality filtering of raw sequence reads, length truncation of merged reads, and prefiltering of sequences against a set of reference sequences ( Figure   1). This section describes options that were compared at each step and how the final workflow was selected.
First, two algorithms commonly used to merge paired-end (PE) amplicon reads were compared: QIIME's join_paired_ends.py algorithm and USEARCH's -fastq_mergepairs algorithm. These two methods were similar when comparing the total number of merged sequences with a minimum base pair overlap filter implemented for a single sample ( Figure 2). The USEARCH algorithm resulted in a maximum yield of 95449 PE reads compared to a maximum yield of only 88619 PE reads in QIIME. The USEARCH -fastq_mergepairs algorithm was selected to merge forward and reverse reads.
Within the USEARCH -fastq_mergepairs algorithm, several parameters were tested to further optimize data processing. Reads were truncated inwards from each end of a read to the first base pair with a quality score of Q30; this was performed separately both before and after PE reads were merged. When Q30 truncation occurred after PE reads were joined, two species were identified in the sequence data that were known to be absent from the simulated community: S. potamos and S. pseudocostatum. A small number of reads were identified as Skeletonema sp. (0.3%) or had no BLAST hit (0.2%) when they failed to cluster with OTUs in our custom Skeletonema reference database.
Also, up to eight OTUs were classified for a single taxon using this method. When quality filtering occurred before PE reads were joined, no unexpected OTUs appeared in the data and only a single OTU was assigned to each species ( Figure 3).
After merging PE reads, two additional parameters were tested. First, a maximum length of 315 bp (10 bp short of the full fragment length) was found to improve the accuracy of results. It removed only highly conserved bps with no phylogenetic information and ensured that the length of all reads did not exceed the length of any reference sequences ( Figure 3). Second, a 99% identity filter in the OTU picking script eliminated OTUs that did not match sequences of species that were known to be present in the sample. When OTUs were required to match reference sequences at either 60% (the default value) or 99%, the 99% sequence match resulted in loss of about 80% of the sequence data compared to the default value but eliminated OTUs that were identified as Skeletonema sp. or had no BLAST hit ( Figure 3). This only had a noticeable effect when Q30 trimming was performed after merging PE reads. An additional quality filter, varying the number of mismatches allowed in the overlapping region, did not have a noticeable effect on OTU picking results ( Figure 3). Based on the results described above, a combination of USEARCH -fastq_mergepairs and -fastq_filter algorithms and QIIME pick_open_reference_otus.py algorithm was used to process sequences, and a suite of quality filters and script parameters were compared for a single sample with known concentrations of five Skeletonema species (Figure 1). The results of these tests revealed that the ideal method of processing sequence reads included a) Q30 trimming before merging PE reads and elimination of sequences containing one or more mismatches in the overlapping region using the USEARCH -fastq_mergepairs algorithm; b) contig length truncation to 315 base pairs using the USEARCH -fastq_filter script; and c) a 99% prefilter ID using the QIIME pick_open_reference_otus.py script (Figure 1).

Simulated communities
Three simulated field communities were created in triplicate by mixing known numbers of five Skeletonema species and four other phytoplankton species in culture. The Skeletonema species were included in equal proportions and combined to comprise 20, 50 or 80% of the total cells in the community. All five Skeletonema species were detected by high-throughput sequencing and none of the other phytoplankton species were detected, confirming that the primers were both specific to the genus Skeletonema and amplified a region that could be used to distinguish among species (Figure 4). The average coefficient of variation (CV) within species was 13.77% and did not exceed 15% for any species with the exception of S. dohrnii in the 20% Skeletonema community (CV=31.38%) and S. menzelii in the 80% Skeletonema community (CV=24.07%) and the 20% Skeletonema community (CV=22.02%).
Both within and among mixed communities, Skeletonema species composition did not vary significantly and was not affected by the absolute concentration of Skeletonema cells in each sample as evidenced by the consistent representation of species across communities (ANOVA, p<0.01) ( Figure 5). Although the five species were expected to contribute equally to the total Skeletonema community in all three experimental conditions (i.e. each species was expected to comprise 1/5 of the sequence data regardless of the total contribution of Skeletonema to the phytoplankton community), the observed percent composition for each Skeletonema species actually ranged from 1.97% to 53.41% and differed significantly from the expected community compositions (Chi-square goodness of fit test with 4 degrees of freedom, χ 2 >43% and p<0.001 for all three communities).

Community composition and seasonality of Skeletonema in Narragansett Bay
Total  Percent composition of sequence reads was used to analyze community composition. Since simulated communities suggested that species like S. japonicum are overrepresented while others like S. marinoi are underrepresented (Figure 4), it was not appropriate to treat percent composition of sequence reads as equivalent to percent composition of species abundance. Instead, percent composition of sequence reads was used to evaluate relative change over time in the community.
Variation in Skeletonema community composition was strongly associated with season ( Figure 6). Field samples clustered into two distinct groups, one primarily dominated by winter and spring samples and the other primarily dominated by summer and autumn samples (Figure 7). This reflects the major shift observed between May and June ( Figure 6) where low richness during winter and spring months quickly transitioned to higher richness during summer and fall months ( Figure 8). The winter-spring cluster appears to have more similar community composition among field samples than the summer-autumn cluster, which also reflects the shift to a richer and unpredictable community. Pairwise comparisons among seasons further indicate that significant differences exist between all seasons (p=0.001) with the exception of winter-spring and summer-autumn (  (Figure 9).
Principal Components Analysis showed distinct separation of seasons in two directions ( Figure 9). The first two axes of the PCA explained 72.2% of the variability in environmental data.  Table 9).

Contribution of S. marinoi to winter communities
Transformation coefficients for five species (S. dohrnii, S. grethae, S. japonicum, S. marinoi, and S. menzelii) were used convert raw sequence reads to absolute cell counts and eliminated over-and underrepresentation of species (Table 5). We focused on winter samples (December-February), which showed that S. marinoi comprised 8-100% of the community during these months. Contribution of S. marinoi generally exceeded 80% of the community and was rarely present in low abundance.
Other species were rarely abundant in winter communities. Dates that coincided with low S. marinoi abundance relative to other species appeared to be influenced by unusual conditions in the bay. For example, in December 2011 community composition was 29% S. dohrnii, 24% S. grethae, 44% S. marinoi, and 3% S. menzelii, and in January 2013 community composition was 8% S. marinoi, 90% S. dohrnii, and 2% S. menzelii.
On these dates, both Skeletonema abundance and overall phytoplankton abundance were low ( Figure 10). December 2011 also experienced a higher than normal surface temperature of 11.5°C (January 2013 surface temperatures were within the typical range).
Chlorophyll-a concentration generally followed changes in overall Skeletonema abundance and was particularly high in January 2010 (25 ug/L). This coincided with a large Skeletonema bloom that was dominated by S. marinoi.

DISCUSSION
Skeletonema is a diatom genus comprised of up to 10 morphologically cryptic species. Previously, studies have focused on genus-level identification to measure differences among Skeletonema populations, but we approached the Narragansett Bay community by identifying cryptic species and their seasonal distributions over a five year time period. We found that Skeletonema experiences distinct shifts over seasons that may relate to environmental changes, particularly in temperature.

Use of high-throughput molecular methods to evaluate community composition
Skeletonema species are closely related, and methods of determining taxonomy must be sensitive enough to detect small differences in DNA sequence and sequence read counts among such species. In this study, high-throughput sequencing was effective in Errors typically had a greater effect on species that revealed low sequence read abundance than on those that made up a majority of the community. These errors were most evident during the DNA extraction step. In contrast, downstream steps of amplification and sequencing had much lower levels of variation. Despite the presence of technical error, high-throughput sequencing is still a trustworthy method to characterize microbial communities. This allows us to focus on species that are more abundant and potentially play a more significant ecological role in marine systems.
Closely-related species like Skeletonema are difficult to study due to experimental errors that can accumulate throughout the method and cause false identification in downstream processing steps. By applying strict quality controls that target the specific needs of the study, we mitigated some of this error and eliminated many false reads and identifications. One technique that was notably different in our study compared to others was the application of Q30 trimming. It is typically recommended to perform Q30 trimming after merging PE reads (Bokulich et al., 2013). We attempted Q30 trimming both before and after merging and found spurious reads when it was performed after merging. Trimming before merging PE reads, however, eliminated OTUs associated with Skeletonema sp., Skeletonema potamos (a freshwater, not marine, species) (Duleba et al., 2014), and unidentified species. Since bps reflective of genetic variation mostly occurred in the middle of the sequence where forward and reverse reads overlapped, it is possible that Q30 trimming before merging reads eliminated low quality bps that were otherwise being included in the merged sequences. We found that developing a custom method of DNA sequence processing, tested on communities of known Skeletonema spp.
abundances, provided us with more accurate and reliable data.

Accuracy of relative and absolute abundance sequence data
Simulated communities revealed that Skeletonema species can successfully be detected and distinguished in mixed phytoplankton communities; however sequence read abundances did not match expected abundances when cell concentrations of Skeletonema spp. were known (Figure 4). For example, abundances for S. japonicum were overestimated while S. marinoi and S. grethae were underestimated. A likely cause for these biases is variation in the LSU rDNA copy number. Theoretically, a species with a greater number of target gene copies will generate a greater number of sequence reads, producing false representation of that species, and vice versa. Unique Skeletonema species were only discovered within the past couple of decades, but it is possible that species have been diverged long enough to develop significantly different numbers of copies of the LSU rDNA gene. A study by Zhu et al. (2005) determined that 18S copy number can vary from 1 to 12,000 for a variety of algal strains, including dinoflagellates and picoplankton. More specifically, within the genus Thalassiosira copy number can vary by 10-fold among species (Zhu et al., 2005). Little work has been done to investigate the potential for variable copy number among Skeletonema species, although LSU copy number in S. marinoi has been estimated to be 61 copies (Ellegaard et al., 2008). The straightforward method for quantifying copy number per cell in these studies was quantitative PCR, and this technique could be applied to Skeletonema species to verify the over-and underestimation of cell abundance based on LSU copy number.

Seasonal separation of Skeletonema communities based on environmental conditions
Studies of Skeletonema community composition have only recently begun to include distinctions of morphologically cryptic species (Yamada et al., 2014). Here, seven species of Skeletonema were detected in Narragansett Bay, five of which had previously never been detected in this area. Previous studies have confirmed the presence of S. grethae and S. japonicum in Narragansett Bay (Kooistra et al., 2008). In this study, S. costatum, S. dohrnii, S. marinoi, S. menzelii, and S. pseudocostatum were detected for the first time in Narragansett Bay. Some of these species have been found in nearby estuaries, such as S. marinoi in Long Island Sound and S. menzelii off the coast of Cape Cod (Kooistra et al., 2008). The results suggest that diversity within estuarine phytoplankton communities may be greater than previously thought.
Communities were found to be significantly different when comparing seasons, except when comparing winter and spring communities and summer and autumn communities. The most noticeable difference between these seasonal communities was the low species richness during winter and spring when S. marinoi was the primary species in Narragansett Bay compared to the high species richness and more unpredictable community composition during the summer. This phenomenon was also noted during several bloom periods that were monitored by sampling each week for the pseudocostatum during the third week of June. This suggests that summer conditions initiated a transition in the Skeletonema community that occurred rapidly within a single bloom period, while winter and spring community composition remained consistent for even up to several months.
A similar separation of communities within a single location was found for morphologically cryptic species of Pseudo-nitzschia in Puget Sound. Initially, at least 16 species of Pseudo-nitzschia were found using another molecular method of automated ribosomal intergenic spacer analysis (ARISA) (Hubbard et al., 2008). Further investigation revealed that these species are spatially separated, even over short distances within a single estuary. While P. pungens and P. delicatissima were widely spread throughout Puget Sound, others were patchy and only found in certain locations. The community composition at each of these locations was strongly associated with environmental conditions, especially temperature, salinity, and ammonium concentration (Hubbard et al., 2014). The effect of cryptic species on community composition along an estuarine gradient is similar to our observed shift in community composition over time relative to seasonal conditions. Furthermore, these studies demonstrate that molecular methods, whether sequencing or ARISA, are useful tools for investigating cryptic species in marine field samples.

Effect of physiology and environmental conditions on community composition
In addition to the phylogenetic separation of cryptic species, studies have also found that physiology can differ significantly among cryptic species, including Skeletonema. In a study by Gallagher (1980), seasonal field samples from Narragansett  (Gallagher, 1982). In these studies, differences were attributed to diversity among clonal lineages prior to the knowledge of cryptic species. Now that we have detected at least seven species of Skeletonema in Narragansett Bay, these seasonal differences could more likely be attributed to species diversity and the unique genetic and physiological characteristics associated with each species.
Different physiologies are a likely cause for shifting community composition across seasons. In a study comparing growth rates of Skeletonema species and temperature, cultures of S. marinoi grew faster than all other species at the lowest tested temperatures (10°C and 15°C) and displayed a specific growth rate of 1.0-1.6 d -1 between temperatures of 10°C and 30°C (Kaeriyama et al., 2011). This could explain the dominance of S. marinoi in winter samples. Other studies of Skeletonema physiology have focused on seasonally separated clonal lineages, such as those of S. marinoi off the coast of Sweden (Saravanan and Godhe, 2010), and significant differences in growth rate and abundance were detected. If significant differences exist among population lineages, it is likely that physiologies differ even more among species of Skeletonema.
Temperature had by far the greatest effect on Skeletonema community composition. We discovered that temperature alone is the sole environmental condition that significantly correlated with diversity. Another study of plankton community composition in a temperate lake found that species richness and diversity were highest in summer, coinciding with higher temperatures (Graham et al., 2004). This phenomenon is apparent not only in small lakes and estuaries but also globally. A study by Thomas et al. (2012) measured growth rates of 130 species of phytoplankton under varying temperatures and found that growth rates of individual species are strongly dependent on optimum temperature niches. Rapid ocean warming could alter species ranges and diversity within a community by challenging these temperature niches, but high genetic diversity within a genus can help prevent loss of ecologically important groups of organisms like Skeletonema.
On a broader scale, marine planktonic organisms occupy a latitudinal gradient that relies heavily on temperature and other environmental variables. Diversity of bacterial communities decreases with increasing latitude and strongly correlates with temperature (Fuhrman et al., 2008). In a modeled global ocean, phytoplankton experienced the same phenomenon of decreasing diversity with increasing latitude that may be dependent on temperature gradients (Barton et al., 2010). Temperature is a possible driver of diversity for marine microbes, where high temperatures result in more diverse communities.
Furthermore, transport from the Gulf Stream into Narragansett Bay has been found to affect bloom behavior of Skeletonema and could allow tropical and sub-tropical species that are more adapted to warm temperatures to enter the Bay and occupy seasonal niches (Borkman and Smayda, 2009). This could provide an opportunity for a species like S. tropicum to enter the bay. At this time it is unclear if S. tropicum is actually excluded from the Narragansett Bay Skeletonema community due to the lack of resolution between LSU sequences of S. grethae and S. tropicum. S. tropicum has been found in marine and brackish environments mostly clustered near low latitudes but extends as far south as the coastal waters of Uruguay in South America (Kooistra et al., 2008). With an optimum growth rate occurring at 25°C (Kaeriyama et al., 2011), S. tropicum could thrive in Narragansett Bay during the warm summer months, but thus far does not appear to do so.
Further clarification of the phylogenetic and physiologic differences between S. grethae and S. tropicum can help distinguish between species and their potential roles in the Narragansett Bay ecosystem.

Composition of winter communities and implications for future changes in community composition
We used the results from simulated communities of known Skeletonema spp.
proportions to calculate absolute cell abundances of those species during the winter.
Based on these calculations, we confirmed that S. marinoi is a dominant winter species and may also dominate other seasons, since simulated communities suggest that S.
marinoi is underrepresented when field samples are sequenced. The discovery of seasonally separated communities of Skeletonema and particularly winter communities that are dominated by a single species is one that could have implications for future changes to phytoplankton communities. Since temperature may partially affect the physiology of Skeletonema, changes in ocean temperature could result in a shift in growth of cryptic species and community composition. S. marinoi is an abundant primary producer during the winter and plays a significant ecological role that can be affected by such changes in the environment, so further investigation of how this particular species responds to environmental change is crucial.

Conclusions
Skeletonema is an ecologically important and cosmopolitan diatom genus that consists of several morphologically cryptic species. In this study, we used molecular tools to investigate how Skeletonema communities vary in Narragansett Bay due to the presence of cryptic species and what environmental factors may influence We conclude the following: 1) In this study, Skeletonema community composition was investigated for This study of Skeletonema community composition addresses the recently discovered cryptic species and how they are influenced by their environment. Future work can expand to include even longer time series and physiological comparisons of species isolated from Narragansett Bay. Our methods and results can also serve as a guide for researchers in other parts of the globe who wish to investigate Skeletonema community composition. Further research into Skeletonema community dynamics over time on both local and global scales will ultimately help researchers understand and project future changes to marine phytoplankton communities.  Table 2. Inter-and intraspecific variation within the targeted 325 bp fragment of the 28S gene in Skeletonema species. A dot indicates that the base pair is identical to the reference sequence and a slash indicates a deletion. Sequence numbers can be cross-referenced with those in Table 4 for the associated accession numbers.   Table 3. Skeletonema genus-specific primers and Illumina adapters used in PCR reactions to target all Skeletonema species in the mixed community and field samples. Adapters for both the forward and reverse primers were added onto the 5' end of the primer. Forward primer 5'-GTTAAAGAGTACCTGAAATTGTTAA-3' Adapter 1 5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3' Adapter 2 5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGT-3' Adapter 3 5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTC-3' Adapter 4 5'-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGA-3' Reverse primer 5'-TGTTACTTTCATTACGCATATCA-3'

Species
Adapter 1 5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3' Adapter 2 5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGC-3' Adapter 3 5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCA-3' Adapter 4 5'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACT-3'  Table 5. Using data from the simulated field communities, transformation coefficients were determined to convert relative percent composition data into more accurate absolute abundance data. A single sample is represented below to demonstrate how coefficients produce more reliable sequence read data. Transformation coefficients were multiplied by the number of sequence reads (before transformation) to obtain a more accurate number of sequence reads (after transformation). The target for expected number of sequence reads per species for this sample was 3665 and percent composition was 20%.  Table 6. Percent composition and standard deviation of species for five filtered field samples. Technical replication of the method was tested at three different steps: DNA extraction, amplification, and sequencing. Three sets of technical triplicates were run for DNA extraction by extracting from triplicate filters of simulated communities, where "Simulated community 1" represents 20% Skeletonema, "Simulated community 2" represents 50% Skeletonema, and "Simulated community 3" represents 80% Skeletonema.    Figure 1. Data processing consisted of three main steps-merging paired end reads, quality filtering, and picking OTUs. Due to the close relationship among Skeletonema species and the possibility for error in downstream sequence processing, we tested various options within these three steps to ensure optimal sequence identification using simulated communities of known Skeletonema species concentrations. Steps highlighted in bold represent the final workflow selected to analyze the full data set. In the first step, we determined that a maximum difference of one bp in the overlapping region was sufficient.  Figure 3. An optimal sequence analysis workflow was determined by testing a variety of parameters using data from a simulated Skeletonema community of known species composition. The first row of the x-axis denotes the number of mismatches allowed in the overlapping sequence, ranging from zero to three mismatches. The second row denotes if percent identity was pre-filtered at 60% (the default value) or 99%. The third row denotes whether or not the merged reads were truncated to 315 base pairs. The fourth row denotes whether truncation up to the first Q30 bp occurred before or after PE reads were joined. The second bar from the left denotes the combination of quality filters used in the final workflow.  Figure 4. Percent composition of Skeletonema spp sequences recovered from three simulated phytoplankton communities generated to test the sensitivity of MiSeq sequencing with low-diversity sequences. The leftmost bar represents expected percent composition for all samples. Triplicate simulated communities of 80%, 50%, and 20% Skeletonema are represented by the remaining bars. Within the Skeletonema portion of the community, five different Skeletonema species were added in equal proportions. The remainder of each community contained other diatoms and dinoflagellates. Each experiment used the same total number of 4 x 10 6 cells.  Figure 5. Simulated field communities of varying Skeletonema concentration produced unexpected results. Each of 5 species was added in equal proportions to make up 1/5 of the total Skeletonema community, but sequencing and downstream processing led to over-or underrepresentation of some species. However, triplicate measurements were precise and representation of species was consistent regardless of concentration of Skeletonema in the simulated community. This suggests that, no matter how abundant Skeletonema cells are in a mixed phytoplankton community, sequencing of environmental DNA can consistently detect present species in at least a relative manner. Error bars represent standard deviation.  A general pattern appears with fewer species occurring in the winter and spring and more species occurring in the summer and autumn, and the community structure transitions rapidly between seasons. S. marinoi appears to be the primary species in the bay during winter and spring.  Figure 7. Field samples were grouped based on Bray-Curtis similarity and categorized by season (winter=blue, spring=green, summer=orange, autumn=red). Samples are represented by boxes on the x-axis and percent similarity is represented by the y-axis. Similarity is measured from nodes joining two samples or two groups of samples, and measurement from the node to the bottom of the dendrogram such that shorter branch lengths represent more similar samples (closer to 100% similarity) and longer branch lengths represent more dissimilar samples (further from 100% similarity). Figure 8. Monthly species richness, defined as the total number of Skeletonema species present in a sample, between December 2008 and December 2013. The cluster of cool colors between January and April indicates low species richness during winter and spring, while the cluster of warm colors between June and September indicate high species richness during summer. Other months tend to demonstrate moderate species richness, which often occurs in the transition from warmer seasons to cooler seasons.

Pick OTUs
White squares indicate a month where no sample was available. Figure 9. Principal components analysis of Narragansett Bay environmental data. Variables included surface temperature (°C), surface salinity (psu), average daily PAR (mmol/m 2 ), DIN (μM), DIP (μM), and Si (μM). Values of surface salinity and DIN were log transformed. Nutrient concentrations mostly contributed to PC1 and surface temperature and PAR contributed to PC2. Along PC1, high-nutrient autumn and winter samples separated from low-nutrient spring and summer samples. Along PC2, lowtemperature winter and spring samples separated from high temperature summer and autumn samples. Surface salinity did not have an effect on seasonality. PC1 and PC2 explained 72.2% of the variance in environmental data. Figure 10. Percent composition data were converted to absolute abundance for five species by multiplying raw sequence reads by a transformation coefficient, which should provide more accurate representation of species. Winter samples (Dec-Feb) were analyzed because they tended to have low concentrations of the two species for which we did not calculate transformation coefficients. The left-hand y axis represents abundance in cells/L, which was obtained for each species by multiplying the transformed percent composition by total Skeleonema abundance. In the bottom figure, the right-hand y axis represents surface chlorophyll-a levels (solid line). S. marinoi dominated winter in both bloom and non-bloom years.