THALASSIOSIROID DIATOM RESPONSES TO SILICON STRESS AND OCEAN ACIDIFICATION

1 Atmospheric CO2 has risen dramatically since the industrial revolution. This 2 rise in atmospheric and oceanic pCO2 has perturbed ocean carbonate chemistry and 3 led to ocean acidification. Diatoms are phytoplankton that account for 40% of oceanic 4 primary production through photosynthetic carbon fixation, which is aided by their 5 carbon concentrating mechanism (CCM). The CCM uses the bicarbonate transporters 6 (BCTs) and carbonic anhydrases (CAs). Our current understanding of how diatoms 7 might respond to ocean acidification is based on experiments using model diatoms or 8 assessing the response of the bulk diatom community, rather than assessing a diversity 9 of diatoms in a complex environment. This dissertation aims to expand our knowledge 10 regarding diatom response to CO2 in ecologically important, non-model diatoms and 11 their response in laboratory experiments and field mesocosms to alterations in CO2 12 concentration. 13 Diatoms’ primary production is a function of their growth, which is 14 constrained by the availability of nutrients in the surface ocean. Silicon is a nutrient 15 that is particularly important for diatoms, as they are unique in their requirement for 16 silicon to build their cell walls. Silicon limitation has been observed in low iron high 17 nutrient low chlorophyll (HNLC) regions and the North Atlantic Ocean, although 18 these studies have focused on the whole diatom community rather than specific diatom 19 groups that may not uniformly experience silicon limitation. Genetic markers have 20 been used to probe species-specific iron status in the field, and similar molecular 21 markers of silicon status could be powerful tools to probe the silicon status of different 22 co-existing diatom species. However, current studies of silicon limitation have relied 23 on model diatoms rather than species that are likely to be found in HNLC regions or 24 the North Atlantic Ocean, limiting the ability to develop appropriate molecular 25 markers. This dissertation aimed to fill in these knowledge gaps using transcriptomic 26 studies of Thalassiosiroid diatom isolate cultures as well as incubations of mixed 27 diatom assemblages. 28 Chapter 1 of this thesis examined the transcriptomes of two closely related 29 Thalassiosira diatoms to better understand the silicon limitation gene expression 30 response of diatoms found in ecosystems where their growth may be constrained by 31 silicon availability, toward the goal of developing appropriate molecular markers of 32 silicon status. This study found a gene family encoding putative ATP-grasp domain 33 proteins that were upregulated in silicon-limited T. oceanica and T. weissflogii. The 34 upregulation of these genes is unique to silicon limitation and its expression is not 35 induced in nitrate or iron limitation conditions. The members of this gene family were 36 also upregulated in response to silicon limitation in previously published T. 37 pseudonana transcriptome studies and homologs were found across a diversity of 38 diatoms, suggesting that it might be useful as a silicon limitation marker in many 39 diatoms, in addition to Thalassiosira spp. diatoms. 40 Chapter 2 of this thesis adds to our knowledge of the diversity of CCM and 41 high CO2 response in different species of the Thalassiosira genus of diatoms. CO2 42 manipulation experiments were conducted with four Thalassiosira species – T. 43 pseudonana, T. rotula, T. weissflogii, and T. oceanica, and transcriptomes were 44 sequenced from the latter 3 species. These species displayed a range of growth rate 45 changes across CO2 conditions, from no change across conditions (T. pseudonana), 46 slower growth in lower CO2 (T. weissflogii), slower growth in higher CO2 (T. 47 oceanica), and faster growth in higher CO2 (T. rotula). The accompanying 48 transcriptome evidence for T. rotula, T. weissflogii, and T. oceanica indicate that these 49 differences in CO2 responses boil down to the ability to dynamically regulate the 50 carbon concentrating mechanism while maintaining internal redox and ion 51 homeostasis, as well as trade-offs between funneling excess CO2 into internal storage 52 pools versus a higher growth rate. 53 Chapter 3 uses a metatranscriptome approach to generalize about how the CCM 54 of Thalassiosira spp. differs from that of Skeletonema spp. diatoms. This study 55 involved CO2 manipulation experiments of a plankton assemblage collected from 56 Massachusetts Bay during March of 2014. We show evidence of distinct enzymatic 57 strategies for N and P uptake and scavenging in these two diatom genera. There is also 58 evidence of distinct strategies in their CCMs – while Thalassiosira spp. diatoms seem 59 to use the α, δ, and ζ carbonic anhydrases and the substrate-specific SLC4 and non60 specific SLC26 bicarbonate transporters, Skeletonema spp. diatoms rely on the γ and ζ 61 carbonic anhydrases and the SLC26 bicarbonate transporters or diffuse CO2 uptake. 62 This suggests that the co-existence of Skeletonema spp. and Thalassiosira spp. might 63 be partially due to a CCM that uses distinct forms of inorganic carbon (CO2 or HCO3) 64 and CA isozymes that use different metal cofactors for their enzymatic activity (γ can 65 substitute iron, δ can substitute cobalt). Furthermore, if Skeletonema does rely on 66 diffusive CO2 uptake rather than active transport of HCO3 using SLC4 or SLC26 67 transporters, Thalassiosira may have a competitive advantage in high CO2 as it 68 downregulates these active transporters. 69


INTRODUCTION Ocean acidification
Fossil fuel combustion has raised atmospheric pCO 2 from ~277 ppm in the year 1750 to the present level of ~400 ppm (Tans 2009;. Increased levels of dissolved CO 2 gas in seawater perturbs carbonate chemistry in the ocean, which results in a decreased buffering capacity, or ocean acidification (OA) . Climate models suggest that pCO 2 levels in the atmosphere may rise to ~1000 ppm by the year 2100 if no measures are taken to reduce emissions (Solomon et al. 2007). Through physical mixing and biological activity, the ocean sequesters approximately 1/3 of anthropogenic CO 2 and has been the only net sink for CO 2 in the past 200 years ).

Diatoms experience spatial and temporal fluctuations in CO 2
Diatoms are photosynthetic single-celled eukaryotes that account for 40% of oceanic primary production and approximately 20% of global primary production (Nelson et al. 1995;Field et al. 1998). Diatoms have experienced fluctuations in pCO 2 on different temporal scales throughout their evolutionary history. Molecular clock estimates suggest that diatoms arose around 240 million years ago (Sims et al. 2006;Medlin 2016). This coincides with the start of a glacial "hot-house" period, where atmospheric pCO 2 levels were as high as 6000 ppm (Kump et al. 2009). In more recent history, diatoms have also experienced low pCO 2 : pre-industrial (before 1850) atmospheric pCO 2 averaged ~270 ppm (Tans 2009). In the present day, diatoms can also experience fluctuations in pCO 2 levels that vary on much shorter time scales.
2 Coastal diatoms might experience dissolved pCO 2 concentrations greater than 850 ppm during seasonal coastal upwelling events that entrain CO 2 -rich deep water into the lit ocean (Feely et al. 2008). Diatoms' distributions in the ocean can also influence the pCO 2 conditions they experience over the course of their life histories. Relative to coastal diatoms, diatoms living in the open-ocean experience more stable pCO 2 conditions . Coastal diatoms may have unique strategies for coping in high CO 2 that are not seen in open-ocean diatoms.

Diatoms & the carbon concentrating mechanism
Diatoms' key role in global carbon cycles is aided by a carbon-concentrating mechanism (CCM), which concentrates CO 2 around RuBisCO, the enzyme that begins the CO 2 fixation process . The diatom CCM uses bicarbonate transporters (BCTs) to take up HCO 3 and carbonic anhydrases (CAs) to interconvert HCO 3 and CO 2 ). The predominant dissolved form of inorganic carbon in the ocean is bicarbonate (HCO 3 -) (Zeebe & Wolf-Gladrow 2001), which is transported into the cell by the BCTs and can be interconverted to CO 2 (which diffuses across membranes without active transport) via the activity of a suite of carbonic anhydrases (Meldrum & Roughton 1933;Stadie & O'Brien 1933). The use of a carbon concentrating mechanism and active transport of inorganic carbon requires a significant energetic investment by the cell . If diatoms can downregulate CCM components when CO 2 levels are high, this may result in the conservation of cellular energy as anthropogenic CO 2 emissions rise.
This energy conservation may confer a competitive advantage to some diatoms and other groups of phytoplankton and could influence community composition in the 3 future. The ability or inability to regulate the CCM in response to fluctuating pCO 2 may be related to how diatoms sense changes in pCO 2 .

Diversification of the diatom CCM
The diatom CCM has been most well-studied in the model diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum and includes the a, β, δ, g, q, and ζ CAs and the SLC4 and SLC26 BCTs (Tachibana et al. 2011;; . Thalassiosira pseudonana and Phaeodactylum tricornutum generally contain the same complement of CCM genes, with a few exceptions: The δ and ζ CA isozymes have not been found in P. tricornutum, while the β CAs have not been found in T. pseudonana (Tachibana et al. 2011;). The θ carbonic anhydrases have been characterized in P. tricornutum, but their expression in T. pseudonana has not been confirmed . The SLC4 transporters have been characterized in P.
tricornutum, but not T. pseudonana (Nakajima et al. 2013). The sub-cellular arrangements of the CCM components (and thus the key points of CCM regulation) also differ in these two species, suggesting that their CCMs work quite differently (Tachibana et al. 2011;). While T.
pseudonana has CAs localized to the periplastid space, the cytosol, and the chloroplast stroma, P. tricornutum does not . In contrast, P. tricornutum has CAs localized to the chloroplast endoplasmic reticulum and the pyrenoid, T.
pseudonana does not (Tachibana et al. 2011). It has been hypothesized that this diversification and re-configuration of the CCM in different diatoms is due to historic fluctuations in atmospheric CO 2 . However, there are few studies 4 examining the CO 2 -sensitivity and regulation of the CCM across a diversity of diatoms. This is particularly true of diatoms living in the open-ocean that may not have the same high CO 2 -coping mechanisms as coastal isolates (like T. pseudonana).

cAMP-mediated CO 2 sensing in diatoms
One aspect of carbon physiology that might influence how diatoms cope in high CO 2 is their ability to sense changes in CO 2 . Sensing of pCO 2 in diatoms is via the activity of cAMP secondary messengers (Harada et al. 2006;Ohno et al. 2012;. Soluble adenylate cyclases generate cAMP secondary messengers from ATP in response to signals received from bicarbonate ). The cAMP secondary messengers activate cAMP-dependent protein kinases that phosphorylate and activate bZIP domain transcription factors that in turn bind to cAMP response elements and regulate downstream gene expression (Ohno et al. 2012). Phosphodiesterases regulate cellular cAMP levels by dephosphorylating cAMP back into AMP . Studies in P. tricornutum and T. pseudonana suggest that the activity of a soluble adenylyl cyclase is increased in the presence of CO 2 /HCO 3 , and that the resulting increase in cellular cAMP levels induces the activity of bZIP transcription factors (Harada et al. 2006;Ohno et al. 2012;. In high CO 2 , the transcription factors bind to upstream CO 2 -cAMP-responsive elements and repress the expression of CCM genes, including the expression of the adenylyl cyclase itself . The pCO 2 sensitivity of the β carbonic anhydrase of P. tricornutum was repressed when cells were treated with a cAMP inhibitor IBMX (Harada et al. 2006) or when cAMP-responsive motifs were deleted upstream of the β carbonic anhydrase gene (Ohno et al. 2012). These studies of 5 cAMP-mediated CO 2 sensing have been conducted in the model diatoms T.
pseudonana  and P. tricornutum (Harada et al. 2006;Ohno et al. 2012). In light of the diversification and re-configuration of the CCM in different diatoms , it is possible that this cAMP-mediated CO 2 -sensing mechanism is not ubiquitous across diatoms.

Silicon limits diatom growth and primary production
In addition to being aided by the CCM, diatoms' primary production is also a function of their growth, which is constrained by the availability of nutrients in the ocean. Many phytoplankton share growth requirements for nutrients such as nitrate, phosphate, and iron, but diatoms are unique among the phytoplankton as they require silicon, which is used to build their ornately silicified cell walls or frustules (Lewin 1957;Lewin 1962;Jørgensen 1955).
It has been suggested that silicon ultimately limits diatom growth throughout the ocean (Dugdale et al. 1995;Yool & Tyrrell 2003) by several different mechanisms. Firstly, there is a discrepancy in the available pools of recycled nitrogen and recycled silicon in the surface ocean. Silicon recycling occurs deeper in the water column than that of other nutrients (C, N, P), leading to a propensity for silicon depletion in the surface ocean (Dugdale et al. 1995;Treguer et al. 1995;Dugdale & Wilkerson 1998). Secondly, diatoms growing in upwelled water can become silicon limited if these waters are enriched in nitrate relative to silicon, which been observed in the North Atlantic Ocean Johnson et al. 2007). Thirdly, silicon may also set the upper limit on diatom primary production in iron limited high nutrient low chlorophyll (HNLC) regions (Dugdale et al. 1995;Dugdale & Wilkerson 1998).
6 This is because diatoms growing low iron conditions (< 5 nM), take up as much as 8 times more silicon than nitrate (Franck et al. 2000). Thus, diatoms in HNLC regions may be driven into silicon limitation due to their increased silicon requirement (Hutchins & Bruland 1998;De La Rocha et al. 2000). It has also been suggested that silicon processing may require iron as a cofactor, which would lead to a biochemically-dependent iron and silicon co-limitation (Mock et al. 2008;Saito et al. 2008). Silicon and trace metal co-limitation of diatoms has been documented in the HNLC waters of the Equatorial Pacific Ocean , the Southern Ocean and sub-Antarctic Leblanc et al. 2005;Franck et al. 2000), and the sub-Arctic Pacific (Koike et al. 2001). As these studies assessed the silicon status of the bulk diatom community, it is currently not known whether these responses are uniform across particular diatom genera, species, or strains. Genetic markers have been used to probe iron status in the field, and similar molecular markers of silicon status could be powerful tools to probe the silicon status of different coexisting diatom species (Chappell et al. 2015).

Silicon uptake and deposition
Silicic acid is a small, uncharged molecule and can freely diffuse across diatom membranes when concentrations are high (Hildebrand et al. 2018), but when silicic acid concentration is low, silicon transporters are used for active transport (Kimberlee Thamatrakoln & Hildebrand 2008). Diatoms' silicified cell wall is formed inside the silicon deposition vesicle (Drum & Pankratz 1964;Reimann et al. 1966). Silicification inside the silicon deposition vesicle is aided by several classes of molecules --the silaffins (Poulsen & Kröger 2004;Kröger et al. 1999), cingulins (Scheffel et al. 2011), 7 silacidins (Wenzl et al. 2008, and long-chain polyamines (Kröger et al. 2000) all play a role in silicon precipitation and frustule formation. Pleuralins and frustulins are tightly associated with the silicified cell wall (Kröger et al. 1997;Kröger et al. 1994), but they are not thought to be directly involved in silicon deposition in the silicon deposition vesicle (Kröger & Wetherbee 2000;van de Poll et al. 1999).
Diatom silicon physiology and the cell cycle are tightly linked. In addition to requiring silicon for their cell walls (Lewin 1957;Lewin 1962;Jørgensen 1955), silicon is also required for diatoms' DNA synthesis to proceed during S phase of the cell cycle (Darley & Volcani 1969). This phase of the cell cycle also sees peak expression of silicon transporter mRNA (but the lowest expression of silicon transporter proteins) (Thamatrakoln & Hildebrand 2007). Diatom frustule valves are formed just after cell division, during G2/M phase, while girdle band synthesis happens in G1 phase (Eppley et al. 1967;Hildebrand et al. 2007). These different processes that require silicon explain why silicon limited T. weissflogii cells sometimes arrest in the G2/M phase or other times in the G1/S phase transition (Brzezinski et al. 1990;Vaulot et al. 1987).

Molecular biology of silicon limitation
Genetic studies have given us a better understanding of the molecular underpinnings of diatom silicon physiology. Silicon limited T. pseudonana upregulates silicon transporter gene copies SIT1 and SIT2 (transcripts and proteins) in silicon limitation (Mock et al. 2008;Shrestha et al. 2012;Du et al. 2014;Smith et al. 2016;) and also has a higher concentration of silacidin proteins relative to silicon replete condition (Richthammer et al. 2011), although transcriptome 8 studies in T. pseudonana did not see differential expression of silacidin genes (Mock et al. 2008;Shrestha et al. 2012). Genes for the biosynthesis of long-chain polyamine precursors (spermidine synthases)    et al. 2017). Given that diatoms also require silicon for DNA synthesis during S phase of the cell cycle (Darley & Volcani 1969), when cingulin expression increases (Shrestha et al. 2012;, it is also feasible that the cingulins play a role in signaling cellular silicon status before DNA synthesis proceeds. Notably, the studies to date have not included diatoms found in HNLC regions, where silicon sets an upper limit on diatom production (Dugdale & Wilkerson 1998;Chappell et al. 2013), or the North Atlantic Ocean where silicon limitation of the bulk diatom community has been observed Johnson et al. 2007), limiting the ability to develop appropriate molecular markers.

Thesis motivation and outline
Diatom primary production is a function of their overall growth rate, which is constrained by the availability of dissolved nutrients in the surface ocean. Many phytoplankton share growth requirements for nutrients such as nitrate, phosphate, and iron, but diatoms are unique among the phytoplankton as they require silicon, which is used to build their ornately silicified cell walls or frustules (Lewin 1957;Lewin 1962;Jørgensen 1955). Silicon growth limitation of the bulk diatom community has been observed in the North Atlantic Ocean Johnson et al. 2007) and silicon and trace metal co-limitation has been documented in the HNLC waters of the Equatorial Pacific Ocean , the Southern Ocean and sub-Antarctic Leblanc et al. 2005;Franck et al. 2000), the sub-Arctic Pacific (Koike et al. 2001), and the equatorial Pacific (Dugdale & Wilkerson 1998).
As these were studies assessing the bulk diatom community, it is unclear whether or not certain diatom species are experiencing silicon limitation while others are not.
Molecular markers have been useful for assessing the iron status of a diatom species, Thalassiosira oceanica, that is prevalent across oceanic gradients (Chappell et al. 2015), and a similar approach could prove useful for determining whether or not T.
oceanica and other co-occuring diatoms experience silicon limitation in a bulk community. However, most studies looking at the molecular and genetic response to silicon limitation have used the model diatoms T. pseudonana (Mock et al. 2008;Shrestha et al. 2012;Du et al. 2014;Smith et al. 2016; or P. tricornutum (Sapriel et al. 2009), rather than diatoms typically found in places where silicon is thought to limit diatom primary production (Dugdale & Wilkerson 1998;Johnson et al. 2007), hindering our ability to develop appropriate molecular markers.
Chapter 1 of this thesis examined the transcriptomes of two closely related Thalassiosira diatoms to better understand the silicon limitation gene expression response of diatoms found in ecosystems where their growth may be constrained by silicon availability, toward the goal of developing appropriate molecular markers of silicon status. Thalassiosira oceanica CCMP1005 has a high tolerance for iron limitation (Sunda & Huntsman 1995) and may persist in chronically iron limited HNLC regions (Chappell et al. 2013) where diatoms' silicon requirements are high (Hutchins & Bruland 1998;De La Rocha et al. 2000). Thalassiosira weissflogii CCMP1010 was isolated from the North Atlantic Ocean, an ocean basin where silicondepleted upwelling has been observed Johnson et al. 2007). We selected this genus as it allows us to make comparisons to studies in T. pseudonana which has a completely sequenced genome ) and its silicon metabolism and molecular biology is very well studied (Poulsen & Kröger 2004;Thamatrakoln & Hildebrand 2007;Mock et al. 2008;Wenzl et al. 2008;Scheffel et al. 2011;Shrestha et al. 2012;Du et al. 2014;Smith et al. 2016;).
While T. weissflogii upregulated the silicon transporters in low silicon, T. oceanica did not. We hypothesize that this is due to the lack of a SIT3-like silicon sensor in T.
oceanica. We also generally found very different silicon limitation responses in these two diatoms. We hypothesize that this is because silicon is required for cell wall synthesis (Lewin 1957;Lewin 1962;Jørgensen 1955) and DNA replication (Darley & Volcani 1969) in diatoms and that these two processes are being differentially affected in silicon limited T. oceanica and T. weissflogii. However, we did find a family of ATP-grasp domain proteins of unknown function that were upregulated in siliconlimited T. oceanica, T. weissflogii, and T. pseudonana (Smith et al. 2016;). The upregulation of these genes is unique to silicon limitation and their expression is not induced in nitrate or iron limitation conditions. It is also found across a diversity of diatoms in the MMETSP database , suggesting that they might be useful as a silicon limitation markers in many diatoms, in addition to Thalassioisra spp. diatoms.
Increased fossil fuel consumption since the industrial revolution has increased CO 2 levels in the atmosphere and in the ocean (Tans 2009;. The increased CO 2 in the ocean has perturbed carbonate chemistry and led to ocean acidification . Diatoms have experienced large fluctuations in CO 2 conditions on evolutionary time scales (Sims et al. 2006;Kump et al. 2009;Tans 2009;Medlin 2016), and it has been suggested that these historical fluctuations in CO 2 have led to a diversification and reconfiguration of the diatom CCM . Most studies of the diatom CCM have used the model diatoms T.
pseudonana and P. tricornutum, and they suggest that the CCM of these diatoms works very differently (Tachibana et al. 2011; (Feely et al. 2008;. These species displayed a range of growth rate changes across CO 2 conditions (<230 ppm CO 2 to >2900 ppm CO 2 ), from no change across conditions (T. pseudonana), slower growth in lower CO 2 (T.
weissflogii) , slower growth in higher CO 2 (T. oceanica), and faster growth in higher CO 2 (T. rotula)   . We show evidence of distinct enzymatic strategies for N and P uptake and scavenging in these two diatom genera. There is also evidence of distinct strategies in their CCMs -while Thalassiosira diatoms seem to use the α, δ, and ζ carbonic anhydrases and the substrate-specific SLC4 and non-specific SLC26 bicarbonate transporters, Skeletonema relies on the γ and ζ carbonic anhydrases and the SLC26 bicarbonate transporters or diffuse CO 2 uptake (Nakajima et al. 2013;. This suggests that the co-existence of Skeletonema spp. and were also upregulated in previously published T. pseudonana transcriptomes and homologs were found across a diversity of diatoms, suggesting that they might be useful as a silicon limitation marker in many diatoms, in addition to Thalassioisra spp. diatoms.

INTRODUCTION
Diatoms are photosynthetic single-celled eukaryotes that account for 40% of oceanic primary production and approximately 20% of global primary production (Nelson et al. 1995;Field et al. 1998). The rate and extent of diatoms' primary production is a function of their growth, which is constrained by the availability of nutrients in the ocean. Many phytoplankton share growth requirements for nutrients such as nitrate, phosphate, and iron, but diatoms are unique among the phytoplankton as they have an absolute growth requirement for silicon, which is used to build their ornately silicified cell walls or frustules (Lewin 1957;Lewin 1962;Jørgensen 1955 & Bruland 1998;De La Rocha et al. 2000). It has also been suggested that silicon processing may require iron as a cofactor, which would lead to a biochemicallydependent iron and silicon co-limitation (Mock et al. 2008;Saito et al. 2008). Silicon and trace metal co-limitation of diatoms has been documented in the HNLC waters of the Equatorial Pacific Ocean , the Southern Ocean and sub-Antarctic Leblanc et al. 2005;Franck et al. 2000), and the sub-Arctic Pacific (Koike et al. 2001). As these studies assessed the silicon status of the bulk diatom community, it is currently not known if different diatom species experience varying degrees of silicon stress.
Results from lab experiments show that silicon metabolism and physiology vary widely across diatom species. There is a wide range in diatoms' silicon quotas, which vary from less than 0.1 to greater than 200 pmol silicon per cell (Martin-Jézéquel et al. 2000). Diatoms can also vary greatly in their maximal silicon uptake rate, even amongst diatoms in the same genus. For example, the maximal uptake rate of Thalassiosira weissflogii (CCMP1336) is approximately 10 fold higher than for in silicon limitation, suggesting that distinct spermidine synthases may be optimized for LCPA synthesis in either low or high silicon concentrations or that LCPAs also play a larger role in diatom cells aside from silicon deposition Shrestha et al. 2012;Mock et al. 2008).
Other general trends regarding diatom silicon metabolism emerge from genetic studies in T. pseudonana and P. tricornutum. Firstly, these studies note that many of the silicon-sensitive transcripts are unique to diatoms or the particular diatom species tested (Mock et al. 2008;Sapriel et al. 2009;Shrestha et al. 2012). Genes involved in intracellular signaling or post-transcriptional regulation (e.g. kinases) are both up (Smith et al. 2016;Mock et al. 2008;Shrestha et al. 2012) and downregulated (Mock et al. 2008;Sapriel et al. 2009;Shrestha et al. 2012) in silicon limitation.
To date, the genetic response of diatoms to silicon limitation has not been followed in diatoms isolated from regions of the ocean where silicon sets an upper limit on diatom production, such as iron-limited, high-nutrient, low chlorophyll  (Hildebrand et al. 1997;Hildebrand et al. 1998;Durkin et al. 2016).
Although their lengths and overall structures vary across species, the biosynthesis of LCPAs involves putrescine, spermine, and spermidine precursors, and the genes for the synthesis of these precursors can be identified by homology searches of protein similarity (Kröger et al. 2000;. As global gene-based studies of silicon metabolism across diatom species have been restricted to a few species, the database of silicon-responsive genes across species is limited, particularly from diatoms found in silicon limited environments. In order to better understand the responses of diatoms found in ecosystems where their growth could be constrained by silica availability and to develop genetic markers that could be used in the environment to profile silicon sufficiency, we selected two related diatom species in the Thalassiosira genus that may be found in areas of the ocean where silicon may limit diatom growth and primary production. Thalassiosira oceanica CCMP1005 has a high tolerance for iron limitation and may persist in chronically iron limited HNLC regions where silicon requirements are high (Sunda & Huntsman 1995;Franck et al. 2000;Chappell et al. 2015). Thalassiosira weissflogii CCMP1010 was isolated from the North Atlantic Ocean, an ocean basin where silicondepleted upwelling has been observed Johnson et al. 2007). We selected this genus as it allows us to make comparisons to studies in T. pseudonana which has a completely sequenced genome  Homology-independent methods (FIMO, BioStrings, SignalP) were used to identify putative silaffins, cingulins, silacidins, and frustulins from Thalassiosira oceanica CCMP1005 and Thalassiosira weissflogii CCMP1010 and their potential utility as molecular markers of silicon stress is discussed (Grant et al. 2011;Pagès et al. 2016;Petersen et al. 2011).

Culturing, RNA extraction, and Sequencing
Cultures of Thalassiosira oceanica CCMP1005 and Thalassiosira weissflogii CCMP1010 were obtained from the National Center for Marine Algae and Microbiota (NCMA; West Boothbay Harbor, ME, USA). Culturing and sequencing was completed in a range of conditions in order to get a thorough representation of each diatoms' transcript pool for the de novo assemblies. For nutrient replete, silicon limited, iron limited, and nitrate limited culturing, T. weissflogii and T. oceanica were grown in triplicate in a modified F/2 media  35 containing 400 nM iron or 4 nM iron for the iron limited condition. A final concentration of 10 µM silicon was used for the silicon limited condition and no nitrate was added to the media for the nitrate limited condition. Macronutrients and vitamin solutions were treated with prepared Chelex-100 resin (Bio-Rad Laboratories Inc., Hercules, CA, USA) and syringe-filtration sterilized before being added to microwave sterilized Sargasso Sea water (Price et al. 1989). For inclusion in the de novo assembly, additional sequencing was performed using RNA from diatoms grown in F/2 medium at four CO 2 levels ranging from ~100 ppm to ~4000 ppm using autoclaved Woods Hole sea water amended with nutrients, vitamins, and trace metal mix that had not been Chelex-treated . For the replete and silicon limited diatom culturing, biomass was collected after 3 days of growth, when the cells in the silicon limited condition began to grow more slowly relative to the silicon replete condition. Experiments were also conducted using the diatom Thalassiosira rotula (CCMP3096) to obtain a draft transcriptome for inclusion in Silix homology networks and comparison to T. oceanica and T.

Trimming, Assembly, and Read Mapping
Initial read quality assessments were performed in FastQC ).
Raw reads were subsequently imported into CLC Workbench (CLC Bio-Qiagen, Aarhus, Denmark) with an insert size ranging from 1-400 nucleotides. Reads were trimmed in CLC Workbench with the following parameters: removal of 13 bases off the 5' end, .001 quality cut off, and no ambiguous nucleotide base calls.
Assemblies of each library were conducted in CLC Workbench, using a set of k-mers ranging from 1/2 read length to full length read after trimming. The assemblies were completed with no scaffolding, automatic bubble size selection, minimum contig length of 200, and with the auto detect paired distance options selected. To remove redundant contigs that were assembled more than once (e.g., present in multiple libraries or assemblies), the resulting assemblies were pooled using CD-HIT-EST using the following flags: c .99 -G .99 -n 10 -d 1000 -aS .75 -g 1 -p 1 -T 0 (Li & Godzik 2006;Fu et al. 2012).
ORF prediction was performed on the nucleotide sequences as follows: nucleotide sequences were run in BlastX against a custom database containing the predicted proteins (filtered set of best models per locus) from available diatom genomes from JGI (Thalassiosira pseudonana Thaps3 and Thaps3_bd, Fragiliariopsis cylindrus v1.0, Phaeodactylum tricornutum Phatr2 and Phatr2_bd, and Pseudonitzschia multiseries v1) and transcriptomes available through MMETSP as of March 2013. The assemblies and BlastX output were uploaded to ORFpredictor to determine coding (CDS) and protein sequences (Min et al. 2005).

38
Additional clustering of protein sequences was performed to remove redundant amino acid sequences that were assembled multiple times but did not cluster at the nucleotide sequence level. Proteins were clustered in CD-HIT using the following flags: -c .75 -G 0 -n 5 -d 1000 -aS .75 -g 1 -p 1 -T 0. CDS sequences corresponding to the clustered protein sequences that were greater than 150 amino acids were used for subsequent read mapping. Trimming and assembly for T. rotula were completed as described for T. oceanica and T. weissflogii, although the iron limited and nutrient replete reads were included as guide-reads only, as there was contaminating ribosomal RNA sequence in these libraries. Sequences with less than 1 RPKM (T. oceanica) or 2 RPKM (T. weissflogii and T. rotula) across all conditions (i.e., consistently very lowly expressed) were removed from the assembly as potential contaminants (e.g. organelle transcripts).

Annotation and Gene Ontologies
The translated protein sequences from each diatom transcriptome assembly were uploaded to the KEGG automatic annotation server ). The BLAST search program was used to query the "Genes" representative data set, plus the two model diatoms available in the KEGG database (T. pseudonana and P. tricornutum). An additional annotation step was performed in InterProScan (version 5.7-48.0 for T. oceanica and 5.13-52.0 for T. weissflogii) (Jones et al. 2014).

Ortholog Finding with Silix
Silix networks were used to infer orthologous transcripts across diatom species (Miele et al. 2011). This is a blast-based homology network construction approach that groups similar sequences into gene families. Two sequences are included in the same family if the high-scoring segment pair length covers at least 80% of the query protein length and if their similarity is over 35% identity. The input for this analysis was the T. rotula draft transcriptome assembly, the T. pseudonana filtered protein models from JGI (Thaps3 and Thaps3_bd), and the assembled T. oceanica and T. weissflogii transcriptomes.

Differential Expression Analysis and Hierarchical Clustering
Differential gene expression was inferred with Analysis of Sequence Counts (ASC), using the total number of reads mapped to each transcript. Transcripts with at least a 95% posterior probability of a 2-fold change were considered differentially expressed ).
To identify genes that respond specifically to silicon status rather than as part of a general stress response, hierarchical clustering was performed on the genes differentially expressed in silicon limitation. The gene expression values (as RPKM) in replete, silicon limited, nitrate limited, and iron limited conditions were the input to this analysis. Expression values were normalized by adding an offset value of 1 to all RPKM values and the log2 fold change for silicon limited, nitrate limited, and iron limited conditions was calculated relative to the nutrient replete condition.

40
The complete Euclidian distance was calculated using the dist function and hierarchical clustering was performed using hclust, both part of the stats library included in R (R Development Core Team 2013). The resulting dendograms were cut at a height of 6 (T. oceanica upregulated) or 4 (T. oceanica downregulated, T. weissflogii up and downregulated). The heights used were selected based on the ability to distinguish clusters of genes regulated by silicon from those that may be part of a general stress response (i.e., clusters of genes upregulated or downregulated in all nutrient limitation scenarios examined).
Genes were grouped according to their cluster assignment and the per-cluster median and interquartile ranges of expression values were tabulated and plotted.
Clusters of genes were considered part of a general stress response if the per-cluster interquartile range of expression values was above (or below, in the case of genes downregulated in silicon limitation) the log2 fold change 0 threshold in iron or nitrate limitation (i.e., genes upregulated in silicon limitation that are also upregulated in iron and/or nitrate limitation were considered part of a general stress response; e.g., T.
oceanica up in silicon limitation cluster #7) (Supplemental Figure 1, Supplemental Table 3). Gene clusters with non-overlapping interquartile ranges were also considered part of a silicon-specific expression response (i.e., clusters of genes that are responsive to iron and/or nitrate are still considered part of a silicon-specific response as long as their fold-change values in silicon limitation are considerably larger than the fold-change in iron and/or nitrate limitation; e.g., T. weissflogii up in silicon limitation cluster #3) (Supplemental Figure 1, Supplemental Table 3). Clusters are also considered part of a silicon-specific expression response even if their fold change is large in iron or nitrate limitation, as long as the direction of regulation in these conditions is in the opposite direction than what is seen in silicon limitation (e.g., T. oceanica up in silicon limitation cluster #6) (Supplemental Figure 1, Supplemental Table 3).

Silicon Transporter Phylogeny
Representative diatom silicon transporter (

Silaffins, cingulins, silacidins, frustulins, pleuralins, and polyamines
Homologs of other silicon processing proteins are bioinformatically difficult to predict in new species. This is because the silaffins ( KEGG annotations were searched for genes involved in the synthesis of polyamines, including those involved in polyamine metabolism as well as arginine, ornithine, and glutamate metabolism that funnel into polyamine metabolism.

Transcriptome Assembly and Annotation
In total, 36382 transcripts were assembled from T. oceanica and 24900 transcripts from T. weissflogii (Table 1). When compared to the previously sequenced T. oceanica, the number of T. oceanica transcripts assembled is less than the total number of predicted genes (37921) but more than the number of genes (29306) in the non-redundant predicted set of genes (Lommer et al. 2012

Differential Expression Analysis and Hierarchical Clustering
Differential gene expression between the silicon limited and silicon replete condition was inferred with Analysis of Sequence Counts ). Genes with a 95% posterior probability of at least a 2-fold change in silicon limitation relative to the nutrient replete condition were considered differentially expressed ).
To distinguish a silicon-specific stress response from a more general nutrient stress response, the subset of transcripts differentially expressed in silicon limitation were hierarchically clustered. The clustering input was the per-transcript log2-fold change expression values in nitrate limited, iron limited, and silicon limited conditions relative to the nutrient replete condition. Differentially expressed genes in T. oceanica were binned into 18 clusters -7 upregulated (containing 947 genes) and 11 downregulated (containing 809 genes) (Supplemental Figure 1, Supplemental Table 3). Of these T. oceanica clusters, 4 (containing 823 genes) were part of a silicon-specific upregulation and 4 (containing 539 genes) were part of a siliconspecific downregulation ( Figure 1). The differentially expressed genes in T. weissflogii were binned into 47 clusters -23 upregulated (containing 2814 genes) and 24 downregulated (containing 1467 genes) (Supplemental Figure 1, Supplemental Table   3). Of these T. weissflogii clusters, 6 (containing 557 genes) were part of a silicon- genes is a feature of diatoms with higher iron requirements rather than diatoms that are tolerant of low iron. 46

Sequence similarity to known SITs from Thalassiosira pseudonana and
Fragilariopsis cylindrus was used identify SIT sequences from T. oceanica, T.
weissflogii, and T. rotula. All of the SITs from these four Thalassiosira diatoms (T. number than the 2 obtained in a previous study (Alverson 2007) that relied on amplification methods using degenerate primers to amplify SIT gene fragments . We also recovered only a single SIT from T. rotula CCMP3096. It is possible that this diatom has additional SIT copies not captured in this assembly, as there was no silicon limited condition included in the T. rotula transcriptome to potentially induce their expression ( Figure 2).
Sequence alignment and maximum likelihood tree construction indicates that three of the SITs in T. weissflogii appear to be paralogous sequences derived from a 47 single ancestral SIT copy that forms a clade with T. pseudonana SIT1 and SIT2 ( Figure 2). Two of these sequences group with previously identified T. weissflogii SIT fragments (Alverson 2007), as illustrated by their grouping together on the tree with bootstrap support values of 100 ( Figure 2). The third T. weissflogii SIT recovered from the transcriptome also clusters with one of the previously identified SIT fragments (Alverson 2007), although with a lower support value of 89 ( Figure 2). The fourth T.
weissflogii SIT forms a clade with T. pseudonana SIT3 ( Figure 2). This is in contrast with previous studies indicating that T. weissflogii CCMP1010 didn't have a SIT3-like gene copy (Alverson 2007). Perhaps the degenerate primers used in the Alverson 2007 study could not distinguish between or amplify some of the SITs from T. weissflogii.
One of the T. weissflogii SITs (T. weissflogii SIT3, K32_12335) is one of the most highly silicon responsive genes in the dataset and is ~15 fold upregulated (log2- . The lack of a strong silicon limitation response in the T. oceanica SIT might be due to the fact that it lacks a SIT3-like gene copy that has been hypothesized to act as a silicon sensor in T. pseudonana (Thamatrakoln & Hildebrand 2007). It is also possible that the single T. oceanica SIT both senses silicon status as well as functioning as a transporter.
In addition to SIT mediated silicon transport, it has been hypothesized that T. pseudonana also contains a silicon efflux transporter. This is because silicon- pseudonana also contains a single gene from each T. oceanica and T. weissflogii (Supplemental Table 4), but they are not differentially expressed in silicon limitation (Supplemental Table 2), suggesting that this gene is not sensitive to silicon status in T.  et al. 1996). We queried the T. oceanica and T. weissflogii transcriptomes for candidate genes involved with silicon processing using a variety of homologyindependent methods.
The search for silaffins and cingulins involved a FIMO search (Grant et al. 2011) for the pentalysine cluster KXXKXXKXX-KXXK (Poulsen et al. 2013).
Proteins lacking an N-terminal ER signal peptide in SignalP were filtered out (Petersen et al. 2011;Poulsen et al. 2013), as these proteins would not be localized to the ER and silicon deposition vesicle. This approach yielded 138 proteins in T.
None of the copies in T. oceanica were differentially expressed in silicon limitation, nor do they have an N terminal ER signal peptide (Figure 3, Supplemental Table 8). In   (Richthammer et al. 2011). This suggests that the transcription of silacidin genes is not a key point of regulation for silicon processing in T. pseudonana or T. oceanica but may be post-transcriptionally or post-translationally regulated.
To identify putative frustulins in T. oceanica and T. weissflogii, a FIMO search (Grant et al. 2011) for the hexapeptide motifs CEGDCD and VPGCSG that are diagnostic of the acidic cysteine rich domains (ACR) of frustulins (Kröger et al. 1994;Kröger et al. 1996)    .

Genes involved in long-chain polyamine synthesis are not differentially regulated in T. oceanica or T. weissflogii
Long-chain polyamines (LCPAs) are additional molecules that play a role in silicon deposition in diatoms (Kröger et al. 2000). Proline, glutamate, and arginine fuel polyamine biosynthesis via their conversion to ornithine, which is converted into putrescine, spermine, and spermidine -the polyamine precursor molecules ). The enzymes spermine and spermidine synthase catalyze the conversion of putrescine to spermine and spermidine using S-adenosyl-L-methionine and S-adenosyl-metioninamine as a substrate, which is derived from methionine metabolism .
Genes involved in funneling proline, arginine, glutamate, ornithine, and methionine toward polyamine synthesis were of interest as potential markers of silicon stress in diatoms and their presence in the transcriptomes was determined by their KEGG annotations. However, we found little evidence of silicon-specific changes in 53 polyamine metabolism, with the exception of T. oceanica upregulating a proline iminopeptidase (K01259) that cleaves proline residues from peptides and T.
weissflogii upregulating an arginase (K01476), which converts arginine to ornithine ( Figure 4). This is in contrast with previous studies in T. pseudonana, where spermidine and spermine synthases were up and downregulated during silicon limitation (Shrestha et al. 2012;Mock et al. 2008). This suggests that while the synthesis of LCPA precursors may be a key point of regulation in silicon processing for T. pseudonana, that may not be the case in T. oceanica or T. weissflogii.

Identifying novel markers of silicon stress
Molecular markers of silicon stress can include novel genes or genes lacking an annotated function in a well described metabolic pathway. To determine whether or not any of the unannotated genes may be good markers of silicon stress in diatoms, we used a network-based approach to identify homologous genes that were upregulated specifically in response to silicon in both T. oceanica and T. weissflogii. Silix networks were computed using the T. rotula draft transcriptome assembly, the T.
pseudonana filtered protein models from JGI (Thaps3 and Thaps3_bd), and the assembled T. oceanica and T. weissflogii transcriptomes. (Miele et al. 2011;). Homologues were identified if they were in the same Silix gene family (e.g., were 35% identical over at least 80% query coverage per HSP). As an example, the silicon transporters cluster together in a single Silix family (FAM000927, Supplemental Table 4).

54
The Silix gene network consists of 104,848 genes grouped into 77,386 gene families (Supplemental Table 4, Supplemental Table 5). In the Silix networks, l32,857 genes (grouped into 9076 gene families) are present in 2 or more diatom species.
However, the majority of genes (71,991) are grouped into 68,310 species-specific gene families that contain representatives from 1 diatom species only (Supplemental Table 4, Supplemental Table 5). This pattern (many genes residing in species-specific gene families) also holds true for the silicon-specific differentially expressed genes. In T. oceanica, 621 of the up and 270 of downregulated genes are in in species-specific gene families (Supplemental Table 6). In T. weissflogii, 381 of the up and 613 of downregulated genes are also in species-specific gene families (Supplemental Table   6). To target potential markers of silicon stress, networks were subsequently filtered to include only Silix families with at least one silicon-specifically upregulated gene from both T. oceanica and T. weissflogii.
The search for silicon specific, upregulated homologous genes in T. oceanica and T. weissflogii revealed 4 gene families containing candidate silicon stress markers.
The most promising of these is a family of ATP-grasp domain proteins (FAM001061).
Silix gene families were also used to recover T. oceanica and T. weissflogii homologs to the T. pseudonana putative silicon efflux protein (Shrestha et al. 2012) and also highlighted the presence of an interesting gene expansion in T. oceanica.

ATP-grasp proteins are silicon sensitive and silicon specific
The most promising marker of silicon stress from this study are members of a gene family of ATP-grasp domain proteins, which contains 3 silicon-specific upregulated genes -2 from T. weissflogii and 1 from T. oceanica (EJK54127 from the genome Lommer et al. 2012). This Silix family also contains 2 genes that are upregulated in silicon limited T. pseudonana Smith et al. 2016).
Another study saw that T. pseudonana slightly upregulated 1 of these 2 genes ~7 hours after silicon resupply, but its expression values return to baseline levels at the 9 hour mark (Shrestha et al. 2012).
To determine whether or not these ATP-grasp proteins are found in other diatoms, the MMETSP database was queried for sequences similar (35% identity and 80% query coverage per HSP) to the 3 silicon-sensitive ATP grasp proteins in the T.

Identifying metabolic pathways with silicon sensitivity
In addition to the previous gene family based approach, an additional strategy for finding potential molecular markers of silicon stress is to look for common, silicon-sensitive pathways in both T. oceanica and T. weissflogii. To find siliconsensitive metabolic pathways in both T. oceanica and T. weissflogii, genes upregulated in silicon stress and not part of a general stress response were binned into broad KEGG modules or silicon metabolism gene categories ( Figure 5). There are several pathways that are upregulated in silicon limited T. oceanica and T. weissflogii, including "Cellular processes and signaling", "Membrane transport", and 57 "Xenobiotics biodegradation and metabolism" ( Figure 5). Although similar pathways are silicon-sensitive in both diatoms, the direction of expression (e.g. up or downregulated in silicon limitation) is often different in each diatom ( Figure 5).
Although there are not specific gene homologs in the "Cellular processes and signaling", "Membrane transport", and "Xenobiotics biodegradation and metabolism" categories (e.g., genes annotated with the same KEGG orthology assignment) that are differentially expressed in both diatoms, these categories contain several phosphatases, peptidases, proteases, and kinases (Supplemental Table 1, Supplemental Table 2).
Perhaps during silicon limitation, some degree of post-translational regulation of protein activity is occuring via degradation by peptidases/proteases or phosphorylation/de-phosphorylation by kinases/phosphatases. Genes involved in modifying proteins could be related to the silacidin and silaffin proteins being phosphorylated (Wenzl et al. 2008;Krog̈er et al. 2002). It could also be related to the fact that protein degradation is an important mediator of transitioning through distinct phases of the cell cycle (Glotzer et al. 1991), a process that is tightly associated with 58 silicon uptake and processing (Brzezinski et al. 1990;Vaulot et al. 1987;Thamatrakoln & Hildebrand 2007).
The contradictory trends in silicon-sensitive gene expression between T.
oceanica and T. weissflogii may be related to previous studies indicating that silicon processing is tied to DNA replication, cell division, and the cell cycle (Eppley et al. 1967;Darley & Volcani 1969;Vaulot et al. 1987;Brzezinski et al. 1990;Thamatrakoln & Hildebrand 2007;Hildebrand et al. 2007 (Eppley et al. 1967;Hildebrand et al. 2007). In addition to cell wall synthesis, diatoms also require silicon for DNA replication to proceed during the S phase of the cell cycle (Darley & Volcani 1969). These silicon requirements explain why silicon limited T.
weissflogii cells sometimes arrest in the G2/M phase or other times in the G1/S phase transition (Brzezinski et al. 1990;Vaulot et al. 1987).
As we were largely focused on identifying markers that would be applicable in asynchronous field populations, we focused on the onset of silicon limitation and made no attempt to synchronize cell cycle shifts in any of the cultures during these experiments, so we cannot speak definitively to specific aspects of the cell cycle that The different biological processes that happen during distinct phases of the cell cycle might explain the contradictory trends seen in silicon limited T. oceanica and T.
weissflogii. Silicon limited T. oceanica upregulates genes in "Replication and repair", suggesting that it may be awaiting sufficient silicon resupply for DNA synthesis to proceed ( Figure 5). Silicon limited T. weissflogii upregulates genes in "Translation", which may be indicative of the biosynthesis of proteins required for silicon processing (e.g., SITs) ( Figure 5).

Sel1/MYND zinc finger domain gene expansion in T. oceanica
One gene family (FAM006551) contained a highly responsive gene (~8 fold upregulation in silicon stress) and 7 other silicon-sensitive genes in T. oceanica (27088) (Supplemental Table 3, Supplemental rhizaria, and stramenopile lineages. In most species (including the diatoms), these genes are not highly expanded (e.g., less than 20 copies) (Supplemental Figure 5).
However, in addition to being highly expanded in T. oceanica, homologs to this gene family are also highly expanded in Cyclotella and Skeletonema diatoms (Supplemental In this context, the high copy number of MYND/Sel1-like genes in Skeletonema diatoms is particularly interesting, as these diatoms are typically found in coastal environments where iron is not thought to limit diatom growth (Kooistra et al.

CONCLUSIONS
This aim of this study was to find molecular markers of silicon stress in two ecologically relevant diatoms, Thalassiosira oceanica and Thalassiosira weissflogii.
Ideal silicon stress gene markers should be upregulated in silicon limitation and not other nutrient stressors and have homologs found across a range of diatoms. This study revealed a family of ATP-grasp domain proteins that were upregulated in both T. oceanica and T. weissflogii uniquely in the silicon limited condition. Previous studies in T. pseudonana also saw an upregulation of this gene family in silicon limitation (Smith et al. 2016;). Homologs of this gene were widespread through other diatoms, but their silicon sensitivity across these different diatoms has yet to be tested. Homology-independent methods were used to identify putative silaffins, cingulins, silacidins, and frustulin sequences (Figure 3, Supplemental Table 7,   Supplemental Table 8, Supplemental Table 9), but it is likely that most of these are not truly functioning in silicon processing because most of the genes identified are not silicon sensitive (Figure 3) and the number of genes recovered are much higher than what has been found for T. pseudonana (Supplemental Table 7, Supplemental Table 8,   Supplemental Table 9) (Poulsen & Kröger 2004;Scheffel et al. 2011;Wenzl et al. 2008;). The candidates we did identify as being silicon-sensitive with N terminal ER targeting sequences could be good candidates to test for function using diatom gene editing methods. These methods have been used to characterize the urease gene in T. pseudonana (Hopes et al. 2016).
Another approach to finding silicon stress markers was to look for similar metabolic pathways that are silicon sensitive in T. oceanica and T. weissflogii.
Although similar KEGG pathways are silicon-sensitive in the two diatoms, the direction of expression is sometimes in opposition in each diatom (e.g., upregulated in one silicon limited diatom and down in the other) ( Figure 5). This trend is most clear in the "Transcription", "Translation", and "Replication and repair" categories ( Figure   5). These different manifestations of silicon limitation may be related to the fact that silicon is required for both DNA synthesis and cell wall synthesis (Darley & Volcani 1969;Eppley et al. 1967;Hildebrand et al. 2007), so it is possible that silicon limitation is disproportionately influencing these two processes in the diatoms tested.
In this context, the argument for using the ATP-grasp domain proteins as molecular markers of silicon status is quite strong, as both manifestations of silicon limitation saw this gene upregulated.       Figure 5. Heat map of the silicon specific, differentially expressed genes from T. oceanica and T. weissflogii and their KEGG orthology or silicon metabolism gene assignment categories. Colors in the heat map indicate the relative proportion of silicon specific genes in each category. Some categories that would not apply to diatoms were excluded from this analysis (e.g., Cardiovascular diseases).   (Feely et al. 2008;. These species displayed a range of growth rate changes across CO 2 conditions (<230 ppm CO 2 to >2900 ppm CO 2 ), from no change across conditions (T. pseudonana), slower growth in lower CO 2 (T.

T . o c e a n i c a T . w e i s s f l o g i i
weissflogii) , slower growth in higher CO 2 (T. oceanica), and faster growth in higher CO 2 (T. rotula) . The accompanying transcriptome evidence for T. rotula, T. weissflogii, and T. oceanica indicate that these differences in CO 2 responses boil down to the ability to dynamically regulate the carbon concentrating mechanism

INTRODUCTION
Anthropogenic activity (fossil fuel emissions and land use changes) has led to an increase in atmospheric pCO 2 concentration from ~277 ppm in 1750 to over 400 ppm currently (Tans 2009;Scripps 2017). Atmospheric CO 2 equilibrates with the ocean, which sequesters ~20% of anthropogenic CO 2 through physical processes and biological activity and has been the only net sink for CO 2 in the past 200 years . Phytoplankton are organisms that live in the surface ocean and convert CO 2 into organic biomass via photosynthesis and serve as an important biological source of CO 2 uptake.
Phytoplankton cells that sink out of the lit ocean can export the CO 2 formerly in the atmosphere, to the deep ocean.
As global CO 2 levels rise, the increased dissolved CO 2 in the ocean disturbs carbonate chemistry and leads to a decrease in pH and buffering capacity, or ocean acidification ). Ocean acidification is thought to have negative consequences for some phytoplankton groups, like those that have calcareous skeletons comprised of calcium carbonate (e.g., Riebesell et al. 2000).
The impact of ocean acidification on other groups of non-calcified phytoplankton, like diatoms that have silicified skeletons, is less clear (e.g., . Diatoms are a phytoplankton group that perform roughly 40% of oceanic primary production via photosynthetic CO 2 fixation (Nelson et al. 1995). Diatoms have experienced fluctuations in pCO 2 on different temporal scales throughout their 85 evolutionary history. Molecular clock estimates suggest that diatoms arose around 240 million years ago (Sims et al. 2006;Medlin 2016). This coincides with the start of a glacial "hot-house" period, where atmospheric pCO 2 levels were as high as 6000 ppm (Kump et al. 2009). In more recent history, diatoms have also experienced low pCO 2 : pre-industrial (before 1850) atmospheric pCO 2 averaged ~270 ppm (Tans 2009).
Diatoms can also experience fluctuations in pCO 2 levels that vary on much shorter time scales. Coastal diatoms might experience dissolved pCO 2 concentrations greater than 850 ppm during seasonal coastal upwelling events (Feely et al. 2008 Climate models suggest that pCO 2 levels in the atmosphere may rise to ~1000 ppm by the year 2100 if no measures are taken to reduce emissions (Solomon et al. 2007). In light of the sensitivity of calcified phytoplankton to ocean acidification, diatoms may have an even larger role to play in the marine food web if no measures are taken to reduce CO 2 emissions. Within the diatoms as well, there will likely be species that are ecological "winners" in adapting to elevated pCO 2 . Physiological studies examining how diatoms respond to changing pCO 2 suggest that there is a wide range of physiological responses within the diatoms. Members of the same genera differ in their physiological response to PCO 2 concentration. This is illustrated by the differences in specific growth rates of members of the Thalassiosira genus grown in different CO 2 conditions. Some genera within this species seem to thrive in high CO 2 ; the specific growth rate of T. rotula increased by at least 13% in elevated (690 ppm) CO 2 . A similar trend is seen in T. weissflogii, which grows faster in high CO 2 and slower in low pCO 2 Shi et al. 2010). The data for T.
oceanica is less clear -in one study the specific growth rate is decreased by at least 19% in high pCO 2 (700 ppm) , while other studies indicate that the specific growth rate of this diatom is increased in high pCO 2 (Shi et al. 2010). Other Thalassiosira diatoms appear to be indifferent to CO 2 levels; the growth rate of T.
pseudonana does not change with pCO 2 (Crawfurd et al. 2011;. One aspect of phytoplankton biology that may influence which groups are winners and losers in a higher CO 2 future is their ability to regulate how much of their energy budgets they spend on inorganic carbon concentrating mechanisms (Giordano et al. 2005;Raven et al. 2014). During the dark reactions of photosynthesis, diatoms use the enzyme ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) to fix CO 2 onto ribulose bisphosphate, producing 3-phosphoglycerate and fueling the Calvin cycle. However, RuBisCO can also fix O 2 onto ribulose bisphosphate, producing 2-Pglycolate, fueling the less energetically efficient photorespiration pathway. Diatoms use carbon concentrating mechanisms (CCMs) to increase the supply of CO 2 proximal to RuBisCO, to energetically favor carbon fixation rather than photorespiration.
The use of carbon concentrating mechanisms and active transport of inorganic carbon requires a significant energetic investment by the cell   . When cAMP function is inhibited chemically (Harada et al. 2006) or by gene deletion (Ohno et al. 2012), the pCO 2 sensitivity of the β carbonic anhydrases is repressed in diatoms.
The majority of our molecular and genetic understanding regarding the metabolic adjustments diatoms may make in a high CO 2 future ocean is largely shaped by studies in two model diatoms, Thalassiosira pseudonana and Phaeodactylum tricornutum. As these diatoms show significant differences in the metabolism they use to respond to changes in pCO 2 , we wanted to probe this response in more closely related diatoms that have broad ecological distributions, albeit in different ocean basins. In this study, we probed the genetic response of Thalassiosiroid diatoms (T. rotula CCMP3096, T. oceanica CCMP1005, and T. weissflogii CCMP1010) that demonstrate distinct physiological responses to different pCO 2 concentrations using quantitative global transcriptome profiling. Our gene expression data is consistent with our previous physiological study  in that related diatom species utilize distinct complements of genes involved in their CO 2 -dependent growth response and carbon concentrating mechanisms.
Raw reads were subsequently imported into CLC Workbench (CLC Bio-Qiagen, Aarhus, Denmark) with an insert size ranging from 1-400 nucleotides. Reads were trimmed in CLC Workbench with the following parameters: removal of 13 bases off the 5' end, .001 quality cut off, and no ambiguous nucleotide base calls. Reads were mapped to de novo transcriptomes that were assembled as described elsewhere (Wallace et al., in prep). Reads were mapped using the RNAseq module in CLC Workbench with the following parameters: mismatch cost (2), insertion cost (3), deletion cost (3), length fraction (0.8), similarity fraction (0.8), paired distances (auto-detect), strand specific (both), maximum number of hits for a read (10). Annotations from the transcriptome sequences were refined by running the translated protein sequences on the KEGG KAAS automated server ). The BLAST search program and bi-directional best hit options were selected. The sequences were queried against the "genes" data set, with the addition of the diatoms Thalassiosira pseudonana (tps) and Phaeodactylum tricornutum (pti). An additional annotation step was conducted using InterProScan with the pathways and goterms options selected (Jones et al. 2014).

Differential gene expression was inferred with Analysis of Sequence Counts
(ASC) from the total numbers of reads mapped to each transcript ).
Transcript expression levels in the present-day ambient open ocean pCO 2 (320-390 ppm) were compared to expression levels in low/pre-industrial (less than 230 ppm pCO 2 ), future pCO 2 projections (690-1340 ppm pCO 2 ), and the maximum that diatoms have experienced in their evolutionary history (2900-5100 ppm pCO 2 ). Transcripts with at least a 95% posterior probability of a 2-fold change were considered differentially expressed.

Identification and comparison of orthologs
Protein sequences for putative carbonic anhydrases, bicarbonate transporters, and additional genes potentially involved in C4 metabolism were used for phylogenetic analysis. An initial BlastP search was conducted to find potential orthologous genes in the translated transcriptomes. Any sequence meeting our threshold against any of these queries (35% ID, 80% query coverage per HSP cutoff) 93 were considered an ortholog. We also included any sequence identified as a carbonic anhydrase or bicarbonate transporter by InterProScan (Jones et al. 2014).
Resulting sequences from the T. oceanica transcriptome were cross-referenced to the T. oceanica genome (Lommer et al. 2012 This data is summarized in Supplemental Table 3. For the carbonic anhydrases, the query sequences used were putative carbonic anhydrase sequences described in Thalassiosira pseudonana and Phaeodactylum tricornutum (Tachibana et al. 2011;. Also included in this analysis were the eta type carbonic anhydrases from Plasmodium falciparum and the epsilon type carbonic anhydrases from Halothiobacillus neapolitanus (Del Prete et al. 2014;So et al. 2004).
For the bicarbonate transporters, we used the SLC4 sequences from T. pseudonana and P. tricornutum (Nakajima et al. 2013). We queried the T. pseudonana JGI genome portal for "SLC26 family" keyword and used those sequences as the sequence queries and included them in the phylogeny .
For genes with a potential role in C4 metabolism, the T. pseudonana JGI genome portal was queried for genes annotated as having a key role in C4 metabolism (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014;).
All alignments were done in MAFFT (version 7) using the L-INS-i strategy with the BLOSUM62 scoring matrix, the "leave gappy regions" option selected, and the MAFFT-homologs option selected for the bicarbonate transporter proteins (Katoh & Standley 2013). FASTA alignments were converted into PHYLIP format and imported into RaxML (Felsenstein 1989;Stamatakis 2014). Maximum likelihood trees were computed with the following parameters: PROTGAMMA matrix, BLOSUM62 substitution model, and 1000 bootstrap replicates. The carbonic anhydrase tree was rooted using the gamma carbonic anhydrases because they are the most ancient carbonic anhydrase (Smith et al. 1999). The bicarbonate transporter tree was mid-point rooted in FigTree.

Predicted Protein Localization
SignalP (V4.1), TargetP (V1.1), and ASA-find (1.1.5) were used to determine the putative localization of differentially expressed genes and genes potentially involved in a carbon concentrating mechanism (Bendtsen et al. 2004;Emanuelsson et al. 2007;Gruber et al. 2015). TMHMM version 2.0 was used for prediction of transmembrane domains (Krogh et al. 2001). SignalP was run against the eukaryote organism group and the default cutoff values were used. TargetP was run with the plant model and "winner-takes-all" option selected.
The input for these analyses were the translated protein sequences from the transcriptome assemblies as found by ORF-Predictor and reported elsewhere (Min, Butler, Storms, & Tsang, 2005, Wallace et al., in prep).
The output of these predictions were interpreted as follows: only proteins predicted to be plastid targeted by ASA-find (low or high confidence) were considered imported across all four plastid membranes and into the stroma. If a protein was not plastid targeted but was SignalP positive, it was considered likely to be targeted across the chloroplast-ER (CER) membrane and retained in the CER compartment. If a SignalP negative protein was predicted by TargetP to be targeted to the mitochondria then that protein was considered targeted to the mitochondria (Kroth et al. 2008).
Proteins that were not predicted to have a signal cleavage site and were not predicted to be localized to the mitochondria were considered cytosolic. If the sequence was predicted to be SignalP negative, have a transmembrane domain in TMHMM, and secreted as per TargetP, the localization for this sequence is considered potentially extracellular.

RESULTS
The three Thalassiosira diatoms studied here show very different physiological responses to changes in CO 2 . Transcriptomes sequenced from the same incubations highlighted the different underlying transcriptional responses. T. oceanica, T. weissflogii, and T. rotula differentially express different numbers of genes across CO 2 conditions (relative to the present-day approximation of 320-390 ppm pCO 2 ) ( Table 1). T. oceanica shows a relatively muted response in global gene expression in <230 ppm and 690-1340 ppm pCO 2 conditions compared to the 2900-5100 ppm pCO 2 condition ( Table 1). The larger shift in expression in high pCO 2 coincides with a decreased specific growth rate in 690-1340 and 2900-5100 ppm pCO 2 that is not reflected in any changes in the molecular ratios of C:N:P across pCO 2 . T. weissflogii shows a more muted response in the 690-1340 ppm pCO 2 , differentially expressing only 156 genes (Table 1). This is in contrast to the <230 ppm and 2900-5100 ppm CO 2 conditions, where there is differential expression of 581 and 434 genes, respectively (Table 1). This corresponds to a decreased specific growth rate of T. weissflogii in <230 ppm pCO 2 that is not reflected in any changes in the molecular ratios of C:N:P, which do not change across the pCO 2 conditions tested . The specific growth rate of T. rotula increases in 690-1320 ppm pCO 2 relative to the specific growth rates in <230 and 320-390 ppm pCO 2 . There is also an increase in the C:P ratio in <230 ppm pCO 2 .
This corresponds to a stronger differential expression signal in the <230 ppm pCO 2 condition, where 988 genes are differentially expressed (Table 1). This is in contrast with the 690-1340 ppm pCO2 condition, where only 62 genes are differentially expressed (Table 1). There were no 2900-5100 ppm CO 2 cultures, as the high CO 2 and low cell density could not be maintained simultaneously .

Species-specific metabolic rearrangement, redox and ion homeostasis in low CO 2
T. rotula grown in low pCO 2 (<230 ppm) increases the expression of degradative genes (Supplemental Table 1), including fatty acid and branched chain amino acid degradation (Figure 1). There also is a low CO 2 downregulation of biosynthetic reactions, chromosomes, translation machinery, and cell cycle progression (Supplemental Table 1).
T. weissflogii grown in low pCO 2 decreases the expression of gluconeogenesis and pentose phosphate pathway genes (Figure 1). There also is a low CO 2 downregulation of genes involved in amino acid, fatty acid, and tRNA biosynthesis (Supplemental Table 1) and an increase the expression of fatty acid and branched-chain amino acid degradation genes (although not as many genes as T. rotula) (Supplemental Table 1).

The trends in T. rotula and T. weissflogii are in contrast to what is observed in
T. oceanica, where there is not a clear reorganization of metabolism with changing pCO 2 . T. oceanica does not increase the expression of genes involved in degradative processes in low CO 2 , but increased the expression of a single gene of unknown function (Supplemental Table 1). Similarly, T. oceanica does not increase the expression of genes involved in biosynthetic pathways in high CO 2 or decrease their expression in low CO 2 (Supplemental Table 1). However, in high CO 2 , T. oceanica upregulates genes involved in transporting protons and cations as well as redox homeostasis genes (Figure 1).

Copy number variation in bicarbonate transporters
The 3 Thalassiosira diatom species considered in this study have different numbers of bicarbonate transporters. T. pseudonana has 2 SLC4 transporters Nakajima et al. 2013), T. weissflogii is predicted to have 3, T.  Table 3).

Varied bicarbonate transporter localization predictions and CO 2 sensitivity
The 3 Thalassiosira diatom species in this study show distinct potential differences in localization of their bicarbonate transporters and expression patterns.
All of the putative SLC4 and SLC26 transporters recovered here have at least 1 transmembrane helix, suggesting that all are localized to a membrane (Supplemental  . In contrast, most of the SLC4 and SLC26 transporters in T. weissflogii are predicted to be in the cytosolic membrane, save for 1 plastid predicted SLC26 transporter (Supplemental Table 2). In <230 ppm pCO 2 , T. weissflogii downregulates a plastid predicted SLC26 but does not differentially express SLC4 transporters across CO 2 conditions (Figure 3, Supplemental Table 2). Interestingly, none of the SLC4 or SLC26 transporters from T.
weissflogii are predicted to be localized to the mitochondria (Supplemental Table 2).

Copy number variation in carbonic anhydrases
The copy numbers and isoform distributions of the putative carbonic anhydrases from the assembled Thalassiosira transcriptomes vary across species (Table 2). The T. weissflogii transcriptome has the largest number of putative carbonic anhydrases (19), while the T. oceanica and T. rotula transcriptomes have 14 and 13, respectively, which is on par with the 14 putative carbonic anhydrases found in the T.
pseudonana genome (Table 2) Tachibana et al. 2011;. T. weissflogii appears to be particularly enriched in its number of ζ carbonic anhydrases (4 copies), compared with the single copies found in T. pseudonana, T. oceanica, and T. rotula, while this class of carbonic anhydrase is absent entirely from P. tricornutum (Table 2) (Tachibana et al. 2011;). Three of the T. weissflogii ζ carbonic anhydrases are in the same clade as the other diatoms, while the fourth T. weissflogii ζ carbonic anhydrase is basal to all the other ζ carbonic anhydrases (Figure 4). The T. weissflogii transcriptome also has a higher number of α carbonic anhydrases -7 copies, compared to 2 in T. rotula, 3 in T. oceanica and T. pseudonana, and 5 in P. tricornutum (Table   2) (Tachibana et al. 2011;). The Thalassiosira α carbonic anhydrases are in clade separate from those in P. tricornutum (Figure 4). The T.
weissflogii β carbonic anhydrase copy groups closely with the P. tricornutum copies (support value of 98) and the T. rotula copy also clusters with this group although with poor support values (support value of 23) (Figure 4). There is no evidence of θ, η, or ε 100 carbonic anhydrases in our assembled transcriptomes (Figure 4) Del Prete et al. 2014;So et al. 2004).
The number of carbonic anhydrases recovered from the T. oceanica transcriptome are generally supported by the genomic evidence, with the exception of the δ carbonic anhydrases (Supplemental Table 3) (Lommer et al. 2012). Each of the 3 α carbonic can be aligned to the genome, although 1 copy did not have a gene prediction overlapping at its mapping location (Supplemental Table 3). The 5 γ carbonic anhydrases and the single ζ carbonic anhydrase also map to distinct genomic locations and are supported by the gene model predictions (Supplemental Table 3).
This congruence starts to break down when comparing the genome and transcriptome δ carbonic anhydrases --2 of the transcripts are potentially distinct alleles of the same gene, as they map to the same genetic location and their sequences differ only by 3 nucleotides (Supplemental Table 3). Additionally, 2 of the transcripts both map equally well to the same 3 genomic locations, although in each of these scenarios the transcript sequence only covers part of the predicted gene model (Supplemental Table   3).

Varied carbonic anhydrase localization predictions
Each diatom also has a unique pattern of predicted localization for their particular suite of carbonic anhydrases ( Figure 5, Supplemental Table 2). T. oceanica does not have any carbonic anhydrases predicted to be targeted to the plastid or the chloroplast endoplasmic reticulum but has 5 mitochondrion predicted carbonic anhydrases (two δ and three γ) and the remainder are predicted to be cytoplasmic ( Figure 5, Supplemental Table 2). T. rotula has a single plastid predicted α carbonic anhydrase, 1 mitochondrion predicted γ carbonic anhydrase, and the remainder are predicted to be cytoplasmic ( Figure 5, Supplemental Table 2). T. weissflogii has 2 chloroplast endoplasmic reticulum predicted α carbonic anhydrases, 3 mitochondrion predicted carbonic anhydrases (2γ and 1α), 1 extracellular predicted δ carbonic anhydrase, and the remainder are cytoplasmic ( Figure 5, Supplemental Table 2).
However, there also is an upregulation of the other 3 δ carbonic anhydrases in 2900-5100 ppm pCO 2 (Figure 3, Supplemental Table 2). None of the putative carbonic anhydrases in T. rotula are differentially expressed in any CO 2 condition tested ( Figure 3, Supplemental Table 2).

Putative C4 photosynthesis genes
Each of the three diatom transcriptomes contain PEPC, PEPCK, ME, and PYC --the key carboxylating and decarboxylating enzymes hypothesized to be critical for diatom C4 metabolism ( Figure 6, Supplemental Table 2) (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014). The localization and pCO 2 sensitivity of these enzymes varies across species, but there is little compelling evidence that C4 enzymes are playing an important role in the CCM of these Thalassiosiroid diatoms (Supplemental Table 2). The putative C4 metabolism genes recovered from the T. oceanica transcriptome are supported by the genomic sequences (Lommer et al. 2012) (Supplemental Table 3).
If PEPC mediated carboxylation is important for a functional CCM, PEPC should be localized outside the plastid and be upregulated in low pCO 2 or downregulated in high CO 2 ( Figure 6). Consistent with the suggestion that PEPC plays a role in the diatom CCM, T. oceanica downregulates a putative cytosolic PEPC in 2900-5100 ppm pCO 2 (Supplemental Table 2). Similarly, in <230 ppm pCO 2 T.
However, T. weissflogii also upregulates a PEPC that is predicted to be plastid localized in <230 ppm pCO 2 (Supplemental Table 2). Although there is a T. rotula PEPC that is predicted to be localized to the mitochondria which consistent with a potential role in a C4 CCM, it is not differentially expressed in any CO 2 condition tested here (Supplemental Table 2).
If PEPCK mediated decarboxylation is important for a functional CCM, PEPCK should be plastid localized and be upregulated in low pCO 2 or downregulated in high CO 2 ( Figure 6). The T. oceanica PEPCK is mitochondrion-predicted and is downregulated in 690-1340 ppm pCO 2 but then upregulated in the 2900-5100 ppm pCO 2 condition, suggesting it is not involved in decarboxylation steps of the CCM (Supplemental Table 2). The T. weissflogii PEPCK is upregulated in 2900-5100 ppm pCO 2 but it is also mitochondrion predicted, suggesting it is also not involved in decarboxylation steps of the CCM (Supplemental Table 2). The T. rotula PEPCK is upregulated in <230 ppm pCO 2 , but it is predicted to be cytosolic, suggesting it is also not involved in decarboxylation steps of the CCM (Supplemental Table 2).
If ME mediated decarboxylation is important for a functional CCM, ME should be plastid localized and be upregulated in low pCO 2 or downregulated in high CO 2 . T. oceanica downregulates a putative cytosolic ME in 2900-5100 ppm pCO 2 , suggesting it is not involved in the CCM because of its predicted localization (Supplemental Table 2). T. weissflogii downregulates a putative mitochondrial ME in <230 ppm pCO 2 , suggesting it is also not involved in the CCM (Supplemental Table   2). T. rotula does not differentially express ME and ME is not predicted to be plastid localized in this diatom (Supplemental Table 2), suggesting that ME decarboxylation is also not important for a functional CCM in this diatom.
If PYC mediated decarboxylation is important for a functional CCM, PYC should be plastid localized and be upregulated in low pCO 2 or downregulated in high CO 2 . T. weissflogii downregulates a PYC in <230 ppm pCO 2 and downregulates another in 2900-5100 ppm pCO 2 and both copies are predicted to be cytosolic (Supplemental Table 2). T. oceanica does not differentially express PYC and PYC is not plastid localized (Supplemental Table 2), suggesting PYC is not a key aspect of the CCM of this diatom. Although the PYC in T. rotula is not differentially expressed, it is plastid localized, so its potential role in the CCM is ambiguous (Supplemental Table 2).

cAMP secondary messengers and CO 2 /HCO 3 sensing
Like T. pseudonana, T. oceanica upregulates adenylyl cyclase (IPR001054) in high CO 2 but also downregulates 6 additional adenylyl cyclases (Supplemental Table   1) . Querying the NCBI NR database with the upregulated adenylyl cyclase indicates that the top blast hit (after the T. oceanica genome accession EJK45951.1) is the soluble adenylyl cyclase that has been suggested to play a role in CO 2 sensing in T. pseudonana (XP_002288198.1) .
Unlike T. oceanica and T. pseudonana, T. rotula and T. weissflogii do not differentially express adenylyl cyclases in response to pCO 2 (Supplemental Table 1).
In contrast to the T. pseudonana study, but consistent with maintaining high intracellular cAMP levels in high CO 2 , T. oceanica downregulates a phosphodiesterase in 2900-5100 ppm pCO 2 (Supplemental Table 1) . T. oceanica also downregulates 3 bZIP domain transcription factors (IPR004827) in 2900-5100 ppm pCO 2 , consistent with the suggestion that cAMPresponsive transcription factors downregulate clusters of genes involved in the CCM (including their own expression) in high CO 2 (Supplemental Table 1) . There is also a single cAMP-dependent kinase downregulated by T. oceanica in 2900-5100 ppm pCO 2 , but there are 3 additional cAMP-dependent kinases that are upregulated as well (Supplemental Table 1). T. weissflogii downregulates a phosphodiesterase in low pCO 2 (<230 ppm pCO 2 ), which is consistent with the  study that saw an upregulation of phosphodiesterases in T.
pseudonana in elevated CO 2 (Supplemental Table 1)  (Supplemental Table 1). In contrast to T. pseudonana (where cAMP-responsive transcription factors downregulate clusters of genes involved in the CCM, including their own expression in high CO 2 ), T. rotula and T. weissflogii downregulate bZIP domain transcription factor and upregulate cAMP-dependent protein kinases in low CO 2 (<230 ppm pCO 2 ) (Supplemental Table 1). This suggests that bZIP domain transcription factors are not increasing the expression of CCM gene clusters (including their own expression) in T. rotula and T. weissflogii in low CO 2 . This is supported by the increase of cAMP-dependent kinases in T. rotula and T.
weissflogii in low CO 2 (Supplemental Table 1), as it was suggested that cAMPresponsive elements would upregulate the CCM in low CO 2 , including their own expression .

Key point of regulation for T. oceanica carbon concentrating mechanism: SLC4mediated HCO 3 uptake into the cytoplasm
The role of carbonic anhydrases in the CCM of T. oceanica is unclear, as these enzymes are up and downregulated the 2900-5100 ppm pCO 2 condition, which approximates the highest levels diatoms have experienced over the course of their evolution (Figure 3, Supplemental Table 2) (Sims et al. 2006;Medlin 2016;Kump et al. 2009). T. pseudonana also upregulates carbonic anhydrases in high CO 2 , but the high CO 2 condition used in that study (~800 ppm) is considerably lower than the high CO 2 conditions where T. oceanica differentially expresses bicarbonate transporters or carbonic anhydrases (2900-5100 ppm) and T. oceanica doesn't differentially express bicarbonate transporters or carbonic anhydrases in the 690-1340 ppm CO 2 condition that would be the closest approximation to the Hennon 2015 study (Figure 3, Supplemental Table 2) .
The incongruence between the δ carbonic anhydrases recovered from the genome (Lommer et al. 2012) and transcriptome of T. oceanica may be explained by the observation that the δ carbonic anhydrases have undergone relatively recent, lineage-specific gene expansions ). These recent gene duplication events may lead to species-specific gene expansions that may not have had time to diverge and would encode very similar transcripts that would likely be merged in de novo assemblies.
The high CO 2 downregulation of 2 cytosolic membrane SLC4 transporters suggests that exogenous bicarbonate uptake is the key point of regulation for T.
oceanica's CCM (Figure 3, Supplemental Table 2). This is supported by the higher number of SLC4 transporters found in T. oceanica (5) relative to the 3 found in T.
weissflogii (this study) and T. pseudonana Nakajima et al. 2013) and the 4 found in T. rotula (this study).

T. oceanica decreased growth rate in high CO 2 due to ionic and redox stress
We hypothesize that T. oceanica's decreased growth rate in high CO 2 is related to oxidative stress and perturbed intracellular ionic balance (Figure 1, Supplemental Table 1) .
Ionic stress in T. oceanica grown in high CO 2 may be explained by the downregulation of SLC4 transporters in high CO 2 (Figure 3, Supplemental Table 2).
The SLC4 transporters in diatoms require Na + ions and are thought to be either Na + cotranspoters or Na + driven Cl -/ HCO 3 exchangers (Nakajima et al. 2013). Their activity requires the movement of Na + back out of the cell and/or Clback into the cell to maintain ionic gradients across cellular membranes (Nakajima et al. 2013). The downregulation of SLC4 transporters in high CO 2 may perturb the intracellular balance of Na + and/or Clin T. oceanica.
The redox stress observed in T. oceanica may be related to the carbonic anhydrases, as the carbonic anhydrases in P. tricornutum are partly redox regulated via the activity of thioredoxin . T. oceanica's regulation of redox enzymes in high pCO 2 may indicate that this diatom uses similar redox regulation of carbonic anhydrase activity (Figure 1, Supplemental Table 1).

Key point of regulation for T. weissflogii carbon concentrating mechanism:
cytosolic-predicted β, δ, and ζ carbonic anhydrases preventing CO 2 leakage In terms of the carbon concentrating mechanism, there is a tight regulation of cytosolic-predicted β, δ, and ζ carbonic anhydrases across CO 2 conditions (Figure 3, Figure 5, Supplemental Table 2). This is consistent with studies in T. pseudonana and P. tricornutum that suggest the β, δ, and ζ classes of carbonic anhydrases are upregulated in low pCO 2 and/or downregulated in high pCO 2 (Tachibana et al. 2011;Clement et al. 2017;Crawfurd et al. 2011;McGinn & Morel 2008).
None of the T. weissflogii SLC4 or SLC26 bicarbonate transporters are downregulated in 690-1340 or 2900-5100 or upregulated in <230 ppm pCO 2 ( Figure   3, Supplemental Table 2). This is consistent with proteomics studies in T. pseudonana that did not observe downregulation of SLC4 transporters in high pCO 2 conditions (Clement et al. 2017), but is in contrast with a transcriptome study where T.
There is one SLC26 transporter predicted to be localized to the plastid that is downregulated in <230 ppm pCO 2 in T. weissflogii (Figure 3, Supplemental Table 2), but it is possible that this transporter is involved in transporting other anions (e.g.,

sulfate) (Alper & Sharma 2013).
Consistent with a potential role in a C4-based CCM, T. weissflogii decreases the expression of a cytosolic PEPC gene in high CO 2 and increases its expression in low CO 2 (Supplemental Table 2, Figure 6) (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014). However, there also is upregulation of a plastid predicted PEPC in low CO 2 , which would potentially compete with RuBisCO for CO 2 and is in opposition to what is expected of a carbon concentrating mechanism (Supplemental Table 2, Figure   6) (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014). PEPCK or ME mediateddecarboxylation is unlikely in T. weissflogii because both genes are predicted to be mitochondrial rather than plastid localized (Supplemental Table 2, Figure 6, (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014)). Further, they are upregulated in 2900-5100 ppm pCO 2 and downregulated in <230 ppm pCO 2 , which is counter to what would be expected of the CCM (Supplemental Table 2, Figure 6) (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014). Although there is a T. weissflogii PYC gene that is downregulated in 2900-5100 ppm pCO 2 , its predicted localization to the cytosol suggest that it is not playing a role as a C4 decarboxylase, which would require localization to the plastid (Supplemental Table 2, Figure 6) (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014).
Their predicted localization to the cytoplasm (Figure 3, Figure 5, Supplemental Table 2) suggests that similar to T. oceanica, the key point of regulation for the T.
weissflogii carbon concentrating mechanism is the flux of inorganic through the cytoplasm. While this is mediated by SLC4 HCO 3 transporters in T. oceanica, T.
weissflogii seems to focus more on diffusive CO 2 uptake and using carbonic anhydrases to prevent CO 2 leakage out of the cell.
The presence of both β and ζ carbonic anhydrases in T. weissflogii is interesting, as neither of the two model diatoms examined thus far (T. pseudonana and P. tricornutum) have both the β and ζ carbonic anhydrases -T. pseudonana has a ζ but not a β, and the reverse is true for P. tricornutum (Tachibana et al. 2011;Samukawa et 110 al. 2014) (Table 2). It has also been noted that the β and ζ carbonic anhydrases are found in relatively few diatom transcriptomes compared to the more commonly found α, δ, and γ carbonic anhydrases, highlighting the rarity of the scenario where a diatom simultaneously has both of these carbonic anhydrases ). However, it is possible that β and ζ carbonic anhydrase are more widely distributed than it would seem from available transcriptome data if their transcripts are too lowly expressed to be recovered by assemblies that do not include a low pCO 2 condition which may be required to induce their expression.

T. weissflogii energetic trade-offs across CO 2 conditions
We hypothesize that the energetic cost of upregulating the carbonic anhydrases may be the cause of the decreased specific growth rate in <230 ppm pCO 2 (Figure 3, Supplemental Table 2) (Giordano et al. 2005;Raven et al. 2014;. T. weissflogii grown in low pCO 2 also decreases the expression of the pentose phosphate pathway (Figure 1, Supplemental Table 1), fatty acid synthesis (Supplemental Table   1), and glycolysis/gluconeogenesis (Figure 1, Supplemental Table 1) suggesting that in higher pCO 2 it may divert carbon and reducing power toward the synthesis of storage compounds like fatty acids and/or glucose/chrysolaminarin instead of an increased specific growth rate, as seen in T. rotula grown in high pCO 2 .
Key point of regulation for T. rotula carbon concentrating mechanism: SLC4mediated HCO 3 uptake into the plastid T. rotula does not differentially express any carbonic anhydrases in any CO 2 condition tested here (Figure 3, Supplemental Table 2). However, some of the δ, γ, and ζ carbonic anhydrases are highly expressed and perhaps are not differentially expressed because they are constitutively expressed across CO 2 conditions, as has been observed for a plastid localized α carbonic anhydrases of T. pseudonana ( Figure   3, Supplemental Table 2) .
T. rotula does upregulate a plastid-predicted SLC4 transporter in <230 ppm pCO 2 , but it also downregulates a cytosolic SLC4 transporter in the same condition ( Figure 3, Supplemental Table 2). This is in contrast with T. pseudonana, which downregulates a cytoplasmic SLC4 gene in high CO 2 and seems to rely on a bestrophin putative ion channel to transport HCO 3 into the plastid .
There is little evidence of a C4-like CCM in T. rotula, which does not differentially express PEPC across CO 2 conditions. Its localization to the mitochondria is also not consistent with a carboxylase role in a C4-based CCM unless it is functioning as a means to recover respired CO 2 (Supplemental Table 2, Figure 6) (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014). The PEPCK, ME, and PYC enzymes that could act as decarboxylases are either not CO 2 sensitive or not localized to the plastid (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014). There is a plastid localized PYC, and while lack of CO 2 sensitivity makes its role in a C4-based CCM ambiguous, its high expression suggests it may be constitutively expressed (Supplemental Table 2, Figure 6) (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014).
These data suggest that T. rotula may rely more on SLC4 uptake into the plastid for their CCM rather than uptake into the cytosol or the activity of carbonic 112 anhydrases, while the potential role of a C4-like CCM in this diatom remains ambiguous.

T. rotula relies on degradation pathways to sustain growth in low CO 2
The upregulation of genes involved in degrading amino and fatty acids in low pCO 2 suggests that T. rotula is relying on these catabolic reactions to fuel growth when inorganic carbon is less readily available (Figure 1, Supplemental Table 1). The slower growth rate in 320-390 ppm pCO 2 and <230 ppm pCO 2 might indicate that T.
rotula is becoming carbon limited in these conditions, but the increased C:P ratio in <230 ppm pCO 2 suggests that T. rotula is probably not carbon limited but is potentially phosphate limited in low pCO 2 . T. rotula does not upregulate alkaline phosphatases or phosphate transporters in low pCO 2 , genes that are indicative of phosphate limitation in T. pseudonana ), but there is an upregulation of a serine/threonine protein phosphatase protein that cleaves phosphate groups from serine and threonine residues (Supplemental Table 1). T. rotula doesn't increase the expression of genes indicative of increased carbon storage in high CO 2 (Supplemental Table 1) consistent with the suggestion that it funnels excess carbon toward an increased growth rate rather than synthesis of storage compounds as suggested for T. weissflogii .

T. rotula
Evidence suggests diatoms sense external CO 2 levels via the activity of cAMP secondary messengers (Ohno et al. 2012;. This seems to be true of T. oceanica, but there is little evidence of cAMP mediated CO 2 sensing in T. rotula and T. weissflogii. One possible complicating factor is the cAMP-mediated cross-talk between CO 2 and light sensing (P. tricornutum diatoms grown in low light are less able to upregulate β carbonic anhydrases in response to low pCO 2 (Tanaka et al. 2016)). The experiments described in this study were conducted in growth-saturating light conditions, so a low-light response is not anticipated here . However, the role of cAMP secondary messengers in diatom biology is not completely understood and there may be additional cross-talk of cAMP (or other signaling pathways) that obscure further indication of CO 2 mediated cAMP signaling in T.
rotula and T. weissflogii. An additional possibility is that T. rotula and T. weissflogii sense changes in external pCO 2 through some other mechanism that does not utilize adenylyl cylases or cAMP as CO 2 sensors.

CONCLUSIONS
The results of this study suggest that the gene complement, subcellular localization, and CO 2 sensitivity of the CCM varies not only across diatom genera (e.g., Thalassiosira versus Phaeodactylum) (Tachibana et al. 2011;), but can also very across species within the same diatom genera. This is consistent with the hypothesis that historic fluctuations in atmospheric CO 2 have led to a diversification and re-configuation of the CCM within the diatoms (Young et al. 2016;. This study suggests that this diversification has occurred even in diatoms within the same genus. Additional studies of diatom genomes or low CO 2 transcriptomes will also be important for understanding whether or not the presence of both β and ζ carbonic anhydrases in T. weissflogii (Table 2, Figure 5, Supplemental Table 2) is as anomalous as it would seem based on current available diatom genomic and transcriptomic evidence.
Different species' strategies for balancing carbon storage versus carbon spending (e.g., increased growth rate) may influence community composition and a species' ability to cope with additional stressors in a high CO 2 future. For example, a larger internal pool of fatty acids and lipids may help T. weissflogii to survive periods of phosphorus limitation in high CO 2 by allowing access to an intracellular pool of phospholipids . Additional mesocosm experiments applying multiple stressors (e.g., phosphate limitation and elevated CO 2 ) to these naturally occurring phytoplankton assemblages may give us a better understanding of how contrasting, species-specific strategies impact competitive dynamics and diatom community composition.
Many of the conclusions about the diatom CCM presented here rely on computational predictions of sub-cellular protein localization. The computational predictions of sub-cellular localization of diatom proteins do not always agree with observed localizations , so the predicted localizations presented here ( Figure 5, Supplemental Table 2) should be experimentally validated in future studies. Additionally, the protein products of putative CCM genes requires biochemical characterization to confirm their enzymatic activity.
The hypothesis that T. oceanica uses cAMP mediated CO 2 sensing (but T. rotula and T. weissflogii don't) could be tested by tracking the expression of the CCM genes (carbonic anhydrases and bicarbonate transporters) across a range of CO 2 115 conditions with and without the addition of the phosphodiesterase inhibitor 3-isobutyl-1-methylxanthine . These studies could be particularly useful if they are combined with measurements of carbon assimilation rates via photosynthetron P versus E cruves (phototosynthesis-irradiance response curves) (Lewis & Smith 1983).   tricornutum (JGI IDs or NCBI accession numbers), the transcriptomes assembled in this study (indicated by asterisks), and MAFFT homologs included for reference. All alignments were done in MAFFT (version 7) using the L-INS-i strategy with the BLOSUM62 scoring matrix, the "leave gappy regions" option selected, and the MAFFT-homologs option selected (Katoh & Standley 2013). RAxML maximum likelihood trees were computed with the following parameters: PROTGAMMA matrix, BLOSUM62 substitution model, and 1000 bootstrap replicates. The tree was mid-point rooted in FigTree. Support values less than 70 are not shown.

Intergovernmental Panel on Climate
126 Figure 3. Expression values of the carbonic anhydrases and bicarbonate transporters from the transcriptomes in this study. Each point represents a single transcript. The x-axis is the log 10 transformed RPKM values in the 320-390 ppm pCO 2 and the y-axis is the log 10 transformed RPKM values in the <230 ppm condition, the 690-1340 ppm pCO 2 condition, and the 2900-5100 ppm pCO 2 condition at the bottom of the figure. The black diagonal line indicates the same expression value in the two conditions compared. Circled points appearing above this line are upregulated in the condition indicated on the y-axis and circled points below it are downregulated in the condition indicated on the y-axis (>= 0.95 posterior probability of a 2-fold change in ASC) . The color of the points indicates the specific isozyme type and the shape of the point indicates from which organism the transcript originates. Figure 3. Expression values of the carbonic anhydrases and bicarbonate transporters from the transcriptomes in this study. Each point represents a single transcript. The x-axis is the log 10 transformed RPKM values in the 320-390 ppm pCO 2 and the y-axis is the log 10 transformed RPKM values in the <230 ppm condition, the 690-1340 ppm pCO 2 condition, and the 2900-5100 ppm pCO 2 condition at the bottom of the figure. The black diagonal line indicates the same expression value in the two conditions compared. Circled points appearing above this line are upregulated in the condition indicated on the yaxis and circled points below it are downregulated in the condition indicated on the y-axis (>= 0.95 posterior probability of a 2-fold change in ASC) . The color of the points indicates the specific isozyme type and the shape of the point indicates from which organism the transcript originates. Figure 4. A maximum likelihood phylogeny of the carbonic anhydrases from the model diatoms T. pseudonana (JGI Thaps3 gene IDs) and P. tricornutum (JGI IDs or NCBI accession numbers), the three Thalassiosira transcriptomes assembled in this study (indicated by asterisks), and two additional carbonic anhydrases from Halothiobacillus neapolitanus and Plasmodium falciparum for reference. All alignments were done in MAFFT (version 7) using the L-INS-i strategy with the BLOSUM62 scoring matrix, the "leave gappy regions" option selected (Katoh & Standley 2013). RAxML maximum likelihood trees were computed with the following parameters: PROTGAMMA matrix, BLOSUM62 substitution model, and 1000 bootstrap replicates. The carbonic anhydrase tree was rooted using the gamma carbonic anhydrases because they are the most ancient carbonic anhydrase (Smith et al. 1999). Support values less than 70 are not shown.  (Bendtsen et al. 2004;Emanuelsson et al. 2007;Gruber et al. 2015). SignalP was run against the eukaryote organism group and the default cutoff values were used. TargetP was run with the plant model and "winnertakes-all" option selected. PPS = periplastid space, Cyt = cytoplasm, mit = mitochondria, CER = chloroplast endoplasmic reticulum, PPC = periplastidal compartment, CEV = chloroplast envelope, Str = chloroplast stroma, Pyr = pyrenoid.  Figure 6. A schematic summarizing the enzymes thought to be involved in the diatom CCM, including variations of C4 metabolism (Reinfelder et al. 2000;Sage 2004;Kustka et al. 2014 have not been completely experimentally verified (Tachibana et al. 2011;).
T. weissflogii T. oceanica T. rotula  . Rising anthropogenic CO 2 emissions have led to an increase in atmospheric pCO 2 concentration from ~277 ppm in the year 1750 to over 400 ppm presently (Tans 2009;. Increased levels of dissolved CO 2 gas in seawater perturbs carbonate chemistry in the ocean, which results in a decreased buffering capacity, or ocean acidification (OA) .
Diatoms are a phytoplankton group responsible for ~40% of oceanic primary production via photosynthetic fixation of CO 2 (Nelson et al. 1995). The predominant dissolved form of inorganic carbon in the ocean is bicarbonate (HCO 3 -) (Zeebe & Wolf-Gladrow 2001). Diatom CO 2 fixation is aided by carbon-concentrating mechanisms (CCMs), which use bicarbonate transporters (BCTs) to take up HCO 3 and carbonic anhydrases (CAs) to interconvert HCO 3 and CO 2 adjacent to the carbon fixation enzyme RuBisCO . Carbonic anhydrases are a protein family that arose from convergent evolution  that has resulted in distinct protein isoforms. In the diatoms, these include the widespread α, δ, and γ types to rare β and ζ carbonic anhydrases, and to the newly described and relatively underexplored θ carbonic anhydrases  In natural phytoplankton communities, diatoms will generally outcompete other phytoplankton in high CO 2 (Tortell et al. 2002;Tortell et al. 2008;Bach et al. 2017), however the diatom growth response can vary across diatom genera. CO 2 manipulation experiments of natural diatom assemblages indicate that in high CO 2 , Thalassiosira spp. grow faster than Skeletonema spp. (Sett et al. 2018) while Skeletonema spp. grow faster than Nitzschia spp. (Kim et al. 2006). Species within the Thalassiosira diatom genus also display a range of growth rate changes across CO 2 conditions (<230 ppm CO 2 to >2900 ppm CO 2 ), from no change across conditions (T. pseudonana), slower growth in lower CO 2 (T. weissflogii) , slower growth in higher CO 2 (T. oceanica), and faster growth in higher CO 2 (T. rotula) .
Global gene expression profiling data (transcriptomes) from the King et al.
2015 CO 2 response experiments  indicate that differences in CO 2 responses between T. rotula, T. weissflogii, and T. oceanica boil down to their ability to dynamically regulate their carbon concentrating mechanisms while maintaining internal redox and ion homeostasis, as well as potential trade-offs between funneling excess CO 2 into internal storage pools versus a higher growth rate (Wallace et al. in prep). This range of plasticity in the diatoms' CCM is particularly important, as the use of this mechanism is energetically costly . Diatoms that do downregulate the CCM in high CO 2 conserve this energy, potentially conferring a competitive advantage over those that do not. However, diatoms that tend to store excess carbon in high CO 2 (rather than exhibit higher growth rates) would have larger internal storage pools of lipids and carbohydrates, potentially allowing those populations to persist when nutrients are scarce. These differences in CCM regulation and storage tendencies could influence diatom community composition and dynamics, but the full impact of these contrasting abilities is not captured by single-species isolate experiments that do not accurately reflect the complex and competitive environment that diatoms inhabit.
Community-level global gene expression profiling (metatranscriptome) studies are a powerful approach that capture the molecular basis for different diatoms' contrasting growth and nutrient acquisition strategies in mixed assemblages.
Metatranscriptomes sequenced from nutrient-amended waters on the coast of Narragansett Bay, Rhode Island suggest that Skeletonema and Thalassiosira diatoms co-exist because the two diatom genera may access different internal and external pools of nitrogen and phosphorus (Alexander et al. 2015), while Thalassiosira and Pseudo-nitzschia diatoms co-exist in the iron-limited Pacific Ocean by using enzymes with different metal cofactor requirements (Cohen et al. 2017). Similar approaches combining metatranscriptomes and manipulation of CO 2 levels in natural phytoplankton assemblages might give a more complete picture of how changing CO 2 levels impact different diatoms' metabolic capacities in their complex environment.
This study used mesocosm CO 2 manipulation incubations to examine the diatom community response to changes in pCO 2 . Surface seawater from Vineyard Thalassiosira diatoms seem to use the α, δ, and ζ carbonic anhydrases and the SLC4 and SLC26 bicarbonate transporters, Skeletonema uses the γ and ζ carbonic anhydrases and the SLC26 bicarbonate transporters.

Seawater collection and Incubation
On March 4th 2014, surface seawater was collected in Vineyard Sound, Massachusetts using acid-cleaned 50 liter carboys. After Nitex mesh pre-filtering to remove particles larger than 180 µm, water was distributed into 9 acid-cleaned, 10 L carboys. Target pCO 2 levels were chosen to approximate low/pre-industrial (less than 215 ppm pCO 2 ), present-day ambient pCO 2 in the open ocean (330-390 ppm pCO 2 ), and future pCO 2 projections (780-1320 ppm pCO 2 ) (Tans 2009;. Target CO 2 concentrations were obtained by sparging with CO 2 (<100 ml min -1 ). and between by gentle bubbling with an air:CO 2 mixture (<30 ml min -1 ). The macronutrients nitrate, phosphate and silicic acid were added to mimic nutrient levels in the mixed layer, aiming for final concentrations of 10 µM nitrate, 20 µM silicate, and 0.2 µM phosphate. Incubations were diluted with nutrient-amended filtered seawater pre-equilibrated to target CO 2 , on the 8 th , 11 th , 14 th , and 17 th day of incubation. On day 11, phosphate concentrations were amended to 1 µM to avoid a phosphate limitation.
All incubations were kept in an outdoor tank incubator, with temperature maintained (between -1 to 3 °C) by a flow-through water system from near-shore water. All incubations were conducted in triplicate for 20 days time to examine differences in CO 2 response while minimizing complicating factors that might arise during a longer incubation (e.g., growth on the walls of the carboys, etc.). After 20 days, biomass was collected on 3 µm pore-size polyester filters (Sterlitech Corporation, Kent, WA, USA) by gentle filtration using a peristaltic pump, flash frozen in liquid nitrogen, and stored at -80 °C until DNA and RNA could be extracted.
Particulate organic C, N, and P samples were vacuum-filtered onto precombusted 25 mm GF/C glass fiber filters (Whatman, GE HealthCare Biosciences).
Briefly, POC and PON samples were dried in an oven at 60°C, acidified with HCl fumes, and analyzed using gas chromatography with a CHNSO analyzer (Costech) (Parsons et al. 1984). POP samples were rinsed with 0.17 M Na 2 SO 4 , dried at 90°C in 0.017 M MgSO 4 , and measured using the colorimetric molybdate method (Solorzano & Sharp 1980).

DNA Extraction
DNA extractions were performed using the DNeasy Plant Kit (Qiagen, Germany) following the manufactuer's protocols with the following modifications: Filters were bead-beat in lysis buffer with 0.5 mm zircon beads and RNaseA Adapter sequences were trimmed from raw reads using CutAdapt version 1.15 (Martin 2011) with the '--minimum-length 1' flag set to omit reads with a length of 0 and the '-discard-untrimmed' flag set to discard reads that do not have the adapter sequence and a maximum 10% error rate allowed. DADA2 version 1.8.0 (Callahan et al. 2016) was used to de-noise reads (filterAndTrim default settings with the following changes: truncLen of 220 to discard reads shorter than 220 bases and maxEE of 2 to discard reads with more than 2 expected errors) and merge read pairs (mergePairs using default settings). DADA2 was also used to remove chimeras (removeBimeraDenovo using default settings) and amplicons were filtered to remove sequences less than 390 bp or greater than 410 bp. Taxonomy was assigned using the PR2 database version 4.10.0 (Guillou et al. 2013) and rarefaction curves were made in vegan version 2.5-2 (Oksanen et al. 2018) (Supplemental Figure 1).

RNA Sequencing
Equal amounts of total RNA from each triplicate were pooled and sent to the  Keeling et al. 2014). Using this custom database, an initial read mapping step was performed using blastX in DIAMOND version 0.8.37 (Buchfink et al. 2015). The mapping parameters used in DIAMOND were as follows: --top 0 -more-sensitiveevalue .00001. This allows only the top hits with an evalue cut-off of e-5 to be reported as mapped. If there are multiple top hits for a read (e.g., a read matches to multiple protein sequences equally well), the proteins that recruit this read are all given equal weight. There is no apportioning of read counts; a read that maps to only 1 protein is given equal weight relative to a read that maps to 10 proteins. Forward and reverse reads were mapped separately because DIAMOND does not support pairedend read mapping. Proteins with at least 1 read mapped were binned at the phylum, genus, and species level.
After filtering the database to include only those organisms most highly represented in the DIAMOND search (Skeletonema and Thalassiosira diatoms), a subsequent read mapping step was performed using Burrows-Wheeler Aligner (BWA-MEM) (Li & Durbin 2010). The mapping parameters used in BWA-MEM were as follows: -T 60 -aM -k 10, as has been used in other metatranscriptome studies (Alexander et al. 2015). The alignments were filtered in SAMtools (Li et al. 2009) so that only reads with the highest possible quality score (MAPQ60) were counted as mapped to each transcript.

Normalization
To account for changes in species abundance in each library, the resulting raw read counts were normalized as per-million reads mapped to each diatom species in the library using Python pandas (pandas.pydata.org). Transcripts with at least 2 normalized reads mapped in at least 1 library were included in further analysis.

Differential Transcript Expression
Any transcript that passed the expression threshold (at least 2 normalized reads mapped in at least 1 library) were ran through Analysis of Sequence Counts (ASC)  to assess differential expression in the low and high CO 2 conditions relative to the present-day CO 2 condition. The raw read counts were the input to ASC, which is an empirical Bayes approach that estimates the prior distribution based on the input data and is suitable for use in situations where biological replicates were pooled before sequencing , as was done here. The counts for each diatom species were ran individually through ASC ). If the relative abundance for a given species changed across conditions, this would be reflected by a change in library size sequenced from those conditions. These differences in library size are taken into account when ASC builds a model of the transcript counts ).
Transcripts with a greater than 0.95 posterior probability of a 2-fold change were considered differentially expressed.

Metatranscriptome rarefaction
Metatranscriptome rarefaction curves were constructed to determine sufficiency of sequencing depth. First, libraries were re-sampled (10% intervals from 10% to 100% of each library) using seqtk (Li 2018). The reads were mapped back (using BWA-MEM and MAPQ60 filtering) to the subset of genes from Skeletonema and Thalassiosira with at least 2 reads mapped in at least 1 library. The rarefaction curves shown plot the number of transcripts that had at least 2 reads mapped per resampled library (Supplemental Figure 2).

Annotation and Differential KO Expression
The associated protein sequences from the transcripts that passed the expression threshold (at least 2 normalized reads mapped in at least 1 library) were annotated using the KEGG Automatic Annotation Server (KAAS) with the following settings: the bi-directional best hit assignment method using the BLAST search program to query the GENES database plus the model diatoms T. pseudonana and P.
tricorntutum ). To assess differential expression of specific KEGG KOs across CO 2 conditions, genes were grouped first by the genus they originated from (Skeletonema or Thalassiosira) and then clustered by their KO assignment. The total number of normalized reads mapped per KO and per genus were ran through Analysis of Sequence Counts (ASC) . In addition to its use for differential expression of genes, ASC is appropriate for any counts-based measurement, as was done here to determine differential KO expression. Any KO with a greater than 0.95 posterior probability of a 2-fold change was considered differentially expressed.

Identification of carbon concentrating mechanism components
Protein sequences from carbonic anhydrases and bicarbonate transporters previously identified in Thalassiosira or Phaeodactylum diatoms (Tachibana et al. 2011;Kikutani et al. 2016, Wallace et al., in prep) were queried against a BLAST database of protein sequences corresponding to transcripts that passed the expression threshold (at least 2 normalized reads mapped in at least 1 library). Sequences that passed the similarity threshold (35% identity, 80% query coverage per HSP) were considered putative carbonic anhydrases or bicarbonate transporters.

Chlorophyll, nutrients, and carbonate chemistry
In these experiments, filtered seawater (including only the <180 µm size fraction) from Massachusetts Bay was incubated at three different CO 2 levels, approximating a low/pre-industrial (less than 215 ppm pCO 2 ), present-day ambient pCO 2 in the open ocean (330-390 ppm pCO 2 ), and future pCO 2 projections (780-1320 ppm pCO 2 ) (Tans 2009;. Temperature, chlorophyll, dissolved macronutrients, and pCO 2 were tracked over the course of the experiment, and the initial and final time points also included measurements of BSi, POC, PON, and POP.
Metatranscriptome and 18S V4 rDNA amplicons were sequenced from the final time point.
The target CO 2 levels were steadily maintained over the duration of the incubation ( Figure 1A). At the final time point, the chlorophyll in the high CO 2 bottle was comparatively lower than the present-day open ocean pCO 2 (330-390 ppm pCO 2 ) condition (1-way ANOVA; p<0.05) ( Figure 1B). This corresponds to significant NO 3 -+NO 2 -( Figure 1C) and PO 4 ( Figure 1D) left unused in the high CO 2 incubations relative to the present-day open ocean pCO 2 incubations (1-way ANOVA; p<0.05).
There was also significantly more Si(OH) 4 left in the high CO 2 incubations ( Figure   1E). This is consistent with the measurements of lower POC ( Figure 1F), PON ( Figure   1G), and POP ( Figure 1H), in the high CO 2 condition relative to the present-day open ocean pCO 2 (1-way ANOVA; p<0.05), although there was not a significant difference in the BSi measurements across CO 2 conditions ( Figure 1I) (1-way ANOVA; p<0.05).
For each of these measurements (chlorophyll, macronutrients, PON/C/P), there was no statistical difference between the low CO 2 condition and the present-day open ocean condition.
The differences observed in the high CO 2 condition relative to the present-day open ocean CO 2 condition do not track over the course of the experiment -previous to the final time point, the drawdown of NO 3 -+NO 2 - (Figure 1C), PO 4 ( Figure 1D), and Si(OH) 4 ( Figure 1E) were similar across CO 2 conditions. POC ( Figure 1F An additional possibility is that there has been some differential grazing pressure on the phytoplankton in the high CO 2 incubations, as the pre-filtering would not have removed all microzooplankton grazers. This could be significant, as micrograzers consume 67% of phytoplankton daily growth (Calbet & Landry 2004).
While this could explain the loss of chlorophyll, the surplus of dissolved nutrients ( Figure 1B, 1C, 1D, 1E) suggests that there is some other mechanism constraining phytoplankton biomass accumulation in the high CO 2 final time point. Previous mesocosm studies have also found that microzooplankton grazing rates were not influenced by pCO 2 level (Suffrian et al. 2008). One possibility is that phytoplankton viruses have increased in concentration or activity in the final time point high CO 2 condition. This could be important, as a 20% increase in viral load led to a decrease in bulk carbon fixation rates by nearly 50% (Suttle 1992). However, previous studies have reported no difference in viral infection across CO 2 conditions (Tsiola et al. 2017). An additional possibility is that high CO 2 has led to a disturbance in intracellular ion and redox homeostasis and a decreased growth rate, which has been 149 observed in the diatom T. oceanica (Wallace et al. in prep), and so seems to be the most likely explanation for the decreased chlorophyll and nutrient uptake observed in the high CO2 condition at the final time point. Rarefaction analysis from the 18S V4 rDNA amplicon sequencing are asymptotic (Supplemental Figure 1). This suggests that we have saturated the diatom V4 amplicon diversity at the species level in the PR2 database. Rarefaction of mapped reads from the metatranscriptome sequencing are approaching an asymptote (Supplemental Figure 2). This suggests that we are approaching saturating the diversity of transcripts detected (with at least 2 reads mapped) from Skeletonema spp.

Metatranscriptome reads are dominated by
and Thalassiosira spp. in the MMETSP database.  .

Diatom genera show different patterns of expression of KOs for C, N, and P acquisition and metabolism across CO 2 conditions
Based on their expression profiles, Thalassiosira spp. and Skeletonema spp.
seem to have distinct strategies for generating glutamate --Thalassiosira spp.
regenerates glutamate from glutamine (glnA and gltD) and Skeletonema spp. seems to preferentially regenerate glutamate from proline (1P5C), all of which are higher in low CO 2 relative to their expression level in present day CO 2 (Figure 3, Supplemental Table 1). The increased expression of these KOs (glnA, gltD, IP5C) in low CO 2 could be indicative of shuttling more amino acids toward glutamate, which can fuel the citrate cycle via its conversion to α-ketoglutarate (Figure 3, Supplemental Table 1).
Skeletonema spp. and Thalassiosira spp. differentially express KOs mediating their nitrogen acquisition strategies across CO 2 conditions (Figure 4, Supplemental  Figure 1C). There is no compensation in the expression of ammonium transporters (AMT) across CO 2 conditions (Figure 4, Supplemental upregulates an amino acid transporter (APA) in high CO 2 , which is not expressed in Skeletonema spp. (Figure 5, Supplemental Table 1). This is consistent with a previous metatranscriptome study that found Thalassiosira spp. highly expressed amino acid transporters, but Skeletonema spp. did not (Alexander et al. 2015). However, both diatom genera show a broad upregulation of KOs involved in the degradation of amino acids leucine, valine, and isoleucine in high CO 2 (Figure 5, Supplemental Table 1).
This suggests that these amino acids might be a source of recycled nitrogen for Skeletonema and Thalassiosira spp. The high CO 2 decrease in expression of nitrate transporters and increase in expression of organic nitrogen metabolism genes is consistent with previous studies in Thalassiosira pseudonana that saw a decrease in nitrate uptake in elevated CO 2 (Shi et al. 2015;Hennon et al. 2014).
These two diatom genera show contrasting complements of phosphate metabolism KOs. While Skeletonema spp. constitutively expresses two phosphatase genes (ACP5 and phoD) involved in scavenging phosphate from internal pools and an SLC20 sodium-dependent phosphate transporter under low CO 2 , they are not expressed in Thalassiosira spp. (Figure 6, Supplemental Table 1). Both diatoms constitutively express an SLC34 sodium-dependent phosphate co-transporter, but it is lowly expressed in Thalassiosira spp. (Figure 6, Supplemental Table 1). This is consistent with a previous metatranscriptome study that found genes involved in phosphate uptake and scavenging were more highly expressed in Skeletonema than Thalassiosira (Alexander et al. 2015).

Expression patterns in ion and redox homeostasis KOs indicate diatoms experience increased oxidative stress in high CO 2
The final incubation time points in high CO 2 saw lower chlorophyll, lower POC, PON, and POP, as well as less nutrient drawdown (1-way ANOVA; p<0.05).
( Figure 1). One possible explanation is that high CO 2 has led to a disturbance in diatoms' intracellular ion and redox homeostasis, as has been observed in the diatom  Table 1), a key mediator of oxidative stress (Noctor & Foyer 1998). Thalassiosira spp. shows a low CO 2 downregulation and a high CO 2 upregulation of glutamate-cysteine ligase (GCLC) and glutathione Stransferase (GST), while Skeletonema spp. shows a high CO 2 upregulation of glutathione synthase (GSS) and glutathione reductase (GSR) (Figure 7). The upregulation of oxidative stress genes is consistent with a previous study that saw an upregulation of these genes in Thalassiosira oceanica grown in high CO 2 (Wallace et al., in prep). In addition, the high CO 2 upregulation of glutathione reductase and glutathione synthase in Skeletonema spp. is consistent with previous studies that report 154 an upregulation of these genes in CO 2 -enriched Skeletonema marinoi cultures (Lauritano et al. 2015). Diatoms may be regulating glutathione metabolism as part of a mechanism to maintain redox homeostasis, as the carbonic anhydrases in P.
tricornutum are redox regulated and a similar redox regulation could be happening in Skeletonema spp. and Thalassiosira spp. .
Diatoms have distinct complements of carbon concentrating mechanism components that are regulated in response to CO 2 In addition to differential KO expression across genera, we also looked at differential expression on the per-gene level. Skeletonema spp. and Thalassiosira spp.
express a similar suite of carbonic anhydrases (α, δ, γ, and ζ) (Figure 8, Supplemental Table 4) and bicarbonate transporters (SLC4 and SLC26) (Figure 9, Supplemental Table 4). If a carbonic anhydrase or bicarbonate transporter were involved in the CCM, it is expected that their expression would be upregulated in low CO 2 and/or downregulated in high CO 2 . In the case of Thalassiosira, there are α, δ, and ζ carbonic anhydrases that are up in low CO 2 or down in high CO 2 (Figure 8, Supplemental Table   4). This is in contrast with what is observed for Skeletonema, in which there are γ and ζ carbonic anhydrases that are up in low CO 2 or down in high CO 2 (Figure 8, Supplemental Table 4). This suggests that while the α and δ carbonic anhydrases are important for the CCM of Thalassiosira, the γ carbonic anhydrases are important in Skeletonema. It also suggests that the ζ carbonic anhydrases are key players in the CCM of both Skeletonema and Thalassiosira. The disparity in the CO 2 -sensitivity of the carbonic anhydrases may be partially explained by the fact that α and δ carbonic anhydrases have undergone many lineage-specific expansions or gene losses and that Skeletonema and Thalassiosira diatoms contain completely distinct subsets of the δ carbonic anhydrases . It also suggests that these diatoms have different abilities to substitute the typical zinc CA cofactor for another metal --the δ CAs can substitute cobalt (Lane & Morel 2000), while the γ CAs can substitute iron (although this has only been observed in anaerobic conditions) (Tripp et al. 2004). The key role of the ζ carbonic anhydrases for these two diatom genera is particularly interesting, since this class of carbonic anhydrases is found in relatively few diatom species .
Thalassiosira spp. also dynamically regulates the SLC4 and SLC26 bicarbonate transporters in response to CO 2 ( Figure 9, Supplemental Table 4), suggesting that regulating bicarbonate transport is critical for its CCM (Figure 4). In contrast, the Skeletonema spp. SLC4 transporter response is relatively muted but the SLC26 transporters are more sensitive to CO 2 (Figure 9, Supplemental Table 4). The sensitive expression of the SLC26 transporters suggests that Skeletonema may rely on diffusive CO 2 uptake or the non-specific SLC26 anion transporters for bicarbonate uptake rather than the bicarbonate-specific SLC4 transporters .
These different strategies in inorganic carbon uptake are consistent with the hypothesis that historic fluctuations in atmospheric CO 2 have led to a diversification and reconfiguration of the CCM within the diatoms (Young et al. 2016;.

CONCLUSIONS
This study suggests that the CCM of Skeletonema and Thalassiosira use distinct CA isozymes that use different metal co-factors (Lane & Morel 2000;Tripp et al. 2004). We also show evidence suggesting distinct strategies for inorganic carbon uptake: while Thalassiosira uses substrate-specific SLC4 BCTs, Skeletonema seems to rely on diffusive CO 2 transport or non-specific SLC26-mediated transport. This might influence community composition dynamics in high CO 2 because diatoms that can downregulate the SLC4 active transporters in high CO 2 might save some energy and gain a competitive advantage. The co-existence of these diatoms might be related to their CCMs that use different metal cofactors and they might also show unique responses to different metal regimes in high CO 2 .
This study suggests that there are differences in N and P metabolism in these two diatoms. The significance of these differences in the context of ocean acidification is still unclear. Future studies should attempt to constrain the extent that transcript expression levels correlate with protein expression levels, cell physiology (e.g., cellular C:N:P ratios), and rate measurements (e.g., nitrate uptake). Additionally, the enzymatic activity of the CAs and BCTs should be characterized to confirm their putative function, as well as their flexibility of metal cofactor requirements. Results from this study showing distinct responses in nutrient metabolism is in agreement with previous transcriptome studies that suggest there are fundamental differences in N and P metabolism in these co-occuring diatoms (Alexander et al. 2015). Future experiments that combine multiple stressors (e.g., culturing in a range of CO 2 , N, and P levels) could give a more complete picture of how these differences would impact diatom community dynamics. BSi, and (J) temperature °C for each incubation carboy. The x-axis indicates the date the variable was measured and the y-axis indicates the values and variables measured. The shapes and colors indicate the CO 2 condition shown. Points and error bars indicate the mean and range of the measurement across the triplicate incubation carboys. The asterisks indicate significant differences relative to the present day pCO 2 condition (1-way ANOVA; p < 0.05). Dilution days are indicated with black arrows. Solid shapes indicate that the measurement shown was taken before dilution, empty shapes indicate the measurement was taken post-dilution.  . Expression values of the α, δ, γ, and ζ carbonic anhydrases from Skeletonema spp. and Thalassisoira spp. diatoms. Each point indicates a gene and the color indicates which genera of diatom the sequence is from. The x-axis is the expression value in present day pCO 2 and the y-axis is the expression value in either low or high pCO 2 . The black line indicates the point at which the expression of a gene is the same in both conditions. Genes that were differentially expressed (>=0.95 posterior probability of >2-fold change) are indicated by the black star) ). Figure 9. Expression values of the SLC4 and SLC26 bicarbonate transporters from Skeletonema spp. and Thalassisoira spp. diatoms. Each point indicates a gene and the color indicates which genera of diatom the sequence is from. The x-axis is the expression value in present day pCO 2 and the y-axis is the expression value in either low or high pCO 2 . The black line indicates the point at which the expression of a gene is the same in both conditions. Genes that were differentially expressed (>=0.95 posterior probability of >2-fold change) are indicated by the black star)  weissflogii showing the posterior probability of a 2-fold up or downregulation in silicon limitation relative to the nutrient replete condition, the gene length in nucleotides, and the expression values in nutrient replete (Re), silicon limited (Si), iron limited (Fe), and nitrate limited (N) conditions in RPKM and total reads mapped.
Supplemental Table 3. Hierarchical cluster assignments for the silicon-sensitive genes from T. oceanica and T. weissflogii and the log2 fold change expression value in silicon (Si), iron (Fe), or nitrate (N) limited conditions relative to the nutrient replete condition.
Supplemental Table 4. Silix gene family assignments for the translated proteins from the T. oceanica, T. weissflogii, and T. rotula transcriptomes as well as the translated proteins from the T. pseudonana genome (filtered set of gene models from Thaps3 and Thaps3_bd). Each Silix gene family assignment starts with the prefix "FAM". The genes from the Thaps3 data start with a "jgi_THApse_" prefix, followed by the gene number assigned by JGI. The genes from the Thaps3_bd data start with a "jgi_THApbd_" prefix, followed by the gene number assigned by JGI.
Supplemental Table 1. All of the significantly differentially expressed genes across conditions and transcriptomes (>= 0.95 posterior probability of a 2-fold change in ASC) . The KO and description columns are from the KEGG annotations. The gene_ID column entries are formatted such that the first 6 letters indicate which genus (capital letters) species (lower case letters) transcriptome the sequence originates and the following letters and numbers indicate the gene ID within that transcriptome. The counts and RPKM values are the output from the CLC mapping. The next six columns are the posterior probabilities of a 2-fold change compared to the 320-390 ppm pCO 2 condition ("ambient") as per ASC . "Low" refers to the <290 ppm condition, "projected" refers to the 690-1340 ppm pCO 2 condition, and "max" refers to the 2900-5100 ppm pCO 2 condition. The last 8 columns are from the InterProScan output.
Supplemental Table 2. A summary of the enzyme type, predicted localization, expression value, and ASC posterior probability of a 2 fold change  for each gene hypothesized to play a role in the diatom CCM. This includes bicarbonate transporters, carbonic anhydrases, and genes thought to be involved in diatom C4 metabolism. The gene_ID column is formatted such that the first 6 letters indicate which genus (capital letters) species (lower case letters) transcriptome the sequence originates and the following letters and numbers indicate the gene ID within that transcriptome. The additional worksheets provide the outputs from the different localization predictions and the "Predicted Localization" column is the interpretation of these results.
Supplemental Table 3. Selected putative CCM genes from the T. oceanica transcriptome (this study) and their predicted location when compared to the T. oceanica genome (Lommer et al. 2012) . Genes included in this table include bicarbonate transporters, carbonic anhydrases, and genes thought to be involved in diatom C4 metabolism.
179 APPENDIX 3 Supplemental Figure 1. A rarefaction curve of the 18S V4 rDNA amplicons. The xaxis indicates the number of amplicons sampled, and the y-axis indicates the number of species recovered. The color of the line indicates which pCO 2 condition is being shown. Each triplicate carboy is represented.