Assessment of Fluid Imaging for Determination of Diatom Assemblage Composition and Biometrics of Southern Ocean Sediment

Diatoms are important ecologic indicators whose assemblage, chemistry, and valve features are reflections of their original environmental conditions. Fossil diatom biometrics are an emerging measurement introduced to supplement our understanding of the hydrographic history of the Southern Ocean. Here, we present a novel method to simultaneously measure fossil diatom assemblage and biometrics using a FlowCam, an instrument combining features from a flow cytometer and microscopic camera. It offers, computerized automatic identification to supplement manual, visual identifications, leading to increased counts and biometric measurements. To assess the viability of the FlowCam as a paleoceanographic tool, a FlowCam measured data set was compared to previously published diatom assemblage and biometric data generated by traditional microscopic methods from a Southern Ocean sediment core. Diatom assemblages and the biometric lengths of Fragilariopsis kerguelensis measured with the FlowCam showed similar trends to those produced by traditional microscopy. The biggest difference was the relative occurrence of Eucampia antarctica, which was observed more frequently using the FlowCam. The high biometric data output from the FlowCam was used to determine an empirically derived, minimum sample count and confidence intervals for future best practices.

. South of the Antarctic Polar Front, diatoms are the main primary producer and important carriers of carbon and dissolved silica into the deep ocean (Singer and Shemesh, 1995;Cortese and Gersonde, 2007). The biological exchange of nutrients is an important factor in diatom growth dynamics and ultimately, sequestration of atmospheric carbon by the ocean (Cortese and Gersonde, 2007). In turn, consumption of nutrients by diatoms in the Southern Ocean has widespread impacts on the global nutrient distribution related to subsurface water masses formed in the Southern Ocean (Sarmiento et al., 2004;Cortese and Gersonde, 2007). While some diatoms are cosmopolitan in nature, many diatom species are endemic to specific ranges of environmental conditions including temperature, salinity, sea ice presence, and nutrient availability. As a consequence of these narrow ecological preferences, diatom assemblages are used to characterize the physical properties in which the community lived (Cermeño and Falkowski, 2009;Baas-Becking, 1934;Zielinski and Gersonde, 1997). The diatom fossil record of the Southern Ocean is an excellent environmental archive of major climatic and physical changes over time (Burckle and Cooke, 1983).
Across the Southern Ocean, the biogeographic distribution of many polar species are restricted by the temperature and salinity gradients of oceanographic fronts, the sea ice edge, and regions of heavy mixing or relatively stratified water (Zielinski and Gersonde, 1997). Over glacial time scales, fluctuating limits on species extent are manifested as variations in the diatom assemblage. Despite the fact that assemblage composition is altered during deposition in the sediment, through dissolution of specific groups of diatoms, the remaining fossil assemblage and its chemistry are representative of surface hydrology (Pichon et al., 1992;Crosta et al., 2005;Zielinski and Gersonde, 1997). Fossil diatom assemblages provide an excellent first order assessment of environmental conditions and when combined with geochemical methods, they are likely to improve the quantitative nature of reconstructions related to nutrient dynamics as well. Emerging biogeochemical studies of nitrogen, carbon, and silicon are using diatom frustule associated stable isotopes to understand the degree of nutrient consumption over time (Singer and Shemesh., 1995;Sigman et al., 1999;Robinson et al., 2004;De la Rocha et al., 1997;Popp et al., 1999). In the case of carbon and nitrogen, the organic matrix is thought to be naturally protected from diagenetic processes by the siliceous biomineral surrounding it (Singer and Shemesh., 1995;Sigman et al., 1999;Robinson et al., 2004;De la Rocha et al., 1997;Popp et al., 1999). Assemblage is needed for these studies to account for difference in how individual species contribution record the isotopic signature of the water in which they grew (i.e. degree of fractionation) (De la Rocha et al., 1997;Des Combes et al., 2008;Horn et al., 2011A;Sutton et al., 2011;Sutton et al., 2013;Studer et al., 2015).
Because size is an important factor in explaining variations in the biogeochemical parameters of diatoms (Sarthou et al. 2005), it should be beneficial to compare the volumetric contributions, as opposed to simple counts, to evaluate the biogeochemical contribution of diatom species.
The measurement of a diatom's shape and size, known as biometrics, is a recently developed tool for environmental reconstruction. Morphology appears to be directly related to the growth conditions and productivity of the diatom community (Crosta, 2009;Burckle and McLaughlin, 1977). The size of diatoms, specifically their relative volume and surface area, can affect their internal chemical composition and their ability to uptake nutrients, making it an important consideration for understanding growth dynamics (Sarthou et al. 2005;Wilken et al., 2011). Diatoms have a two-phase reproductive cycle where asexual reproduction leads to a reduction of cell size and sexual reproduction restores size to its initial condition (Edlund & Stoermer, 1997).
The initiation of sexual reproduction is thought to be related to primary production, occurring earlier when production rates are high (Burckle and McLaughlin, 1977;Assmy et al., 2006). A positive relationship has been found between diatom abundances and size (Cortese and Gersonde, 2007;Crosta, 2009), consistent with the idea that growth conditions affect both. Diatom size, in turn, becomes an important factor in predatory protection and interspecies competition both of which have effects on assemblage and bloom dynamics (Wilken et al., 2011). Biometric studies of Southern Ocean diatoms have been shown to complement assemblage information by improving estimates of frontal position, as a secondary stratigraphic indicator of glacial terminations, and as an additional constraint on nutrient characteristics of the opal belt (Cortese and Gersonde, 2007;Burcke and Cooke, 1983;Jouse et al., 1962).
The biggest factor preventing biometrics from becoming a more commonly utilized parameter is the labor intensive nature of measuring biometrics using traditional microscopic methods.
The traditional method for categorizing fossil diatom assemblages, slide-based light microscopy, provides an excellent standard for the identification of diatom species despite some known biases (Moore, 1973;Law, 1983). The largest of these is not related to the microscopic method itself but rather from slide creation and covering (Battarbee, 1973;Moore, 1973;Drooger, 1978;Schrader and Gersonde, 1978;Law, 1983). Counting bias can be influenced by size, depending on the conventions for counting fragmented particles, or by counting area (Law, 1983). By counting 300-400 diatoms, species making up less than 5% of total may have very high error in their relative count (Drooger, 1978;Schrader and Gersonde, 1978). In addition to these biases, traditional microscopic identification is labor-intensive, limiting the quantity of data that can reasonably be collected during an experiment's time frame. While this factor does not significantly impact studies designed to look at assemblage changes, where counts of 300 diatoms per sample are meaningful and relatively rapid (Schrader and Gersonde, 1978;Law, 1983;Zielinski and Gersonde, 1997), time consuming supplemental analyses, like diatom biometrics, are generally restricted for practicality.
A significant drawback to using traditional microscopy for biometric analysis is that most studies limit measurements to only spatial changes, temporal changes, or a single species to accommodate the time requirement. In addition, as frustule based geochemical proxies are increasingly common, estimates of relative volumetric species contributions, rather than simple counts, become more important, increasing the need for comprehensive biometric data (Shukla et al., 2013).
To address these limitations, we present a potentially complementary method to microscopy for assessing first order changes in a sedimentary diatom assemblage that provides robust biometric data as well. This method is not meant to replace microscopy in evaluation of diatom assemblages but rather to provide a tool to estimate changes in the relative contributions of major groups and provide biometric data for each valve counted.  (Orsi et al, 1995;Gersonde et al., 1999). I chose TN057-13 PC4 for its established diatom stratigraphy (Crosta, personal communication, 2010;Nielson and Hodell, 2005), high resolution glacial/ interglacial age control and opal record (Anderson et al., 2009), available biometric data for two diatom species (Shukla et al., 2013;Shukla et al., 2016), and ancillary geochemical data (Horn et al., 2011A) which I compare to the FlowCam results. The FlowCam was operated with a X10 objective lens and illuminator with a 100 µm flowcell and 1 ml syringe pump. This allowed an effective particle range of 5 -100 µm, though sometimes particles greater than 100 µm appeared because the flowcell's opening size. Trials were performed under the default settings of Auto-Image mode for this flowcell and pump size (which determines rate of flow and imaging rate) (Fluid Imaging Technologies, 2011) except for segmentation threshold (Supplementary C), which was set to a dark pixel value of 10, and flash duration, set so that mean pixel intensity is between 160 and 180. Around 1 -10 mg of disaggregated sample, in this case diatom frustules, physically isolated and chemically cleaned following Horn et al. (2011B), was placed in ~ 10 ml of Milli-Q water to form a liquid slurry which was continuously mixed to ensure homogeneity. Before introducing sample into the FlowCam, at least 0.5 ml of deionized water was introduced in the pipette tip holder. For each sample, a test trial was conducted to confirm adequate focus and particle concentration. If particles physically blocked the flowcell or if the particle density was such that the FlowCam could not digital isolate all particles, samples were diluted until these conditions were met. At the start of each sample trial, between 0.1 and 0.5 ml of sample slurry (depending on estimated concentration) was quickly pipetted into the FlowCam after imaging had begun. As fluid levels began to drop in the pipette holder, deionized water was layered on top of sample to allow the total volume of sample to be imaged. Trials were completed when particle images were no longer being captured. Flowcells were rinsed between trials to ensure no cross contamination. Samples were analyzed in replicate with a target of 10,000 particles per sample collage.
Initially, diatoms were sorted into three major groups: Centrics, Fragilariopsis spp. and Eucampia antarctica Identifications were largely based on Scott and Marchant's Antarctic Marine Protists (2005). Ultimately, the assemblage counts and identifications were a combination of machine and operator effort. From test trials, libraries were compiled to teach the FlowCam the filter values needed to automatically distinguish diatom groups from each other. Libraries were made using at least 60 particles for each group. Centrics and Fragilariopsis spp. were chosen as good candidates for automatic analysis because both had characteristic shapes and size ranges and were different from each other (Fischer, 2002). Identifications are first made by VisualSpreadsheet and then by an operator who looks at both the machine identifications and the rest of the collage for unmarked diatoms. This method ensured that machine identifications are accurate and that whatever could not be identified by machine alone was counted. Diatom counts were defined as being identifiable to one of the three diatom groups and having greater than half of its frustule. In cases where silicoflagellates, Rhizosolenia, or radiolarians could be identified, they were placed in unique libraries but not used for subsequent assessments. After these initial identifications were made, each group was further broken down into species between 100 to 800 cm and 2. Fragilariopsis-poor, 800 cm and deeper, which contains lower Fragilariopsis spp. contribution (40 -70%), higher centric contribution, and a peak in Eucampia antarctica ( Figure 5).
The number of diatoms available for biometric measurements is a function of the composition of the assemblage. While the average trial measured over 500 diatoms for biometric data, most measurements are from F. kerguelensis because it dominated the downcore abundance. Centric abundance contributed to 17% of total diatoms identified and its biometric measurements were available for ~15% of diatoms identified. By contrast, F. kerguelensis biometric measurements were available for 56% of diatoms identified meaning that it has more measurements than centrics. It is noteworthy that the depths between 800 -900 cm, the inferred glacial period, had the fewest identifiable diatoms in its ~10000-particle count, which limited the number of identified could be measured for biometrics during this study, ranging from 42 -81% in individual samples. The unmeasured diatoms were essentially large fragments, identifiable and countable but would give incorrect length/diameter measurements. A key difference in relative abundances between the FlowCam and the microscope counts is the much higher abundance of Eucampia antarctica recorded by the FlowCam relative to the other diatom taxon. While E. antarctica made up less than 5% of the assemblage with traditional microscopy, its proportion in the FlowCam was as high as an order of magnitude greater, making up 36% of the assemblage during the deglacial transition ( Figure 5). The trends of relative abundance are the same for both methods, but the overall contribution of E. antarctica is greater with the FlowCam.
The difference in E. antarctica counts is probably a result of differences in the technique used to separate and concentrate diatom frustules. E. antarctica is more resistant to dissolution and breakage (Pichon et al., 1992) than many other diatom species and it would also appear more frequent if other species' frustules were preferentially broken during the cleaning process, which includes gentle sonification.

COMPARISON OF BIOMETRICS
Existing measurements of F. kerguelensis average length from Shukla et al., Centrics show a unique size trend with a maximum just before the major deglacial increase of F. kerguelensis and decreases toward the present. There was no major change in size range for either diatom group over time or in the size distributions. The trends in both interquartile ranges roughly mimicked that of its average length/diameter, but the quartile changes were almost equally proportional it changes in length. These interquartile changes were not statistically different than what is to be expected from size changes alone. Similarly, there was no distinguishable difference between skewness and kurtosis between glacial and interglacial assemblages. While mean is not normal a meaningful metric in a skewed distribution, because skewness and kurtosis do not change significantly with time, changes in mean are actually show shifts in the whole distribution.

ENVIRONMENTAL INTERPRETATION OF BIOMETRICS
Supplementing assemblage data with the biometric data from multiple groups allows for a more specific interpretation of past events than only having a single group, and in this case, helps to look at growth conditions over the stages of the last deglaciation. In comparing the biometric data from F. kerguelensis and centrics, I observe differences in the timing of their peaks, likely resulting from different environmental preferences, which are in phase with different stages of the deglacial.
My data suggests that two major features of deglaciation at TN057-13 PC4, the retreat of sea ice and upwelling, are not synchronous events and that biometrics helps explain the environmental conditions that existed between them. Deglaciation began and sea ice influence waned at 20 kya where E. antarctica peaks and then sharply declines replicate. The associated biometric data has the potential to provide a more representative assessment of average valve length and will allow us to define an optimal sample size. Here, I will compare these two biometric datasets.
There are two goals when looking to improve the biometric assessments of F. kerguelensis: (1) Ensure the number of counts is large enough to provide a measured average valve length and standard deviation that is representative of the population's true value; and to (2) have small enough confidence intervals that I can distinguish environmentally relevant size changes. F. kerguelensis shows a large variation in size; its normal ecological range is between 8 and 92 µm (Shukla et al., 2013;van der Spoel et al., 1973;Fenner et al., 1976;Assmy et al., 2006). The distribution of F. kerguelensis is right skewed with mean length typically in the mid 30's. The associated high deviation makes it difficult to determine when a representative dataset has been collected.
We suggested above that the FlowCam biometric measurements are statistically indistinguishable to results from traditional microscopy for F. kerguelensis. The FlowCam's true benefit over the traditional method stems from its higher counts and due to the law of large numbers, increased precision of the mean. The ultimate goal of large numbers is to capture the precision needed to differentiate mean values and minimize the effects of outliers. To have a first order estimate of when these goals are met, F. kerguelensis length was plotted as a running average for multiple FlowCam trials (Figure 9). At low counts, these graphs show a random walk that eventually becomes centered around the true population mean. The point at which this random walk begins to show low fluctuation is generally a function of the population's variance and one of the functions used when calculating minimum sample size. I interpret that when fluctuation of the value falls to a minimum, it is closest to the populations' true values and the effects of outliers was minimum. The consensus from these graphs was that this uncertainty often occurred after 100 counts, which means that 100 counts may not adequately define these F. kerguelensis distribution.  (Nair et al., 2015), ~3 µm increase between the Hypsithermal and the Neoglacial (Crosta, 2009), ~4-6 (with a maximum of 10) µm between the last glacial and the Holocene (Gersonde and Cortese, 2007), and at total range of ~10 µm, attributed to geographic position, from samples spread across the Southern Ocean (Gersonde and Cortese, 2007). Therefore, the inferred environmentally influenced range of average valve size changes is between 3 -10 µm. The question remains whether these changes are related to environmental conditions or random variability.
To test this, I assumed that real changes were reflected in a 3 µm mean size shift and then asked how many biometric measurements are needed to narrow confidence intervals sufficiently to see such a change given the variance in the data. In order to showing that much of the variance is preserved across methods. Confidence intervals will be reported as E (Equation 1) which represents the distance between the true population value and its upper or lower limit. These confidence intervals would ideally need to be half of the environmentally significant value, in this case 1.5 µm, or less in order to capture a 3 µm change (the smallest of the recorded changes). Using Crosta's (2009) extremes and assuming I want the measurement to be within 1.5 µm at 95% confidence, you would need to measure 93 F. kerguelensis in the best scenario (standard deviation = 7.4) and over 400 in the worst scenario (standard deviation = 15.5). Because of the 100 counts used, Crosta's (2009) 95% intervals really reflect a range of 1.5 -3 µm, which implies their data cannot be used to resolve a change of 3 µm between points without application of other statistical techniques. This assertion also applies to other biometric studies where the small sample size may not have the confidence intervals to resolve the environmental changes inferred to be related to their data (Crosta, 2009;Cortese and Gersonde, 2007;Nair et al., 2015, Shukla et al., 2013Shuka et al., 2016). 100 counts with the FlowCam yields a similar 95% confidence interval of 1.5 -2.3 µm, showing that this inability to resolve small average length changes is not necessarily based on method, but instead, a result of the high variance of the measurements. This method of resolution only applies when the difference between two points (or series of points) is needed. In many cases, a trend is still significant even in cases where points cannot be individually resolved as different. It appears that, on average, the 100-count standard cannot always precisely represent average lengths and would not likely have small enough confidence intervals to resolve size changes that have been purported to be environmentally significant.
Given an abundant and easily identifiable organism, such as F. kerguelensis, the FlowCam is a good tool for this problem because of the lower time costs than traditional microscopy. Ideally, future studies would like to spend the least amount of time per FlowCam sample, i.e. measuring the least counts, and still capture all of the information needed, including length and standard deviation, with small enough confidence intervals to resolve environmental changes. While the values in Table 1 are slightly underestimated due to the skewed nature of the F. kerguelensis length distribution, they provide a first order assessment of how many counts are needed to decrease our confidence interval. Given the standard deviations calculated in this study, a 300-count per each sample would likely be the lowest count needed to provide confidence intervals less than 1.5 µm (Table 1) in order to resolve 3 µm changes in average frustule length.

POTENTIAL IMPACTS AND FUTURE WORK
The FlowCam seems to be a useful paleoceanographic tool because it can capture first order diatom assemblage information and provide a large number of biometric data for a statistically more robust result. The FlowCam also has the potential to pick out indicator species, such as E. antarctica, for stratigraphic assessment. The ability of the FlowCam to measure first order changes in diatom assemblage can be useful when paired with biogeochemical data, especially in scenarios like nitrate utilization where major diatoms groups are shown to have significant impacts on the total δ 15 N (Horn et al., 2011A;Studer et al., 2015). Future studies could use the high quantity of biometric measurements from the FlowCam to calculate the volumetric contributions of different diatom groups (calculated as a function of length and width) and use them for the basis for comparing biogeochemical records rather than counts, which should be a better representation of the relative contributions to chemistry by a given organism.
This methodology could be improved in the future through analysis of changes within the Fragilariopsis spp. group. F. curta and F. cylindrus, seasonal sea ice zone diatoms, differ from F. kerguelensis, an open ocean diatom, by having opposite environmental and ecological requirements for growth (Crosta, 2009). The distinction between these species could lead to assessment of yearly sea ice cover and would be keystone in any study in which F. curta and F. cylindrus are particularly abundant.
While I was able to distinguish between these species within the Fragilariopsis spp.
group with FlowCam images, identifications for F. curta and F. cylindrus had much lower abundance than is reported in literature. This discrepancy is likely because our FlowCam methodology did not look at particles smaller than 5 µm, so in turn, it does not fully encompass the natural size range of species at this site. While there appeared to be more F. curta/cylindrus in glacial samples, the abundance was so low that it is unlikely statistically robust. It would be, however, possible to capture this range in future experiments by sieving samples into a small size fraction. The smaller size fraction could then be run in the FlowCam with the 50 µm flowcell under the X20 lens to increase magnification needed to view this size range. This method may also be useful in identifying Chaetoceros resting spores which is a proxy for spring ice melting (Crosta, 2009). Similarly, a shift to a larger flow cell, could be implemented for capturing the full spectrum of centric sizes, above 100 µm. There may be issues with using the same sample on two different flowcell sizes because it would involve changing the flowcell between trials or saving sample for later. A valid option for the use of two FlowCams could involve simultaneously inputting the same sample through different FlowCam flowcells in attempts to create a method for mixed distributions and the standardization of volume/material.
The FlowCam can also benefit from additional uses of structural deep learning.
While this VisualSpreadsheet uses the spatial parameters of diatom particles, this is not the only bioinformatic method to quantify and identify diatoms. Other techniques such as symmetry contouring, Fourier descriptors, texture analysis, and striation descriptors can be used as digital data which quantifies the physical properties of a diatom (Fischer, 2004) and many of them can be improved by machine learning.
Because the FlowCam has a large output of diatom images, it would be a useful tool

Introduction
The Auto-image mode of the FlowCam is well suited to imaging fossil diatoms.
Here, we outline a method for imaging fossil diatoms based on the use of relatively clean sedimentary diatoms, where much or all of the clay and other lithogenic and biogenic materials have been removed. The samples were also chemically cleaned to remove external organic matter.

Instrument Setup
Flow Cell Selection -Flow cell size should be larger than base large particle size (in microns). However, the illuminator, magnification lens, and syringe are based on the flowcell size and magnification is inversely related to the cell size. This means that the larger size flow cells result in imaging of a broader range particles but at weaker magnification.

Sample Preparation
Place ~10 mg of dried sample in ~10 ml of deionized water and mix to ensure homogeneity. Use a representative sample to focus the camera. The sample may need to be diluted further if the flowcell becomes jammed or the FlowCam does not capture individual particles.

Camera Settings
See Important Parameters of the FlowCam for settings regarding image quality. In the camera settings, change Shutter Duration until the mean intensity value is between 170 and 180. Change the value of the segmentation threshold to "Only Dark Threshold: 10" (see Segmentation Threshold Test below). Other parameters may be changed, but they are appropriate under the default setting (see Parameterization).

Analyzing a sample
The FlowCam should be prefilled with water, allowing all of the flowcell and half of the sample funnel to be filled. Auto-Image is the mode for running samples without fluorescence. As soon as the calibrations are over and the FlowCam begins to image, quickly pipette < 1 ml of sample into the sample funnel. As the level of water begins to drop, layer milli-q water into the sample funnel. The end point for a trial is adjustable (particle count, sample amount, time, etc...) but can be continuously run and stopped manually. Trials should end when no particles have been captured for around 30 seconds.

Parameterization
Segmentation Threshold, Light and Dark Pixels -This setting, with two parameters, is used to define when the gradient in color between the calibrated background and a potential particle constitutes the outline of a particle. Higher values denote more contrast needed to fulfill criteria. Objects within a sample can differ considerably on how this parameter affects them but changes in range can be identified after a test trial. Objects will have a dark or light "halo" around them. The goal for best biometric parameterization would be to have outline on the edge of particles, inside of the halos. Recommended values: Only Dark Threshold -10.
Particle Capture, Distance to Nearest Neighbor -This parameter sets the distance between two outlines, below which the images are considered to be from one particle. Recommended value -5.
Collage Image Border Padding -This value influences the ability of the FlowCam to "fill in" the particle outline in order to calculate area. When this value is underestimated, it will leave gaps in its fill and underestimate area and when it is overestimated, it will overshoot and artificially inflate the particle outline.

Segmentation Threshold Test
The segmentation threshold distinguishes particles from its calibrated background. When this threshold is too high, the FlowCam will not distinguish all particles that need to be identified. When this threshold is too low, the FlowCam will capture and crop images where no particles exist. The goal for this methodology is to capture all identifiable diatoms while minimizing the number of extraneous (not real) particles. In order to find the best settings for capturing diatoms and measuring biometrics with the FlowCam, a single sample was run in the FlowCam and the data analyzed using different settings. The images from a single FlowCam trial were saved as .RAW files types. This allowed for that trial to be digitally recreated in the FlowCam, allowing processing of particles with different settings. For each setting, diatoms were identified and counted. The setting of "Only Dark Threshold: 10" was chosen for use in this study because it seemed to capture the most identifiable diatoms and did not create erroneous particles. Under this setting, artifacts that affected the outline of particles seemed to be generally less than under settings with higher thresholds. This benefits biometric measurements as the outline seemed to capture the organism's actual edge in most cases.