IDENTIFICATION OF SECONDARY METABOLITE GENE CLUSTERS OF BACTERIA FROM SOUTH PACIFIC GYRE SUBSEAFLOOR SEDIMENT

Secondary	  metabolites	  are	  organic	  compounds	  that	  are	  not	  directly	  involved in	  the	  key	  processes	  (growth,	  reproduction	  and	  development)	  of	  an	  organism.	  They are	  commonly	  targeted	  in	  pharmaceutical	  science	  for	  drug	  discovery.	  Secondary metabolites	  that	  have	  been	  used	  in	  drug	  discovery	  have	  been	  derived	  from	  plants, invertebrates	  and	  microbes.	  Microbes,	  bacteria	  in	  particular,	  have	  contributed greatly	  and	  will	  continue	  to	  play	  an	  important	  role	  in	  new	  drug	  discovery.	  Among the	  bacteria	  from	  all	  environments,	  marine	  bacteria	  are	  a	  vast	  reservoir	  for	  many potential	  useful	  bioactive	  compounds.	  Recent	  studies	  using	  marine	  bacteria	  for pharmaceutical	  use	  mainly	  focused	  on	  the	  bacteria	  collected	  from	  near-­‐shore sediments.	  However,	  bacteria	  from	  deep-­‐sea	  sediments	  remain	  unexplored. The	  South	  Pacific	  Gyre	  (SPG)	  is	  the	  most	  oligotrophic	  region	  of	  the	  world ocean.	  Due	  to	  the	  low	  surface	  productivity	  and	  distance	  from	  land,	  sediments	  below the	  gyre	  accumulate	  very	  slowly	  and	  are	  characterized	  by	  very	  low	  organic	  carbon content	  and	  relatively	  high	  dissolved	  oxygen	  concentrations.	  Sediments	  from	  South Pacific	  Gyre	  were	  found	  to	  host	  a	  living	  microbial	  community	  that,	  compared	  to other	  marine	  sediments,	  contains	  very	  low	  microbial	  biomass	  and	  very	  low metabolic	  activity. Thus,	  the	  goal	  of	  this	  study	  is	  to:	  (1)	  document	  the	  diversity	  of	  bacteria isolated	  in	  pure	  culture;	  and	  (2)	  explore	  the	  pharmaceutical	  potential	  of	  deep-­‐sea bacteria	  from	  South	  Pacific	  Gyre	  sediment.	  	  To	  address	  this,	  bacteria	  were	  isolated in	  pure	  culture	  from	  sediments	  from	  seven	  sites	  of	  the	  Integrated	  Ocean	  Drilling Program	  (IODP)	  Expedition	  329	  in	  the	  South	  Pacific	  Gyre.	  16S	  rRNA	  genes	  from	  81 bacterial	  isolates	  throughout	  six	  SPG	  sites	  (U1366,	  U1367,	  U1368,	  U1369,	  U1370 and	  U1371)	  were	  sequenced	  for	  phylogenetic	  analysis	  using	  the	  RDP	  (Ribosomal Database	  Project).	  16S	  rRNA	  genes	  were	  amplified	  with	  bacterial	  primers	  that	  have been	  proven	  to	  amplify	  bacterial	  sequences	  well	  (27F,	  1392R).	  Whole	  genomes	  from nine	  Rhodococcus	  isolates	  (with	  two	  isolates	  sequenced	  in	  duplicate)	  from	  four	  SPG sites	  (U1366,	  U1367,	  U1370	  and	  U1371)	  were	  sequenced	  for	  secondary	  metabolites gene	  clusters	  discovery.	  By	  using	  antiSMASH	  (antibiotics	  &	  Secondary	  Metabolite Analysis	  SHell),	  secondary	  metabolite	  biosynthesis	  gene	  clusters	  in	  the	  bacterial genome	  were	  identified,	  annotated	  and	  analyzed. Of	  the	  81	  16S	  rRNA	  gene	  clone	  libraries	  constructed,	  most	  of	  the	  clones	  (63%) affiliated	  with	  the	  genus	  Bacillus,	  35.8%	  were	  affiliated	  with	  the	  genus	  Rhodococcus and	  one	  clone	  was	  identified	  as	  a	  Corynebacterium.	  The	  phylogenetic	  tree	  indicated that	  all	  the	  Rhodococci	  were	  identified	  as	  Rhodococcus	  erythropolis.	  By	  using antiSMASH	  to	  look	  for	  the	  secondary	  metabolites	  gene	  clusters	  from	  the Rhodococcus	  genomes,	  many	  gene	  clusters,	  most	  of	  which	  were	  non	  ribosomal peptides	  (NPRS)	  and	  polyketide	  synthases	  (PKS),	  were	  found	  in	  the	  genomes.	  This study	  suggests	  that	  deep-­‐sea	  sediments	  harbor	  bacteria	  with	  the	  potential	  to produce	  pharmaceutically	  important	  secondary	  metabolites.


IDENTIFICATION OF SECONDARY METABOLITE GENE CLUSTERS OF
Comparison of marine bacteria identified from other marine sediments34 Table 9 Whitman et al. (1998) used ODP cell abundance data to extrapolate the global abundance of bacterial biomass in the marine sediment and argued that the biomass of bacteria in marine sediments is equivalent to 30 of the total biomass on Earth. Parkes et al. (2000) suggested that the biomass of subsurface marine bacteria is equivalent to 10% of the total surface biosphere. Using a much richer dataset that considered factors of sedimentation rate and distance from shore, Kallmeyer et al. (2012) concluded that the microbial biomass in marine sediments is equivalent to 0.6% of the total biomass on Earth.
Despite this reduction in the revised estimate, marine sediment hosts an enormous amount of microbial biomass and therefore, represents a vast reservoir of genomic potential.

Microbial biosynthetic secondary metabolites
Antibiotic compounds found in drug discovery efforts have been derived from plants, invertebrates, vertebrates and microbes. Microbes have made enormous contributions to the health and well--being of people throughout the world. More than two--thirds of the antibiotics used to treat humans are microbial natural products or semi--synthetic derivatives of these (Fischbach & Walsh 2009).
Though secondary metabolic products can be utilized as pharmaceuticals (antimicrobials and anticancer agents etc.), secondary metabolism genes are often overlooked because of their evolutionary relationships to primary metabolic genes.
Advances in genetics, biochemistry, and bioinformatics have contributed to the study of antibiotics and other natural products, not only by revealing how they are synthesized but also by casting them as phenotypes encoded by gene collectives that be studied through an evolutionary lens (Fischbach & Walsh 2009). Antibiotic-encoding gene collectives can converge evolutionarily on similar phenotypes just like other sets of genes that encode adaptive traits. Some distinct gene clusters on similar phenotypes have been converged into a single functional unit, called gene clusters. Examining antibiotics have provided an entry point for studying the natural roles of these natural products. Because the useful lifetime for an antibiotic is relatively short compared to the time that clinically significant resistance emerges (Walsh, 2003), the continuing and cyclical need for new antibiotics to combat the current generation of resistant pathogens compels the scientists and clinicians to search for new sources of antibiotics (Clardy et al. 2006).
Rhodococcus species have been shown to have potential as a commercial product. The wide range of chemicals transformed or degraded by Rhodococcus makes them good candidates for use in both environmental and industrial biotechnology. They play an important role in bioremediation and biodegradation pollutants (Warhurst et al. 1994), biosurfactants and bioflocculants (Finnerty 1992), desulphurization of fossil fuels (Gray et al. 1996) and oil prospecting (Ashraf et al. 1994). Besides, a range of other transformations using Rhodococcus cells or enzymes for pharmaceutical use has been described. A novel and efficient biotransformation producing sec--cedrenol, a compound with potential medical value, has been described (Takigawa et al. 1993). Peters et al. (1993)report possible synthetic uses of carbonyl reductases from Rhodococcus to give a range of compounds that can be used for synthesis of pharmaceuticals and agrochemicals.
Cholesterol oxidases of Rhococcus rhodochrous (Warhurst et al. 1994) have been studied and could have applications in the food industry or in steroid drugs' production (Finnerty 1992, Christodoulou et al. 1994, Kreit et al. 1994). Due to their remarkable metabolic versatility, Rhodococcus isolates were chosen for whole genome sequencing, in order to explore the pharmaceutical potential of Rhodococcus isolates from South Pacific Gyre deep sea subsurface sediment.

Integrated Ocean Drilling Program Expedition 329
In 2011

This study
This study examines the prospects for the discovery of novel microbial biosynthetic secondary metabolites from bacteria isolated from the SPG subseafloor sediment. The phylogentic affiliations of 81 bacteria previously isolated from SPG sediments (Forschner--Dancause 2012) were determined using 16S rDNA sequence analysis. Of the 81 isolates, nine were chosen for whole genome sequencing. The genomes were then analyzed for secondary metabolite gene clusters.

Core Handling and Sample Collection
The Integrated Ocean Drilling Program (IODP) Expedition 329, "South Pacific Gyre Subseafloor Life" was conducted in October -December 2010 onboard the drillship "JOIDES Resolution". Sediment core samples were collected at seven sites along two transects: from the western side of the gyre to the center of the gyre (Site U1368) and from the center southerly, ending at a control site in the upwelling region located southwest of the gyre (Fig. 1). Once a core was retrieved, it was immediately transferred to the catwalk for labeling and cutting of sections before the next core barrel was deployed (Expedition 329 Scientists 2011). This reduced the amount of time the core remained on deck and therefore minimized warming of the samples. All core sections to be sampled for microbiological studies were transferred from the drilling platform to the Hold Deck refrigerator (~7 °C -10 °C) as quickly as possible and kept as whole--core sections until processed. The core liner was cut by the standard IODP core cutter and cut with an ethanol--wiped spatula. Since the core liner is not sterile and the outer surface of the core is contaminated during drilling, subsampling of whole--round cores excluded the sediment next to the core liner. Subcores were taken with sterile 5 cm 3 syringes that had the luer--lock end removed. The subcores were remained in the syringe barrel inside sterile bags and stored and shipped at 4 °C for future onshore cultivation efforts (Expedition 329 Scientists 2011).

Bacterial Cultivation and Cultured Strains Purification
Bacteria were isolated in pure culture by Dr. Forschner--Dancause (Forschner--Dancause 2012). Sediment slurries were made by aseptically removing the exposed end of the subcore and then sampling approximately 0.5 cm --3 of sediment from the center of the core. The samples were added to a 4 mL vial containing 1 mL sterile artificial seawater (30 g Instant Ocean per 1 L diH2O) and mixed using a vortex mixer. The resulting slurry was allowed to settle. 100 μL of overlaying water was drawn off the slurry and spread on plates made of various marine media. The sealed plates were incubated at room temperature and observed daily for bacterial growth.
Individual bacterial colonies were sampled with a sterile loop and streaked out onto a fresh isolation plate of the same media from which it was isolated. To purify the colony, a well isolated colony was streaked from the first isolation plate onto a second one of the same media and then onto a third one containing Yeast Peptone media (1 g yeast extract, 5 g peptone, 22.5 g Instant Ocean and 15 g agar per 1 L diH2O). For cryopreservation into the culture collection, the strain was grown in YP broth (1 g yeast extract, 5 g peptone, 22.5 g Instant Ocean and 1 L diH2O) for 48 hrs at 24 °C, shaking at 175 rpm. For preservation of the cells, glycerol was added to the culture for a final concentration of 20% and the strain was frozen at --80 °C (Forschner--Dancause 2012).

16S rDNA sequencing DNA Extraction
For this project, frozen bacterial isolates were revived by incubating them on YP media plates for 3 to 4 days at room temperature. Well isolated colonies were streaked onto fresh media and allowed to grow. Colonies from the new plates were picked and the DNA was extracted using the Lyse and Go PCR Reagent (Thermo Scientific, Cat. No.: PI78882) according to the manufacturer's instructions. Each colony was suspended in a 10 μL Lyse and Go Reagent.

16S rDNA PCR
The primers used in the DNA amplification were: 10 μM B27F and 10 μM 3 μL sample were well mixed with 2 μL loading buffer and 3 μL PCR product and then added into the wells of the agarose gel. The DNA templates for sequencing were prepared in 12 μL reaction volumes containing primers and molecular biology grade water according to the following standards: (1) PCR products: 2.5 ng DNA/100 bases per reaction; (2) Plasmid templates: 300--500 ng per reactions; (3) Primers: 5 pmol per reaction. In order to facilitate pipetting to mix the solution well, the reactions were submitted to Genomics and Sequencing Center (GSC) in URI in duplicate (24 μL) and more than 16 reactions were prepared in strip--tubes.

Sequence Analysis
Purified PCR products were sequenced in the forward and reverse direction in separate reactions. The forward and reverse sequences for each sample were aligned using Geneious software 7.0.6 to obtain a composite sequence. The quality of each sequence trace was visually assessed and poor quality sequence was edited and removed manually. Samples were identified for each assay by running the

Whole Genome Extraction
Genomic DNA of nine Rhodococcus isolates (with two individual isolates sequenced in duplicate) was extracted by using Wizard ® Genomic DNA Purification Kit (Promega, Cat. No.: A1120). Considering that Rhodococcus is a Gram--positive bacterium that has cell envelopes made of a thick layer of peptidogluycans, several steps in the protocol were revised to achieve the optimal results. DNA was extracted according to the steps below: (1) Add 1 ml of overnight culture to a 1.5 ml microcentrifuge tube; (2) Centrifuge at 16,000 x g for 2 minutes to pellet the cells.

Genome Sequences Analysis
The whole genome sequencing work was conducted at the RI Genomics and Sequencing Center, University of Rhode Island by using a next generation sequencing technology (NGS) instrument (Illumina MiSeq). Next generation sequencing is the generic term for methods that simultaneously sequence millions of small fragments of DNA prepared from an entire genome, transcriptome, or smaller targeted regions in a single run of the instrument. Genomic DNA was fragmented using a focused ultra--sonicator (Corvaris S220) in order to produce DNA fragments in the 500 -700 bp range. The library is then linked to adaptors that enable the fragments to bind to a glass slide or flow cell. Eleven different adaptors were used in this experiment (Table 2). In the Illumina instruments, the immobilized templates are clonally amplified to generate millions of molecular clusters each containing 1,000 copies of the same template. The clustered templates are then sequenced using Illumina's sequencing--by--synthesis technology.
In this process, the addition of fluorescently labeled nucleotides liberates one of four colors that are detected by laser excitation and high--resolution cameras in every run cycle (Mardis 2008). edited by deleting the uncertain nucleic acid base, which is denoted as "N" in the raw sequences. Thus all the genome sequences will be composed by only 4 nucleic acid bases afterwards, which are "A", "G", "C" and "T"; (2) Sequences Assembling: both the forward sequence and reverse reads of each sample were assembled by using "De Novo Assembly". A new sequence, which is the assembled sequence, was built; (3) Whole genome summary reports were assembled in CLC.
antiSMASH (http://www.secondarymetabolites.org/) is the first freely available comprehensive software package capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non--ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta--lactams, butyrolactones, siderophores, melanins and others). The program aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross--links all previously available secondary--metabolite specific gene analysis methods in one interactive view (Medema et al. 2011). Uploading the assembled sequence to antiSMASH, a report containing some known classes of secondary metabolite biosynthesis gene clusters with detailed NRPS functional annotation and chemical structure of NPRS is generated.

19
Ten different operational taxonomic units (OTUs), defined as ≤97 % similarity between sequences were found among the 51 Bacillus strains (Appendix A=B) and 29 Rhodococcus (Appendix A). Two OTUs -OTU 7 (Rhodococcus) and OTU 8 (Rhodococcus) may represent new species, since they showed less than 97% similarity to sequences previously submitted to the database/published sequences.
Each of these sequences were found in only one strain. Eight OTUs were identified in the Rhodococcus genus, which were attributed to species Rhodococcus erythropolis and Rhodococcus qingshengii. Two OTUs were identified in the Bacillus genus, which were attributed to species Bacillus pumilus and Bacillus safensis.

Whole Genome Sequencing
Nine Rhodococcus strains' whole genomes were sequenced (   T1PKS  2  2  2  2  2  2  2  2  2  Unknown  3  3  4  3  3  3  3  3  2  NRPSBTerpene  1  1  1  1  1  1  1  1  1  Lantipeptide  1   (from Site U1367D) --present the gene clusters of all the five NRPS domain patterns. P*1 P*2 P*3 P*4 P*5 MZ1 1 1    (Jamieson et al. 2013 Crozet Islands, Southern Ocean Gamma Proteobacteria, Alpha Proteobacteria Gamma Proteobacteria is the common and sometimes dominant phylum in bacterial communities from marine sediment. In addition, a recent study demonstrated that three Gram--negative bacteria belonging to the genus Luteimonas were isolated from sediment collected during EXP 329 in the South Pacific Gyre (Fan et al. 2013). However, both Gamma Proteobacteria and Luteimonas were not identified in the isolates in this study. In addition, the three genera identified seem to imply that the diversity of cultivatable bacteria in this deep--sea subseafloor sedimentary environment is low compared to those isolated from marine sediment in other locations.
It is well known that only a very small minority of bacteria living in marine sediment can currently be isolated in culture. The limitations growth media in agar plates to isolate bacteria was apparent to Morita and ZoBell (Morita & ZoBell 1955) when they isolated organisms from the Philippine Trench. They argued that one recipe for media may support the growth of a subset of the community while another recipe could support another subset. Novel efforts to bring more marine bacteria into pure culture show great promise. Jesen et al. (1996)were able to increase the number of bacteria isolated from the surfaces of marine algae by using low nutrient agar. Zengler et al. (2002)used a combination of microencapsulation of individual cells in agar beads with low nutrient agar to bring marine bacteria into culture in a massively parallel approach. It is clear that including these methods, as well additional innovation in needed to increase the number of marine microbes in pure culture.
It must be noted that even though we strive to increase the diversity of bacterial isolated from marine sediment we still cannot come close to determining either the total biomass or the community diversity via culturing. Based on both microscopic evidence and environmental DNA studies, is very clear that the vast majority of microbes in the marine environment resist our attempts to bring them into isolated cultures. Staley and Konopka (1985) brought used the term "the great plate count anomaly" to describe the difference in orders of magnitude between the numbers of cells from natural environments that form colonies on agar media and the numbers countable by microscopic examination (Jannasch & Jones 1959).
Although these observations apply to all parts of the ocean, the cultivation conditions in the laboratory are even more different from the deep sea sediment environment which is salty, cold and under high pressured. The rapid environmental change may force some of the bacteria to activate different mechanism to protect themselves, such as spore formation. Although 16S rDNA phylogeny analysis revealed two Rhodococcus OTUs (OTU7 and OTU8, each has only one Rhodococcus strain) with less than 97% similarities with known sequences/published sequences (refer to Phylogenetic tree). Unfortunately, none of them was selected for whole genome sequencing since we hadn't got the phylogeny analysis results by the time we determined to do whole genome sequencing. However, the presence of two unknown Rhodococcus strains may be new Rhodococcus species, which will reinforces the need for new studies in SPG deep--sea sediment.
Higher coverage revealed the identification of lantipeptide gene clusters, though the assembly results didn't seem to get improved. This triggers the thoughts that whether coverage affects the gene clusters identified and how to determine the priority coverage for the best gene clusters identification results.
Since Tao's literature also focused on Rhodococcus erythropolis, a comparison was made in Table 9. A similar number (23) of secondary metabolites gene clusters were found in the Rhodococcus (Tao et al. 2011  Rhodococcus isolates distributed throughout four SPG sites (U1366F, U1367D, U1370F and U1371F). No bacteria isolates were sequenced from sites U1365 and U1369 in this study by chance. Although the sedimentary environment varied from site to site, the OTUs of both Bacillus and Rhodococcus seemed to be unaffected.
Bacillus strains' depth profile revealed that Bacillus strains distributed throughout the sediment cores from shallow section to deeper section. However, Rhodococcus strains' depth profile revealed that Rhodococcus strains tended to distribute in the shallower sections of the sediment cores.
Further study of sequencing the whole genomes of 11 selected Rhodococcus isolates throughout three sites (U1366F, U1367D and U1371F) showed that the G+C 39 content of the genomes was constant with a value of 62%. By annotating the genomes to look for the secondary metabolites gene clusters using antiSMASH, the results further revealed that various secondary metabolites gene clusters were found in the Rhodococcus genomes, dominated by NRPS and PKS. Five major domain patterns of NRPS in total were found in all the genomes, while Pattern 2 and Pattern