Uncovering Orthologous Genes of the Ciona Intestinalis Fanconi Anemia Pathway

Fanconi Anemia is a disease caused by any number of mutations in a collection of DNA double-strand repair genes. In silico tests were performed to determine whether any of these genes were conserved in Ciona, which split from humans over 500 million years ago. Among the 22 gene products tested, evidence for 10 orthologs were discovered. Possible orthologs were seen in all three of the major groups of FA proteins.

increased risk for leukemia and other types of cancer (D'Andrea, 2003), hypersensitivity to both DNA crosslinking agents (such as mitomycin C) and DNA replication inhibitors (Howlett, 2005), as well as a host of congenital defects (Glanz, 1982). The Fanconi Anemia pathway is extensive, with fifteen known genes causing the disease (Bogliolo, 2013), and several other proteins involved that may or may not manifest themselves as FA-related. The disease is recessive, with the genes (for the most part, all phylogenetically unrelated) in the pathway found throughout the chromosomes, on autosomes, and in one case, the X-chromosome (Meetei, 2004). The pathway itself is involved in a DNA damage response mechanism (Wang, 2007), repairing instances of DNA crosslinkage (Nidernhofer, 2005) as well as double-strand breakage (Kim, 2012).
The proteins involved in the FA pathway are often subdivided into three groups.
Group I is the Fanconi Anemia core complex. This group consists of eight known FA causing proteins (A, B, C, E, F, G, L, M) and at least two FA-associated proteins. This complex is required for the ubiquitination of FANCD2 and FANCI (Meetei et al, 2003).
When at least one of these core complex genes is disabled, downstream ubiquitination does not occur (Garcia-Higueira, 2001). FANCM has been thought to serve as a scaffold for the rest of the core complex as well as for another closely related disease-causing gene, BLM (Bloom's syndrome) (Deans and West, 2009), though its role in causing FA directly is controversial (Meetei, 2005). FANCL serves as an E3 ubiqutin ligase (Meetei et al, 2003). The functions of the other Group I proteins are currently unclear. The second group consists of the aforementioned FANCD2 and FANCI proteins, which act on genes and proteins directly involved in the DNA repair process. The third group contains the remainder of the proteins, assumed to act downstream from FANCD2 and FANCI, as FA patients with mutations in these genes have normal levels of D2 and I ubiquitination (Wang, 2007). These proteins include nucleases and ATPases which function in DNA repair.
Several of the FA proteins are ubiquitous among the eukaryotes (Zhang et al, 2009). Almost every organism surveyed possessed both of the group II proteins, as well as FANCL, FANCM and an associated ubiquitin-conjugating (E2) enzyme. There is no apparent evolutionary pattern associated with the presence or absence of the other core complex proteins outside of the vertebrates, as some are found in insects, while others are seen in plants and red algae before seemingly reappearing in Nematostella and then again in the vertebrates. Echinoderms, a sister group to chordates, possess at least four of the group I proteins.
Ciona intestinalis is the closest invertebrate relative of the vertebrates. (Delsuc et al, 2006). While in many cases Ciona has lost genes reflecting adaptation to its sessile niche (Hughes and Friedman, 2005), it can still be used to model simplified pathways (Davidson, 2007 andPhilips et al, 2003), as it possesses a simplified vertebrate body plan, most notably as a larva (Satoh, 2003). A previous study focusing on zebrafish (Titus, 2006) looked into the Ciona FA pathway and was unable to find most of the genes. The genes that were found were concentrated in groups II and III, making it plausible that Ciona could at the very least be used as a model for the latter two thirds of the pathway and possibly the minimal pathway for FA function in vertebrates.   (Fraser and Hirsh, 2003), which for the most part returned the same protein, but in three cases returned a Fanconi Anemia gene product which had not been found bythe reciprocal best BLAST method. In instances where the two differed, the case with the higher percent of positive matches was used. In all cases this was the FA gene found by the RSD algorithm.
Protein Information: Each Ciona and human sequence was run through the European Bioinformatics Institute SAPS (Brendel et al, 1992) program for sequence analysis as well as PROSITE (Sigrist et al, 2010) and the Eukaryotic Linear Motif (ELM) database (Dinkel et al, 2013), which in turn uses SMART (Schultz, Bork, Ponting, 1998) to search for possible overarching structures and motifs. The sequences were also checked in BLAT (Kent, 2002-04, Kent, 2002 and the original genes were examined in the JGI genome portal (Nordberg et al, 2014) as well as OrthoDB (Zdobnov, 2008) to look for synteny, though none was found for any of the genes.
Using ClustalX and ClustalΩ (Larkin et al, 2007), I aligned each Ciona FA protein sequence once against the human sequence and Xenopus sequence (to serve as a vertebrate comparison), and once against human, Danio, Xenopus, Strongylocentrotus purpuratus, and mouse Fanconi genes, when those sequences were available, and the closest related human gene by RBB/RSD (Complete listing found in Appendix, Table 3).
For the first alignment, I imported the sequences into JALVIEW (Waterhouse et al, 2009) and isolated the most closely aligned regions. I then put together hydrophobicity plots of each sequence using biopython (Chapman, 2000) and code built and modified from Dalke scientific (Dalke, 2011). To determine whether the results were significant, I determined the Pearson coefficients for the Ciona amino acid sequence against the human and Xenopus sequences (again using Python), derived a beta distribution for each sequence (AbouRizk, 1994), and compared the critical values to a p < 0.02 level of significance. A standard p < 0.05 level of significance with 20 tests gives about a 30% chance of a false positive (Type I error), far too high a level to be acceptable.
Phylogenetic Relationships: The second set of alignments was used as input in PHYLIP (Felsenstein, 1993), where multiple data sets were bootstrapped (5000x), and Tree-Puzzle (default settings) (Schmidt et al, 2002), to determine the relation between the proteins. As there is no sequence homology between most FA genes, closer relationships among the same FA gene in different organisms should be seen than ones between different FA genes in the same organism, or between an FA gene and its second best match through RBB/RSD. In particular, Ciona FA genes should be especially close to D. rerio where available. FANCB: FANCB is part of the FA core complex with an unknown function, and is the only known sex-linked FA gene product (Meetei, 2004

20
FANCL: FANCL is an E3 ubiquitin-ligase and a part of the FA core complex which serves to ubiquitinate FANCD2 and FANCI (Meetei, 2003). Both methods return a putative Ciona FANCL protein with a given E-value of 2 . 10 -74 . Phylogenetic data agrees with this finding (figure 5), showing that the second best human match for Ciona's FancL candidate is not as closely related to any of the group L proteins as they are to each other.
Unlike most of the core complex proteins, FANCL does appear to have an orthologue in

Ciona.
FANCM: FANCM is a part of the FA core complex known to function as a scaffold for several DNA repair pathways not necessarily involving FA proteins (Deans, 2009), being involved in protein degradation (Ali, Singh, Meetei, 2009) and also serving as a DNA translocase (Gari et al 2008 UBE2T: UBE2T is one of many ubiquitin-conjugating enzymes found in the human proteome, and is the specific one found in the FA pathway (Machida, 2006). The reciprocal best blast method returns UBE2L4 as the closest match with an e-value of 2 . 10 -24 , while the reciprocal smallest distance method returns UBE2J1 with an e-value of 9 . 10 -76 . In humans, UBE2T interacts with FANCL to ubiquitinate FANCD2. It is entirely possible that instead of using an orthologous protein, C. intestinalis has repurposed a different E2 ubiquitin-conjugating enzyme for the same job.
FAAP20: FAAP20 is a ubiquitin-binding protein known to interact with and stabilize FANCA (Ali, 2012), (Leung, 2012). It is a poor match for any C. intestinalis proteins; both the RBB and RSD methods return a best match of Transcription Elongation Factor SPT6, with an e-value of 0.054. FAAP20 looks unlikely to have a Ciona orthologue. FAAP100: FAAP100 is a protein known to interact in the core complex with FANCB and FANCL that works to protect all three from degradation (Ling, 2007). Both methods return an L-fucose kinase as the closest match, though with a poor E-value (0.0002) and the tree data also lends evidence to this. FANCO / RAD51C: RAD51C is involved in maintaining chromosome stability as well as playing a part in recombinational repair (Takata, 2001). RBB and RSD both return RAD51 in Ciona with an E-value of 1 . 10 -94 . The human protein contains two globular domains and the Ciona one, and both possess a AAA-ATPase domain. The phylogenetic tree indicates the Ciona candidate is more closely related to RAD51 than RAD51C. This will be an interesting protein to investigate further, as though this particular paralog of RAD51 does not appear to be in Ciona, it could be that a different member of the same family (of which there are several) provides the same function.
FANCP / SLX4: FANCP is an important coordinator of nucleases (Stoepker, 2011). Both methods of searching return a different Kelch-like protein, Kelch-10 for RBB and Kelch-20 for RSD. Kelch domains are found in diverse families of proteins. Kelch-10 was found to be involved in mouse fertility (Yatsenko, 2006), while Kelch-20 is involved in regulating HIF-2αf (Higashimura, 2011 Tree data indicates the Ciona protein more closely matches these human Kelch proteins rather than any Fanconi group P proteins, and is unlikely to be a FANCP orthologue.
FANCQ / ERCC4 / XPF: FANCQ is a repair endonuclease (Bogliolo, 2013) known to interact with FANCA (Sridharan, 2003). Both search methods return XPF as the most closely matching protein, with 50% identity, and 64% positive matches. The tree data agrees with this, and the hydrophobicity plots show a high correlation, excepting one area corresponding to aa 390 -430 in Ciona and 520 -560 in humans. As of this writing, no FANCQ mutations relating to this region have been found (Auerbach, 2014 Considering the proteins listed, I expect FANCJ and RAD51 to possess the same roles downstream of FANCD2 and FANCI in Ciona that they do in vertebrates. FANCQ may not be involved in the pathway at all, as it is thought to interact with the absent FANCA, but instead possibly maintains its presence in the genome because of its interactions with ERCC1, an endonuclease still present in Ciona. Similarly, the ubiquitinconjugating enzyme most closely matching that found in humans may not be orthologous but merely a different E2 brought into the pathway from somewhere else to fill the same function.
These data could help in determining the function of the core complex proteins absent in Ciona that have no known function. With the exception of FANCB, each of the proteins checked against human and Ciona returned a plausible function for a related protein, like human FANCC and C. intestinalis stabilin-like. It could be possible that FANCC's function is similar to that of the stabilin-like proteins, which would explain that RBB/RSD result and those like it.
There are three possibilities that could explain the absence of the majority of the core complex proteins. The first is that the FA core complex had not yet come together as a single unit before the divergence of humans and C. intestinalis. Proteins of the FA core complex can be seen in organisms as diverse as plants, Dictyostelium and arthropods (Zhang, 2009). The second is that many of the DNA repair functions are not necessary for the sessile lifestyle of the tunicate and were not retained as a result -In his paper, Titus (2006) suggested that FANCL may be able to ubiquitinate FANCD2 and FANCI without assistance from the core complex. The third possibility is that the sequences have diverged to an extent that BLAST searches cannot locate them. I feel as this is the least likely scenario, as searches using different block matrices returned the same results as in With this in silico analysis performed, the next step would be to clone these genes and test their expression patterns in living C. intestinalis. Once the genes are found to be expressed (and where), one could grow up Ciona cell cultures, expose them to DNA-crosslinking agents and see if and how the genes are upregulated. The effects of these agents on mutated versions of the Ciona genes could also be tested.
These results indicate that Ciona intestinalis could possibly serve as a model of a minimal FA pathway, and this helps establish a possible lower level at which the pathway can still function. While core complex interactions cannot be done in this organism (and can still be done in humans) Ciona is still able to model interactions downstream from the core complex, as most of the known FA effector proteins are still present in the organism.    The secondary structure plot for FANCM indicates that both the human and C.

APPENDICES
intestinalis candidate possess a standard helicase as well as a DEAD-like helicase, with the DEAD-like helicase about 200 amino acids upstream in both. The hydropathy plot shows high levels of similarity (R 2 = 0.321 for positive matches) and the correlation is even higher towards the N-terminal end (R 2 = 0.446 for the first 500 amino acids) The phylogeny data indicates that the Ciona candidate is more closely related to vertebrate FANCM than it is the second closest Ciona match through RBB/RSD.

FancI
No regions of explicit secondary structure appear on either human or Ciona FancI, though that could be a result of regions in the human gene not being in the system. The hydropathy plot shows high levels of similarity, especially at the N-terminal region.

FancQ
The ELM plot for FANCQ indicates the presence of an ERCC4 nuclease domain approximately 100 amino acids from the C-terminal end of both proteins. The phylogeny data indicates the proteins are less closely related to each other than they are to some of the vertebrate FANCQs.
Proteins that appear to be absent: Group I: FancA The ELM plots for human FancA protein and the Ciona candidate FancA proteins 3494 (best match) and 1694 (second best match) show very little similarity. The best match contains a region of globularity not seen in the human protein, and the protein serving as the second closest match has several protein domains that are not found in the human protein, including a 'zona pellucida' region and two ankyrin repeats. The phylogenetic tree indicates that the Ciona candidate gene product is more closely related to the 2 nd best BLAST match for Ciona than it is to any of the known FancC gene products, giving more evidence that it is not particularly closely related to FancC.

FancF
The hydropathy plot of FANCF in humans against FANCF in Xenopus and the Ciona candidate gene shows very little correlation. The phylogenetic tree also shows the Ciona proteins being more closely related to each other than they are to other FANCF proteins.

FancG
The secondary structure plot for FANCG shows the presence of multiple TPR regions, a common scaffolding structure (Blatch, 1999). The phylogenetic data indicates the Ciona proteins are more closely related to each other than they are vertebrate Fanconi gene products.

FancO
The ELM plot for FANCO indicates the presence of an AAA+ ATPase region in the human protein (which is actually RAD51, a different protein in the pathway) which is entirely absent from the Ciona Fanconi group O candidate.

FancP
The secondary structure data for FANCP indicates a Bric-a-Brac domain in both the human and Ciona candidate genes, which most likely accounts for RBB and RSD giving high similarity scores. The Ciona candidate gene product also possesses several Kelch motifs, another structural region. There is a high region of similarity in amino acids around the region of the Kelch motifs in the Ciona protein, though there is no indication of these in the human FANCP. The phylogeny data indicates that the Ciona candidate gene is more closely related to its second best RBB/RSD match than it is any of the vertebrate FANCPs FAAP20: FAAP24: FAAP100: