TRANSCRIPTIONAL REGULATION OF SOX5 AND WWOX BY FANCD2

Fanconi anemia (FA) is a rare genetic disease characterized by an increased risk for bone marrow failure, leukemia, and premature cancers (Alter et al., 2018). The FA pathway is involved in the repair of DNA damage such as stalled replication forks and DNA interstrand crosslinks (ICL) (Feng et al., 2019; Schlacher et al., 2012). It has been previously seen that under conditions of replication stress, FANCD2, a key protein in the FA pathway, binds to and traverses large actively transcribed genes (Okamoto et al., 2018). Using U2OS 3xFLAG cells we seek to answer the question of why FANCD2 is binding to these genes and if by doing so is it acting as a regulator of transcription? To answer this question, we used a candidate gene approach and chose WWOX and SOX5 and observed if under conditions of aphidicolin (APH), there were changes in their protein and transcript levels. We also observed if the absence of FANCD2 would affect their protein and transcript levels. In this study, it was found that under conditions of replication stress there was little change in both the protein and transcript levels of SOX5 and WWOX. Interestingly, the analysis of an RNA sequencing experiment brought to light the potential for FANCD2 to be involved in neuronal development as it was found that there were significant differences in normalized read counts in a subset of neuronal genes in the absence of FANCD2.

. Patients with FA have an increased chance of developing acute myelogenous leukemia and squamous cell carcinomas of the head and neck in comparison to the non-FA population (Scheckenback et al., 2012;Fiesco-Roa et al., 2019). Clinical characteristics of patients with FA include physical abnormalities such as short stature, microcephaly, and abnormal skin pigmentation as well as an overall increased chance of developing cancers (Hays et al., 2014). It is important to note that FA patients should be further subtyped as there is clinical variability amongst the subtypes.
The FA genes are autosomal recessive apart from FANCB which is X-linked, and FANCR which is autosomal dominant. They are involved in the DNA damage repair pathway which is associated with the mending of DNA interstrand crosslinks (ICL) and stabilization of stalled replication forks ( Fig. 1) (Feng et al., 2019;Schlacher et al., 2012). The elimination of DNA damage is necessary for the maintenance of genomic integrity. Inefficient repair of DNA damage can lead to genomic instability increasing the likelihood of cancer commencement and development (Niraj et al., 2019). The common diagnostic test for establishing if someone has FA is by determining if there is cellular hypersensitivity to DNA interstrand crosslinking agents (ICLs) such as mitomycin C (MMC). The exposure of FA cells to this agent results in increased levels of chromosomal aberrations, including chromosomal breaks and radial formations, as well as an increase in cell cycle arrest at the G2/M phase (Kee & D'Andrea, 2012).
The FA protein network can be further divided into groups based on commonalities in function. FANCA, FANCB, FANCC, FANCE, FANCF, FANCG, FANCL, and FANCM function as the FA core complex and together with the E2 ubiquitin-conjugating enzyme FANCT/UBE2T, monoubiquitinate FANCD2 and FANCI. FANCL, FANCB, and FAAP100 can also form a sub-catalytic complex that is able to support low levels of monoubiquitination, while FANCA-FANCG-FAAP20 and FANCC-FANCE-FANCF provide support for the chromatin and DNA damage associated with the catalytic module (Huang et al., 2014). Mutations in any one of the eight proteins in the core complex results in loss of function in the FANCD2/FANCI monoubiquitination (Kee & D'Andrea, 2012). FANCI and FANCD2 are paralogs and when monoubiquitinated, form a heterodimer that will be referred to as ID2 (Joo et al., 2011). Recently, Tan et al., 2020 discovered that the monoubiquitination of ID2 promotes protein:protein interaction and helps to stabilize the ID2 heterodimer onto the double stranded DNA (dsDNA). This clamping action has also been seen to only require the monoubiquitination of FANCD2 but not the dimer (Tan et al., 2020). The monoubiquitination of ID2 promotes the assembly of foci at sites of DNA damage in chromatin to further facilitate DNA repair and protect the genome. FANCI along with the core complex are required to produce FANCD2 foci that mark locations of DNA double-stranded breaks (DSBs), stalled replication forks, and R-loops in the nucleus in order to protect nascent DNA from degradation by nucleases (Taniguchi et al., 2002;Schwab et al., 2015). Following the exposure of DNA to damaging agents, and during S-phase of the cell cycle, the core complex will monoubiquitinate FANCD2 at lysine 561 to signal the activation and translocation to nuclear foci (Garcia-Higuera et al.,

2001).
Post monoubiquitination, there is a recruitment of effectors that cleave the DNA including FANCQ (XPF) and FANCP (SLX4) (Y. Kim et al., 2011;Stoepker et al., 2011). FANCM-FAAP24-MHF also form a complex which acts upstream of the FA pathway to detect DNA damage as well as initiate signal transduction pathways to promote the monoubiquitination (Nepal et al., 2017). The monoubiquitination leads the BRCA2 protein into chromatin complexes to facilitate the assembly of DNA damage inducible RAD51 nuclear foci (Taniguchi et al., 2002;Schwab et al., 2015).
FANCD2 will also oversee the coordination of ICL incision by SLX4-XPF or FAN1 nuclease leading to repair by homologous recombination (H. Kim & D'Andrea, 2012).
While it has been established that the monoubiquitination of FANCD2 is essential for some DNA damage inducer resistance, how FANCD2 does this as well as responds to DNA replication stress is not fully understood.
The replication of the human genome is an intricate process that requires the organized activation and maintenance of replication forks ensuing from many sites of origins of replication during S-phase (Schwab et al., 2015). Precise DNA replication requires numerous factors, including proteins of the FA pathway. The accurate duplication of chromosomes, followed by their even segregation during mitosis is essential for genome stability (Mankouri et al., 2013). Errors that can occur during replication can impact the precise copying of chromosomes and segregation during mitosis. Replication stress is therefore, one of the main sources of genome instability.
Sources of replication stress result from both endogenous as well as exogenous sources and can act either locally or globally. Sources can be categorized into alternations of origins firing, impediments to fork progression, conflicts between the DNA replication and transcription machineries, and DNA replication in unfit metabolic conditions (Magdalou et al., 2014).
Collapsed replication forks are a frequent contributor to spontaneous recombination events and genomic instability, both of which are hallmarks of cancer.
DNA replication originates at thousands of individual replication origins that form bidirectional replication forks. Prior to S-phase, every origin is prepped by combinations of replication initiation proteins to ready the chromatin for replication (Barlow et al., 2013). After commencing, cells need to achieve balance, accuracy, speed, as well as the utilization and administration of necessary resources for example nucleotides and replication factors to complete replication in a timely manner.
DNA replication must also work through parts of the genome that are considered as more difficult to replicate because of an increased probability of breaks and gaps. Regions of the genome known as common fragile sites (CFSs) are more prone to breaks and gaps on metaphase chromosomes likely because of an increase of stalled replication forks at these regions (Debatisse et al., 2012). These regions require the replication fork to progress through CFSs, which causes a reliance on fork-restart mechanisms (Magdalou et al., 2014). When the DNA polymerase has stalled, helicases are continuing to unwind DNA resulting in an accumulation of single stranded DNA (ssDNA) at stalled forks. ssDNA is more unstable and more prone to be acted upon by nucleases (Byun et al., 2005).
During replication stress, regions containing CFSs have paused replication forks and consequently the forks are either at a standstill or moving slower as they move from flanking regions through CFSs to complete replication. This results in the inevitable collision of replication and transcription machineries intensifying genome instability (Debatisse et al., 2012). One result of this collision is the formation of Rloops, which are DNA: RNA hybrids, that result from the nascent transcript reannealing to the corresponding template DNA strand causing the non-template strand to be displaced as ssDNA (Hamperl & Cimprich, 2016). These hybrids can lead to DNA damage because of the exposure of the ssDNA (Sollier & Cimprich, 2015).
Previous studies show that the transcription of large genes can require more than one cell cycle, consequentially resulting in replication-transcription collisions resulting in potential R-loop formations (Hamperl & Cimprich, 2016).
Replication-transcription conflicts are often inevitable as there are many large genes in the genome requiring start-restart replication as well as more than one cell cycle for transcription (Helmrich et al., 2011). Conflicts are unavoidable in the longer human genes because these genes often overlap with CFSs resulting in longer times for transcription. Genes overlapping with CFSs are also replicated late in S-phase and are hotspots for chromosome instability (Le Tallec et al., 2014 As DNA replication continues, it exposes ssDNA because of lagging strand synthesis. These ssDNA are more prone to fold into G-quadruplexes and impede replication fork progression. G-quadruplexes form due to the repetitive nature of the genome and can result in genomic instability (Hänsel-Hertsch et al., 2017). They are non-canonical DNA secondary structures that form because of the interactions of guanines in G-rich sequences where the nucleotides have interacted by Hoogsteen hydrogen bond that have been stabilized by a cation (Maffia et al., 2020).
A frequent hindrance of replication is caused when the replication fork encounters CFSs. These regions have been seen to undergo chromosomal rearrangements in tumors but the correlation between their role in cancer and implementation in replication has only recently been uncovered (Maffia et al., 2020).
Studies have seen that these regions in the genome are sensitive to replication stress and the treatment of cells with aphidicolin, a DNA polymerase inhibitor, can cause breaks in metaphase chromosomes (Glover et al., 2017;Helmrich et al., 2011). When replication forks are confronted with obstacles during S phase, normally dormant replication origins are fired to prevent or rescue the instability. These CFS regions can be large with some being greater than 1 Mb in length as seen in a recent study identifying FANCD2 accumulating at large transcriptionally active genes (Okamoto et al., 2018). The Okamoto study found that FANCD2 preferentially bound to sites that were enriched near the center of the genic region of the large genes. Large genes can have a greater distance between dormant origins, increasing the likelihood of replication stress occurring. When replication stress is encountered at CFSs, because of the short supply of origins, the ability to activate this rescue mechanism is lost (Shima & Pederson, 2017).

Recent studies have investigated if there is a connection between FANCD2 and
CFSs. One study has found that FANCD2 has been shown to reposition to large genes that comprise CFSs after replication stress (Fernandes et al., 2021). It has also been seen that FANCD2 forms foci at CFSs during mitosis, marking these areas as damaged (Chan et al., 2009). As a result of the FA pathway involvement in the coordination of replication and transcription by preventing or mending R-loops, a connection forms between the failure to resolve R-loops, CFS replication, and the genomic instability commonly seen in FA patients. In addition, a connection can be formed as FANCD2 has been seen to promote CFS replication by the regulation of Rloop formation (Okamoto et al., 2019). It has also been previously seen that FANCD2 may help promote the replication of CFSs (Madireddy et al., 2016).
Further recent discoveries involving FANCD2 include that FANCD2 has been seen to accumulate on large actively transcribed genes when under conditions of replication stress (Okamoto et al., 2018). A genome-wide chromatin localization of FANCD2 using chromatin immunoprecipitation sequencing (ChIP-seq) analysis was used to identify the binding footprints of FANCD2. This study found that FANCD2 accumulates in the central region of large actively transcribed genes with some of the genes overlapping with known CFSs in cells that had been cultured in the presence of aphidicolin (Okamoto et al., 2018). This study also assembled a list of the top thirty genes of a total of 120, that FANCD2 was found to bind to under conditions of replication stress (Table 1). However, what this study did not find was why is FANCD2 binding to these regions, and if by doing so it is preventing the transcription in these areas undergoing replication stress.
It has also been recently discovered that many of these large genes are prone to copy number variation (CNV) (Wilson et al., 2015). This is a type of mutation where the gene copy number can vary under certain conditions. Under replication stress, there could be a deletion of one allele and a triplication of the other allele causing the normal number of alleles to increase from two to three. At a locus that is more prone to CNV, there can be considerable variability in copy number and consequently in the levels of transcription of that gene (Wilson et al., 2015). and WWOX in U2OS 3xFLAG cells to answer this question as these two genes encode for proteins found in the top ten of the top thirty genes list and were shown to have broad FANCD2 peaks in the ChIP-seq data (Figs. 3 & 4). Aphidicolin, as previously stated is a DNA polymerase inhibitor and frequent inducer of replication stress.
Through the utilization of aphidicolin as the replication stress inducer, we used immunoblotting as well as quantitative-PCR (qPCR) to analyze both the protein and transcript levels of SOX5 and WWOX. These two genes were chosen as SOX5, is a transcription factor involved in breast cancer regulation, and WWOX, a member of short-chain dehydrogenases and is found to span the common fragile site FRA16D (Driouch et al., 2002;Pei et al., 2014). They were also chosen because both proteins are intracellular making them good targets for immunoblotting using whole cell lysates. We first determined if there was a change in the protein and transcript levels under aphidicolin conditions and hypothesized that when FANCD2 is present and under conditions of replication stress, the amount of protein and transcript levels will decrease. We then used an siRNA to knock down FANCD2, aphidicolin as our replication stress inducer, and analyzed the consequences this had on both the protein and transcript levels and hypothesized that the levels will then increase in the absence of FANCD2.

Top hits order
Chromosome location Xq22 DIAPH2  -339  30  2q22  THSD7B  -322 Genes with (*) are related with autism, neurodevelopmental or psychiatric disorders. The number of ChIP-seq reads found in a gene or in a region that consist of a combined gene of CFS. SOX5 and WWOX have been highlighted.  (Kalb et al., 2007).

Replication Stress assay
U2OS 3xFLAG cells were plated at 1 x 10 6 cells/mL and grown for 24 hours in 10 cm 2 dishes in a total of 8 mL DMEM media supplemented with 10% FBS, 1% (vol/vol L-glutamine, and 1% (vol/vol) penicillin-streptomycin. Cells were then treated with 0.4 µM aphidicolin (APH) for 24 or 48 hours. One set of cells were then released from APH treatment and allowed to grow for an additional 24 hours. Cells were harvested 24 hours after APH treatment, 48 hours after APH treatment, and 24 hours after release from APH treatment. ACHT hTERT FANCD2 and ACHT hTERT FANCG cells were also subjected to same experimental conditions. Cells were trypsinized using 0.05 % trypsin for 4 minutes. Media was then added to neutralize the trypsin and cells were spun down for 4 minutes to pellet in 15 mL centrifuge tubes. Media was then aspirated and cells were pelleted by centrifugation and resuspended in PBS siRNA U2OS 3xFLAG cells were plated at 1.75 x 10 5 cells per well in a 6 well plate and the following day treated with a 20 nM FANCD2 siRNA (target sequence: AACAGCCATGGATACACTTGA) or an siControl, and lipofectamine 2000 (Thermo Fisher) (Howlett et al., 2005). siControl is a random sequence that does not target a specific gene. siFANCD2 or siControl was left on for 24 hours, removed and allowed to recover for 24 hours. Cells were then incubated in the absence or presence of 0.4 μM APH for 24 hours, 48 hours, or 24 hours plus 24 hours of release to induce replication stress. Cells were harvested 24 hours or 48 hours after APH treatment. The control for this experiment was a no treatment siControl in the presence and absence of APH. The siControl is used as it is an siRNA but does not target a specific gene.

RNA Isolation
U2OS 3xFLAG cells were grown in 10 cm 2 dishes and treated using the replication stress assay as stated above. Cells were trypsinized using 0.05 % trypsin for 4 minutes.
Media was then added to neutralize the trypsin and cells were spun down for 4 minutes to pellet in 15 mL centrifuge tubes. Media was then aspirated and cells were pelleted by centrifugation and resuspended in PBS. Using the PureLink RNA Mini Kit (ThermoFisher) RNA was isolated, 80 µL/sample of a ~3 U/µL DNase treatment (ThermoFisher) was applied to the column provided in the PureLink RNA Mini Kit, and RNA was quantified using Nanodrop 2000 (ThermoFisher).

cDNA Synthesis
RNA was converted into single-stranded cDNA using the High-Capacity Reverse Transcription Kit (ThermoFisher). 1 µg of RNA was used from each sample. cDNA quantification was done using Nanodrop 2000 (ThermoFisher). 500 ng of cDNA was used in quantitative real-time PCR (qPCR). PCR products were tested on a 2% agarose gel using 1x TBST and 10 µL SYBR SAFE DNA gel stain. 2 µL of 6x gel loading dye was added to the 10 µL reaction and the total 12 µL was loaded onto the gel. Agarose gel was run at 100 V for 50 minutes on a PowerEase 90W by Life Technologies. Gel was imaged using the Bio-Rad imager in the Howlett Lab.

RNA-seq Analysis
ACHT hTERT FANCD2 and ACHT hTERT FANCG (FANCD2-/-) cells were seeded, and the following day treated with 0.4 µM APH for 24 hours, 48 hours, and a no treatment sample in quadruplet. Cells were trypsinized using 0.25 % trypsin for 8 minutes. Media was then added to neutralize the trypsin and cells were spun down for 4 minutes to pellet in 15 mL centrifuge tubes. An ice-cold PBS wash was then applied, and cells were transferred to a 1.5 mL microcentrifuge tube where they were again spun down in a microcentrifuge set to 4°C for 4 minutes. PBS was aspirated and cells were flash frozen using liquid nitrogen. were generated from the Illumnia HiSeq and converted into fastq files. They were then de-multiplexed using Illumina's bcl2fastq 2.17 software. One mismatch was allowed for index sequence identification.

Bioinformatics Workflow
To analyze the quality of the raw data, the sequence reads were trimmed to remove possible adapter sequences as well as nucleotides with poor quality using Genes that had adjusted p-values were called as differentially expressed genes for each of the comparisons and a gene ontology analysis was performed on the statistically significant set of genes using the software GeneSCF. The goa_human GO list was then used to cluster the set of genes.

Statistical Analysis
To determine significance, either a single-factor ANOVA or two-factor ANOVA was used to test and post-hoc analysis was performed if significance was found. Student's two-tailed paired t-test was also performed. P-value was determined using the Bonferroni correction of the p-value for each test (0.05) divided by the number of tests performed.      b.

Figure 5. Analysis of SOX5 and WWOX gene expression in U2OS cells incubated in the absence or presence of aphidicolin (APH).
U2OS cells were incubated in the absence (No Treatment) or presence of 0.4 µM APH for 24 hours, 48 hours, or 24 hours plus 24 hours of release. Error bars represent the standard errors of the mean calculated from twenty-seven technical replicate measurements from three independent biological replicate experiments. Statistical significance was determined using student's two-tailed paired t-test. *P < 0.001 and Single-Factor ANOVA. a. SOX5 gene expression. Bonferroni correction has been applied. b. WWOX gene expression.

SOX5, WWOX, and GAPDH qPCR primer efficiency analysis
Our results suggested little change in the levels of expression of SOX5 and WWOX under the conditions tested. Our Ct values for SOX5 and WWOX qPCR analysis also suggested that our qPCR conditions might not be optimal. Upon consulting with our colleagues and technical support at Thermo-Scientific, we decided to thoroughly analyze the efficiency of our qPCR primers using varying amounts of quantified cDNA input. Each gene had two sets of primers designed that were tested along with the housekeeping gene, GAPDH. To test the efficiency of the primers, a standard curve of cDNA was created using quantities of 1500 ng, 150 ng, 15 ng, 1.5 ng, and 0.15 ng of cDNA.
Primers were run using the same protocol as our normal qPCR and standard curves were created for each primer set with each cDNA quantity run in duplicate.
Analysis included observing the Ct values of the different quantities of cDNA and how similar amplification curves were between duplicates. Figure 6 shows the amplification curves for GAPDH, SOX5, and WWOX. The amplification curves show that the duplicate samples are very consistent and as the quantity of cDNA decreases, the Ct value at which the amplification curve starts increases (Fig. 6, 7, 8). The GAPDH standard curve shows that duplicate samples are very close together and for all five quantities of cDNA, there was amplification. The Ct values for GAPDH range from 15 at 1500 ng of cDNA to 28 at 0.15 ng of cDNA ( Fig. 6). The WWOX curve again had amplification for all five quantities of cDNA however, the number of cycles for amplification increased in comparison to the GAPDH graph. The Ct values for WWOX range from 22 at 1500 ng of cDNA to 35 at 0.15 ng of cDNA (Fig. 7). For SOX5 there was no amplification for the 0.15 ng cDNA and the 15-ng amplified late at the 35 th cycle the cycle limit for analysis. The Ct values for SOX5 range from 25 at 1500 ng of cDNA to 35 at 1.5 ng of cDNA (Fig. 8).   Amplification curves display how similar the duplicate samples are by how close together the same color curves are. As the amount of input cDNA decreases, the number of cycles increases.

Optimization of primer annealing temperatures using gradient PCR analysis
To continue our strategy of qPCR analysis optimization, we next performed gradient PCR analysis to determine the optimal annealing temperatures for our qPCR primers.
Gradient PCR analysis also allowed us to visualize our PCR products to determine if our primers were amplifying a band of the correct size and determine if any nonspecific products were being generated. A primer gradient was performed utilizing different concentrations of cDNA, as well as a gradient of temperatures (54.2, 55.2, 57.8, 59.5, 62.7, 64.1°C). The PCR products were run on a 2% agarose gel run for 50 minutes at 100 V. Bands did not appear for SOX5 or WWOX for 50 ng, 100 ng, and 200 ng. Faint bands were seen at 500 ng but not at 1 µg of cDNA at the tested temperatures signifying that these primers are not optimally amplifying their sequence targets or indicating that these genes were being expressed at very low levels (Fig. 9).
The GAPDH primers, our positive control and housekeeping gene, showed amplification at each of the cDNA quantities as well as at all of the tested temperatures. We also saw amplification for the FANCD2 primers at 100 ng, 200 ng, 500 ng, and 1µg of cDNA at temperatures of 57.8, 59.5, 62.7, and 64.1 °C.  Figure 9. Primer gradients using different quantities of cDNA and melting temperatures. Different concentrations of cDNA (50 ng, 100 ng, 200 ng, 500 ng, 1µg), as well as increasing melting temperatures (54.2, 55.2, 57.8, 59.5, 62.7, 64.1 °C), were run using a gradient PCR to determine the optimal melting temperature for each primer set. Products were run on a 2% agarose gel for 50 minutes with a 100 bp ladder and imaged using BioRad imager. Optimal temperature and cDNA concentration determined by the brightness of the band. , were run using a gradient PCR to determine the optimal melting temperature for each primer set. Products were run on a 2% agarose gel for 50 minutes with a 100 bp ladder and imaged using BioRad imager. Optimal temperature and cDNA concentration determined by the brightness of the band.

Analysis of the protein expression of SOX5 following aphidicolin exposure
To study if APH inducing replication stress affected the protein expression of SOX5, we analyzed via immunoblotting whole cell lysates treated with APH for 24, 48, and a 24-hour release samples (Fig. 11a). It is known that APH induces the monoubiquitination of FANCD2 and FANCD2 was confirmed to be monoubiquitinated in this experiment as can be seen in Figure 9b showing the top band at the 24-, 48-, and 24-hour release samples. Monoubiquitination is confirmed by the top band in the FANCD2 western blot. L/S ratios which are the ratio between monoubiquitinated FANCD2 and unubiquitinated FANCD2 were calculated. There is an increase in the monoubiquitination after 48 hours of APH compared to the 24-hour APH sample and the release samples shows that monoubiquitination has decreased (Fig. 11b). We saw an increase in the protein expression in SOX5 following APH treatment in the immunoblots as can be seen by comparing the 24-, 48-, and 24-hour release samples to the no treatment sample (Fig. 11b). Tubulin was used a loading control. Protein bands were further analyzed using ImageJ to measure protein band intensities and band intensity was averaged across replicates (Fig. 11c). It can be seen in the fold change of protein expression graph that there was an increase in the band intensity further concluding that there was an increase in protein expression of SOX5 following the APH treatment (Fig. 11c). Standard deviations can also be seen on the fold change expression graph displaying replicates with similar expression levels.    Figure 11. Effect of aphidicolin on SOX5 protein expression. a. Cartoon of experimental design. U2OS 3xFLAG cells were incubated in the absence (No Treatment; NT) or presence of 0.4 µM APH for 24 hours, 48 hours, or 24 hours plus 24 hours of release. b. The abundance of SOX5 and FANCD2 proteins present in whole-cell lysates was determined by immunoblot using anti-SOX5 and anti-FANCD2 antibodies. Tubulin, detected with an anti-Tubulin antibody, serves as a loading control. c. Protein band intensities analyzed using ImageJ and statistical significance was determined using student's two-tailed paired t-test. *P < 0.01 and Single Factor ANOVA. Bonferroni correction has been applied.

Analysis of protein expression of WWOX following aphidicolin exposure
We also analyzed the protein expression of WWOX following APH treatment and determined that there was no difference between the no treatment condition and the aphidicolin treated (Fig. 12b). The band intensity of the 24-, 48-, and 24-hour APH release samples look the same as the no treatment sample (Fig. 12b). Further analysis of the band intensity was again done by use of ImageJ quantification. It can be seen that there is minor increase in the protein expression in the treated samples (Fig. 12c).    Figure 12. Effect of aphidicolin on WWOX protein expression. a. Cartoon of experimental design. U2OS 3xFLAG cells were plated at 1 million cells/mL and incubated in the absence (No Treatment; NT) or presence of 0.4 µM APH for 24 hours, 48 hours, or 24 hours plus 24 hours of release. b. The abundance of WWOX and FANCD2 proteins present in whole-cell lysates was determined by immunoblot using anti-WWOX and anti-FANCD2 antibodies. Tubulin, detected with an anti-Tubulin antibody, serves as a loading control. c. Protein band intensities analyzed using ImageJ. *P < 0.01 and Single Factor ANOVA. Bonferroni correction has been applied.

Analysis of the impact/effect of loss of FANCD2 on SOX5 protein expression
We then wanted to determine what the consequences of the loss of FANCD2 would have on SOX5 protein expression following aphidicolin treatment. We treated U2OS 3xFLAG cells with an siFANCD2 and an siControl for 24 hours followed by 0.4 µM APH treatment (Fig. 13a). In comparison to the siControl samples, there are no bands in the lanes treated with the siFANCD2 (Fig. 13a). Results of the siControl bands reveal that there is an increase in protein band intensity in the 48-hour and release samples in comparison to the no treatment sample (Fig. 13b). In samples treated with an siFANCD2, there is a minor decrease in protein levels in the 24-hour and release sample lanes in comparison to the siFANCD2 no treatment lane (Fig. 13b). There does not appear to be a change in expression levels when comparing the siFANCD2 48hour sample to the siFANCD2 no treatment sample (Fig. 13b). Changes in expression levels were confirmed using ImageJ again to quantify the intensity of the protein bands showing the decrease in siFANCD2 samples and increase in siControl samples (Fig. 13c). The standard deviations can also be seen on the fold expression protein level graph and show that replicates had similar expression levels for each of the samples tested. Bonferroni correction was applied by dividing 0.05 by the number of replicates tested (3). c.
or Washed Cells harvested Figure 13. Effects on SOX5 after siRNA knockdown of FANCD2. a. Cartoon of siRNA experimental design. U2OS 3xFLAG cells were plated at 175,000 cells/mL and treated with either an siControl or siFANCD2 for 24 hours. On Day 4 U2OS were incubated in the absence (No Treatment; NT) or presence of 0.4 µM APH for 24 hours, 48 hours, or 24 hours plus 24 hours of release. b. The abundance of SOX5 and FANCD2 proteins present in wholecell lysates was determined by immunoblot using anti-SOX5 and anti-FANCD2 antibodies. Tubulin, detected with an anti-Tubulin antibody, serves as a loading control. c. Protein band intensities analyzed using ImageJ with siControl shown in blue and siFANCD2 shown in orange. Protein band intensities analyzed using ImageJ. *P < 0.01 and Two-Factor ANOVA. Bonferroni correction has been applied.

Analysis of the impact/effect of loss of FANCD2 on WWOX protein expression
The same siRNA treatment was also performed to test WWOX using the same siFANCD2 and siControl and using whole cell extracts looking at protein expression via immunoblotting (Fig. 14a). In contrast to SOX5, western blots displaying WWOX do not show an increase or decrease in protein level expression in the siControl lanes or the siFANCD2 lanes except for the release sample in the siFANCD2 western lanes (Fig. 14b). There does seem to be an increase in expression level when comparing the siFANCD2 release sample to the siFANCD2 no treatment sample.
ImageJ analysis of band intensities display a slight reduction in the expression of WWOX protein in the absence of FANCD2 as well as in the siControl samples, but it is very minor (Fig. 14c). For both the siControl and siFANCD2 samples, they increase initially from the no treatment to the 24-hour APH treatment and then decrease back down to the same level as the no treatment sample. Standard deviations also show that replicates were similar (Fig. 14b). Bonferroni correction was applied by dividing 0.05 by the number of replicates tested (3). Cartoon of siRNA experimental design. U2OS 3xFLAG cells were plated at 175,000 cells/mL and treated with either an siControl or siFANCD2 for 24 hours. On Day 4 U2OS were incubated in the absence (No Treatment; NT) or presence of 0.4 µM APH for 24 hours, 48 hours, or 24 hours plus 24 hours of release. b. The abundance of SOX5 and FANCD2 proteins present in whole-cell lysates was determined by immunoblot using anti-SOX5 and anti-FANCD2 antibodies. Tubulin, detected with an anti-Tubulin antibody, serves as a loading control. c. Protein band intensities analyzed using ImageJ with siControl shown in blue and siFANCD2 shown in orange. *P < 0.01 and Two-Factor ANOVA. Bonferroni correction has been applied.

Analysis of the impact/effect of loss of FANCD2 on SOX5 gene expression
Following the analysis of protein expression of SOX5 following an siFANCD2 treatment and APH treatment, we wanted to observe if there were transcript changes to SOX5. Samples were run in triplicate and three biological samples were tested and averaged with GAPDH used as the control housekeeping gene. In the absence of FANCD2, there is an increase initially in the no treatment sample compared to the siControl no treatment sample. After 24 hours of APH treatment, there is an increase in the transcript levels for both the siControl and siFANCD2 samples. In the 48-hour samples, there is a decrease in the siControl in comparison to the 24-hour sample and is at the same level as the no treatment sample. siFANCD2 at 48 hours also decreases compared to the 24-hour and again at the same level as the no treatment sample (Fig.   15). Both siControl and siFANCD2 displayed an increase in transcript levels in the 24hour release samples compared to the no treatment (Fig. 15). Significant differences were seen between the siControl no treatment and siFANCD2 no treatment (Fig. 15).
Bonferroni correction was applied by dividing 0.05 by the number of replicates tested (12). Cells were also treated with either an siControl (blue bars) or an siFANCD2 (orange bars). Error bars represent the standard errors of the mean calculated from twenty-seven technical replicate measurements from three independent biological replicate experiments. Statistical significance was determined using student's two-tailed paired t-test. *, P < 0.004. Bonferroni correction has been applied.

Analysis of the impact/effect of loss of FANCD2 on WWOX gene expression
Again, following the analysis of protein expression of WWOX in the presence of an siFANCD2 treatment and APH treatment, we wanted to observe if there were transcript changes to WWOX. Samples were again run in triplicate and three biological samples were tested and averaged with GAPDH used as the control housekeeping gene. In the absence of FANCD2, there is an increase initially in the no treatment sample compared to the siControl no treatment sample. Post 24 hours of APH treatment, there is an increase in the transcript levels for the siControl sample, however the siFANCD2 sample has stayed level with the no treatment sample. In the 48-hour samples, there are decreases in the both the siControl and siFANCD2 samples compared to the 24-hour and no treatment samples (Fig. 16). Both siControl and siFANCD2 displayed an increase in transcript levels in the 24-hour release samples compared to the untreated samples (Fig. 16). A significant difference was seen between the siControl no treatment and the siFANCD2 no treatment. Bonferroni correction was applied by dividing 0.05 by the number of replicates tested replicates (12). Cells were also treated with either an siControl (blue bars) or an siFANCD2 (orange bars). Error bars represent the standard errors of the mean calculated from twenty-seven technical replicate measurements from three independent biological replicate experiments. Statistical significance was determined using student's two-tailed paired t-test. *, P < 0.004. Bonferroni correction has been applied.

RNA sequencing analysis of ACHT hTERT FANCG (FANCD2 -/-) and ACHT hTERT FANCD2 reveal no levels of SOX5 and low levels of WWOX
As previously stated, FANCD2 was found to bind to and traverse large actively transcribed genes in U2OS 3xFLAG cells under conditions of replication stress.
However, we wanted to see the effects of treatment with aphidicolin on telomerase immortalized cells lacking FANCD2 as well as telomerase immortalized cells with counts which are the number of times a read maps to a specific gene. SOX5 was not found to be expressed in the dataset and WWOX had similar read counts between cells with FANCD2 and cells without FANCD2 but the difference in read counts was not significant (Fig. 17).
Independent of SOX5 and WWOX, the RNA-seq dataset allowed us to analyze the levels of expression of all FANCD2 targets identified by ChIP-seq. In the ChIPseq dataset generated by Okamoto et al., FANCD2 was found to bind to approximately 120 genes under conditions of replication stress. Therefore, we asked the question of if there was a difference in the levels of expression of any of these genes in the absence or presence of FANCD2 and in the absence or presence of replication stress. Using the differential analysis datasets for the no treatment samples, provided from Genewiz, it was found that 79 of the 120 ChIP-seq genes were expressed in the RNA-seq data. In the 24-hour APH samples 66 of the 120 ChIP-seq genes were expressed. Of the 79 genes found to overlap, 26 were found to be differentially expressed with 7 being downregulated and 19 being upregulated (Table 7). In this data, upregulated refers to an increase in ACHT hTERT FANCD2's and downregulated refers to an increase in ACHT hTERT FANCG's. In the 24-hour APH treated samples, 22 of the 66 overlapping genes were found to be differentially expressed with 7 being downregulated and 15 being upregulated (Table 8). Interestingly, 7 of the 26 genes found in the no treatment samples and 6 of the 22 genes found in the 24-hour APH treated samples are found to be related to autism, neurodevelopmental, psychiatric disorders.
The RNA-seq data also allowed us to observe if there were differences in gene ontology. Genewiz also provided a gene ontology analysis which categorized the gene ontology and enrichment of gene ontology. It was found that the 5 th most significantly enriched gene sets in the 24-hour APH treated samples were nervous system development (Fig. 21). . RNA-seq analysis reveals no significant differential expression in WWOX. Differential expression analysis displays the number of reads found per gene and if there is a significant difference between the ACHT hTERT +FANCD2 cells and the ACHT hTERT +FANCG (FANCD2 -/-) cells. Cells were incubated in the absence (no treatment) or presence of 0.4 µM APH. Normalized reads were averaged between the 4 replicates for both cell lines and the standard deviation was also calculated. In both conditions, SOX5 did not exhibit differential expression.  Okamoto et al., 2018 ChIP-seq data, we observed that 17 genes from our RNA seq analysis using ACHT hTERT +FANCD2 and ACHT hTERT +FANCG were found to have log2foldchanges either <-0.5 or >0.5 and a P-value <0.05. Genes with (*) are related with autism, neurodevelopmental or psychiatric disorders.

DISCUSSION
The protection and tight regulation of our genome is dependent upon the coordination of several DNA repair pathways, including the FA pathway, to ensure the timely repair of DNA damage. FANCD2, one of the key proteins in the FA pathway, has recently been seen to bind to and traverse large actively transcribed genes under conditions of replication stress (Okamoto et al., 2018). We sought to answer the question of why is FANCD2 binding to these genes and if by doing so it has another role in addition to the FA pathway as a potential regulator of transcription.
We used a candidate gene approach targeting SOX5 and WWOX which are found in the top 10 genes FANCD2 was seen to bind to in the Okamoto ChIP-seq dataset (Okamoto et al., 2018). SOX5 was chosen as it is a transcription factor that has been seen to be involved in breast cancer regulation by the transactivation of EZH2 (Sun et al., 2019). It is also involved in the regulation of embryonic development as well as the determination of cell fate (Pei et al., 2014). Choosing a gene involved in breast cancer regulation is relevant as the FA pathway primarily monitors and repairs sites of DNA damage that could otherwise lead to the development of bone marrow failure, leukemia, or premature cancers (Alter et al., 2018). WWOX was also chosen as it is a member of short-chain dehydrogenases and found to span the common fragile site FRA16D (Driouch et al., 2002). It has also been seen that WWOX may play a role in the DNA damage response as a modulator of the DNA damage checkpoint kinase ATM. Common fragile sites have an increased risk of being preferential targets of genomic instability such as chromosomal breaks in response to replicative stress (Abu-Odeh et al., 2014) In this study, we observed that there is an increase in the protein levels of SOX5 under conditions of replication stress and that this increase is significant in the 24-hour APH treatment compared to the no treatment sample. We also did test to see if there was a difference in a 24-hour no treatment sample and 48-hour no treatment sample for both SOX5 and WWOX and did not see differences in the protein levels ( Fig. 18). However, using qPCR to analyze the transcript levels of SOX5, we did not see a significant difference when comparing the APH treated samples to the no treatment control sample. There was also little change in WWOX both in protein and transcript levels while FANCD2 was present and under the same conditions of replication stress using APH. It was also found that to observe transcript levels, a large quantity of cDNA (500 ng/µL) was needed for qPCR to observe crossing points that were within the accepted analysis range. A possible reason for the limited changes in the protein levels and overall low abundance of gene expression could be that these two genes are just lowly expressed, and the proteins are very stable in U2OS 3xFLAG cells. It could also be that the proteins for these genes are very stable and therefore the mRNA is not always needed/necessary, so it is in low abundance. It should also be noted that there is the possibility that these genes are so big and as seen in the controlled, they could be transcribed even when they are not necessarily needed. As these genes are prone to CNVs and WWOX overlaps with FRA16D, the uncontrolled transcription of them could lead to an increase in genome instability and more susceptibility to cancer.
As the differences seen in our results are minor, with some of the differences not being significant, it brought us to the question of is FANCD2 regulating transcription and if not, what is it doing at these two gene sites? It could be that FANCD2 is regulating replication instead of transcription and helps in the stabilization of the DNA at sites that are more prone to DNA damage. It has been previously seen that in the absence of FANCD2, replication forks stall within the AT-rich fragility core of the FRA16D CFS and this can lead to the dormant origin activation (Madireddy et al., 2016). It has also been previously seen that under conditions of replication stress, FANCD2 is able to promote DNA replication (Lossaint et al., 2013).
It has also been seen that FANCD2 may play a role in the regulation of Rloops. R-loops, which are DNA: RNA hybrids are hazards to the genome and can increase the probability of disruption of chromatin organization. The displaced ssDNA in R-loops also poses a threat to the stability of the genome as ssDNA is more prone to attracting nucleases (Sollier & Cimprich, 2015). However, R-loops can also be nonhazardous in the genome as they are required for immunoglobulin class switching To study the role of FANCD2 in transcriptional regulation at a genome-wide level, we also performed RNA-seq analysis using ACHT hTERT FANCD2 and ACHT hTERT FANCG (FANCD2-/-) cells. We chose to use these cells because they are FA patient-derived skin fibroblasts unlike U2OS cells, which are derived from an osteosarcoma. ACHT cells therefore more closely resemble those of an FA patient.
Using RNA sequencing we were able to analyze not only SOX5 and WWOX but also the other 120 top gene hits from the Okamoto et al., paper. We observed that SOX5 was not expressed, and a small expression of WWOX was seen but with no significant difference with/without the presence of FANCD2 (Fig. 17).
Upon further analysis of our RNA sequencing data, we found that of the 120 genes in the Okamoto et al., 2018 data set, 79 of the ChIP-seq genes were expressed in the RNA-seq data in the no treatment sample and 66 genes in the 24-hour APH treated samples. Of those 79 genes found to overlap, 26 genes were found to have log2fold changes less than -0.05 or greater than 0.05 in the no treatment sample and 22 in the 24-hour APH treated samples (Tables 7 & 8). Of the 26 genes we also saw that 7 were downregulated meaning that there is an increase in the ACHT hTERT FANCG's and 19 were upregulated meaning that there is an increase in the ACHT hTERT FANCD2 samples (Table 9). In the APH treated samples, it was found that 7 genes were being downregulated and 15 were upregulated (Table 10). Interestingly, 7 of the 26 in the no treatment samples and 6 of the 22 genes found in the APH treated samples are found to be related to autism, neurodevelopmental, psychiatric disorders. These findings and differences seen in the RNA sequencing data support the notion that FANCD2 could be playing a role in the transcriptional regulation of some genes.
Furthermore, it was found that a portion of the significantly differentially expressed genes are related to nervous system development or nervous system conditions, therefore suggesting that FANCD2 could also be contributing to the regulation of neuronal derived genes.
We also input the genes found to have significant differences in gene expression into the STRING database to determine their relationship to one another for both the no treatment samples and the APH treated samples (Figs. 19 & 20). If there is a line between nodes, this indicates that there is an interaction between the proteins. If there is no line, there is no interaction. The color of the line also represents different associations with blue being that it is a known interaction from a curated database, pink is experimentally determined, green is a gene neighborhood, red is gene fusions, blue is gene co-occurrence, yellow is textmining, black is co-expression, and purple is protein homology. Many of the proteins displaying an interaction are associated with neuronal development such as the link between CTNND2, PTPRD, ASTN2, and NLGN1 in the no treatment samples as seen in Fig. 18. All four of these proteins play a role in neuronal development. Looking at the 24-hour APH samples it can also be seen that there is a link between LSAMP, NEGR1, and MDGA1 which are also related to neuronal functions (Fig. 20).
This analysis using the STRING database and our RNA sequencing data reveals that there are proteins that have a significant difference in gene expression between ACHT hTERT +FANCD2 and ACHT hTERT +FANCG samples that are directly linked to neurological development functions. The STRING database is able to integrate all known and predicted associations between proteins including but physical and functional interactions. This is interesting because it further proposes the possibility that FANCD2 has another role distinct from ICL repair in the FA pathway and that the role could involve regulating neuronal development.
It was also observed that among the top differentially expressed genes clustered by gene ontology, nervous system development was seen as the 5 th most differentially expressed under 24 hours of APH treatment (Fig. 21). The gene ontology analysis aims to identify the biological processes, cellular locations, and molecular functions that have been impacted in the condition studied. This analysis is able to reduce the complexity and instead highlight the biological processes in the genomewide expression studied. Identifying nervous system development as one of the systems affected in this study which utilized two cell lines with one lacking FANCD2 additionally supports a hypothesis of FANCD2 assisting with neuronal development and having another purpose apart from the FA pathway.

Future Experiments
We have made significant progress determining the consequences of replication stress in the presence and absence of FANCD2 for WWOX and SOX5 and if FANCD2 could be involved in the transcriptional regulation of these two genes. However, these were only two of the 120 genes that FANCD2 was found to bind to in the ChIP-seq data (Okamoto et al., 2018). Here, we will propose several experiments to further answer our question of why FANCD2 is binding and traversing large actively transcribed genes under replication stress conditions.
We would further like to determine whether FANCD2 is also involved in the regulation of R-loops as these as previously stated, also pose a threat to genomic stability. The S9.6 antibody can be used to detect the presence of R-loops and it has been previously seen that the members of the FA pathway (BRCA2/FANCD1) accumulate at R-loops (García-Rubio et al., 2015). Since the FA pathway is involved in the repair of interstrand crosslinks which block replication fork progression, this suggests that because R-loops can also block replication progression, that these R-loops may be a contributor to genomic instability in FA cells (Schwab et al., 2015). Schwab et al., 2015 also found that in cells with downregulated FANCD2, there was a significant increase in asymmetric sister forks compared to their control cells which suggests that that there is a deviation from the normal replication program. This also suggests that FANCD2 may play a role in the regulation of replication. A previous study also found that in cells deficient in FANCD2 and FANCA, there was an increase in the number of R-loops suggesting that these two proteins play a role in R-loop regulation (García-Rubio et al., 2015). It has also been previously found that the FANCD2: FANCI heterodimer (ID2) preferentially binds ssRNA but not RNA: DNA hybrids making it seem as though ID2 binds to the ssRNA that is displaced in R-loops (Liang et al., 2019).
We would also like to further investigate other genes, separate from the Okamoto dataset, that are found in our RNA sequencing data and have normalized read counts that are significantly different. In the no treatment conditions, we have examined the data and created a list of potential gene targets from different cell pathways and families such as retinaldehyde dehydrogenases. Of the genes found to have significant differences in normalized read counts in the RNA sequencing data, we will perform western blots and qPCR on a subset of these genes to further confirm the differences seen in the RNA sequencing data. We are also planning to perform ChIP-seq on the ACHT hTERT +FANCD2 and ACHT hTERT +FANCG under APH conditions to be able to compare to the U2OS 3xFLAG ChIP-seq data. This will be important to be able to tell how different the two cell lines and determine if the genes that FANCD2 was found to bind to in the U2OS 3xFLAG cells are similar or different in ACHTs.
We would also like to use a neurological stem cell line and using our siFANCD2, determine the consequences of the absence of FANCD2 on a number of genes found in the RNA sequencing data. We found in our RNA sequencing data, that the development of neurological systems was found to be fifth in differentially expressed gene ontology and of the 22 genes found to overlap with the ChIP sequencing data and our RNA sequencing data under APH conditions, 6 of the 22 gene were found to be related with autism, neurodevelopmental, or psychiatric disorders (Table 8). Using the neurological stem cell line would help us understand more of whether FANCD2 is an integral part in maintaining neurological functions.
We would use qPCR as well as western blotting to detect if there are differences with or without the presence of FANCD2 as well as in the presence or absence of APH. Abundance of SOX5 and WWOX proteins present in whole-cell lysates was determined by immunoblot using an anti-SOX5 and anti-WWOX antibodies. Tubulin, detected with an anti-Tubulin antibody, serves as a loading control. Figure 19. RNA sequencing reveals potential link between FANCD2 and neurological development in no treatment samples. Cells were harvested, flash frozen using liquid nitrogen and sent to Genewiz for RNA sequencing analysis at 150 bp paired end sequencing at 60 million reads per sample. Using the differential analysis datasets for the no treatment samples provided from Genewiz, it was found that 26 of the 120 Okamoto et al., 2018 ChIP-seq genes FANCD2 was found to bind to overlap with the RNA seq data for the no treatment samples. Of the 26 genes, 7 of them are found to be related with autism, neurodevelopmental, or psychiatric disorders. The 26 genes were input to STRING database to see if they connect to one another. The color of the line also represents different associations with blue being that it is a known interaction from a curated database, pink is experimentally determined, green is a gene neighborhood, red is gene fusions, blue is gene co-occurrence, yellow is textmining, black is co-expression, and purple is protein homology. Figure 20. RNA sequencing reveals potential link between FANCD2 and neurological development in aphidicolin treated samples. Cells were harvested, flash frozen using liquid nitrogen and sent to Genewiz for RNA sequencing analysis at 150 bp paired end sequencing at 60 million reads per sample. Using the differential analysis datasets for the 24-hour APH samples provided from Genewiz, it was found that 22 of the 120 Okamoto et al., 2018 ChIP-seq genes FANCD2 was found to bind to overlap with the RNA seq data for the no treatment samples. Of the 22 genes, 6 of them are found to be related with autism, neurodevelopmental, or psychiatric disorders. The 22 genes were input to STRING database to observe the relationship between one another. The color of the line also represents different associations with blue being that it is a known interaction from a curated database, pink is experimentally determined, green is a gene neighborhood, red is gene fusions, blue is gene co-occurrence, yellow is textmining, black is coexpression, and purple is protein homology. Cells were harvested, flash frozen using liquid nitrogen and sent to Genewiz for RNA sequencing analysis at 150 bp paired end sequencing at 60 million reads per sample. Genewiz analysis provided a differentially expressed genes by gene ontology table in which it was found that the 5 th most differentially expressed was nervous system development in the 24-hour APH samples.