Sequence Effects on the Repair and Replication of Arylamine-DNA Adducts

Cancer is a collection of diseases defined by uncontrolled cell growth and dissemination of abnormal cells. It is the second most deadly disease in the United States. Exposure to chemicals, both endogeneous and exogeneous is one of the major causes of cancer initiation. The role of chemicals in carcinogenesis was observed back in the eighteenth century when surgeon Pott observed a large number of scrotal cancer cases in chimney sweeps and associated it with to the soot exposure. However, research in the field of chemical carcinogenesis was renewed when Yamagawa and Ichikawa reported the development of malignant tumor on the rabbit’s ear due to the regular application of coal tar. Later, it was discovered that the presence of the polycyclic aromatic hydrocarbon benzo[a]pyrene in the coal tar was responsible for its tumorogenicity. Following that finding, researchers have made significant advancement in the field of chemical carcinogenesis. Majority of the chemical carcinogens are genotoxic in nature which exert their effect primarily by reacting with the DNA. These chemicals are either reactive as such or undergo enzymatic activation into the electrophilic species that enable them to form covalent adducts with the electron-rich DNA. Therefore, DNA-adduct formation is considered as a hallmark of chemical carcinogenesis. DNA-adducts are formed regularly in the cells as the genomic DNA is under regular attack by various chemicals. Cells have developed various repair mechanisms such as nucleotide excision repair (NER), base excision repair (BER) to protect the DNA from the effect of these adducts. Despite the effectiveness of these defense mechanisms some of the adducts have the potential to evade them. Unrepaired adducts can enter in the replication cycle and affect the process which creates a possibility of various kinds of mutation induction. Mutation on the genes that control the cell growth may trigger cancer initiation. For example, mutation of tumor suppressor p53 gene is regularly seen in the majority of sporadic human cancers. Arylamines is an important group of chemical carcinogens that are implicated in the etiology of human cancers. 2-Naphthylamine, benzidine, 4-aminobiphenyl and 2acetylaminofluorene are among the well-known arylamine carcinogens. 2Acetylaminofluorene was originally developed as a pesticide, but was banned because of its liver carcinogenicity on animals. However, it is used continuously by researchers as a model carcinogen to study the chemical carcinogenesis. Arylamines are not reactive per se but undergo metabolic transformations in the body to produce highly reactive nitrenium ion, which bind with DNA specifically at C-8 position of guanine to yield two major C8-substituted dG adducts: N-(2-deoxyguanosin-8-yl)-2-aminofluorene (dG-C8AF) and N-(2-deoxyguanosin-8-yl)-2-acetylaminofluorene (dG-C8-AAF). Similarly, bladder carcinogen 4-aminobiphenyl forms N-(2-deoxyguanosin-8-yl)-4-aminobiphenyl (dG-C8-ABP). The NMR studies on ABP, AF, and AAF adducted duplexes found that they exist in a mixture of two prototype conformers; anti-glycosidic B-conformer in which the fluorenyl moiety orients in the major groove of double helical DNA without affecting the Watson-Crick base pairing and syn-glycosidic stacked-conformer (S) in which the carcinogen is inserted into the duplex resulting into flipping out of the modified guanine. Along with the above two conformers, AAF additionally produces a syn-glycosidic wedge-conformer (W) with the fluorene moiety in the narrow minor groove of the duplex. The population of these conformers was found to be dependent on the nature of the lesion and the bases surrounding it. Arylamine-DNA adducts are good substrates for NER, a repair pathway responsible for removing bulky lesions. The repair efficiency of NER proteins are found to be affected by different chemical, structural and conformational factors. The nature of bases surrounding the lesion site is an important factor which is generally stated as sequence effect. The NER of arylamines exhibits dramatic sequence effects. One of the most significant effect was witnessed on the reparability of AAF adduct involving E. coli UvrABC and human exonuclease NER systems, where the lesion presents separately on each of the three guanines of the NarI sequence (5'-...CG1G2CG3CC...-3'). It was discovered that in E. coli and human exonuclease the relative repair efficiencies of AAF at G1, G2 and G3 were in a ratio of 100:18:66 and 38:100:68, respectively. But, the rationale behind this sequence dependent modulation of NER is not clear. We hypothesize that adduct induced conformational heterogeneity is modulated by the base sequences around the lesion and the thermal/thermodynamic stability of the lesion containing DNA duplexes is an important factor for determining their repair efficiencies. In Manuscript I (published in Nucleic Acids Research), the objective was to test our hypothesis on the aforementioned sequence dependent NER of AAF in the NarI hot spot sequence. To that end, we prepared three 16-mer oligonucleotide duplexes that are site specifically modified by FAAF (fluorinated derivative of AAF) at G1, G2 or G3 of the NarI sequence (5ꞌ-CTCTCG1G2CG3CCATCAC-3ꞌ). We utilized 19 F NMR and circular dichroism (CD) to determine the conformational profiles of the FAAF in the three duplexes. We conducted melting experiments using UV-spectrophotometer and differential scanning calorimetry (DSC) for thermodynamic analyses. In addition, we carried out NER assay of these duplexes using the E. coli UvrABC system. Similarly, the repair work was also performed on FAF (fluorinated derivative of the N-deacetylated AF) modified duplexes as their conformational and thermodynamic properties were known. Our 19 F NMR/CD data showed that the FAAF at G1 and G3 orients majorly in synglycosidic Sand W-conformers, whereas the anti glycosidic B-conformer is favored at G2. The thermodynamic data obtained from UV and DSC indicates that the Sand Wconformers induce greater distortion and thermodynamic destabilization of the modified duplexes. Interestingly, we found that the repair of FAAF occurred in a conformationspecific manner with highly S/W-conformeric AAF at G3 and G1 incised more efficiently than the B-type G2 (G3~G1 > G2). The N-deacetylated AF adducts in the same NarI sequence were repaired 2-3 fold less than the bulky N-acetylated FAAF, but the order of reparability was different (G2~G1 > G3), a reverse order of the FAAF. We believe the presence of N-acetyl group raised the conformational barrier of FAAF vs. FAF. Overall, the results provided conformational and thermodynamic insight into the sequence dependent UvrABC incisions of the bulky aminofluorene DNA adducts. In Manuscript II (to be submitted in Chemical Research in Toxicology), we extended our NER mono-adducts work (Manuscript I) to FAAF derived di-adducts formed from the NarI sequence. The objective was to understand how the cluster adducts are treated by the NER proteins and to investigate the sequence effects in the repair efficiency of the arylamine di-adducts, if any. To do so, we prepared modified oligonucleotides that contain FAAF simultaneously on two (G1G2, G2G3 and G2G3) of the three guanines (G1, G2, and G3) of the NarI sequence (5ꞌ-CTCTCG1G2 CG3CCATCAC-3ꞌ). We conducted the NER assays on the three di-adducts using the E. coli UvrABC proteins. Moreover, we performed spectroscopic ( 19 F NMR and CD) and melting experiments for conformational and thermodynamic examination, respectively. We found that the repair of the bulky FAAF di-adducts occurred with double the efficiency of the mono-adducts. Our structural data related the enhanced reparability with the greater thermal and thermodynamic destabilization produced in the di-adduct duplexes. In addition, the three di-adduct duplexes exhibited dramatic sequence effects in the repair efficiency. Although, the 19 F NMR data was not conclusive to provide their conformational profile, the respective mono-adducts data (Manuscript I) indicate that the repair seems to occur in a conformation-specific manner. The NarI-G2G3 and -G1G3 diadducts are repaired more efficiently and they contain FAAF at G3 position, in which FAAF is known to exist in highly S/W-conformation. In contrast, the NarI-G1G2 diadduct contains the highly B-conformeric G2 position. Apparently, duplex distortion by destacking and destabilization is important as the repair and destacking/destabilization trend followed in the same order: G2G3 > G1G3 > G1G2. The results of this study provide the first report on the structure-repair relationships of FAAF cluster adducts. Arylamine adducts are also capable of inducing mutation. For example, AF produces point mutation and AAF results into a mixture of point and frameshift mutations. Aforementioned sequence effects in the NarI sequence are not limited to repair, but were also seen in the mutational outcome. Previously, it has been reported that AF and AAF adducts yield significantly higher frequency of -2 frameshift mutation only when adducted to third guanine (G3) of the NarI sequence (5ꞌ--G1G2CG3CN--3ꞌ). In addition, the mutational rate was found to be governed by the base at N position. The presence of cytosine (C) at position N resulted into relatively higher rate of mutation as compared to thymine (T). Structural studies from our laboratory have shown that the FAF adduct adopts stacked (S) conformer when N is = C, but displayed conformational heterogeneity when C is replaced by T. In Manuscript III (published in Nucleic Acids Research), our focus was to determine whether the above 3ꞌ-next flanking base (N) effect on the conformational property is limited to AF modified NarI sequence or can also exist in random G*CN duplexes modified by different arylamines (ABP, AF, AAF). We prepared two 11mer duplexes (5ꞌ-CCATCGCNACC-3ꞌ, N= T or A) having the same G*CN sequence context as in NarI sequence and modified them with


MANUSCRIPT-IV
Deficiencies in NER are closely associated with the development of several genetic diseases, such as xeroderma pigmentosum that increases the risk of skin cancer due to higher sensitivity to sunlight (6).
The NER pathway in Escherichia coli involves the UvrABC nuclease system and has been studied extensively for understanding DNA damage recognition and incision. E. coli NER is initiated following damage recognition by a dimeric UvrA protein. Next, UvrB protein reaches the damage site, forms a trimer, and verifies the damage. Departure of UvrA from the resulting complex recruits UvrC and UvrD proteins, which cleave and remove the lesion-bearing patch of DNA. Finally, DNA polymerase I synthesizes and ligase I seals a new patch to complete the repair process (5,7).
Arylamines are an important class of environmental pollutants that are implicated in the etiology of human cancers, especially of the bladder and liver (1). 2-Acetylaminofluorene was originally developed as an agricultural insecticide, but was later banned due to its strong tumorigenic activity in rat livers (8). It has been used extensively as a model for studying chemical carcinogenesis. In vivo, metabolic activation of AAF produces a highly electrophilic nitrenium ion, which subsequently interacts with DNA to produce two major C8-substituted dG adducts: AAF and AF ( Figure 1a) (8,9). In vitro, Nacetylated AAF blocks the activity of high-fidelity polymerases and require bypass polymerases for a translesion synthesis (TLS), whereas AF only slows down replication (10). In general, the bulky AAF exhibits greater susceptibility towards NER than AF (11), which is known to exist in a sequence-dependent equilibrium between anti B-conformer and syn S-conformer ( Figure 1c) (10,(12)(13)(14). We recently reported that AAF adducts also adopt a sequence-dependent S/B/W-conformational equilibrium ( Figure 1d) (15).
Local sequence context plays an important role in the repair of arylamine-DNA adducts (16). Fuchs et al. constructed DNA sequences modified with AAF at each of three guanines of the most frequently studied mutational hotspot known as NarI sequence (5'-…CG 1 G 2 CG 3 CC...-3') and tested their substrate reparability in the E. coli UvrABC and human exonuclease systems. In E. coli, the three AAFs were repaired in a sequencedependent manner, with relative repair efficiencies of G 1 :G 2 :G 3 in a ratio of 100:18:66 (17,18). However, different repair efficiencies were observed for the same lesions by the human exonuclease, 38:100:68 for G 1 , G 2 , and G 3 , respectively (18). AAF at G 3 of NarI sequence induces ~100-fold greater frequency of -2 frameshift (-2 deletion) mutations, even though the three guanines exhibit similar chemical reactivities (19). We have shown that the FAF-modified NarI -2 deletion duplex in the 5ꞌ--CG 1 G 2 CG 3 *CC-3ꞌ context adopts a single looped-out bulge structure, whereas the 5ꞌ--CG 1 G 2 CG 3 *CT--3ꞌ context results in a local conformational heterogeneity (20). These results support the importance of the 3'-next flanking nucleotide to the lesion in modulation of mutation efficiency. The studies verified that the conformational stability of a slipped mutagenic intermediate is a critical determinant for the hotness (up to 30 to 50-fold) of G 3 in NarI sequence for -2 frameshift mutation (20)(21)(22). Mekovich et al. found a greater incision rate in E. coli systems when AAF was located at G 3 of the NarI sequence (5ˈ---CG 1 G 2 CG 3 *CC---3ˈ) than in a non-NarI sequence (5ˈ---GATG*ATA---3ˈ) (23). Zou et al. have reported that the UvrABC incision efficiency is 70% more in the TG*T than in the CG*C sequence context when adducted with either AF-or AAF lesions (24).
The NER pathway is characterized by its unique ability to excise a wide array of structurally diverse DNA lesions. The structure of individual adducts per se is not as important as lesion-induced local distortions and destabilizations to trigger a NER response. Examples include disruption of Watson-Crick hydrogen bonding, DNA bending, thermodynamic destabilization, local conformational flexibility, and flipped-out bases in the unmodified complementary strand (25,26). However, mechanisms of sequence dependence that control NER efficiencies remained elusive. It could be that the observed local sequence effects described in the previous paragraph for AF and AAF are due to differences in the extent of distortions, which in turn depends on the conformation adopted in a particular sequence context.
In the present work, we investigated the role of conformational heterogeneity in the structure-repair relationships of AF and AAF. These two adducts are structurally similar, but differ in the absence and presence, respectively, of an N-acetyl group on the central nitrogen. We prepared oligonucleotides that were site-specifically modified by the fluorine model FAF and FAAF at three different guanines (G 1 , G 2 , and G 3 ) of the NarI recognition sequences (Figure 1a,b). We conducted spectroscopic and melting experiments for conformational and thermodynamic analyses. Moreover, we performed NER studies of these adducts using the E. coli UvrABC system. The results present strong structural and thermodynamic evidences for the differential NER efficiencies exhibited by AF and AAF at different guanine residues of the NarI sequence.

Materials and Methods
Caution: 2-Aminofluorene derivatives are mutagens and suspected human carcinogens and therefore must be handled with caution.

Preparation and Characterization of FAF-and FAAF-modified ODNs
We previously reported the preparation of 12-mer NarI ODNs (5'-CTCG 1 G 2 CG 3 CCATC-3'), in which each of the three guanines were site-specifically modified by FAF (20). We have also demonstrated that the incorporation of fluorine atom at the longest axis position 7 does not affect the overall conformational and thermal/thermodynamic profiles of AFor AAF-modified duplexes (14,20). The three FAF-modified NarI sequences were each annealed with a complementary 12-mer sequence (5'-GATGGCGCCGAG-3') to form fully paired NarI-G 1 -FAF, NarI-G 2 -FAF, and NarI-G 3 -FAF duplexes, respectively. These duplexes were thoroughly characterized by 19 F NMR, CD, and UV melting experiments (20).
The FAAF modified oligomers appearing between 28 and 85 min were separated and purified up to >97% purity by repeated injections. The HPLC system consisted of a Hitachi EZChrom Elite HPLC unit with an L2450 diode array detector and a Phenomenex Luna C18 column (150 X 10 mm, 5.0 µm). We employed a gradient system involving 3-15% acetonitrile for 40 min followed by 15-20% and 20-35% acetonitrile for 20 min and 40 min, respectively, in pH 7.0 ammonium acetate buffer (100 mM) with a flow rate of 2.0 mL/min. The three FAAF-modified 16-mer sequences were each annealed with the complementary sequence (5'-GTGATGGCGCCGAGAG-3') to form fully paired NarI-G 1 -FAAF, NarI-G 2 -FAAF, and NarI-G 3 -FAAF duplexes for UV melting, DSC, CD and dynamic 19 F NMR experiments.

LC/MS Characterization of FAAF-modified ODNs
Electrospray ionization and quadrupole time-of-flight mass spectrometry was utilized to verify the molecular weights and the position of FAAF attachment of the three oligomers.
The 16-mer ODNs were sequenced using 3ꞌ-5ꞌ or 5ꞌ-3ꞌ exonucleases as described previously for the analysis of modified 12-mers (28). Normally, one microgram of a particular ODN was combined with 0.01 units of an exonuclease in a 1 mM solution of MgCl 2 and incubated for several hours. The digests were separated using a Phenomenex Aqua C18, 1.0 × 50 mm column (5 µm; 120 Ǻ). Solvent A was 5 mM in both ammonium acetate and dimethylbutyl amine. Acetic acid was added to solvent A to adjust the pH to 7.0. Solvent B was 0.1% formic acid in acetonitrile. The flow rate was 100 µL/min and total run time was 20 min. All LC/MS spectra were acquired using a Waters SYNAPT quadrupole time-of-flight mass spectrometer (Milford, MA) operated in the negative ion and V-modes. The measured molecular masses of all three isomeric ODNs were within 0.1 daltons of their theoretical monoisotopic mass (5016.9 Daltons).

UV Melting
UV melting data were obtained using a Cary100 Bio UV/VIS spectrophotometer equipped with a 6 × 6 multi-cell block and 1.0 cm path length. Sample cell temperatures were controlled by an in-built Peltier temperature controller. Oligonucleotide duplexes with a concentration range of 0.4-6.4 µM were prepared in solutions containing 0.2 M NaCl, 10 mM sodium phosphate, and 0.2 mM EDTA at pH 7.0. Thermomelting curves were constructed by varying temperature of the sample cell (1 °C/min) and monitoring absorbance at 260 nm. A typical melting experiment consisted of forward/reverse scans and was repeated five times. Thermodynamic parameters were calculated using the program MELTWIN version 3.5 as described previously (12).

Circular Dichroism (CD)
CD measurements were conducted on a Jasco J-810 spectropolarimeter equipped with a Peltier temperature controller. Typically, 2 ODs of each strand were annealed with an equimolar amount of a complementary sequence. The samples were dissolved in 400 µL of a neutral buffer (0.2 M NaCl, 10 mM sodium phosphate, 0.2 mM EDTA) and placed in a 1.0 mm path length cell. The samples were heated at 85 °C for 5 min and then cooled to 15 °C, over a 10 min period to ensure complete duplex formation. Spectra were acquired every 0.2 nm with a 2 s response time from 200 to 400 nm at a rate of 50 nm/min, were the averages of 10 accumulations, and were smoothed using 17-point adaptive smoothing algorithms provided by Jasco.

Dynamic 19 F NMR:
Approximately 20 ODs of a pure FAAF-modified 16-mer ODN was annealed with an equimolar amount of a complementary sequence to produce a fully paired duplex ( Figure   1b). The samples were then dissolved in 300 µL of typical pH 7.0 NMR buffer containing 10% D 2 O/90% H 2 O, 100 mM NaCl, 10 mM sodium phosphate, and 100 µM EDTA, and filtered into a Shigemi tube through a 0.2 µm membrane filter. All 1 H and 19 F NMR results were recorded using a dedicated 5 mm 19 F/ 1 H dual probe on a Bruker DPX400 Avance spectrometer operating at 400.0 and 376.5 MHz, respectively, using acquisition parameters described previously (14,20,29). Imino proton spectra at 5°C were obtained using a phase-sensitive jump-return sequence and referenced relative to that of DSS. 19 F NMR spectra were acquired in the 1 H-decoupled mode and referenced relative to that of CFCl 3 by assigning external C 6 F 6 in C 6 D 6 at -164.9 ppm. One and two-dimensional 19 F NMR spectra were measured between 5 and 60 °C with an increment of 5-10 °C.
Temperatures were maintained by a Bruker-VT unit with the aid of controlled boiling liquid N 2 in the probe. Line shape simulations were performed as described previously

DSC Experiments
Calorimetric measurements of the three FAAF-modified 16-mer duplexes were performed using a Nano-DSC from TA Instruments (Lindon, Utah). Prior to temperature scanning, samples were degassed for at least 10 min under house vacuum in a closed vessel. Solutions were loaded respectively into the sample and reference cells using a pipette by attaching a small piece of silicone tube at the end of the tip and were purged several times to get rid of air bubbles. After both cells were filled, they were capped and a slight external pressure (~3 atm) was applied to prevent evaporation of the sample solution. Raw data were collected as microwatts vs. temperature. Template-primer solutions were prepared by dissolving desalted samples in a pH 7.0 buffer solution consisting of 20 mM sodium phosphate, and 0.1 M NaCl. In a typical scan, a 0.1 mM template-primer solution was scanned against buffer from 15 to 90ºC at a rate of 0.75ºC/min. At least 5 repetitions were obtained. A buffer vs. buffer scan was used as a control and subtracted from the sample scan and normalized for heating rate. This results in base-corrected ΔCp ex versus temperature curves. Each transition shows negligible changes in the heat capacities between the initial and final states, thus ΔΔCp ex was assumed to be zero. The area of the resulting curve is proportional to the transition heat, which, when normalized for the number of moles of the sample, is equal to the transition enthalpy, ΔH. ΔH is an integration of ΔCp ex over temperature T. All sample solutions were 0.1 mM concentration. T m was the temperature at half the peak area. ΔG and ΔS values have been determined according to the procedures described by Chakrabarti et al (31).
UvrA, UvrB, and UvrC proteins were over expressed in E. coli and then purified as previously described (33). The estimated purity of the three proteins was greater than 95%. A Bio-Rad Protein Assay was used to determine the protein concentration with BSA as the standard based on the manufacturer-recommended procedures.
The Uvr proteins were diluted and premixed in Uvr storage buffer before addition to the reaction. Aliquots were collected at 0, 5, 10, 15, and 20 minutes into the reaction. The reaction was terminated by heating at 95°C for five min. The products were denatured by addition of formamide loading buffer and heating to 95°C for five min, followed by quick chilling on ice. The incision products were then analyzed by electrophoresis on a 12% polyacrylamide sequencing gel under denaturing conditions with TBE buffer.
To quantify the incision products, radioactivity was measured using a Fuji FLA-5000 Image Scanner with MultiGauge V3.0 software. The DNA incised (in fmol) by UvrABC was calculated based on the total molar amount of DNA used in each reaction and the ratio of the radioactivity of incision products to total radioactivity of DNA. At least three independent experiments were performed for determination of the rates of incision.

Model Sequences
We previously used FAF-modified 12-mer duplexes (5ꞌ-CTCG 1 G 2 CG 3 *CNATC-3', N = C or T) to probe the impact of flanking and 3ꞌ-next flanking sequences on NarI-induced frameshift mutagenesis (20). Initially, we tried to use the same 12-mer NarI sequence for FAAF-modification for sake of comparison and consistency; however, the sequence was unsuitable for FAAF. Although FAAF adduction on the 12-mer NarI sequence was facile, a resulting reaction mixture was difficult to purify on the reverse phase HPLC system (see asterisked peaks in Supplementary Figure S2). Moreover, ligation efficiencies of the FAAF-modified 12-mers, particularly on the G 1 -and G 2 -positions, were very low. Accordingly, the length of DNA was increased to 16 (  Unreacted control ODN appeared at 21 min, followed by seven FAAF-modified ODNs in three retention time zones: Peaks 1-3 at 28-35 min, Peaks 4-6 at 42-60 min, and Peak 7 at 84 min. On-line UV (Figure 2b) of the modified ODNs displayed a small shoulder in the 290-320 nm range. The relative absorption intensities (290-320 nm) for the three peak groups were approximately 1:2:3. This finding is reminiscent of AF-or FAF-induced absorption shoulders observed in 290-350 nm, whose intensities correlate consistently with the number of adduct modifications (28). As a result, Peaks 1-3 and 4-6 were assigned as mono-and di-adducts, respectively, and Peak 7 as a tri-adduct. These adducts were characterized by exonuclease digestion/ESI-TOF-MS-MS analyses, as described below. Depending on the location of FAAF, the mono-adducts were designated as NarI-G 1 , NarI-G 2 , or NarI-G 3 , in which G 1 , G 2 , and G 3 signify the position of the FAAF-modified guanine. Details of the structural characterization and repair of the di-and tri-FAAF adducts will be published separately.
The HPLC elution profile of the FAAF-modified 16-mer ODNs in the present study is similar to that of the AAF-modified 15-mer NarI sequence (5'-TCCTCG 1 G 2 CG 3 CCTCTC-3') reported by Tan et al (34). These results indicate that AAF and FAAF are chromatographically comparable, irrespective of sequence length, as long as the common NarI core (underlined) is included in the sequences. This is not surprising since conformational and thermodynamic compatibilities of fluorine containing AF and AAF models have well been documented (20,30,35). A similar elution pattern was observed for the FAF-modified 12-mer and FAAF-modified 16-mer NarI sequences; however, the order of elution of G 1 and G 3 was reversed (compare

ESI-QTOF-MS Characterization
The molecular weights of all three FAAF-modified ODNs were measured by ESI-QTOF-MS prior to sequence verification by exonuclease digestion. Ionization of ODNs normally occurs by the loss of a proton from a phosphate group in the ODN backbone.
As the number of nucleotides in an ODN increases, the average charge state observed in the full scan mass spectra increases as well (28). As shown in Supplementary Figure S3, the ODNs containing sixteen nucleotides form (M-4H) 4ions predominantly unlike the 12-mers studied previously that form (M-3H) 3primarily upon electrospray (28).
Exonucleases cleave terminal deoxynucleotides from the ODN chain until the FAAFmodified nucleotide is exposed at the end of the chain. At that point the digestion reaction slows down significantly. The position of modification is identified (in this case) when the fragment(s) formed by the loss of the unmodified guanine nucleotides is observed in the LC/MS spectra. This is shown in Figure S4 for the 3ꞌ digest of the -  3-ions at m/z 1604.3 and m/z 1069.2 whose masses are consistent with 5ꞌ-G 2 (FAAF)CG 3 CCATCAC-3ꞌ fragment. No ions formed by the loss of two guanine deoxynucleotides were observed in any of the mass spectra. The observation of ODN fragments with two G's in both 3ꞌ and 5ꞌ digests confirms that the last singly-modified ODN to elute from the reaction mixture is modified on the middle G. Figure 3a shows an overlay of the CD spectra for the three FAAF-modified NarI-G 1 , -G 2 and -G 3 duplexes relative to the unmodified control (red). Unmodified and FAAFadducted duplexes both displayed a positive and negative ellipticity at around 270 and 250 nm, respectively, which is an S-curve characteristic of a B-form DNA double helix.

Circular Dichroism
The modified duplexes displayed significant blue shifts relative to the unmodified duplex, NarI-G 3 (6 nm) >> NarI G 1 ~ G 2 (3 nm), indicating adduct-induced DNA bending. A concomitant increase in the positive intensity around 270 nm was noted in the order of G 3 ~ G 1 > G 2 , which could be due to the interaction of the intercalated S-conformeric FAAF with neighboring bases. We noted a similar blue shift and hyperchromic effect for the highly (75%) S-conformeric FAF-modified NarI-G 3 duplex (Figure 3b and Table 1 in More importantly, FAAF-modified duplexes exhibited sequence-dependent induced CD in the 290-320 nm range (ICD 290-320 nm ). This finding is reminiscent of ICD 290-350 nm , which has been used as a sensitive marker for the FAF-induced S/B/W-conformational heterogeneity (positive for S-and W-and negative for B-conformer) (12,15,20,29,36). In the present case, however, the FAAF-modified NarI duplexes exhibited negative dips, with the NarI-G 2 duplex showing a greater dip than G 1 or G 3 duplexes (Figure 3a). This result could be due to a higher ratio (57%) of B-conformer for the NarI-G 2 duplex.

UV Melting Experiments
Supplementary Figure S8 shows the UV melting profiles of the three FAAF-NarI duplexes and an unmodified control duplex, all at 6.4 M. All duplexes showed typical monophasic, sigmoidal, helix-coil transitions with a strong linear correlation (R 2 > 0.9) between T m -1 and lnC t . Thermal and thermodynamic parameters calculated from UVmelting are summarized in Supplementary Table S1. As expected, modified duplexes were destabilized thermally and thermodynamically relative to the control duplex. The magnitude of thermal (ΔT m ) and thermodynamic (ΔΔG) destabilization was in the order of NarI-G 2 ~ NarI-G 3 (-8.7 to -8.8 o C, 3.3 to 3.7 kcal/mol, respectively) > NarI-G 1 (-4.6 °C and 2.0 kcal/mol, respectively). Figure 3b shows differential scanning calorimetry (DSC) plots of excess heat capacity C p ex vs. temperature for the FAAF-NarI duplexes relative to the unmodified control. compensates for the enthalpy, thus resulting in the overall free energy loss of ΔΔG 37°C =4.7 kcal/mol (37). In contrast, NarI-G 1 and -G 2 duplexes possess higher populations of B-conformer (46% and 57%, respectively), thus exhibiting lower differences in the enthalpy values (ΔΔH = 21.9 kcal/mol and ΔΔH = 18.6 kcal/mol, respectively)( Table 2). The relatively small enthalpy differences observed for the G 1 and G 2 duplexes could be attributed to the presence of S-and W-conformers in addition to Bconformer. As expected, entropy compensation was less in these two duplexes (NarI-G 1 , ΔΔS = 58.9 eu and NarI-G 2 , ΔΔS= 47.0 eu), yielding similar overall free energies (NarI-G 1 , ΔΔG = 3.7 kcal/mol and NarI-G 2 , ΔΔG= 4.1 kcal/mol). Figure 4a shows the 19 F NMR spectra of FAAF-NarI 16-mer G 1 -, G 2 -, and G 3 -duplexes measured at 5 °C, in which 19 F signals are in slow chemical exchange. These NarI-FAAF duplexes exhibited three to five 19 F signals, each representing a particular conformation.

S/B/W Conformational Heterogeneity
The percent population ratios shown were calculated on the basis of line simulations as shown in Supplementary Figure S9. Assignments of the different 19 F signals of each duplex were necessary to carry out meaningful structure-activity-relationship studies.
The initial assignments (as shown in Fig 4a) were made on the basis of chemical exchange, ring current effect, and chemical shift pattern recognition as have been done for a number of FAAF-and FAF-adducts in various sequence contexts (30,35). It has been demonstrated that AF and AAF adducts adopt the S/B-and S/B/W-conformational equilibrium, respectively (Figure 1c,d) and their 19 F chemical shifts are independent of overall sequence and its length, but strongly rely on the nature of the bases flanking the lesion (15,30). The major 19  To complement the 19 F signal assignments, we additionally conducted a set of comparative spectral analyses using three FAAF-12-mer duplexes in the non-NarI sequences ( Figure 4b) in otherwise identical flanking sequence contexts (CG*G, GG*C, and CG*C context for G 1 , G 2 , and G 3 , respectively). The top trace in Figure 4b is the 19 F NMR spectrum of a FAAF-modified 12-mer duplex (5ꞌ-CTTCTCG*CCCTC-3ꞌ), whose S/B/W conformational profiles have been well characterized (15). It should be noted that this non-NarI 12-mer duplex contains the identical CG*C flanking sequence context as the 16-mer NarI-G 3 -FAAF duplex. Comparison of the two spectra (i.e., top traces of Figure 4a and 4b) revealed a parallel trend both in terms of chemical shifts and population ratios ( Table 2 and Supplementary Table S2), supporting the conformational assignments. This is consistent with our previous findings that the electronic environment for the 19 F signals of AF and AAF adducts are strongly modulated by the nature of flanking bases (15,30). Similarly, we prepared two additional FAAF-modified non-NarI 12-mer duplexes (5ꞌ-CTTCTCG*GCCTC-3ꞌ and 5ꞌ-CTTCTCGG*CCTC-3ꞌ) with the same flanking base contexts (underlined) as the NarI-G 1 and -G 2 duplexes, respectively. Figure 4a, b compares the 19 F NMR spectra of all three NarI 16-mer and non-NarI 12-mer duplexes side by side. The 19 F signal profiles, as indicated by dotted lines (pink, B; red, S; green, W) for G 1  G 2  G 3 of each sequence context, match quite well overall despite of slight variations observed in chemical shifts and population ratios, particularly for the GG*C sequence context. Whereas the S-and W-conformer signals were prone to shift, the B-conformer signal appeared to be steady at -115.5 ppm.
This trend is more apparent in Supplemental Figure S11, in which the two FAAFmodified sequence series (NarI-16-mer vs. non-NarI-12-mer) are compared in a pair for each -CG*G-, -GG*C-, and -CG*Csequence contexts. It is plausible that the carcinogen moiety in the major groove of the B-conformer is not subjected to the ring current effect, as the S-and W-conformers would be (14). We were unable to identify the minor signals (asterisked) in the 16-mer NarI-G 2 -(<19%) and -G 3 -FAAF duplexes, although their downfield shifts relative to the B-conformer imply B-like conformers, in which the fluorine containing carcinogen moiety is exposed. Figure 4c shows the 19 F NMR spectra of FAF-modified 12-mer duplexes with the same NarI sequence contexts, which have been thoroughly characterized (20). The B/S conformer population ratios were determined to be 42%:58%, 69%:31%, 35%:65% for FAF-modified NarI-G 1 , G 2 and G 3 , respectively, at 5 °C (Table 2) Figure 5 shows the 19 F NMR spectra of the three FAAF-NarI duplexes as a function of temperature (5-60 °C). Whereas the three 19 F signals in each duplex were in slow exchange at 5 °C, the two downfield B-and S-signals became exchange broadened, giving rise to coalescent signals at around 30, 40, and 25 °C for G 1 , G 2 , and G 3 , respectively. In all cases, the merged signals coalesced with the upfield W-signal at around 60 °C. All three NarI duplexes showed relatively strong off-diagonal contour peaks of the major signals in the exchange spectra (data not shown), confirming their chemical exchanges.

Dynamic 19 F NMR
UvrABC Incisions of FAAF-adducts on NarI sequence Figure 6 shows the kinetic assay results, in which 55-mer FAAF-modified DNA duplex substrates were incised by UvrABC nuclease. These substrates were radioactively labeled at the 5ꞌ-end of the adducted strand. The major incision products can be seen as 18-mer (NarI-G 1 ), 19-mer (NarI-G 2 ), or 21-mer (NarI-G 3 ) separated on a urea-PAGE gel under denaturing conditions (Supplementary Figure S12). The incision occurred at the 8 th phosphate bond 5ꞌ to the modified nucleotide, which is consistent with the previously reported results of UvrABC incision (11,32).
Quantitative analysis of the incision indicated that the substrates were incised at different efficiencies, depending on where the damage site was located in the sequence ( Figure 6).
Specifically, the N-acetylated FAAF adducts at NarI-G 1 and NarI-G 3 displayed similar rates of incision, whereas NarI-G 2 had a much lower rate of incision, G 3 (100%) ≥ G 1 (93%) > G 2 (32%) (Figure 6c, Table 2). For comparison, we also determined the UvrABC incision of FAF adducts in the same NarI sequence context. As shown in Figure 6b,c, the N-deacetylated FAF adducts in the same NarI sequence context were repaired 2 ~ 3 fold less than FAAF. Despite having similar B/S-conformer profiles ( Figure 4 and Table 2) the incision efficiency of FAF adducts at the three different sites in the NarI sequence followed the order of G 1 (44%) ≈ G 2 (43%) > G 3 (25%), where the percentages were calculated relative to FAAF NarI-G 3 (which was the most efficiently incised).

Discussion
It is well known that DNA sequence is a major determining factor for repair outcomes of site-specifically modified bulky DNA lesions. In this study, we examined the conformational heterogeneity and thermodynamics of FAAF and FAF at three different guanine positions (G 1 , G 2 , and G 3 ) of the well-known NarI recognition sequence.
Moreover, we obtained nucleotide excision repair (NER) data of these adducts using the E. coli UvrABC system.

FAAF-induced B/S/W-conformational heterogeneity
Our combined 19 F NMR/ICD results show that FAAF adduct in a well-known mutational hotspot NarI sequence exist in a mixture of B/S/W conformers with varying populations ( Figure 4a, Table 2). A greater population of syn-glycosidic S-(61%) and W-(26%) conformers was observed in NarI-G 3 , in which the lesion is flanked with C on both 5ˈ and 3ˈ ends (-CG 3 *C-). This result is consistent with the preferred syn-conformation the same -CG*C-contexts, either in NarI or non-NarI sequences (20,38,39,41). The mostly syn NarI-G 3 duplex appeared to be distorted, bent, or possibly formed a B-Z junction, as evidenced by a significant blue shift and hyperchromic effect in CD ( Figure   3a) (42). The latter was probably due to the п-п stacking interaction between the intercalated aminofluorene and flanking base pairs. On the other hand, the NarI-G 2 duplex (-G 1 G 2 *C-) exhibited largely the anti B-conformer (57%) along with S-(15%), W-(9%), and two unidentified minor conformers (~19%). In comparison to NarI-G 3 , the NarI-G 2 duplex exhibited smaller blue shift and hyperchromic effect (Figure 3a), suggesting lesser disturbance of the double helical DNA structure.
These results indicate the heterogeneous nature of AAF in the NarI sequence and are consistent with a previous CD study that showed a major DNA distortion for AAF at G 3 adduct compared to G 1 and G 2 (42). Similarly, Veaute et al. (43) conducted a DNase I footprint study on the NarI sequence and showed that AAF at the G 2 position inhibits DNase I digestion of DNA at up to 5 bases in the modified strand and 4 bases in the complementary strand. In contrast, inhibition at the G 1 and G 3 positions was extended to 8 and 6 bases, respectively in the modified strand.
The G 3 and G 2 duplexes are chemically isomeric, differing only in the direction of the G:C base pair at the 5ꞌ-position (e.g., C:G  G:C). Such a polarity swap is clearly responsible for the rather dramatic conformational shift from S-(61% to 15%) to B-(13% to 57%), and W-conformation (26% to 9%)( Table 2). A similar polarity switch at the 3ꞌ-end of the NarI-G 1 duplex resulted in varying degrees of conformational shift in S-(61% to 34%), B-(13% to 46%), and W-conformation (26% to 20%).

Conformation-specific Nucleotide Excision Repair
The E. coli UvrABC system displayed significant differences in repair of the FAAF adduct at each guanine position (G 1 , G 2 , and G 3 ) of the NarI sequence. The NarI-G 2 duplex showed considerably lower efficiency than NarI-G 1 or NarI-G 3 , [G 3 (100%) ≥ G 1 (93%) > G 2 (32%)] ( Figure 6). It is clear from Table 2 that, these NER results are in good agreement with the order of the S-conformer population [G 3 (61%) > G 1 (34%) > G 2 (15%)], but are in exactly the reverse order of the population of B-conformer, [G 2 (57%) . This data suggests that the S-conformation is recognized and incised by E. coli NER dominantly over the B-conformation. We reported previously a similar conformation-specific NER results on a series of FAF-modified duplexes (16,30).
For comparison, we also determined the UvrABC incisions of FAF-adducts in the same NarI sequence context. The two lesions revealed a similar B/S conformer heterogeneity in the NarI sequence context ( Figure 4). Therefore, the expectation was that FAF would show a similar NER profile as FAAF, i.e., S-/W-conformer promotes NER over Bconformer. However, the NER results revealed that incision efficiency was in the order of G 1 ≈ G 2 > G 3 ( Figure 6, Table 2). At first, this result appears to be in line with the Bconformer population. It should be noted that FAF is repaired consistently 2-to 3-fold less than FAAF ( Figure 6c, Table 2). This result is a general trend reported in the literature, although much greater differences in incision efficiency between AF and AAF have been noted (11,44). As a result, the difference between G 1 and G 2 of FAF is not statistically significant (P = 0.83), but their difference with G 3 is significant (P < 0.0001).
The incision differences between FAAF and FAF seem to suggest that, in addition to the sequence-dependent adduct conformation, the acetyl group in FAAF may play a role in DNA damage recognition by UvrABC. The only structural difference between FAF and FAAF is the absence of a bulky acetyl group on the linking nitrogen of the former ( Figure   1a). It has been documented that N-acetylated FAAF adducts in fully paired duplexes produce a mixture of complex S/B/W-conformers, whereas N-deacetylated FAF adopts a simple exchangeable S/B-equilibrium (15,30). Thus, it is clear that the N-acetyl group is responsible for generating up to 26% W-conformer in the NarI sequence ( Figure 4). The bulkyness of the acetyl group with its possibility for cis and trans rotamer transitions about the amide bond (14,15,45) may facilitate the repositioning of the fluorenyl rings into the minor groove from the S conformation. This conformational rearrangement is relatively straightforward since it does not require a change in the glycosidic bond, which is syn in both cases. We observed a good correlation between the proportion of Wconformation and NER efficiency of FAAF.
Moreover, although FAF and FAAF have similar S/B-conformational profile (Figure 4), the N-acetyl group in the latter could act as a "conformational locker" to raise the energy barriers among conformers. Such a scenario, i.e., higher energy barriers of FAAF vs.
FAF, is plausible and might contribute to a greater disturbance in DNA, and thus greater repair. By contrast, the N-deacetylated FAF adopts a facile interchangeable B/Sequilibrium (<2 kcal/mol) that triggers weaker binding affinities with the damagerecognition protein UvrA. A recent crystal study indicated that the UvrA dimer does not contact the lesion site directly, but rather binds DNA regions on both sides of the modification and primarily recognizes adduct-induced unwinding, bending, and deformity in the overall DNA structure (25). Furthermore, DNA damage recognition in E. coli NER is achieved through a sequential 2-step mechanism (46). The initial step is to recognize the adduct-induced distorted DNA structure. After strand opening at the damage site, the DNA adduct structure is further recognized or verified in a second step, which may facilitate the flipping of the adducted nucleotide (47,48). Therefore, it is possible that, for FAF, the second step of recognition plays a more important role than the first step, whereas the first step is a dominate recognition for FAAF.
The order of NER efficiencies described here is roughly consistent with bacterial NER data on AAF adducts embedded in a similar NarI sequence (17): G 1 (100%), G 3 (66%), and G 2 (18%). Sequence dependence was also found in human NER of AAF adducted in the NarI sequence (18). In contrast to the E. coli NER data, however, the AAF adduct at G 2 (100%) was found to be more repairable, followed by G 3 (68%) and G 1 (38%).
Despite differences in the nature of proteins involved in prokaryotic and eukaryotic NER, the two systems show similar involvement of -hairpin intrusion as damage recognition factors (49). Liu et al. (50) found a general qualitative trend toward similar relative NER incision efficiencies for 65% of bulky benzo[a]pyrene and equine estrogen substrates.
Similar to bacterial UvrA, Rad4 (XPC) in yeast also recognizes helical distortion to sense DNA damage; unlike bacteria, yeast use a base-flipping mechanism for repair (26).
Therefore, the efficiency of repair depends not only on the damage recognition step, but also on other factors, such as ease of base flipping.
In summary, our structural and thermodynamic data provide valuable conformational insights into the sequence-dependent UvrABC incisions of the bulky FAF and FAAF adducts in the NarI sequence context. Repair of the bulky N-acetylated FAAF adduct seems to occur in a conformation-specific manner, i.e., the highly S/W-conformeric G 3 and G 1 duplexes incised considerably more efficiently than the G 2 duplex (G 3 ~ G 1 > G 2 )( Table 2). These results were supported by melting and thermodynamic data. Not surprisingly, FAF was repaired 2-to 3-fold less than FAAF; however, the order of incision efficiencies was the reverse of that in the FAAF case. We considered the socalled "N-acetyl factor" and lesion-specific recognition mechanism for the different orders of incision for FAF and FAAF. Finally, the temperature dependence of the S/B/W-conformational equilibria of the FAAF-adducts in the NarI sequence could provide valuable opportunities for conformation-specific NER utilizing thermophilic UvrABC proteins (51). Taken Table 1: Table 2:

INTRODUCTION
The human genome is susceptible to various chemical assaults from external and internal sources including UV radiation and carcinogens. Adduct formation occurs throughout the genome: some sites showed higher adduct formation, yet little or no mutation induction and vice versa. [1][2][3] The so-called locally multiply damaged sites (LMDS) or clustered lesions are defined as two or more lesions occurring in a short stretch (<24 bp) of DNA. 4 In general, cluster lesions are considered to be more genotoxic than a single lesion. It has also been hypothesized that cluster lesions are less repairable than individual ones and that mutation of one lesion is synergized by the presence of another lesion in the vicinity. 5,6 Studies have shown that the efficiency for the repair of clustered oxidative damage depends on the nature of chemical changes, inter-lesion distance, sequence contexts, and their relative orientation. 7 Kalam et al 5 have evaluated the mutagenic potential of 8-oxo-G adjacent to a uracil in simian kidney cells with the assumption that the uracil would be excised by uracil glycosylase to produce an apurinic site. A substantial fraction of tandem mutations was detected when the uracil was adjacent to 8-oxo-G. The presence of an AP site either 5ꞌ or 3ꞌ to 8-oxo-G increased the frequency of 8-oxo-G to T transvesions with the effect much greater on the latter. It has also been suggested that cluster oxidative lesion could be converted into double strand breaks during repair process, highlighting the complexity of cluster-induced mutagenesis. 8 Evidence suggests that a similar clustering occurs with bulky DNA lesions and may be responsible for their sequence dependent repair and mutational outcomes. Fuchs and coworkers have shown that the binding spectrum of 2-acetylaminofluorene (AAF) is essentially random and all guanine residues exhibited equal reactivity towards N-AcO-AAF. 9,10 As in oxidative damages, GC rich regions and runs of contiguous guanines in the genome possess a higher probability of producing cluster adducts. 11  Arylamines are implicated in the etiology of human breast, liver and bladder cancers. 14 The prototype AAF and its 2-amino and nitro derivatives produce two major DNA adducts upon in-vivo activation: N-(2-deoxyguanosin-8-yl)-2-aminofluorene (dG-C8-AF) and N- (Fig. 1a). 15 Both AF and AAF adducts exist in an equilibrium of external binding anti-B-type and syn stacked S-conformers. In the bulky N-acetylated AAF, however, the syn-adduct could also exist in the minor groove binding "wedge (W)-conformer" (Fig. 1c). 16,17 The population ratios of these different conformations were found to be strongly dependent on the nature of neighboring sequence context. 16 . We reported that FAAF exhibits conformation-specific repair, where highly S/W-conformeric G 3 and G 1 duplexes incised considerably more efficiently than the highly B-conformeric G 2 duplex (G 3 ~ G 1 > G 2 ). The NER efficiency of the N-deacetylated FAF was 2-to 3-fold less, but it too displayed strong sequence dependence. 16  factor, more so than the local S/B-conformational heterogeneity that was observed previously for FAF and FAAF in certain sequence contexts. 16,21,27 This work represents a novel 3ꞌ-next flanking sequence effect as a unique NER factor in E. coli.
Virtually nothing is known about the structure and repair consequence of cluster bulky lesions. As elaborated above, previous studies have mainly focused on single sitespecifically modified sequences, primarily because of technical ease dealing with a single lesion than multiple adducts. 28 We hypothesize that cluster arylamine lesions could be formed wherever multiple guanines are present in the genome, resulting in major structural/ conformational consequences on repair and mutational outcomes.
To that end, we prepared the duplex NER substrates in which the bulky FAAF lesions are located on two (G 1 G 2 , G 2 G 3 and G 2 G 3 ) of the three guanines (G 1 , G 2 , and G 3 ) of the NarI recognition sequence (Fig. 1b). We conducted spectroscopic and melting experiments for conformational and thermodynamic analyses and NER measurements using the E. coli UvrABC system. The results present the first report on the NER of the bulky cluster FAAF lesions and its differential efficiency at different guanine pairs (G 1 G 2 , G 2 G 3 and G 1 G 3 ) of the NarI sequence.

Caution: 2-Aminofluorene derivatives are mutagens and suspected human carcinogens
and therefore must be handled with caution.

Preparation and Characterization of FAAF-modified di-adduct ODNs
We reported previously the preparation of 16-mer NarI ODNs (5'- , in which each of the three guanine were adducted by FAAF. 16 We have also documented that the incorporation of fluorine atom at the longest axis does not affect the overall conformational and thermodynamic profiles of aminofluorene-modified duplexes (e.g., FAF, FAAF). 19, 29 FAAF-modified 16-mer diadduct ODNs were prepared similarly with a longer reaction time. 16 The three di-FAAF-modified 16-mer sequences were each annealed with the corresponding complementary sequence (5'-GTGATGGCGCCGAGAG-3') to form fully paired duplexes, designated as NarI-G 1 G 2 , NarI-G 2 G 3 , and NarI-G 1 G 3 depending upon the location of FAAF modification (Fig. 1b).

MALDI-TOF characterization of di-adduct ODNs
The modified 16-mer ODNs were characterized by following the general procedures on 3ꞌ-5ꞌ and 5ꞌ-3ꞌ exonuclease digestion/MALDI-TOF (time of fight) mass spectrometry. 31,32 In general, 1µL of sample containing approximately 100-250 pmol of modified ODN was used for the digestion. In 5ꞌ-3ꞌ digestion, 0.01 unit of bovine spleen phosphodiesterase Approximately 10 ODs of a di-FAAF 16-mer ODN was annealed with an equimolar amount of a complementary sequence (5'-GTGATGGCGCCGAGAG-3') to produce a fully paired duplex (Fig. 1b). The duplex samples were then dissolved in 300 µL of a pH 7.0 NMR buffer containing 10% D 2 O/90% H 2 O, 100 mM NaCl, 10 mM sodium phosphate, and 100 µM EDTA, and filtered into a Shigemi tube through a 0.2 µm membrane filter. All 1 H and 19 F NMR results were recorded using a HFC probe on a Varian NMR spectrometer operating at 500.0 and 476.5 MHz, respectively, using the usual acquisition parameters described previously 19,29,33 . Imino proton spectra were obtained at 5°C using a WET-1D sequence and referenced relative to that of DSS. 19 F NMR spectra were acquired in the 1 H-decoupled mode and referenced relative to that of CFCl 3 by assigning external C 6 F 6 in C 6 D 6 at -164.9 ppm. 19 F NMR spectra were measured between 5 and 65 °C with an increment of 5-10 °C. Temperatures were maintained by a Varian FTS control system.

Substrate Construction and UvrABC Protein Purification
DNA substrates of 55 bp containing two FAAF adducts simultaneously on two of three guanine residues were constructed as previously described. 34,35 Briefly, NarI-G 1 G 2 , NarI-G 2 G 3 and NarI-G 1 G 3 16-mer ODNs was individually ligated with flanking 20-mer ODN (5ꞌ-GACTACGTACTGTTACGGCT-3ꞌ) and 19-mer ODN (5ꞌ-GCAATCAGGCCAGATCTGC-3ꞌ) ODN at the 5ꞌ-and 3ꞌ-end, respectively. The 20-mer was 5ꞌ-terminally labeled with 32 P. The ligation product was purified by urea-PAGE under denaturing conditions. Following the purification, the substrate was annealed to the corresponding complementary strand, and then purified on an 8% native polyacrylamide gel.
UvrA, UvrB, and UvrC proteins were over expressed in E. coli and then purified as previously described. 36 The estimated purity of the three proteins was greater than 95%.
A Bio-Rad Protein Assay was used to determine the protein concentration with BSA as the standard based on the manufacturer-recommended procedures.

Nucleotide Excision Assay and Quantification of Incision Products
The 5ꞌ-terminally labeled DNA substrates were incised by UvrABC as previously described. 34,35 Briefly, the DNA substrates (2 nM To quantify the incision products, radioactivity was measured using a Fuji FLA-5000 Image Scanner with MultiGauge V3.0 software. The DNA incised (in fmol) by UvrABC was calculated based on the total molar amount of DNA used in each reaction and the ratio of the radioactivity of incision products to total radioactivity of DNA. At least three independent experiments were performed for determination of the rates of incision.

Model Sequences
We previously reported that a simply mixing of the NarI 16-mer ODN (5ꞌ-CTCTCG 1 G 2 CG 3 CCATCAC-3') with the chemically reactive FAAF at room temperature produced approximately equal mixture of mono-, di-and tri-adducts. 16 A longer reaction time increased the production of di-and tri-adducts which were clearly separable on a RP-HPLC as shown in Figure 2a. We grouped them previously into three: Peaks 1-3 (28-35 min), 4-6 (42-60 min), and 7 (84 min) as mono-, di-, tri-adducts on the basis of relative absorption intensity (1:2:3, respectively) of adduct induced shoulder in the 290-320 nm range. 16 Moreover, we assigned Peak 1, 2 and 3 as G 1 , G 3 and G 2, respectively, using the well-known exonuclease-LC/MS methods. 16 In the present study, we have characterized Peak 4, 5 and 6 using MALDI-TOF as described below. The di-adducts were designated as NarI-G 1 G 2 , NarI-G 2 G 3 , or NarI-G 1 G 3 , in which the numbers signify the position of FAAF-modified guanines (Fig. 1b).

MALDI-TOF Characterization
The molecular weights of di-FAAF modified 16-mer were measured by MALDI-TOF prior to sequence verification by exonuclease digestion. Ionization of ODNs normally occurs due to the addition of a proton on the phosphate group in the ODN backbone.
Exonucleases cleave terminal deoxynucleotides from the ODN chain until the FAAFmodified dG is exposed at the end of the chain, at which point digestion reaction slows down significantly. The position of modification was identified (in this case) when fragment(s) are formed by the loss of an unmodified guanine nucleotide is observed in the MALDI-TOF spectra. Figure 3 (see Fig. 4 for complete time range) represents the 3ꞌ5ꞌ (Fig. 3a) and 5ꞌ3ꞌ (Fig. 3b)  that the digestion slowed down one base prior to the modification. Taken all together, peak 4 was assigned as NarI-G 1 G 3 (5ꞌ-CTCTCG 1 (FAAF)G 2 CG 3 (FAAF)CCATCAC-3ꞌ).
Peak 5 and 6 were similarly characterized in the same manner and the results are shown in Figure 5 and 6, respectively. The 5ꞌ-and 3ꞌ-digest of peak 5 exhibit ions at m/z 3452.6 ( Fig. 5a) and m/z 3169.5 (Fig. 5b), which correspond to the [M+H] + ions formed from the double modification of 5ꞌ-G 2 CG 3 CCATCAC-3ꞌ and 5ꞌ-CTCTCG 1 G 2 CG 3 -3ꞌ fragments, respectively. Therefore, Peak 5 was assigned as NarI-G 2 G 3 . The 5ꞌ-and 3ꞌ-digest of Peak 6 exhibited ions at m/z 3777.8 (Fig. 6a) and m/z 2551.6 ( Fig. 6b) Figure 7a shows an overlay of the CD spectra for the three FAAF di-adduct NarI-G 1 G 3 ,- increase, the NarI-G 2 G 3 and -G 1 G 3 duplexes actually displayed a decrease in intensity with the greater impact in the former (Fig. 7a).

Circular Dichroism
In addition, sequence-dependent induced CD in the 290-320 nm range (ICD 290-320 nm ) was seen in all three di-adduct duplexes. Previously, we reported that the FAAF-modified NarI mono-adduct duplexes displayed negative dip, with the NarI-G 2 duplex exhibited a greater dip than G 1 or G 3 duplexes. 16 Similar ICD 290-320nm was observed in the di-adduct duplexes. The NarI-G 1 G 3 and -G 2 G 3 duplexes showed a bigger dip than the NarI-G 1 G 2 duplex. Figure 7b shows the UV melting profiles of the three di-FAAF duplexes in comparison to the unmodified control at 4.8 M. All duplexes exhibited typical monophasic, sigmoidal, helix to coil transitions with an excellent linear correlation (R 2 > 0.9) between T m -1 and lnC t . It is clear that all modified duplexes were thermally and thermodynamically destabilized relative to the control. Table 1 summarizes the thermal and thermodynamic parameters calculated from the UV-melting experiments. The results were converted into the thermodynamic chart shown in Figure 8. The NarI-G 2 G 3 duplex was most destabilized with ΔΔG 37°C = 7.8 kcal/mol, ΔΔH = 23.3 kcal/mol and ΔT m = -17.9 °C. The ΔT m value of NarI-G 1 G 3 was -14.1 °C, approximately 4 °C lower than NarI-G 2 G 3 , but had a slightly higher ΔΔH = 24.1 kcal/mol. However, the large entropy (ΔΔS = 56.0 eu) compensated for the enthalpy (Fig. 8, Table 1), resulting in the overall free energy destabilization of ΔΔG 37°C = 6.8 kcal/mol. In comparison, the stability of the NarI-G 1 G 2 was least affected (ΔΔG 37°C = 4.3 kcal/mol, ΔT m = -10 °C). The weak entropy compensation (ΔΔS = 6.0 eu) for the substantial enthalpy reduction (ΔΔH = 6.2 kcal/mol) resulted into the free energy destabilization (ΔΔG 37°C = 4.3 kcal/mol) (Fig. 8, Table 1).

UV Melting Experiments
Overall, the magnitude of thermal (ΔT m ) and thermodynamic (ΔΔG) destabilization was in the order of NarI-G 2 G 3 > NarI-G 1 G 3 > NarI-G 1 G 2 . was very difficult to assign signals as we had done for the mono-adducts. 16 Nonetheless, all di adducts, exhibited typical dynamic NMR characteristics, i.e., exchange broadening of 19 F signals with increasing temperature, reaching an eventual coalescence signals at 65 ˚C and above, indicating thermal denaturation of a duplex (Fig.   9). In addition, overall signal patterns were found to be flanking sequence dependent. Figure 10a compares the 19 F NMR spectra of NarI-G 2 G 3 with the mono-FAAF counterparts (NarI-G 2 and -G 3 ) at two different temperatures (5 and 20 ˚C). Similar comparisons were made for NarI-G 1 G 3 and NarI-G 1 G 2 (Fig. 10b and c). The expectation was that the signal patterns of the di-adduct duplex would be the sum of their respective mono-adducts, but that turned out not to be the case. It is possible that the two nearby FAAF adducts would have some kind of complex conformational interactions with each other, influencing their conformational profiles.

Dynamic 19 F NMR Spectroscopy
It was noted that signals of NarI-G 1 G 3 and G 2 G 3 are broad even at 5 ˚C, indicating some conformeric dynamic exchanges. This observation suggests a conformational flexibility at the lesion site, consistent with their high entropy values (Fig. 8, Table 1). In contrast, the 19 F signals of NarI-G 1 G 2 were comparatively sharper in line with their low entropy (Table 1). In all three duplexes, increasing temperatures resulted in coalescence of signals into a single signal (~-115 ppm) which represents a free modified single strand.
The NarI-G 1 G 2 duplex showed a greater duplex melting signal at ~65 ˚C compared to 60 ˚C for the other two duplexes. These results are consistent with the UV-melting data above. The 19 F conformational complexity was supported by their imino protons spectra (Fig. 11). The NarI-G 1 G 3 and NarI-G 1 G 2 duplexes exhibited multiple signals in the non-Watson-Crick base pairing 11-12 ppm region, which is known to occur from the imino protons of modified guanines. In NarI-G 2 G 3 , however, the signals were shifted downfield to 12.2-12.3 ppm.

UvrABC Incisions of NarI FAAF di-adducts
The NER kinetic assay results are shown in Figures 12 and 13, in which the 55-mer FAAF-diadduct duplex substrates were incised by UvrABC nuclease. These substrates were radioactively labeled at the 5ꞌ-end of the adducted strand. The major incision products can be seen as ~18 to 20-mer fragments separated on a urea-PAGE gel under denaturing conditions (Fig. 12). The incision occurred at the 8 th phosphate bond 5ꞌ to the modified nucleotide, which is consistent with the previously reported results of UvrABC incision. 34,35 Quantitative analysis of the incision products suggest that the di-adduct substrates were incised 1.5-2 times more efficiently than the previously reported monoadducts (Fig. 13). 16 Moreover, the substrates were incised at different efficiencies, depending upon where the two damage sites were located in the sequence (Fig. 13).

Impact of sequence context on duplex structure and thermodynamics
The CD results indicate that the overall DNA structure was greatly influenced by how the two bulky FAAF lesions are arranged in the NarI sequence (Fig. 7a). In particular, base stacking represented by changes in the positive ellipiticity at 270 nm was sequence dependent. While a significant decrease was seen in NarI-G 2 G 3 and NarI-G 1 G 3 , an increase in intensity was noted for NarI-G 1 G 2 . Similar sequence effect was observed in the thermal and thermodynamic instability (Fig. 8, Table 1): NarI-G 2 G 3 > NarI-G 1 G 3 > NarI-G 1 G 2 . This trend is not surprising since base stacking is the hallmark of FAAF modified S-conformation and linked to their thermal and thermodynamic instability, especially in the fully paired duplexes. 19,20 Conversely, the stability of NarI-G 1 G 2 is due to the higher B-conformer population at G 1 (46%) and G 2 (57%).
Assuming that each lesion can exist in both B and S conformation, there are 4 major conformational possibilities: BB, BS, SB, and SS. There will be numerous other possibilities in between. Consequently, 19 F NMR spectra ( Fig. 9 and 10) of di-FAAF adducts are complex and we were unable to make specific conformational assignments.
However, this argument is not consistent with a greater thermodynamic stability observed for the former (Fig. 8, Table 1). It is possible that the closer proximity of two FAAF in NarI-G 2 G 3 (e.g. just one base apart) compared to NarI-G 1 G 3 (two bases) could possibly induces a greater DNA distortion. It should be noted, however, that the most thermodynamically stable NarI-G 1 G 2 has no base in between the lesions. Thus, it appears that assessing the overall thermodynamic stability of di adducts is complicated by various conformational and configurational factors.

Conformation-specific Nucleotide Excision Repair
As shown in Figure 13, the E. coli UvrABC system exhibited approximately 1.5-2.0 folds better efficiency in the reparability of the di-adducts (G 2 G 3 , G 1 G 3 and G 1 G 2 ) as compared to mono-adducts (G 1 , G 2 and G 3 ) in the same NarI sequence. This is not surprising considering the extent of thermal/thermodynamic destabilization of the di-vs. mono-FAAF duplexes. It is well known that the NER efficiency is highly modulated by the damage recognition (UvrA 2 ) and verification/recognition (UvrB) steps. Jaciuk et al have shown that the UvrA 2 does not interact with DNA lesion directly, but senses the structural/conformational perturbations induced by lack of Watson-Crick hydrogen bonding and thermodynamic destabilization. 23 Moreover, the thermodynamic destabilization of the duplex assists UvrB in the insertion of -hairpin into the duplex which is required for the lesion verification/recognition. 24 We reported that the repair of FAAF mono-adducts in NarI sequence occurs in a conformation-specific manner, i.e. the highly S/W-conformeric G 3 and G 1 duplexes produced greater DNA distortion and thermodynamic destabilization and thus incised more efficiently than the B-type G 2 duplex (G 3~G1 >G 2 ). 16 Therefore, the presence of two FAAF in the same NarI sequence is expected to synergize destabilization effect. Indeed, as shown in Table 1, the destabilization effect of di-adducts (ΔΔG 37°C = 4.3 to 7.8 kcal/mol, ΔT m = -10.0 to -17.9 °C) is 1.5-2.0 folds greater than that of the mono-adducts (ΔΔG 37°C = 3.7 to 4.7 kcal/mol, ΔT m = -5.3 to -8.3 °C). 16 These results are in contrast to recent reports on oxidative DNA damages where the reparability is compromised due to clustering, leading to enhanced genotoxicity. 7 The rationale behind this dissimilarity could well be the complicated repair process comprising of base excision repair, non-homologous end joining and homologous recombination proteins executing the processing of oxidative cluster damages.
In summary, to the best of our knowledge the present study represents the first report on the structure-NER relations of any bulky cluster lesions. The repair of di-FAAF adducts was found to occur 2-fold more efficiently than the corresponding mono-adducts in the NarI sequence. The greater reparability of the di-FAAF adducts was found to be primarily due to their greater thermal and thermodynamic destabilization. We also found that the incision efficiency of the di-FAAF adducts occurred in a conformation-specific manner i.e. the highly syn-conformeric NarI-G 2 G 3 and NarI-G 1 G 3 are repaired more efficiently than the highly anti-conformeric NarI-G 1 G 2 . Taken together, these present results indicate the importance of carcinogen induced destacking and related thermal/thermodynamic destabilization in the NER of di-FAAF adducts.

Funding:
This work was supported by the National Institutes of Health (Grant number# CA098296 to BC and CA86927 to YZ); the National Science Foundation/RI-EPSCoR (0554548) and the RI-INBRE core facility supported by National Institutes of Health (P20 RR016457).                    (3). Metabolic activation of these amines in vivo produces C8-substituted dG as the major DNA adducts (4). For example, the human bladder carcinogen 4aminobiphenyl produces ABP (Fig. 1a). Similarly, AF and AAF are the major DNA adducts derived from 2-aminofluorene, 2-nitrofluorene, and 2-acetylaminofluorene (Fig.   1a). The ABP and AF adducts in fully-paired duplex DNA have been shown to adopt an equilibrium of two prototype conformers: "B-type", in which the carcinogen resides in the major groove of a relatively unperturbed double helical DNA; and "stacked (S)", in which the carcinogen is base-displaced and the glycosidic linkage to the modified guanine is syn (Fig. 1c) (5,6). The aromatic moieties of ABP are not coplanar as in AF, which results in a much lower S-state population than AF. AF-induced S/Bheterogeneity is dependent on the flanking sequence, which modulates mutational and repair outcomes (6,7). AAF is chemically identical to AF except for a single acetyl group on the central nitrogen (Fig. 1a), leading to sampling of an additional W-conformation, in which the fluorene moiety is in the minor groove along with a syn glycosidic linkage ( Fig. 1c) (19). We recently conducted E. coli UvrABC NER studies on the NarI sequence duplexes (5ꞌ--G 1 G 2 CG 3 CC--3ꞌ), in which guanines are modified by either AF or AAF (7). Results showed that the bulky AAF adducts repair in a conformation-specific manner, with the highly S/W-conformeric G 3 and G 1 duplexes incised considerably more efficiently than the highly B-conformeric G 2 duplex (G 3 ~ G 1 > G 2 ).

References
On the other hand, the repair rate of N-deacetylated AF was 2-to 3-fold lower than AAF, and the order of incision efficiencies was opposite of that observed for the AAF case. We have coined the term "N-acetyl factor" to describe the complexity of NER recognition of AF vs. AAF (7). Here
An identical set of unmodified duplexes was also prepared as controls.

Differential Scanning Calorimetry (DSC):
DSC measurements were performed using a VP-DSC Micro-calorimeter from Microcal Inc. (Northampton, MA) according to the procedures published previously (22). All sample solutions were 0.12 mM concentration. Tm was the temperature at half the peak area. ΔG and ΔS values were determined by the procedures of Chakrabarti et al. (25). The uncertainties in the values of Tm, ΔH, ΔG and ΔS represent the random errors inherent in the DSC measurements.

Dynamic 19 F NMR:
Duplex samples (about 20~30 ODS) were dissolved in 300 μL of pH 7.0 buffer (100 mM NaCl, 10 mM Na 3 PO 4 , and 100 μM EDTA in 10% D 2 O/90% H 2 O) and filtered into through a Shigemi tube using a 0.2 μm membrane filter. All 1 H and 19 F NMR experiments were conducted using a dedicated 5-mm 19 F/ 1 H dual probe on a Bruker DPX400 Avance spectrometer operating at 400.0 and 376.5 MHz, respectively. Imino proton spectra were obtained using phase sensitive jump-return sequences at 5 °C. 19 F NMR spectra were acquired in the 1 H-decoupled mode and referenced to CFCl 3 by assigning external C 6 F 6 in C 6 D 6 at -164.90 ppm. Temperature dependence spectra were processed as previously reported (20,27).

EMSA assay:
The FABP, FAF, and FAAF-modified 19-mer GCT and GCA sequences were each (100 nM) annealed with an equimolar complementary sequence, in which the 5ꞌ-end was ɣ-32 P-labeled using T4 polynucleotide kinase and [ɣ-32 P] ATP (Perkin-Elmer radiochemical, Boston, MA) in a buffer containing NaCl (25 mM) and Tris/HCl (25 mM). The mixture was heated at 95 ˚C for 5 minutes and then cooled to room temperature overnight. The duplexes were subjected to 15% non-denaturing polyacrylamide (acrylamide:bisacrylamide; 29:1, w/w) gel electrophoresis at 1,800 V and the temperature was maintained at 4-8 ˚C by regularly replacing the running buffer with the ice-cold TBE buffer. Gel was exposed to Kodak phosphor imaging screen overnight and scanned on Typhoon 9410.

Nucleotide Excision Assay:
DNA substrates of 58 bp containing a single FABP, FAF, or FAAF, each adducted at either G*CT or G*CA sequences were constructed as previously described (28,29).

Model Systems
As model systems, 11-mer DNA duplexes [d(5ꞌ-CCATCG*CNACC-3ꞌ).d(5ꞌ-GGTNGCGATGG-3ꞌ)] were prepared, in which G* is FABP, FAF or FAAF and N is either dA or dT (designated as G*CA and G*CT duplexes, respectively) (Fig. 1b). The two sequences are chemically isomeric, differing only on the polarity of the 3'-next flanking A:T vs. T:A. The utility of fluorine-tagged lesions as effective structure probes has been documented (32). Both the G*CA and G*CT sequences have been used previously for studies of bulky adducts (22,33). duplexes at 20°C for the G*CT and G*CA sequence contexts (see Fig. S1 for full temperature ranges). 19 F signal assignments were made based on the H/D solvent effect, exchange spectroscopy, adduct-induced CD (ICD 290-350nm ), and chemical shifts as previously described (6,26,32,34).

FABP-Duplexes: A clear conformational difference exists between the two isomeric
FABP-modified G*CA and G*CT duplexes (Fig. 2a). The single signal at -116.9 ppm for FABP-G*CA has been previously assigned to the B-conformer (22). In contrast, FABP-G*CT exhibited two signals at -116.9 (B) and -118.0 (S) ppm in a 40:60% ratio and adopted a two-site exchange (EXSY spectra at 5 and 17°C, inset, Fig. S1a). A large chemical shift gap (~1 ppm) between the two signals suggests differences of their electronic environments. In addition, the -116.9 ppm signal revealed a large H/D effect (+0.24 ppm) compared to the -118.0 ppm signal (+0.14 ppm)(data not shown) upon increasing the D 2 O content from 10 to 100%. The results indicate the exposed fluorine atom in the B-conformer, as observed in the MD/PMF simulations (Fig. S2).

FAF-Duplexes:
Though not as dramatic, a similar sequence effect was observed for FAF.

Induced Circular Dichroism (ICD)
Figure 2d-f show the CD of the modified G*CA and G*CT duplexes. We reported that Band S-conformers are characterized by positive and negative ICD 290-350 nm , respectively (26). Accordingly, the B-conformeric FABP-G*CA displayed a strongly negative ICD 290-350 nm , whereas an S-shape curve was observed for the S/B-mixture G*CT duplex (Fig.   2d). These results are in good agreement with the 19 F NMR results (Fig. 2a). Unlike FABP, FAF on both sequences exhibited a strong positive ICD 290-350 nm with the effect much greater for G*CT (Fig. 2e), consistent with the greater S-conformer population determined by 19 F NMR (Fig. 2b). The ICD of FAAF (Fig. 2f), which is confined in the narrow 290-320 nm range, has not been defined as clearly as FAF (8).
In addition, the modified duplexes displayed significant blue shift relative to their respective control duplexes (Fig. S5a,b and Table 1). All except for FAF-G*CT exhibited significant blue shifts up to 8 nm. The bulky N-acetylated FAAF exhibited greater shifts ( G*-G = 4 ~ 8 nm) than FAF and FABP ( G*-G = 0 ~ 4 nm). GCA sequences, which are prone to the B-conformer, displayed greater blue shift ( G*CA-G*CT= 2 ~ 4 nm, Table 1) compared to GCT. It is well known that protein-induced DNA bending exhibits significant CD shift at 275 nm of regular B-type DNA (36)(37)(38). For instance, the HMG box protein SOX-5 bends DNA by ~74˚ upon binding, which resulted in a significant blue CD shift (37). These reports suggest that the blue shifts observed in the present study result from the distortion of the DNA backbone, particularly bending.

Gel Mobility Assay
Two 19-mer sequences (5ꞌ-CTTACCATCGCNACCATTC-3', N = T or A) were used to investigate the impact of the A/T polarity swap at N position on the gel mobility of the modified duplexes. Initially, the 11-mer sequences above were used but they denatured in the 15% native polyacrylamide gel at 1,800 V (data not shown). Figure 3 compares the electrophoretic mobility of the 19-mer GCA and GCT sequences with and without modifications. Differential mobility between the single strand (ss)-and double strand (ds)-oligonucleotides confirmed the integrity of the duplexes (Fig. 3). The modified duplexes exhibited retardation in the mobility in a lesion-dependent manner. In both sequences, major retardation effect was observed for FAAF followed by FABP, whereas no retardation was observed for FAF, results consistent with the CD blue shift data above (  (41).

Thermodynamics
Thermodynamics results from UV-optical melting (Table S1) and differential scanning calorimetry (DSC), which is not dependent on melting patterns and stoichiometry (22) ( Table 2), are comparable. Figure S6 shows the DSC thermograms of modified duplexes in the G*CA and G*CT sequences relative to the unmodified controls. These curves were transformed into the corresponding thermodynamic charts (Fig. 4a,b) and the results are tabulated in Table 2. According to the NMR results (Fig. 2), FABP and FAF display a S/B-equilibrium, whereas FAAF produces a complex S/B/W-equilibrium: thus, they will be compared separately.

FABP/FAF-DNA Duplexes:
Both FABP and FAF resulted in destabilization (Fig. 4, Table 2). FABP reduced T m for the G*CA and G*CT duplexes by -10.2 and -11.2 °C and ΔΔG 37°C by 2.4 and 2.9 kcal/mol, respectively. The G*CA/G*CT transition produced major effect on H (2.7 vs. 10.8 kcal/mol) and S (0.8 vs. 25.6 eu), consistent with a significant increase in the S-conformer population from 0 to ~60% (Fig. 2a). The B-conformer FABP-G*CA is expected to lead to small entropy compensation, and consequently the enthalpy reduction dominated the free energy destabilization (Fig. 4a).
As expected, the structural disturbance caused by the S/B mixture FABP-G*CT duplex leads to a considerable reduction of melting enthalpy, however, most of it is compensated by entropy (Fig. 4b).
FAF modification resulted in a similar destabilization effect: ΔT m by -9.1 and -9.6 °C and ΔΔG 37°C by 2.7 and 2.9 kcal/mol, respectively, for G*CA and G*CT. However,  (Table S2) FAAF-Duplexes: FAAF modification resulted in the most significant reduction of ΔT m by -21.8 and -23.3 °C and ΔΔH by 39.4 and 41.1 kcal/mol, respectively for G*CA and G*CT sequence contexts (Fig. 4a, 4b and Table 2). This is due to the bulky acetyl group in FAAF (Fig. S2). Like FABP and FAF, however, entropy compensation contributed a stabilization, i.e., ΔΔG = 5.6 and 5.4 kcal/mol, respectively, for the G*CA and G*CT.

Molecular Dynamics (MD)/Potential Mean Force (PMF) Calculations
To further understand the impact of lesion modification on the G*CA and G*CT duplexes, MD simulations were performed in combination with PMF calculations. PMF calculations yield the free energy as a function of the extent of flipping of the modified G* base (Fig. S7). Individual PMFs were determined with G* in either the anti or syn orientations, while only the anti orientation was studied for the unmodified duplexes. with the conformational thermodynamic data discussed above. As such, the lowest energies of the flipped state occur with FAAF (Table 2) and moreover, that the syn FAAF may sample a wider range of conformations as compared to FABP and FAF, consistent with the 19 F NMR data (Fig. 2). Shown in Figure S15 are bending probability distributions for the B-, S-and W-states for the three lesions in both the G*CA and G*CT contexts. In the B-and S-states the extent of bending is significantly larger with FAAF (cyan) versus FABP (red) and FAF (blue).
These results are consistent with the experimental data obtained from the greater blue shift in CD (Fig. S5, Table 1), although the changes in G*CA occur only in the S-state (Fig. S15). In addition, the simulations indicate the extent of bending for FABP to be similar to that of FAF. The significant increase in bending in FAAF is consistent with the greater destabilization of the duplexes caused by the FAAF lesion (Table 2).
Concerning the bending, calculation of local helicoidal parameters revealed significant differences in twist and tilt for base pair 8, where the A/T switch occurs (Table S7). For example, twist values are systematically larger and tilt values less negative in GCT versus GCA sequences. These differences suggest that the local structural alteration associated with the A/T switch is being propagated to the overall helix.

E. coli UvrABC incision
DNA substrates containing lesions in the defined sequences were subjected to incisions by the E. coli UvrABC system in a kinetic assay. These substrates were radioactively labeled at the 5ꞌ-end of the adducted strand and the major incision products separated on a urea-PAGE gel running under denaturing conditions (Fig. S16). The incision occurred at the 8th phosphate bond 5ꞌ to G*, which is consistent with the currently accepted mechanisms of UvrABC-based NER (28,29). The substrates were incised at differing efficiencies depending on not only the type of DNA adduct but also the sequence context ( Fig. 6). Specifically, the G*CA sequences were incised at higher rates by ~ 2 fold than G*CT, while the order of incision rate of adducts is: FAAF > FAF ≈ FABP for both sequences, with FAAF being incised with 2-3 folds greater efficiency than the other lesions.
It should be noted that the 5′-incision products appeared as doublet bands (Fig. S16).
Similar incision products of this type of lesion have been observed previously (35,46,47).
This is likely either due to the type of arylamine lesions or to the structural heterogeneity exhibited by this type of lesions as demonstrated in the present and previous studies, suggesting that UvrABC may make the 5′-incision at the site different by one nucleotide for the different conformers of arylamine lesion.

Conformational and Thermodynamic Effects on the G*CA/G*CT Transition
The NMR/ICD results indicate that lesion stacking is affected considerably by a polarity swap at the 3'-next flanking base (GCA  GCT). The effect was most significant for FABP, which produced a dramatic increase in S-conformer (0% to 60%) (Fig. 2a). This is an extraordinary DNA sequence effect. A similar trend was observed for FAF, although the S-conformer was only 24% greater in G*CT than in G*CA (Fig. 2b). Unlike FABP and FAF, the impact of the A/T swap on FAAF was minimal; specifically, the synglycosidic S-and W-conformers remained relatively unchanged (~78% to ~83%) (Fig.   2c). Interestingly, the increase of W-conformer (14% to 30%) appeared to be compensated by a concomitant decrease of S-conformer (64% to 53%). This data indicates that the N-acetyl group in FAAF can push the low energy syn-S/W-equilibrium towards W (see "N-acetyl factor" below). Overall, these results indicate that the A/T swap has the largest impact on the most stable system, while the least stable FAAF lesion is the least impacted.
As expected, all modified duplexes were consistently destabilized as compared to the controls (Fig. 4, Table 2): FAAF > FAF ≈ FABP. The G*CA/G*CT transition led to further destabilization, which was associated with increases in lesion stacking (greater S/W) for all three lesions. Obviously, a higher population of the syn-S-/W-conformer states is expected to disrupt the double helical structure which would significantly reduce the enthalpy, accompanied by a compensatory increase in entropy (22).
Moreover, good correlation between the magnitudes of change in conformer populations and incision efficiencies was found among the lesions. In FABP, the A/T polarity swap caused a 60% change in S-conformer proportion and a 3-fold reduction in repair efficiency. The changes were significantly lower in FAF and FAAF (24% and 5-15%, respectively), as were the repair efficiencies (2.0-and 1.8-fold, respectively). At a glance, the results seem to suggest B-conformers have greater reparability than S-conformers.
This feature is in clear contrast to the trend that has been observed previously for AF and AAF in certain sequence contexts, i.e., the S-conformer is more reparable than the Bconformer (7,35). This type of conformation-specific repair is not only restricted to arylamines, but also applied to other bulky lesions. For instance, Geacintov et al have reported that the base-displaced cis-N 2 -dG adducts of benzo[a]pyrene are incised more efficiently than the minor-groove orientated trans-N 2 -dG adducts (33).
The E. coli repair results in the present study, however, seem to match well with events of adduct-induced DNA bending/distortion, as evidenced by blue shifts in CD ( Table 1) and retardation of mobility in EMSA (Fig. 3). The slowed mobility indicates flexibility at the lesion site as observed by Tsao et al for (+)-trans-anti-[BP]-N 2 -dG lesion in the TG*T sequence context with concomitant thermal destabilization (41,48). Similarly, the bulky N-acetyl FAAF exhibited significantly slower electrophoretic mobility as compared to FAF and FABP within the same sequence context. In case of sequence, the G*CA duplex exhibited consistently greater bending than its G*CT counterpart, with the effect being significantly greater for FAAF than FAF and FABP. A similar CD pattern has been reported for AAF-modified NarI duplexes related to the formation of a B-Z junction (49).
Clearly, the A/T swap alters the conformational equilibrium anti (B-) to syn (S-or W-).
It should be noted that the G*CA (--TCG*CAA--) sequence contains a stretch of alternating pyrimidine:purine bases, which are predisposed to DNA bending (50)(51)(52). In contrast, such a stretch is interrupted in the highly S-conformeric G*CT (--TCG*CTA--) sequence. It is possible that the B-conformer may facilitate DNA bending, due to the exposure of the carcinogen moiety to the major groove's hydrophilic environment. In both sequences, a major effect was observed with FAAF, followed by FABP and FAF (Fig. 3). MD/PMF simulations indicate, however, that the major changes in G*CA occur in the S-state. Also, unlike the CD data, there were no significant differences in electrophoretic mobility between the two sequences (Fig. 3). The reason for the inconsistency in mobility, CD, and MD data demonstrating the sequence effect is not apparent, but the greater bending and flexibilty of FAAF over FABP or FAF is in good agreement with the observed repair efficiencies (FAAF >> FAF ≈ FABP; Fig. 6).
The repair results in Figure 6 along with previously reported work on polycyclic aromatic hydrocarbons (41,53) and arylamines (7) indicate that lesion-induced destabilization of DNA is a major determining factor for repair. However, these lesions were consistently repaired 2~3 times more efficiently in G*CA than in G*CT, which was not consistent with relative thermodynamic stabilities observed for each. The inconsistence is likely due to the second step of damage recognition (54) that becomes much more significant for FAAF vs. FABP and FAF within a given sequence. Unlike the initial step of damage recognition by UvrA 2 which depends on DNA conformation and sequence, the second step of recognition is well-known to be characterized by the direct interaction of UvrB with adduct itself upon DNA strand opening (47,(54)(55)(56). In other words, the structure and chemistry of the lesions matter more with UvrB than UvrA 2 . Recently, Liu et al reported the NER incision efficiencies of the bulky benzo[a]pyrene and equine estrogen substrates using human HeLa cell extracts and bacterial UvrABC proteins (53). They demonstrated that in spite of having differences in the prokaryotic and eukaryotic NER proteins, XPC-RAD23B and UvrB, respectively, exhibit common feature of -hairpin intrusion for damage recognition. In addition, it was found that local thermodynamic destabilization near the lesion site assists the insertion of -hairpin, thus recognition.
Clearly, the present study shows that the thermodynamic destabilization of the DNA duplex along with lesion flexibility promotes strand opening and thus the second step of damage recognition. The presence of the N-acetyl group (see below) may make FAAF more efficiently recognized than FAF and FABP at the second step due to its flexible nature and greater destabilization of the DNA double helix. As for the G*CA/G*CT transition, the initial recognition step conducted by UvrA 2 should be a major determinant factor as the same efficiency of recognition at the second step is expected for the same type of lesion. Thus, bending appears to be an important factor for the DNA damage recognition. Indeed, a recent crystal study by Jaciuk et al (10) found that in the active site of UvrA, the fluorescein-modified duplexes were bent by ~15˚ and the structure was related to the kinked structure of psoralen and PAH adducts according to NMR (57).
They concluded that the UvrA 2 protein does not have direct chemical contacts with a lesion per se, but indirectly senses the overall helical distortion (unwinding and bending)(10). Since energy is required for the bending, formation of the pre-bent DNA including the 3'-next flanking base, are identical to those used in the present study, they did not consider the structural and repair consequences of the GC*A/G*CT swap.

N-Acetyl Factor
Though a relatively small modification, the N-acetyl group has an important structural consequence. As shown in Figure 7, the lack of the acetyl moiety in G*CT-FAF, allows the G* moiety (red licorice representation) to point away from the sugar and stay in the plane of the GC base pair, where the N-H bond is directed towards the sugar. However, in G*CT-FAAF, the acetyl group will have a steric clash with the sugar moiety of G* (identified with a black arrow), thereby leading the fluorene moiety (cyan) to be perpendicular to the G* ring system. This persistent "perpendicular" lesion orientation is predicted to lead to more disruption of the DNA duplex. A similar observation regarding the differences in the orientation of AF and AAF was reported by Mu et al (19) who have conducted a NER study of these lesions in human HeLa cells. MD simulations in that work indicate that the greater repair susceptibility of AAF stems from steric hindrance effects of the acetyl group which significantly diminish the adduct-base stabilizing van der Waals stacking interactions relative to AF. The persistent "perpendicular" FAAF mentioned above could raise conformational barriers of FAAF, resulting in the overall lower free energy of the syn-G* PMFs for FAAF, as compared to FABP and FAF. In other words, the N-acetyl group in FAAF could act as a "conformational locker" (7) that orients the adduct in a position that will lead to greater destabilization of the DNA duplex ( Fig. 4, Table 2), as well as the increased bending observed in CD (Table 1) and mobility assays (Fig. 3). As a result, FAAF lesions are repaired at significantly greater rate compared to the FABP and FAF lesions (7).

Conclusion
The A to T polarity swap in the arylamine-modified G*CA/G*CT transition produced a dramatic increase in destabilized stacked conformation, but resulted in unexpected 2~3fold lower NER efficiencies. These results are consistent with lesion-induced DNA bending/distortion. As for lesions, FAAF was repaired 3~4 times more efficiently than FABP and FAF lesions, which is consistent with the extent of bending and helix destabilization, as well as the steric constraint in the duplex ("N-acetyl factor") (7). A number of different damage recognition parameters have been implicated in the molecular mechanisms of NER (9,55,58). Although, it is known that thermal/thermodynamic destabilization and DNA distortion/bending are important factors for damage recognition by repair proteins (9,39,53). The present results show that lesioninduced DNA bending/thermodynamic destabilization is a more important NER factor than the usual S/B conformational heterogeneity, as has been observed previously for AF and AAF in certain sequence contexts (7,35). The present work represents a novel 3'-next flanking sequence effect as a unique NER factor for bulky arylamine lesions in E. coli.           respectively. 5,6 In DNA duplex, ABP and AF adducts exist in a equilibrium of two major conformers: anti-glycosidic "B-type", in which the adduct orients in the major groove of the duplex and syn-glycosidic "stacked (S)", in which the carcinogen is base-displaced at the adduct site (Fig. 1b, AF as an example). 7-10 However, due to the non-coplanarity of biphenyl moiety, ABP does not stack that well as in case of AF, therefore predominantly adopts B-conformation. 8 The B-and S-conformeric orientation of arylamine adducts is not only influenced by planarity of the adduct structure, but also from the nature of bases surrounding the lesion. 7,[9][10][11][12][13][14] Strong literature evidences have bolster the hypothesis that the conformational profile of an adduct in a particular sequence context plays a significant role in determining its biological fate like mutation and repair. 9,11,[15][16][17] For instance, we have reported recently that arylamine lesions undergo nucleotide excision repair (NER) in a conformational specific manner where more distortive syn-glycosidic S-conformeric FAAF get repaired more efficiently than less-distortive B-conformation. 11 Involvement of structural and conformational factors is not limited to repair, but were also reported in the mutational outcome. For instance, AF and AAF are known to induce different mutational spectra where former induces solely point mutation and latter results into a mixture of point and frameshift mutations. 18 These mutagenic dissimilarities between the two have been attributed to their structural and conformational differences.
Furthermore, the mutagenicity of a particular adduct is also affected by its location within the DNA template i.e. neighboring base sequence context. One of the most striking reports on the sequence effect involves NarI sequence (5ꞌ--G 1 G 2 CG 3 CN--3ꞌ), where AF and AAF adduct yield significantly higher frequency of -2 frameshift mutation, only when adducted to third guanine (G 3 ) of the sequence. 19 Moreover, the mutational frequency was found to be governed by the base at N position. The presence of C at position N resulted into relatively higher rate of mutation as compared to T, suggesting the involvement of next-flanking base. 20 Structural studies have shown that AF adduct adopts stacked (S) conformer when N=C, but displayed conformational heterogeneity when C was replaced by T. 9 Recently, we discovered that the aforementioned 3ꞌ-next flanking T effect on the adduct conformation is not limited to AF or NarI sequence, but also persists in FABP or FAAF Previously, we studied the chemically simulated TLS, although on different sequence contexts (CG*A and TG*A), to deduce the conformational behavior of FAF adduct at the replication fork. 21 We found that the conformational heterogeneity exhibited by FAF in the duplex setting also exist at the replication fork and that the S-conformeric FAF in CG*A sequence context favors the insertion of A over C at the lesion site. In the present work, we carried out the similar type of simulated TLS on the aforementioned G*CT and G*CA duplex to investigate the impact of 3ꞌ-next flanking base on the conformation of Approximately 15 ODs of a modified template was annealed with an equimolar amount of a complementary primer sequence to produce different duplexes ( Fig. 1c and 1d) Avance spectrometer operating at 400.0 and 376.5 MHz, respectively, using acquisition parameters described previously. 9,10,24,25 Imino proton spectra at 5°C were obtained using a phase-sensitive jump-return sequence and referenced relative to that of DSS. 19 F NMR spectra were acquired in the 1 H-decoupled mode and referenced relative to that of CFCl 3 by assigning external C 6 F 6 in C 6 D 6 at -164.9 ppm. 19 F NMR spectra were measured between 5 and 60 °C with an increment of 5-10 °C. Temperatures were maintained by a Bruker-VT unit with the aid of controlled boiling liquid N 2 in the probe.

Primer Extension Assay:
Standing start experiment: Steady state kinetic experiments were performed as described previously. 21

Results:
Model Systems: The original 11-mer G*CT sequence was revised (Fig. 1c) to study the effect of flanking base on the conformational heterogeneity induced as a result of 3ꞌ-next flanking T. As shown in Fig. 1c, the changes were made on the flanking cytosines by systematically replacing them with either G, T or A. The templates were modified with FABP and annealed with their respective complementary sequences for 19 F NMR and CD. Figure 1d shows the G*CN (N= A or T) sequences designed for the simulated TLS system. In here, the original sequences were extended from 11-mer (Fig. 1c) to 19-mer ( Fig. 1d) to impart stability to n-1 and n duplexes. Four model template−primer duplexes ( Fig. 1d) were prepared by annealing the FABP-or FAF-modified G*CN 19-mer strand with primers of variable length (n − 1, n, n + 3, and n + 9, where n is the lesion site).

Flanking base effect:
The objective of this work was to investigate whether the presence of T around the lesion site (G*) will alter the conformational heterogeneity (% S-conformer) induced by the 3ꞌnext flanking T in G*CT sequence context. We were also interested in deducing the role of 3ꞌ-flanking C in the G*CT sequence, therefore changed it to A, T or G (Fig. 2). We used FABP adduct for this study since it exhibited the most dramatic differences in conformeric ratio in the two sequences (Fig. 3); G*CT (60% S-conformer) and G*CA (100% B-conformer). Figure 2 shows the 19 F NMR of the different FABP modified sequences at different temperatures. In all the cases, one major signal was observed at ~-117 ppm which falls in the chemical shift range of B-conformer as has been reported previously. 22 The B-conformeric nature of FABP was also confirmed from the CD where negative ellipticity at 290-350 nm (ICD 290-350 nm ) (Fig 2a) is a signature pattern of Bconformation. 22 In addition, all the duplexes exhibited one minor signal in the upfield Sconformation range (-117.5 to -118.0 ppm) except G*AT which had a minor downfield signal (~116.8 ppm). The results clearly indicate that the presence of T next to the lesion site does not produce a synergistic effect rather shifts the equilibrium back to the Bconformer ( Fig. 3 shows comparison). The replacement of 3ꞌ-flanking C by A or G slightly maintained the conformational heterogeneity as evidenced from 19 F NMR and imino protons spectra (Fig 3b)(asterisk), but in general favored the B-conformation.
Overall, the aforementioned observations indicate that the CG*CT is a unique sequence context, where the next-flanking T produce maximum effect.

Simulated TLS
FABP: Figure 4a and 4b shows the dynamic (5−50 °C) 19 F NMR spectra of the G*CT and G*CA TLS system at various primer positions (n-1 to full), respectively. The assignments of 19 F signals in these spectra have been reported previously. 21,25 The full G*CT and G*CA duplexes (Fig. 1d) which represent an end point in the TLS process, displayed a clear conformational differences (Fig. 4a and 4b) between them, similar to the one that was previously observed in their respective 11-mer duplexes (Fig. 3). The FABP in G*CT duplex exhibited B-and S-conformational mixture whereas in G*CA it exclusively adopted the B-conformation (Fig. 4). Although, the conformational heterogeneity was intact in G*CT 19-mer duplex, but the percentage of S-conformation was dropped from 60% to 33% going from original 11-mer to 19-mer sequence. At n+3 primer position, similar conformational differences were retained between the two sequences as evidenced from their 19 F signals. In contrast to n+3 and full, the two sequences exhibited very similar 19 F NMR characteristics at (n) and before (n-1) the lesion site as both showed the two prominent 19 F signals. We assigned these signals as anti-type B* and syn-type S* conformers based on their similarity with the B-and Schemical shift range. Apparently, the only noticeable difference between G*CT and G*CA (at n and n-1) was in the percentages of B*-and S*-conformers (~12%) with former having relatively higher S*-population (Table 1). This was surprising in G*CA case, as it exhibited predominantly B-conformeric signal at n+3 and full duplexes (Fig   4b). The overall summary of FABP induced conformers with their population ratios at different primer positions of the two templates are tabulated in Table 1. Figure 5a and 5b shows the dynamic 19 F NMR spectra of the FAF adducted G*CT and G*CA TLS system (n-1 to full), respectively. The full G*CT and G*CA duplexes ( Fig. 1d) displayed the conformational profile (Fig 5a and 5b) similar to their respective 11-mer duplexes that were reported previously. In both sequences, FAF exhibited B-and S-conformational heterogeneity with G*CT having ~16% more S population than G*CA.

FAF:
But, the extension of template from 11-mer to 19-mer affected the G*CT S-conformer population (reduced from 90% to 80%) as was observed in the FABP case above. At n+3 primer position, the conformational gap between the two sequences was increased to ~23%, specifically due to increase in B-conformer population in G*CA (Fig. 5b).
Moving from n+3 to n and n-1 positions, FAF exhibited two broad 19F signals at low temperatures (5 and 10 ˚C), which were well resolved at 15 ˚C and higher temperatures.
Again, based on their chemical shifts the signals were assigned as B* and S* conformers.
Between the two sequences, the gap was further increased to ~34% at the lesion site (n), reason being the significant increase in G*CA B* population to ~58% (Fig. 5b and Table   1). However, at n-1 position they displayed a very similar 19 F characteristic with approximately equal populations of B* and S*. The overall summary of FAF TLS system is tabulated in Table 1.

Primer extension assay:
Running start: In the present study, the running start experiments were performed across the 44-mer G*CN (N= A or T) template modified by either FABP or FAF. A short 32 P labeled primer (25-mer) was annealed to the templates and primer extension were carried in the presence of all four dNTPs, DNA polymerase (KF exoor Dpo4) and appropriate reaction buffer as mentioned above in materials and methods. It is known that the presence of an adduct on the template strand retards the DNA polymerase mediated primer extension. Our focus was to investigate the effect of conformational differences induces by the 3ꞌ-next flanking base on the primer extension across the modified templates ( Fig. 7, 8).
Primer extension across FABP modified template: a) KF (exo -): Figure 7a shows the products of KF exomediated primer extension across the FABP modified templates at different time intervals and room temperature.
The KF exowas able to extend primer to full length product (44-mer) across GCT and GCA as early as 2-5 mins after start. However, two blockages in extension were seen, one before the lesion site (n-1, 30-mer) and the other at the lesion site (n, 31-mer). The blockage at the lesion site that was seen in the initial few minutes was relatively stronger.
It appears that most of the full length product was formed from this blocked primer, because the primers at n-1 site were persistent even after 60 mins of reaction. Although, both sequence context show similar blockage pattern near the lesion site, but the duration of blockage were different. In the initial few minutes, GCT displayed a significant increase in blockage at the lesion site (green arrow) which starts to drop once the extension into full length product initiated. The GCA had a similar increase, but for relatively shorter period of time than GCT. Apparently, the effect of a prolonged blockage at the lesion site was propagated to the full length extension with GCA showing a faster rate of extension than GCT (see inset in Fig. 7a).

b)
Dpo4: Similar experiments were performed with Dpo4 across the FABP modified templates (G*CT and G*CA) as shown in Figure 8a. Like KF exo -, Dpo4 extended the primer to full length products (44-mer) in both the sequences. Besides, blocks were seen at two similar sites i.e. n-1 and n. But, in contrast to KF (exo -) the amount of full length extension by Dpo4 was much lesser, even at high enzyme concentration. Furthermore, the major blockage was one base prior to the lesion site (n-1) as opposed to lesion site (n) observed in KF exo-case (32-mer). Despite these differences in extension efficiency and blockage sites, both exhibit similar sequence effects. Here too, GCT shows prolonged blockage at the lesion site (green arrow) resulting into slower full length extension (see inset Fig. 8a). These results indicate that KF (exo-) and Dpo4 experience similar conformational resistance during primer extension.
Primer extension across FAF modified template: a) KF (exo -): As depicted in Figure 7b, the study was extended to FAF lesion to see whether it experiences a similar sequence effect. In here, the primers were blocked at three sites with a major block at the lesion site and two minors were one base before (30mer) and after the lesion site (32-mer). Despite having an extra blocking site, KF (exo -) displayed better efficiency of extension across FAF than FABP. In addition, the blunt end addition was observed here whereas no such addition was seen in the FABP case.
Notably, the similar sequence effect was observed in FAF with GCA showing a faster rate of extension compared to GCT (see inset Fig. 7b).

b)
Dpo4: The results of Dpo4 mediated primer extension are shown in Figure 8b.
Here, there were two blocking sites with the major blockage at the lesion site (n). Unlike KF (exo-), no blockage after the lesion site (32-mer) was noticed. However, the differences in the rate of extension across the two sequences were noted with faster extension across GCA than GCT. But the differences were not that significant. Overall, our running start experiment results show that the two DNA polymerases do experience sequence effects while extending primer across the above two lesions.

Steady state kinetics:
We performed steady-state kinetic experiments to investigate the influence of FABP-and FAF-induced conformational heterogeneity on nucleotide insertion kinetics around the lesion. To do so, the lesion was positioned one nucleotide downstream of the templating base (n − 1), at the templating position (n), one nucleotide upstream (n + 1), and three base pairs upstream of the primer terminus (n + 3) (Fig. 9). In addition, we also examined the role of sequence effects in insertion efficiency (kcat/Km). The results of the steadystate kinetic experiments are presented in Tables 2-10.  (Table 2), FAF; f ins = 0.40 and 0.31 (Table 6)] for the G*CA and G*CT sequences, respectively, compared to the controls. In FABP, the f ins of dCTP opposite the lesion (at n) was further reduced to 0.11 and 0.09 for G*CA and G*CT, respectively, which represents ~10 fold reduction in rates, relative to the controls (Table 3). Interestingly, the insertion frequency of the wrong nucleotide dATP opposite the adducted guanine in the G*CA (3 × 10 -4 ) and G*CT (2 × 10 -4 ) context was further reduced, relative to the control. This suggest that in G*CA dCMP insertion was preferred over dAMP by ∼367-fold, while 450-fold in G*CT ( Table  Discussion: The role of neighboring bases in modulating the mutational and repair outcomes of bulky DNA lesions is well reported. 9,11,[15][16][17] Previously, we determined the effect of 3ꞌ-next flanking base on the NER efficiency of FABP, FAF and FAAF adducts. Our NMR study on flanking base effect ( Fig. 2 and 3) clearly demonstrates that the G*CA and G*CT are the unique sequence contexts that induce this effect. Cai et al 27  and insertion (n) sites of both the sequences, the FABP adopted a dynamic mixture of anti-B* and syn-S* conformers with a small conformational difference of ~12% (G*CT having more syn-S* conformer). This conformationally heterogeneous behavior of FABP was expected in G*CT sequence as it adopted a mixture of B (40%) and S (60%) conformers in 11-mer full duplex, as reported previously. However, the similar conformational heterogeneity observed in G*CA case was unanticipated, because it is known to adopt exclusively B-conformer in this sequence context. It appears that at these positions the open structure of the undeveloped duplex provides flexibility to the lesion to move in (S*) and out (B*) of the duplex easily. Previously, we have observed similar scenario in FAF modified C*GA and TG*A sequences, where despite the dramatic differences in n+3 and full duplexes, FAF exhibited similar conformational profiles at n-1 and n sites. 21,28 Likewise, in here, the conformational difference between the two sequences started to build up at n+3 site with FABP adopting a mixture of B (65%) and S (35%) conformers in G*CT sequence whereas exclusively producing B-conformer in G*CA (Fig. 4, Table 1). These differences were then transmitted to the full duplexes.
This data suggest that the insertion of nucleotide 3 bases upstream and thereafter of the damage site restricts the movement of the lesion, thus allowing it to adopt a more defined stable conformations. Notably, the extension of original 11-mer to 19-mer did not affect the conformational profile of FABP in G*CA sequence, but resulted into the reduction of S-conformer (60% to 35%) in G*CT sequence. Apparently, the reason behind the effect of template length on the percentage of S-conformer is not clear. Overall, the structural data suggest that the 3ꞌ-next flanking base effect observed previously in 11-mer duplexes do exist in simulated TLS involving 19-mer, but minimized significantly near the lesion site (n-1 and n).

FAF:
The conformational behavior of the FAF was identical to FABP at n-1 and n positions as it too adopted a dynamic mixture of anti-B* and syn-S* conformers. In comparison to FABP, however, FAF exhibited much higher percentage of S-conformer which can be attributed to the planar nature of the fluorene moiety. Besides, the sequence effect (G*CA vs. G*CT) at n-1 position was negligible (~1% vs. 12% in FABP) (Table  1). Surprisingly, pronounced conformational differences were seen at the lesion site (n) with G*CT exhibiting ~34% more S-conformer population than G*CA (Table 1). This effect was much more dramatic than FABP (~12%, Table 1) and what was previously observed in FAF modified 11-mer full duplexes (~24%). The transition from n to n+3 position resulted into the increase in S-conformer population in both G*CA (42% to 58%, Table 1) and G*CT (76% to 81%, Table 1) sequences. But, due to the difference in the magnitude of increase, the overall conformational difference was reduced to 23% (from 34%). The completion of the duplex further increased the G*CA S-conformer population to 64%, whereas that of G*CT was largely unaffected (~81%). Again, the extension from 11-mer to 19-mer have negligible effect on the conformational profile of FAF in G*CA, but significantly reduced the S-conformer population (90% to 80%) in G*CT as observed in FABP case. Nevertheless, the structural data showed that the 3ꞌ-next flanking base effect noticed previously in 11-mer duplexes continued in the simulated TLS with maximum effect at the lesion site followed by n+3 and full duplexes, but largely no effect at the preinsertion site (n-1).

Structural insight into the primer extension across the lesion:
The TLS (running start, Fig. 7  reported the crystal structures of AF modified dG while undergoing accurate replication across it by high fidelity DNA polymerase. They found that dG-AF exhibits syn conformation at the preinsertion site and undergoes a transition to an anti conformation at the insertion site which allows it to pair with an incoming dCTP. As shown in Table 1, the FAF at preinsertion site exhibited more syn type S*-conformer whereas FABP was oriented more in anti type B* conformer. This suggests that the anti-configuration of the FABP modified dG at the pre-insertion site does not allow the incorporation of the base at the lesion site, thus blocks the process at n-1 site. Similar effects was observed in FAF modified C*GA and TG*A sequences, where highly B-conformeric TG*A (full duplex) exhibited stronger blockage at n-1 site, whereas mostly S-conformeric CG*A (full duplex) blocks at the lesion site (n). In terms of sequence effect, both the sequences (in FAF and FABP) displayed similar blocking pattern as expected owing to their similar conformational profiles around the lesion site ( Fig. 7 and 8, Table 1). However, the full length extension across G*CA sequence was relatively faster than G*CT in both the lesions and by both the DNA polymerases (inset in Fig 7 and 8). As mentioned above, the polymerase prefers modified dG in anti-configuration at the lesion site for the successful replication across the lesion. Therefore, relatively slower extension across G*CT might be due the higher percentages of syn-S* conformer of the lesion at the lesion site.
The sequence effect on the insertion efficiency (fins) of correct base (dGTP) at the preinsertion site is negligible in both the lesions (Fig. 10). This is in line with the structural data (summarized in Table 1) where the 3ꞌ-next flanking base barely affected the conformational patterns of FABP and FAF. The similar trend was seen at the lesion site as the conformational characteristics of the lesions (FABP and FAF) and the insertion efficiency of the correct base (dCTP) were comparable between the two sequences.
Although, both the lesions favor the insertion of correct base opposite them, but the preference for correct base was more prominent in FABP than FAF (see steady state kinetics in results section). This could be due to the relatively higher population of syn-S* (Table 1) conformer in FAF which makes it difficult for the polymerase to decide which base to be inserted opposite the lesion. In FABP, the insertion of dATP does not exhibit any sequence effect as expected from its structural data. In FAF, however, G*CT sequence favored dATP insertion by about 11 fold more than G*CA, which could be attributed to the relatively higher population of syn-S* (~34%) conformer in G*CT case.
The extension of match series at n+1 site, immediately after the lesion (FAF and FABP) was significantly reduced due to the prolonged blockage at the lesion site as evident from the running start experiment. We were unable to determine the extension efficiency of FABP mismatch series due to the extremely low insertion rate of the base at n+1 site. In contrast, FAF mismatch series showed extension beyond the lesion site but with extremely low efficiency. These results indicate that the insertion of incorrect dATP was not tolerated by both FAF and FABP. In addition, no significant sequence effects were observed at this particular position in both match and mismatch series. At n+3 site, where 19 F NMR data showed well defined conformations, the extension efficiency too exhibited some sequence effects. In FABP, the G*CA show exclusively B-conformation (against B/S mixture in G*CT) at this position so as 6-and 10-fold higher extension efficiency (than G*CT) of match and mismatch series, respectively. Whereas FAF, which have only 16% difference in the conformational profile in the two sequences, exhibited only 2-2.5 fold alteration in the extension efficiency of match and mismatch series.
In summary, our results revealed that (a) AF/ABP in the G*CA and G*CT sequences adopt a similar conformational profiles at the replication fork (n-1 and n site), (b) due to similar conformational profiles, no substantial 3ꞌ-next flanking base effect was observed in the base insertion efficiency at the replication fork (n, n and n+1), (c) significant stalling occurred at both the prelesion (n − 1) and lesion (n) sites but the anti-conformeric