CHARACTERIZATION OF A PUTATIVE TRANSCRIPTION FACTOR

Basic helix-loop-helix (bHLH) proteins belong to a large family of transcription factors that are known to play important roles in cell proliferation, differentiation and oncogenesis. These proteins are structurally featured by a bHLH motif, which is responsible for protein dimerization and sequence-specific DNA binding (e.g., E-box). Recently we isolated a cDNA from a human liver library by a gene trapping method. Based on the Kozak rule, this cDNA encodes a protein with 415 amino acids, which is hereafter designated as CCAF. The objective of this thesis is to establish the molecular mass of this protein and to test the hypothesis that CCAF is a transcriptional modulator involving the regulation of cell cycle events. To establish the molecular mass, CCAF was in vitro translated with TNT reticulocyte lysate and analyzed by autoradiography. Addition of the CCAF cDNA to the reaction mixture yielded a single product with a molecular weight of 52 kDa. This mass is consistent with the estimated weight and suggests that the Kozak favorable sequence indeed harbors the codon for translation initiation. In order to determine whether CCAF undergoes posttranslational modifications, immunochemical experiments were performed. An antibody was raised against a peptide derived from CCAF and subjected to affinity chromatography. This antibody detected a 52-kDa protein in the CCAF cDNA transfected cells but not in the control cells. These results further support the notion that the functional CCAF is a 52-kDa protein and undergoes little post-translational modifications.


LIST OF TABLES
. Classification based on structural features 8 Table 3. Classification based on evolutionary relationship 9 Table 4. Abbreviation of bHLH proteins 19 Table 5.Sequence of oligonucleotides for cDNA trapping 72 Table 6. cDNA trapping of human cytochrome 72 P450 enzymes and reductase Table 7. cDNA trapping of Pl-4-phosphate 5-kinase and inositol 1,3,4-triphosphate 5/6-kinase To establish the molecular mass, CCAF was in vitro translated with TNT reticulocyte lysate and analyzed by autoradiography. Addition of the CCAF cDNA to the reaction mixture yielded a single product with a molecular weight of 52 kDa. This mass is consistent with the estimated weight and suggests that the Kozak favorable sequence indeed harbors the codon for translation initiation. In order to determine whether CCAF undergoes posttranslational modifications, immunochemical experiments were performed. An antibody was raised against a peptide derived from CCAF and subjected to affinity chromatography. This antibody detected a 52-kDa protein in the CCAF cDNA transfected cells but not in the control cells. These results further support the notion that the functional CCAF is a 52-kDa protein and undergoes little post-translational modifications.
antibody detected the 52-kDa protein in the cells plated at all densities.
These studies were further extended to human colon carcinomas. Both Northern and Western blotting analyses detected abundant expression of CCAF in the carcinomas but not in the nearby normal tissues.
These findings suggest that CCAF involves the regulation of cell cycle events and contributes to the oncogenic pathogenesis.
To determine the activity of CCAF in transcription regulation, transient cotransfection experiments were conducted with an E-box reporter. CCAF alone caused little change on the reporter enzyme activity. However, CCAF antagonized by 30% the transactivation activity conferred by E47, a bHLH protein that is known to transactivate E-box reporter. The antagonism on E4 ?-mediated transactivation activity and the differential expression relating to cell growing and oncogenic states support the hypothesis that CCAF is a transcriptional modulator that involves the regulation of cell cycle events and plays a role in oncogenic pathology.

INTRODUCTION
Transcription factors are proteins that initiate and modulate transcription rate by interacting with specific DNA recognition sequences in the target genes. As shown in Fig. 1, these DNA-binding transcription factors are structurally classified into four major classes: Helix-turn-helix homeodomain (e.g. PBX1 ), C 2 H 2 zinc finger (e.g., Sp1 ), Helix-loop-helix (e.g ., c-myc) and Leucine zipper (e.g. , c-fos and c-jun). Many other transcription regulators fall into none of these classes.

I. Common Features of bHLH Proteins
1. Structural characteristics of bHLH proteins bHLH proteins are distinguished by their bHLH domain, which was first identified in an immunoglobulin enhancer-binding polypeptide and several other proteins (e.g. Daughterless, MyoD and myc) (Murre et al., 1989). The bHLH domain is divided into two functional subdomains: a Helix-loop-helix (HLH) subdomain and an adjacent basic region. The HLH subdomain consists of two short amphipathic a-helices separated by a non-conserved loop with various lengths and primarily mediates the dimerization between HLH proteins (Voronova et al., 1990). The basic region consists of a cluster of 10-20 amino acids rich in lysine and arginine residues and is responsible for sequence-specific DNA binding (Burley, 1994 ).
In addition to the bHLH domain, some proteins contain other structures that are functionally important (Fig. 2). The myc oncoproteins contain a leucine zipper (LZ) motif that is responsible for dimerization (Penn et al., 1990). Drosophila hairy, E(spl) and the mammalian homologues (e.g., HES) contain an orange domain, a Drosophila C-terminal binding protein (dCtBP) motif (PLSLV) and a Groucho motif (WRPW). The orange domain mediates transcription repression (Dawson et al., 1995). PLSLV and WRPW motifs mediate the recruitment of transcription corepressors dCtBP and Groucho, respectively (Poortinga et al., 1998, Sewalt et al., 1999, Zhang et al., 1999. A group of the bHLH proteins (e.g., AHR and ARNT) contains a PAS domain that is responsible for dimerization between PAS proteins, xenobiotic binding and interaction with non-PAS proteins , Gradin et al., 1996. A conserved domain C in E proteins is also required for in vivo dimerization (Goldfarb et al., 1998 are currently used for the classification of these proteins. Each of these will be discussed individually.

1 Based on tissue distribution and dimerization
In this classification system, bHLH proteins in mammals are grouped into two classes based on their expression pattern (Table 1 ). Class I, also known as Eprotein family, consists of bHLH proteins encoded by E2A, HEB and E2-2 genes (Little et al., 1998). E2A encodes E12 and E47, which are produced by alternative splicing (Sun et al., 1991 ). Class I proteins are ubiquitously expressed and capable of forming transcriptionally active homo and/or heterodimers. Class II proteins display a tissue-or cell-restricted expression pattern. For example, myogenic bHLH proteins (e.g., MyoD) present specifically in muscle cell lineage while neurogenic bHLH proteins (e.g., NeuroD) present mostly in neural system. Class II proteins can exist as homodimers, but the active form is predominantly the heterodimers formed with class I proteins (Murre etal., 1994, Little et al., 1998.  (Table 2). Class I includes bHLH proteins (e.g., E47 and MyoD) that have a basic region adjacent to the N-terminus of the HLH domain. Class II includes bHLHZ proteins (e.g., myc and Max) that contain an additional LZ dimerization domain immediately C-terminal to the bHLH domain.
Class Ill includes HLH proteins that lack the DNA-binding domain due to the loss of the basic region (e.g., Id). These proteins act as negative regulators by effectively forming inactive, non-DNA-binding heterodimers with other bHLH proteins. Although mammalian homologues of E(spl) such as HES have an intact basic region, they also belong to this class because there is a proline residue in their basic region, leading to a preference for an N-box instead of an E-box site. The binding to the N-box results in transcription repression.  , 1997). In this classification, the phylogenetic analysis of amino acid sequences is used to describe the patterns of evolutionary change within the motif and define the evolutionary lineages.
These evolutionary lineages are well-known functional groups of proteins that can be further arranged into five classes based on the DNA binding (E-box), the amino acid patterns in the basic region, and the presence or absence of a LZ (Table 3). The hypothesized ancestral amino acid sequence for the bHLH transcription family is given together with the ancestral sequences of the subclasses.  (French et al., 1991 ). bHLH proteins exhibit different biological functions by heterodimerizing with different partners. For example, heterodimerization of Da with members of the Achaete-Scute class leads to the formation of neuronal precursors (Cabrera et al., 1991 ). On the other hand, heterodimerization of Da with Atonal protein leads to the formation of different, nonoverlapping sense organs and photoreceptors .

2 DNA binding
After dimerization, bHLH proteins usually bind to the cis-acting DNA elements present in the target genes, resulting in the change of the gene expression.
These DNA elements contain core sequences CANNTG or CACNAG. CANNTG, commonly known as the E-box, was first identified in the immunoglobulin heavychain (lgH) intronic enhancer and has been found in a large number of pancreatic-, lymphoid-, and muscle-specific promoter and enhancer elements (Little et al., 1998). CACNAG, commonly referred to as N-box, is present in the promoter of genes such as HES gene. Most bHLH proteins (e.g., MyoD, E12) bind as dimers to E-box , while some bHLH proteins (e.g., Hairy and HES) prefer to binding to the N-box (Dawson et al., 1995). The binding preference is specified by the sequence in the basic region of bHLH proteins.
Generally, the praline-containing basic region has a higher affinity toward the Nbox, whereas the basic region without a praline preferentially recognizes the Ebox. Some bHLH heterodimers recognize different core DNA sequences rather than E-or N-box. For example, the AHR-ARNT complex usually binds to the dioxin response element TNGCGTG (Bacsi et al., 1995).

3 Transcriptional regulation
Binding to specific DNA elements by bHLH proteins leads to transcriptional activation or repression. MyoD and its related myogenic bHLH proteins, for example, bind to the E-box and activate transcription of myogenic genes (Weintraub et al., 1991 ). Drosophila Hairy and its related protein E(spl) and HES, however, bind to the N-box and inhibit the transcription of neurogenic genes (Dawson et al., 1995). This transcription repression process is outlined in such as Da/Scute. This binding also leads to a recruitment of Groucho, which is a transcription corepressor. Other than the N-box binding, Drosophila Hairy and its related proteins have been shown to form non-functional heterodimers with Ebox binding bHLH transactivation proteins (Sasai et al., 1992). Therefore, transcriptional repression of bHLH proteins is achieved in two distinct manners, binding to the N-box and/or titrating other bHLH proteins.  (Dawson et al. , 1995) Both transcription activation and repression mediated by bHLH proteins are essential for organ development.
Lack of either mechanisms results in developmental defects. For example, Mash1 promotes the neuronal differentiation.
The absence of Mash1 in mice results in death at birth accompanied by the loss of olfactory and autonomic neurons . HES proteins, however, suppress the neuronal differentiation. The absence of HES proteins (e.g ., HES1) accelerates neuronal differentiation, resulting in severe defects such as anencephaly and eye anomalies (Ishibashi et al. , 1995).

Other features of bHLH proteins
In addition to the characteristics described above, bHLH proteins also have other important properties in the organ developmental processes. Some bHLH proteins compensate functionally for each other and are subjected to auto-and cross-regulation. For example: 1. Myf5, a myogenic bHLH protein, was initially found indispensable for normal rib cage development. In a later experiment, however, the insertion of the myogenin gene, a homologue of myf5, into the myf5 locus (simultaneously disrupting myf5 function) was found to give rise to mice with a normal rib cage (Wang, et al., 1996). 2. In vertebrate myogenesis, MyoD, Myf5 and myogenin all up-regulate their own expression. They are also able to regulate the expression of others (Fig. 4 ). MyoD is required for the expression of myogenin. Myf5 induces MyoD, and Id is inactivated by MyoD. A similar complex network also occurs in bHLH proteins involved in neurogenesis and sex determination.

II. Significance of bHLH Proteins in Oncogenesis
The bHLH proteins play important roles in the control of cellular proliferation and differentiation in various lineages, from invertebrates to mammals. An imbalance between the cell proliferation and differentiation caused by bHLH proteins may have oncogenic significance. Generally, proliferation-promoting action is oncogenic, while differentiation-promoting action is tumor-suppressing. Myc proteins, for example, are known to promote cell proliferation and inhibit differentiation (Penn et al., 1990, Chin et al., 1995. Several members of the myc family in cooperation with an activated ras oncogene have transformed primary rat embryonic cells in culture . Transgenic mice with enforced c-myc expression also exhibit a significantly higher incidence of malignancy than control mice . The oncogenic mechanisms by bHLH proteins remain to be established. The deregulation of gene expression by these proteins plays a major role in tumor formation. For example, the p53 tumor suppressor gene contains an essential CACGTG motif within the promoter region. The ectopic c-myc can specifically bind to this motif and activate the expression of the mutant p53, leading to oncogenic transformation (Popescu et al., 1998). Other than the gene regulation, a protein-protein interaction is also found responsible for the oncogenesis of some bHLH proteins. For example, in the presence of overexpressed LM01, the enforced expression of an amino-terminal truncated Tal1, which lacks the transactivation domain, leads to aggressive T-cell malignancies in transgenic mice (Aplan et al., 1993). In this case, Tal1 is not acting by transactivation of the target genes, but acting through a protein-protein interaction.

Ill. Statement of Purpose
A full-length cDNA was recently isolated in our lab from a human liver library by a gene trapping method. The sequence alignment reveals that this cDNA encodes a protein highly similar to bHLH proteins such as human DEC1 (-92%), and rat SHARP (-80%) and mouse strate 8, particularly in the region encoding functional structures such as bHLH domain (Fig. 5). DEC1 is a Bt2cAMP inducible bHLH protein that may function as a transcription regulator in chondrogenesis (Shen et al., 1997). SHARP proteins are mammalian E(spl) and hairy-related bHLH proteins that play essential roles in neurogenesis (Rossner et al., 1997). performed. The CCAF cDNA-transfected cell lysates were subjected to SOS-PAGE and then immunochemically detected by an antibody, which was raised against a peptide derived from CCAF and purified via affinity chromatography.
This antibody was also used to determine the cellular localization of CCAF.
bHLH proteins are known to play important roles in the control of cellular proliferation and differentiation. The experiments in this thesis were designed to determine the expression of CCAF in different cell growth states. OLD cells derived from colon carcinomas were seeded at different densities, inducing different cell growth states. The CCAF expression in each state was detected by the antibody prepared against CCAF. OLD cells were also seeded at a certain density, and after reaching confluence, cells were maintained in the same medium or changed to 0.25% medium for 4 additional days. The expression of CCAF was detected at each day point by the same antibody.
The deregulated or ectopic expression of some bHLH proteins (e.g., myc and Tal1) is known closely related to the oncogenic pathogenesis. The experiments in this thesis were designed to determine the expression of CCAF in human carcinomas and the normal nearby tissues. Northern blot probed with radioactive-labeled CCAF cDNA was performed to detect the CCAF gene expression in cancer and nearby normal tissues. The same antibody against CCAF was also used to detect the CCAF protein expression in these tissues.
In addition to the bHLH domain, CCAF contains an orange domain and a modified dCtBP motif (PLSLV), which are present in the transcription repressive bHLH proteins such as Drosophila Hairy and mammalian HES (Fig. 6)

Antibody purification
A polyclonal antibody specific to the peptide (CSQALKPIPPLNLETKD) derived from C-terminus of CCAF was raised in New Zealand White rabbits in our lab.
To diminish the nonspecific binding, the antibody was purified by immunoaffinity chromatography. This was conducted as described by Harlow and Lane (Harlow et al, 1988). First, the peptide (1 mg) was covalently coupled to 2 ml of SulfoLink gel (PIERCE, Rockford, IL) via incubation at room temperature for 30 min in a PD-10 column (Pharmacia Biotech, Sweden). 50 mM cysteine (2 ml) was then added to block the nonspecific binding sites. Prior to applying the antiserum, this antigen-coupled gel column was sequentially pretreated with 20 ml of Tris (pH7.5), 20 ml of 100 mM glycine (pH 2.5) and 20 ml of 100 mM triethylamine (pH11.5, fresh). The antiserum (5 ml) was 1 :1 diluted with 10 mM Tris (pH7.5) and repetitively applied to the column three times to ensure the complete binding.
The column was washed with 40 ml of 10 mM Tris (pH7.5) followed by 40 ml of 500 mM NaCl in 10 mM Tris (pH7.5). The antigen specific antibody was then eluted from the column with 20 ml of 100 mM glycine (pH2.5), and collected in a tube containing 5 ml of 1 M Tris-HCI (pH 8.0). The elution was dialyzed in a tu bin against PBS (containing 0.02% sodium azide) at 4°C overnight with stirring, and then aliquoted and stored at -20°C.

Western immunoblotting
The transfected cell lysate was subjected to SOS-PAGE as described previously. containing cytosol proteins were stored in 0.5 ml plastic tubes while the nuclear proteins in the pellet were redissolved in 70 µI of 1 % SDS lysis buffer. All the samples prepared above were separated via SOS-PAGE followed by immunoblotting with antibody against CCSF as described previously.

CCAF is a 52-KDa protein.
To establish the molecular weight, CCAF was in vitro translated with TNT reticulocyte lysate and analyzed by autoradiography. Addition of the CCAF cDNA to the reaction mixture yielded a single product with a molecular weight of 52-kDa (Fig. 7). This mass is consistent with the estimated weight. In order to determine whether CCAF undergoes posttranslational modifications, the CCAF cDNA-transfected COS? cell lysates were size-separated via SOS-PAGE and analyzed by Coomassie blue assay and Western immunoblotting, respectively.
As determined by Coomassie blue assay, an extra band with a molecular weight of 52-kDa was detected in the CCAF-transfected cells, and the level of the protein expression increased with the increase of the CCAF plasmid concentration (Fig. 8, lane 2 and 3). An antibody was raised against a peptide derived from CCAF and subjected to affinity chromatography. This antibody detected the 52-kDa protein in the CCAF cDNA-transfected cells but not in the control cells (Fig. 9). Thus, three independent assays all demonstrated that the isolated full-length cDNA encodes a protein with a molecular weight of 52-kDa.
CCAF expression is related to the cell growth states.
To determine the CCAF expression in different cell growth states, OLD colon cancer cells were seeded at different densities, and the cell lysates were fractionated via SOS-PAGE and then analyzed by western immunoblotting. As shown in Fig. 10, the antibody against CCAF detected a 52-kDa protein in OLD cells at all seeding densities. But the expression of this protein increased with the plating densities. The level of this protein in the cells that are seeded at high density was -3 to 5-fold higher (lane 3, 4 and 5) than that in the cells that are seeded at low density (lane 1 and 2).
To further confirm this expression pattern, OLD cells were seeded at a certain density, and after reaching confluence, cells were maintained in the same medium or changed to 0.25% medium for 4 additional days. The expression of CCAF was detected at each day point by the same antibody against CCAF. As shown in Fig. 11, the CCAF expression was hardly detectable before cell reaching confluence (lane1 and 2), while after confluence, the level of CCAF protein increased about 3-5 folds (lane 3-6). The level of CCAF expression was also positively related to the starvation days (Fig. 12).
CCAF is abundantly expressed in human colon carcinomas but not in the nearby normal tissues.
Northern blot with the radiolabeled full-length cDNA screened the CCAF mRNA expression in various human carcinomas. Except kidney and ovary cancer, CCAF gene expression was significantly high in some carcinomas such as lung and breast when compared to that in normal tissues (Data not shown).
Particularly, without exception, this cDNA detected abundant expression of CCAF in 5 individual colon cancer cases but not in the nearby normal tissues (Fig. 13). The same results were also obtained by using immunoblotting to detect the CCAF protein expression in colon carcinomas. As shown in Fig. 14, the antibody specific to CCAF recognized a strong band with a molecular weight of 52-kDa in samples from 3 individual colon cancer patients (Lane 3, 4 and 5).
In contrast, no bands were detected by the same antibody in normal nearby tissues (Lane 6 and 7).
CCAF is localized in cell nucleus.
DLD cytosol proteins were separated from nuclear proteins, and both were subjected to Western immunoblotting analysis. As shown in Fig. 15, the antibody against CCAF recognized a protein with a molecular weight of 52-kDa in nucleus but not in cytosol, suggesting that the CCAF protein is a nuclear protein.
CCAF inhibits the transactivation activity of E4 7 on an E-box reporter.
To test the activity of CCAF on the gene regulation, CCAF plasmids were cotransfected with an E-box luciferase reporter and the induction of reporter enzyme was determined via a dual-luciferase assay system. As shown in Fig.   16, E47, a bHLH protein that is known to transactivate E-box reporter, promoted a 380-fold increase of luciferase activity. In contrast, CCAF caused little activation on luciferase expression by itself, but partially antagonized by 30% the enzyme induction conferred by E47.  which is responsible for protein dimerization and sequence-specific DNA binding (e.g., E-box). Recently we isolated a cDNA from a human liver library by a gene trapping method. Sequence alignment indicates that this cDNA encodes a putative bHLH protein termed CCAF. The objective of this thesis is to establish the molecular mass of this protein and to test the hypothesis that CCAF is a transcriptional modulator involving the regulation of cell cycle events.
CCAF is a 52-kDa protein with little posttranslational modifications.
Based on the Kozak rule, the largest protein encoded by the isolated cDNA is 415 amino acids long with a calculated molecular mass of 52-kDa. One of the studies described in this thesis is to determine the molecular weight of CCAF. This is achieved by measuring the mass of CCAF translated in vitro and in CCAF cDNA-transfected cells. In both methods, a protein with a molecular weight of 52-kDa is produced. The consistence with the calculated mass suggests that CCAF undergoes little posttranslational modifications.
CCAF is a cell cycle regulator.
The expression of CCAF is detected in different cell growing states induced by either contact inhibition or serum starvation. The proliferating cells express lower level of CCAF than the growing-arrest cells. This differential expression is closely related to the regulation of cell cycle. Myc proteins are highly expressed in the proliferating cells but less in the differential cells, and these proteins are known to promote the cell proliferation but inhibit the differentiation (Penn et al., 1990, Chin et al., 1995. The low level of CCAF in the proliferating cells suggests that CCAF is a cell cycle regulator, which has little proliferation-promoting activity but plays an important role in the rescuing cells from death (anti-apoptotic).
The expression pattern of CCAF in different cell growing states is similar to the oncoprotein junD, which is found to cooperate with ras oncogene in transforming rat embryo fibroblast (Vandel L et al., 1996). This similarity suggests that CCAF has oncogenic significance. Moreover, abundant CCAF is found in human colon carcinomas but not in the nearby tissues. Ectopic and deregulated expression of several bHLH proteins is related to oncogenesis. Tal1 and other closely related bHLH proteins (Tal2 and L YL 1 ), are normally not expressed in T cells, but are constitutively expressed in >60% of T-cell acute lymphoblastic leukemia (T-ALL) (Alphan et al., 1992, Bash et al., 1995. In transgenic mice, overexpression of TAL 1 gene in cooperation with a misexpressed LM01 protein induces aggressive T-cell malignancies (Aplan et al., 1993).
It has been proposed that the progression of a tumor might not only be a function of cell proliferation but also result from inappropriate suppression of apoptosis (Marx, 1993). The oncogenic nature of some bHLH proteins is known to suppress apoptosis. For example, TAL 1 bHLH oncoprotein is recently found antiapoptotic. A Jurkat leukemic T cell subline expressing a C-terminally truncated mutant TAL 1 undergoes rapid apoptosis upon medium depletion.
Transfection with a wild type of TAL 1 reverses this process, suggesting that TAL 1 inhibits the apoptotic signaling in the absence of survival factors (Leroy-Viard et al., 1995). Overexpression of TAL1 significantly blocks granulopoietic and monocytic cell apoptosis induced by chemotherapeutic agents (Bernard et al., 1998). The oncogenic pathology of CCAF is likely due to its antiapoptosis.
CCAF is a gene transcription regulator.
Not all the proteins that possess bHLH domains are transcription factors. For example, the calcium binding proteins with the bHLH domain are just components of the calcium-signaling pathway (Marsden et al, 1990). Thus the final purpose is to determine if CCAF is a transcription modulator. Firstly, the nuclear localization provides the evidence that CCAF is a nuclear protein. Then the antagonism of CCAF on E47-mediated transactivation further supports that CCAF is a transcription regulator. The molecular mechanisms for such an antagonism remain to be determined. There are three hypotheses: (1) CCAF binds to the E-box, but the binding doesn't activate transcription; (2) CCAF sequestrates E47 by forming inactive, non-DNA-binding heterodimers; (3) CCAF dimerizes with E47 but the heterodimers are less active than E47 homodimers.
E47 is also known a tumor-suppressing protein. The repression on E47 may have oncogenic significance. This has been confirmed in oncoproteins such as Tal1, which forms heterodimers with E47 (Steven et al., 1998).
In summary, the results in the thesis support our overall hypothesis that CCAF is a transcriptional modulator involving the regulation of cell cycle events, and the expression of CCAF is related to oncogenic pathology. However, further experiments need to be conducted to fully understand CCAF from the structure, the function and its molecular basis. The electrophoretic gel mobility assay (EGMA) can be used to study whether the DNA-binding is required for the transcription repression (E-box and/or N-box transactivation). The gene array can be used to study the target genes regulated by CCAF, and the yeast twohybrid system can be used to study the CCAF mediated protein-protein interaction. These studies will likely define whether CCAF is a marker for colon tumor diagnosis and whether CCAF is a potential target for the therapeutic intervention. Screening of a cDNA library is primarily conducted with nuclei acid hybridization or antibody staining. These procedures usually involve library plating, colony transferring to a membrane, and clone identification. Because of the high density at which clones are plated, isolation of pure clones requires secondary and multiple screening at decreasing alone densities. In addition, screening of a meaningful number (10 5 -10 6 ) of cloned may take weeks or even months (Sambrook et al., 1989).
Recently a cDNA trapping method was developed (Li et al., 1995). In this method, a complex population of double-stranded phagemid DNA is isolated from a cDNA library and converted to single-stranded DNA (ssDNA Results and discussion. Oligonucleotides in the P450 group were targeted to human cytochrome P4502C8, P4502E1, and P450 reductase, respectively. They were designed to anneal to the 5'-coding region of each cDNA. The trapping experiment with these oligonucleotides results in the isolation of -500 individual clones. From those, 32 individual clones were cultured and the corresponding phagemid DNAs were isolated. Sequencing of the 5'-end (-350 bases long) demonstrated that they all contained full-length cDNA inserts.
Among them, 6 clones contained the cDNA encoding P4502C8, 4 clones contained the cDNA encoding P4502C18, and 16 clones contained the cDNA encoding P4502E1, as shown in Table 2. Therefore, half of the sequenced clones contained the cDNA encoding P4502E1.
The P4502C subfamily has four members: they are used in this experiment for this subfamily matched the sequence of 2C8 perfectly, but did not match the sequence of 2C9, 2C18, or 2C19 by two bases. As expected, the trapping experiment isolated the cDNA encoding 2C8. However, surprisingly the trapping experiment also isolated the cDNA encoding 2C18 but not 2C9 and 2C19. The reason for this phenomenon was probably due to the relative abundance of the cDNA in this library. Namely, the cDNA encoding 2C18 was more abundant in this library than those encoding either 2C9 or 2C19. Similarly, a relatively high abundance of 2E1 cDNA in the library was probably the reason that 50% of the clones sequenced in the P450 group contained the cDNA encoding this enzyme.
Compared with other P45 enzymes, P4502E1 is not a very abundant enzyme in normal liver (only -9% of total P450). However, the level of P4502E1 can be significantly increased by environmental factors such as alcohol consumption and disease status such as diabetes. It is likely that the individual whose liver mRNA was used for the constriction of this library was previously exposed to one or more of these factors.
In contrast, the trapping experiment with the oligonucleotides in kinase group yielded only 31 individual clones. As shown in Table 3, 2 of these clones encoded mouse type I Pl4/5-kinase a; 6 of them encoded mouse type I Pl4/5kinase 13; 4 of them encoded a recently identified type I Pl4P5-kinase, termed y; and 7 of them encoded human Ins (1,3,4) P3 5/6-kinase. The rest of these clones encoded other proteins. Form y exhibits a high degree of sequence identity (20 of 21 bases) with form 13 in the region where the oligonucleotide was derived. Thus, the cDNA encoding form y was probably captured by the oligonucleotide for form 13. It should be noted that the human liver and mouse brain libraries were constructed with vectors that have different flanking sequences at the polylinker region, which was used to establish by the polymerase chain reaction which library was the source from which a cDNA was isolated.
Among the cONAs encoding other proteins in the kinase group, half coded for vitamin 0-binding protein (Table 3), and the rest apparently coded for yet unidentified proteins based on a BLST search in Gene Bank. These unidentified proteins may actually share a significant sequence identity with the oligonucleotides used for the trapping experiment (only a -350-bp sequence in the 5' end was determined), which probably contributed to their isolated. In contrast, the cONA encoding vitamin 0-binding protein shows no significant sequence identity with the oligonucleotides; the reason for its isolation remains to be determined. It was likely that the abundance of the cONAs coding for the kinases to be isolated in these libraries was so low that an excess of the labeled oligonucleotides was available to hybridize with unrelated cONAs and yields false positives. This notion if supported by the fact that all of the sequenced clones in the P450 trapping experiment resulted in the isolation of target cONAs, because P450 enzymes, the primary phase I drug-metabolizing enzymes, are very abundant in the liver (Shephard et al., 1992).
We report here a modified cONA trapping technique that can accommodate multiple oligonucleotides and more than one library at the same time. This method has all the features of the original method such as screening of a large number of clones within a week, rapid isolation of full-length cONA, and identification of related sequences. This modified method, compared with the original method, saves time and significantly reduces the cost, as the reagent used for cDNA trapping are expensive. This is particularly effective for isolating multiple forms of cDNAs. Human cytochrome P4502C8 5'-gtc tat ggt cct gtg ttc acc-3' 5 Human cytochrome P4502E1 5'-gtg gtg atg cac ggc tac aag-3' 11 Human cytochrome P450 reductase 5'-cag cat gac ggc cat gat tct-3' 12 Mouse Pl-4-phosphat 5-kinase a 5'-caa tgg gag gca ttc cag cta-3' 8 Mouse Pl-4-phosphate 5-kinase p 5'-caa gac eta tgc ace tgt tgc-3' 9 Inositol 1,3,4-triphosphate 5/6-kinase 5'-atc ate cac aag ctg act gac-3' 10