<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2008-9-6-r97</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Identification of motifs that function in the splicing of non-canonical introns</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Murray</snm>
               <mi>I</mi>
               <fnm>Jill</fnm>
               <insr iid="I1"/>
               <email>jill.i.murray@gmail.com</email>
            </au>
            <au id="A2">
               <snm>Voelker</snm>
               <mi>B</mi>
               <fnm>Rodger</fnm>
               <insr iid="I1"/>
               <email>rvoelker@molbio.uoregon.edu</email>
            </au>
            <au id="A3">
               <snm>Henscheid</snm>
               <mi>L</mi>
               <fnm>Kristy</fnm>
               <insr iid="I1"/>
               <email>henscheid@molbio.uoregon.edu</email>
            </au>
            <au id="A4">
               <snm>Warf</snm>
               <fnm>M Bryan</fnm>
               <insr iid="I1"/>
               <email>mwarf@molbio.uoregon.edu</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Berglund</snm>
               <fnm>J Andrew</fnm>
               <insr iid="I1"/>
               <email>aberglund@molbio.uoregon.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Institute of Molecular Biology and Department of Chemistry, University of Oregon, Eugene, Oregon, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>6</issue>
         <fpage>R97</fpage>
         <url>http://genomebiology.com/2008/9/6/R97</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18549497</pubid>
               <pubid idtype="doi">10.1186/gb-2008-9-6-r97</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>20</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>27</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>12</day>
               <month>6</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>12</day>
               <month>06</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Murray et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Non-canonical intronic motifs</p>
      </shorttitle>
      <shortabs>
         <p>The enrichment of specific intronic splicing enhancers upstream of weak PY tracts suggests a novel mechanism for intron recognition that compensates for a weakened canonical pre-mRNA splicing motif.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>While the current model of pre-mRNA splicing is based on the recognition of four canonical intronic motifs (5' splice site, branchpoint sequence, polypyrimidine (PY) tract and 3' splice site), it is becoming increasingly clear that splicing is regulated by both canonical and non-canonical splicing signals located in the RNA sequence of introns and exons that act to recruit the spliceosome and associated splicing factors. The diversity of human intronic sequences suggests the existence of novel recognition pathways for non-canonical introns. This study addresses the recognition and splicing of human introns that lack a canonical PY tract. The PY tract is a uridine-rich region at the 3' end of introns that acts as a binding site for U2AF65, a key factor in splicing machinery recruitment.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Human introns were classified computationally into low- and high-scoring PY tracts by scoring the likely U2AF65 binding site strength. Biochemical studies confirmed that low-scoring PY tracts are weak U2AF65 binding sites while high-scoring PY tracts are strong U2AF65 binding sites. A large population of human introns contains weak PY tracts. Computational analysis revealed many families of motifs, including C-rich and G-rich motifs, that are enriched upstream of weak PY tracts. <it>In vivo </it>splicing studies show that C-rich and G-rich motifs function as intronic splicing enhancers in a combinatorial manner to compensate for weak PY tracts.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The enrichment of specific intronic splicing enhancers upstream of weak PY tracts suggests that a novel mechanism for intron recognition exists, which compensates for a weakened canonical pre-mRNA splicing motif.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Pre-mRNA splicing is an essential processing step where non-coding intervening sequences (introns) are removed from the initial RNA transcript and coding sequences (exons) are ligated together to produce mature mRNA. Pre-mRNA splicing is mediated by the spliceosome, a multi-component complex composed of small nuclear ribonucleoproteins (snRNPs) and over 100 accessory proteins <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The splicing machinery assembles on the pre-mRNA in a highly regulated fashion to carry out the process of removing the intron and ligating the two adjoining exons <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. Pre-mRNA splicing relies on the accurate recognition of the splice junctions that define introns and exons. This is underlined by the observation that incorrect pre-mRNA splicing is a major contributor to human genetic diseases <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Not only is splicing a crucial step in the accurate transfer of genetic information from DNA to RNA to protein, it is also a step that allows for regulation of gene expression as well as increased protein diversity through alternative splicing decisions <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>Several canonical intronic sequences define an intron and recruit the spliceosome to the pre-mRNA: the 5' splice site (5'ss, AG/GURAGU), the branchpoint sequence (CURAY), the polypyrimidine (PY) tract (a run of polypyrimidines located between the 3' splice site and the branchpoint), and the 3' splice site (3'ss, YAG). These four canonical intronic sequences are recognized by specific components of the spliceosome or associated splicing factors. In the initial stage of splicing, when the decision to remove an intron is made, the U1 snRNP recognizes the 5'ss <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, splicing factor 1 (SF1, also known as BBP) recognizes the branchpoint sequence <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, and U2AF65 (U2AF (U2 snRNP auxillary factor), 65 kDa subunit) recognizes the PY tract <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp> while its heterodimer partner U2AF35 (U2AF 35 kDa subunit) recognizes the 3'ss <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. After these initial recognition events, U2AF65 interacts with the U2 snRNP in order to recruit it to the branchpoint sequence, where it displaces SF1 <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>.</p>
         <p>Although canonical splice elements are located within the intron, the exon is generally considered to be the unit that is first recognized and defined by the spliceosome. This is known as exon definition and is thought to be a dominant mode of recognition in human genes where the exons are small and the introns are large <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. In the exon definition model, the exon and flanking upstream and downstream splice junctions are recognized and bridging interactions across the exon are important for accurate splicing. Conversely, according to the intron definition model, the splice junctions within the intron are recognized and bridging interactions across the intron mediate accurate splicing <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. Intron definition is proposed to be the dominant mode of recognition for small introns <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
         <p>It has become clear that the four canonical splice elements do not contain adequate sequence information to ensure accurate splicing <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Additional <it>cis</it>-elements appear to be essential for accurate identification of many splice sites, and various <it>cis</it>-splicing elements have been identified in both exonic and intronic regions. Based upon their locations and effects upon splicing, these have been categorized as exonic and intronic splicing enhancers (ESEs and ISEs, respectively) or exonic and intronic splicing silencers (ESSs and ISSs, respectively) (for reviews see <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>).</p>
         <p>We are interested in the question of how introns that lack a canonical splice element are recognized and spliced. We have focused on introns that lack a canonical PY tract. In humans, U2AF65 binding to the PY tract is believed to be critical for intron recognition and splicing. <it>In vitro </it>selection studies have determined that U2AF65 binds with highest affinity to continuous runs of uridines interrupted by cytidines <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. This agrees with the general observation that good PY tracts contain runs of uridines. We have observed that many human introns lack these canonical PY tracts. This leads to the question of how introns lacking strong U2AF65 binding sites are recognized and are able to recruit the U2 snRNP.</p>
         <p>One model predicts that U2AF65 is not essential for the splicing of these introns. Several human introns have been shown to be spliced when U2AF65 levels are significantly reduced by RNA interference <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. U2AF65 may not be required because another splicing factor is functioning to recognize the PY tract region. For example, PUF60 has been shown to substitute for U2AF65 <it>in vitro </it>for some substrates <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. There is the potential that other, yet unidentified, U2AF65-like proteins may function to promote 3'ss selection of non-canonical PY tracts. In a second model, U2AF65 is required for splicing but strong U2AF65-PY tract interactions are not. It has recently been observed in fission yeast that introns lacking PY tracts require U2AF for splicing <it>in vivo </it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Alternative pathways for U2AF65 recruitment may function in introns lacking strong PY tracts. For example, additional <it>cis</it>-elements present in the intron could alleviate the need for strong U2AF65-RNA interactions. These <it>cis</it>-elements could include the branchpoint sequence and 3'ss, which recruit SF1 and U2AF35, respectively, both of which can bind U2AF65 cooperatively through protein-protein interactions <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. Auxiliary <it>cis</it>-elements such as ESEs and ISEs could function in the recognition of introns containing weak PY tracts. Previous studies have indicated that ESEs located in the downstream exon are able to compensate for weak PY tracts <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. In this model, the ESEs are recognized by SR (serine/arginine-rich) proteins that interact with the U2AF65/35 heterodimer to help recruit U2AF65 to the 3' end of the intron <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. We propose that a similar mechanism exists where ISEs in the region upstream of the PY tract function to compensate for weak U2AF65 binding by helping to recruit either U2AF65 or U2AF65-recruiting proteins or bypassing the need for U2AF65 in recruiting the U2 snRNP to the intron.</p>
         <p>We have used a computational approach to classify human introns in terms of their U2AF65 binding site strength. We conclude that a significant population of human introns does not contain a strong U2AF65 binding site in the PY tract region. This classification of human PY tract strength enabled us to computationally identify intronic motifs over-represented upstream of weak PY tracts. We propose that these over-represented motifs are putative ISEs that are important for the splicing of introns containing weak PY tracts.</p>
         <p>LCAT (lecithin cholesterol acyltransferase) intron 4 is a short (83 nucleotide) constitutively spliced intron with a weak PY tract. Mutation of the branchpoint sequence U to C (C<ul>U</ul>GAC), is known to result in intron retention, causing familial LCAT deficiency (complete deficiency) or fish-eye disease (partial deficiency), which can lead to premature atherosclerosis <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Intron retention, rather than skipping, suggests an intron definition model of recognition <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Therefore, we expected that ISEs might be involved in the recognition of this intron. We present results showing that G-rich and C-rich motifs, similar to those predicted by our computational approach to be enriched upstream of weak PY tracts, are ISEs important for the splicing of LCAT intron 4, which has a weak PY tract. Furthermore, we have observed that the G-rich and C-rich ISEs function in a combinatorial manner to promote the recognition of a weak PY tract-containing intron. Finally, we show another example of an intron, GNPTG (N-acetylglucosamine-1-phosphotransferase gamma subunit) intron 2, in which C-rich ISEs again appear to be compensating for a weak PY tract.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Computational analysis of human intron PY tracts using a U2AF65 binding site scoring method</p>
            </st>
            <p>U2AF65 plays an important role during splicing and is known to bind to the PY tract region located between the branchpoint sequence and the acceptor splice junction <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Visual inspection of human introns reveals that, although the PY tract region is enriched in uridines in general, there is a great deal of sequence variation between introns. This degeneracy, at least in part, appears to reflect the low RNA site specificity that U2A65 displays compared to other RNA binding proteins that evolved to recognize highly specific targets. U2AF65 binds with high affinity to contiguous runs of uridines but appears to tolerate moderate interruptions of other nucleotides <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. Despite the ability of U2AF65 to bind to degenerate sites, an effective binding site must still be composed primarily of uridines <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. However, many thousands of human introns contain PY tracts that do not contain any sequences that are likely to be effective binding sites (shown below). Many of these PY tracts either contain contiguous runs of cytidines or contain numerous purines, neither of which are likely to represent binding sites for U2AF65 <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. Therefore, it is likely that individual human intronic PY tracts possess a wide range of affinities towards U2AF65, and that many may possess only weak binding sites for it. It is possible that additional <it>cis</it>-sequence elements augment the role of the PY tract during splicing, and that such elements play crucial roles in splicing in the absence of a strong U2AF65 binding site.</p>
            <p>Many human introns have been shown to be enriched in motifs containing GGG in the region upstream of the PY tract <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp> (Figure <figr fid="F1">1a</figr>). This observation demonstrates that this region is under compositional selection. G-triples located upstream of a weak PY tract have been shown to affect splice site usage <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. We hypothesized other <it>cis</it>-elements may also be located upstream of the PY tract and may compensate for PY tracts containing weak U2AF65 binding sites. To explore this possibility we performed a computational analysis to determine if the region upstream of the PY tract is enriched in specific motifs when the PY tract does not contain a strong U2AF65 binding site.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Computational analysis of human intron PY tracts</p>
               </caption>
               <text>
                  <p>Computational analysis of human intron PY tracts. <b>(a) </b>Distribution of intronic motifs (branchpoint (BPS), G-triples (GGG) and U2AF65 binding sites (U2AF65)) adjacent to the 3' end of human introns. The BPS curve is a composite of the distribution of all pentamers containing YTRAC (Y = T or C, R = A or G). The G-triple curve is the composite for all pentamers containing GGG. The U2AF65 curve is a composite of the occurrence of the ten most abundant pentamers found in the U2AF65 SELEX sequences <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B39">39</abbr></abbrgrp> (Additional data file 1). The distributions were determined over all human introns, and for each curve the total area under the curve was normalized to unity. The two regions used in this study are depicted below the curves. The PY tract region consisted of the region from -30 to -3, and the upstream PY (UPY) tract region was defined to be from -80 to -30 (relative to the acceptor splice-junction (SJ)). <b>(b) </b>Distribution of U2AF65 binding site scores (S<sub>65 </sub>scores) for all human introns (filled blue) and for the U2AF65 SELEX sequences used as the training set for the binding site score (vertical solid black lines). The distributions were generated using a bin size of 0.02, and the total area under the curves was normalized to unity. The median (used as the cutoff for 'weak' and 'strong' binding sites) is depicted as a vertical dashed line.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-1"/>
            </fig>
            <p>In order to carry out this analysis, we first needed to correlate the composition of the PY tract of introns with likely affinities towards U2AF65. Several theoretical models have been presented that describe the relationship between binding site composition and the &#916;G of binding between nucleic acids and nucleic acid binding proteins <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. These models require the use of a positional frequency model representing the preferred binding site. <it>In vitro </it>selection (SELEX) experiments using human U2AF65 did not reveal a well defined consensus motif shared by high affinity RNAs <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B39">39</abbr></abbrgrp>. Several computational methods have been developed to define a degenerate consensus motif from a population of sequences that are thought to contain a common, but unknown, motif <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp>. Though such methods have proven useful, each has its own weaknesses, and all such predictive methods introduce an added level of uncertainty. We decided to develop a computational method to predict the affinity between a short RNA sequence and U2AF65 that is independent of knowledge of a particular consensus binding motif. We refer to this score as an S<sub>65 </sub>score. The S<sub>65</sub>score, for a given intron, is the average degree to which all pentamers (using a sliding window) found in the PY tract region (-30 to -3 relative to the acceptor splice-junction) are themselves enriched within the SELEX derived sequences (see Materials and methods for a complete description).</p>
            <p>For this analysis, the PY tract was defined as the region from -30 to -3 (relative to the acceptor splice junction). This region is highly enriched in the pentamers that are most abundant within the U2AF65 selected sequences (Figure <figr fid="F1">1a</figr> and data not shown). Although a small number of introns are thought to possess functional U2AF65 binding sites upstream of this region <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, the general enrichment for uridines in this region (Figure <figr fid="F1">1a</figr>) is consistent with the premise that the bulk of U2AF65 functional binding sites are located adjacent to the acceptor splice-junction.</p>
            <p>The S<sub>65 </sub>scores for the SELEX RNAs appear to be normally distributed with a mean of 1.5 (Figure <figr fid="F1">1b</figr>). In contrast, the S<sub>65 </sub>scores for human PY tracts display a slightly skewed distribution with a mean of 0.877 and a median of 0.811. These are shifted significantly to the left (that is, weaker) relative to the scores for the U2AF65 selected RNAs, suggesting that a large portion of human PY tracts represent weaker than optimal U2AF65 binding sites.</p>
            <p>We chose to classify PY tracts that score below the median of 0.811 as 'weak' PY tracts and those above 0.811 as 'strong' PY tracts or likely to have high affinity U2AF65 binding sites. Using this designation, only a single SELEX-derived sequence scores as 'weak'. We are therefore asking whether there are statistically significant differences in the composition of the -80 to -30 region of two types of introns: ones that contain a PY tract with affinities similar to those derived using SELEX, and those with PY tracts with lower affinities.</p>
         </sec>
         <sec>
            <st>
               <p>Binding of U2AF65 to low-scoring PY tracts</p>
            </st>
            <p>In order to asses the relationship between the S<sub>65 </sub>score and observed U2AF65 binding affinities, we evaluated the binding of recombinant human U2AF65 to several human PY tracts of varying S<sub>65 </sub>scores using gel-shift mobility assays (Figure <figr fid="F2">2</figr>). We chose one PY tract that had a very low score (MBNL1 intron 6, S<sub>65 </sub>= 0.0750). This PY tract is interrupted by several purines that are expected to impair U2AF65 binding. We also evaluated three other low-scoring PY tracts with scores closer to the median, and, therefore, correspond to the more 'typical' human PY tract: BRUNOL4 intron 9 (S<sub>65 </sub>= 0.3602), ITGB4 intron 31 (S<sub>65 </sub>= 0.3608), and LCAT intron 4 (S<sub>65 </sub>= 0.5068). All three of these are cytidine-enriched. In addition, we tested three high-scoring PY tracts that had scores spanning the higher range of the distribution: INSR intron 10 (S<sub>65 </sub>= 0.9593), U2AF2 intron 6, (S<sub>65 </sub>= 1.1787), and SR140 intron 9 (S<sub>65 </sub>= 1.8434), and an altered version of the LCAT intron 4 in which the central region was modified to contain an eight nucleotide poly-uridine run (LCATmut with a S<sub>65 </sub>of 1.2060). All four of these high-scoring sequences are uridine-enriched. Binding data were also obtained using two sequences derived from the PY tract of the adenovirus major late (ADML) pre-mRNA, similar to previously studied ADML PY tracts <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B49">49</abbr></abbrgrp>.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Binding of U2AF65 to human PY tracts validates the U2AF65 SELEX scoring system</p>
               </caption>
               <text>
                  <p>Binding of U2AF65 to human PY tracts validates the U2AF65 SELEX scoring system. <b>(a) </b>Gel shift of human U2AF65 with human PY tract RNA oligonucleotides. <b>(b) </b>RNA sequences used for binding studies. The gene and intron (IVS) of origin are indicated. The K<sub>d </sub>values are the average of triplicate experiments. K<sub>d </sub>values marked with an asterisk are estimated since the levels of protein required to reach saturation exceed the capacity of the experiment. <b>(c) </b>Linear regression of the observed U2AF65 affinities versus the predicted S<sub>65 </sub>score.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-2"/>
            </fig>
            <p>We expected the MBNL1 intron 6 PY tract to represent the weakest U2AF65 binding target and observed no detectable levels of U2AF65 binding at the protein concentrations tested (Figure <figr fid="F2">2</figr>). Meanwhile, all three of the cytidine-rich sequences with moderate S<sub>65 </sub>scores demonstrated moderate affinities in the binding assay. In contrast, three of the uridine-rich sequences (with high S<sub>65 </sub>scores) bound with high affinity. An interesting exception was the INSR-derived sequence, which bound U2AF65 more weakly than the more cytidine-rich LCAT-derived sequence. Importantly, for both LCAT and ADML, the binding of the mutant versions correlates well with the predicted affinities based upon the S<sub>65 </sub>score.</p>
            <p>Overall, there is a good agreement between the observed binding affinities for U2AF65 and the predicted affinities based upon the S<sub>65 </sub>score. Plotting the observed K<sub>d </sub>values versus the predicted S<sub>65 </sub>score revealed that the ln of the K<sub>d </sub>appears to be linearly related to the S<sub>65 </sub>score (Figure <figr fid="F2">2c</figr>). Since &#916;G is related to K<sub>d </sub>according to the equation &#916;<it>G</it>&#176; = -<it>RT</it>ln(<it>K</it><sub><it>d</it></sub>), this is consistent with the supposition that S<sub>65 </sub>is linearly related to &#916;G. Linear regression of the observed affinities and S<sub>65 </sub>scores demonstrates that these values are strongly correlated (R<sup>2 </sup>= 0.77; Figure <figr fid="F2">2c</figr>). Some of the observed deviations may be due to influences of RNA secondary structures present in some of the templates. Such secondary structure could greatly influence U2AF65 interactions, but this parameter is not addressed in the S<sub>65 </sub>score. Since U2AF65 is known to have a strong preference for uridines, it is possible that the observed binding affinities simply reflect overall uridine content. However, linear regression analysis of the uridine content versus binding affinities demonstrates that these values are not well correlated (R<sup>2 </sup>= 0.27, data not shown). Therefore, the S<sub>65 </sub>score is a better predictor of binding affinity than uridine content alone and suggests that U2AF65 is recognizing sequence features more complex than the simple presence or absence of contiguous runs of uridines.</p>
         </sec>
         <sec>
            <st>
               <p>Introns containing weak PY tracts are enriched in specific motifs upstream of the PY tract</p>
            </st>
            <p>It is possible that introns containing weak U2AF65 binding sites might be enriched in specific sequences that can compensate for the lack of a well-defined PY tract. In order to identify such motifs, we first characterized the relative enrichment of all 4-7 nucleotide n-mers in the 50 nucleotide region from -80 to -30 (relative to the splice-junction) for introns with PY tracts categorized as 'weak' relative to the set of all introns (S<sub>65 </sub>scores less than 0.811; see Materials and methods). We were specifically interested in identifying sequences located in the region upstream of the branchpoint itself. Since most branchpoints are located between -17 and -30 (Figure <figr fid="F1">1a</figr>), the region evaluated would exclude the majority of branchpoint-like sequences.</p>
            <p>Human introns have been shown to fall into two classes based upon GC or AT content <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. In order to be sure that we were not merely measuring compositional biases between AT-rich and GC-rich introns, we classified introns according to the GC content of the last 100 bases. Introns with greater than 50% GC content were categorized as GC-rich while those with less than 50% GC were categorized as AT-rich. As measured using our criteria, 37% of AT-rich introns were found to have 'weak' PY tracts, and 72% of GC-rich introns were determined to have 'weak' PY tracts.</p>
            <p>Enrichment of n-mers in the -80 to -30 region for introns with weak PY tracts versus all GC or AT-rich introns was determined (see Materials and methods). The entire list of enriched n-mers used in this study is available in Additional data files 2 and 3. According to this analysis, 99 n-mers were determined to be significantly enriched (<it>P </it>&lt; 0.01) in the AT-rich class, and 349 n-mers were determined to be significantly enriched in the GC-rich class. For comparison, we drew random samples of the same size as the corresponding weak PY tract class for both the AT-rich and GC-rich introns, and determined enrichment using the same method as above. The average number of n-mers (for to seven nucleotides) that were determined to be significantly enriched in the randomly drawn samples was ten for the AT-rich and zero for the GC-rich class. Therefore, the enrichment measured appears to be strongly correlated with the composition of the PY tract as measured by the S<sub>65 </sub>score.</p>
            <p>It has been proposed that signals that govern splicing of shorter (&lt;200 nucleotides) introns may differ from those governing splicing of longer introns <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Therefore, we also evaluated short (&lt;200 nucleotides) and long (&#8805; 200 nucleotides) AT-rich and GC-rich introns as independent classes. We found that enrichment was similar for both short and long GC-rich introns as evidenced by the observation that the enrichment score for n-mers correlated between these groups (Additional data file 6a). Meanwhile, little correlation was seen between the enrichment scores for long versus short AT-rich introns (Additional data file 6b). This is likely due to the fact that few n-mers were actually determined to be significantly enriched in the short AT-rich population (Additional data file 6b, and data not shown). Together, these data suggest that the compositional biases seen in the region upstream of the PY tract correlate with the potential for U2AF65 binding, especially for GC-rich introns, and that the bias is similar for both long and short introns.</p>
            <p>To determine motifs, the enriched n-mers were clustered using the graph clustering method and software presented by Voelker and Berglund <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. Clustering of the n-mers derived from the GC-rich introns yielded 25 clusters (Additional data file 4). These were manually separated into eight groups of compositionally similar motifs (Figure <figr fid="F3">3a</figr>). The n-mers derived from the AT-rich introns yielded eight clusters, of which the three most significant are shown in Figure <figr fid="F3">3b</figr>.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Introns containing weak PY tracts are enriched in specific motifs upstream of the PY tract</p>
               </caption>
               <text>
                  <p>Introns containing weak PY tracts are enriched in specific motifs upstream of the PY tract. Shown are representative motifs derived from n-mers enriched in the region upstream of weak PY tracts (see Materials and methods for details of motif construction). The complete list of motifs is available in Additional data files 4 and 5. The average Z-score for enrichment of all of the n-mers that compose the motif is shown to the right. <b>(a) </b>Motifs over-represented upstream of weak PY tracts for GC-rich human introns. <b>(b) </b>Motifs over-represented upstream of weak PY tracts for AT-rich human introns.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-3"/>
            </fig>
            <p>Motifs containing three to four contiguous guanidines are greatly enriched upstream of weak PY tracts for both AT-rich and GC-rich introns (Figure <figr fid="F3">3</figr>, motifs GC2-GC8 and AT1-AT2). Similar G-rich motifs have been previously shown to be enriched in this region <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>. G-rich intronic tracts have been shown to play important roles as splicing signals <abbrgrp><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>, and several heterogeneous nuclear ribonucleoproteins (hnRNPs), including hnRNPs A1, A2, F, and H, have been shown to bind G-rich RNA motifs <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B57">57</abbr><abbr bid="B58">58</abbr><abbr bid="B59">59</abbr></abbrgrp>. The majority of the G-rich motifs appear to contain a common substring of three to four contiguous Gs separated by one to two nucleotides, and the preferred di-nucleotide spacers appear to be CT, CC, and CA.</p>
            <p>In addition, we observed that C-rich motifs (containing three to four contiguous cytidines) are enriched upstream of weak GC-rich PY tracts (Figure <figr fid="F3">3</figr>, motif GC1). Using different computational methods, similar C-rich motifs have been predicted to be ISEs <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. Our analysis provides additional evidence suggesting that C-rich motifs, located upstream of the PY tract, may play important roles in splicing.</p>
            <p>We also observed that AT-rich introns with weak PY tracts were enriched in motifs similar to a motif recognized by the protein CUG-BP1 (Figure <figr fid="F3">3</figr>, motif AT3) <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. It is interesting that these motifs did not appear in the GC-rich class. This may be due to compositional biases in the GC-rich class that preclude their identification using the computational methods that we employed, or it may imply that these motifs are, in fact, more abundantly represented in the AT-rich class.</p>
            <p>These analyses demonstrate that certain motifs are statistically over-represented upstream of human introns containing weak PY tracts. We also wanted to assess how prevalent these motifs are among introns in general, and also determine the relative level of enrichment between introns with strong versus weak U2AF65 binding sites. Therefore, for each intron, we determined the percentage of the region from -80 to -30 that matched one or more of the n-mers determined to be enriched in introns with weak PY tracts relative to those with strong PY tracts (see above). We refer to this value as the percent coverage. As an example, 80% coverage indicates that 80% of the -80 to -30 region (or 40 of the 50 nucleotides) matches one or more of the enriched n-mers. This analysis (Additional data file 7) revealed that most introns have at least one match to an enriched n-mer. This is not surprising considering that the n-mers are only four to seven nucleotides in length, and, therefore, are expected to occur by chance with fairly high frequency. However, this analysis also revealed that introns with weak PY tracts are likely to have a greater coverage than introns with strong PY tracts. This is especially true of the GC-rich class of introns. For instance, while only 10% of GC-rich introns with strong PY tracts have 80-100% coverage, 23% of introns with weak PY tracts have this level of coverage (Additional data file 7). A smaller difference in coverage is seen between AT-rich introns with strong and weak PY tracts; however, the overall trend is the same (Additional data file 7). In both cases, the enriched n-mers tend to make up a greater portion of the -80 to -30 region for introns with weak PY tracts. Together, these observations indicate that the sequences represented by the enriched n-mers are rather common but they tend to cluster in introns with weak PY tracts.</p>
         </sec>
         <sec>
            <st>
               <p>C-rich and G-rich motifs act as ISEs in an intron containing a weak polypyrimidine tract</p>
            </st>
            <p>LCAT intron 4 contains both C-rich and G-rich motifs upstream of the PY tract similar to those we identified computationally that are also highly conserved. The PY tract of LCAT intron 4 is a low-scoring PY tract and is not well conserved. To investigate the role of C-rich and G-rich motifs present in LCAT intron 4, we used a mini-gene system. We created a mini-gene that contains the last 50 nucleotides of LCAT intron 3, LCAT exon 4, LCAT intron 4, LCAT exon 5 and the first 50 nucleotides of LCAT intron 5. We included the downstream and upstream flanking introns in order to allow exon definition to occur, although short introns are often observed to function by intron definition <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Mutation of the G-rich motifs</p>
            </st>
            <p>We examined the role of two G-rich motifs (G-rich motif (GRM)1 and GRM2) present upstream of the PY tract of LCAT intron 4 (Figure <figr fid="F4">4a</figr>). The wild-type (WT) LCAT intron 4 mini-gene splices such that 5 &#177; 1% pre-mRNA is observed (Figure <figr fid="F4">4b</figr>, lane 1, and 4c). Mutation of GRM1 to AAA (MUT 3, Figure <figr fid="F4">4a</figr>) had a strong effect, and increased the unspliced product to 19 &#177; 5% (Figure <figr fid="F4">4b</figr>, lane 2, and 4c). Mutation of GRM2 to AAA (MUT 4, Figure <figr fid="F4">4a</figr>) had slightly less of an effect than MUT 3, resulting in 14 &#177; 3% pre-mRNA (Figure <figr fid="F4">4b</figr>, lane 3, and 4c). Mutation of both GRM1 and GRM2 (MUT 7, Figure <figr fid="F4">4a</figr>) had a similar effect as mutation of GRM1 alone (Figure <figr fid="F4">4b</figr>, lane 4, and 4c), suggesting that the two GRMs do not function additively towards recognition of LCAT intron 4. We also mutated a region that was neither a G-rich motif nor C-rich motif (MUT 5, Figure <figr fid="F4">4a</figr>) to be sure that the AAA motif we were inserting was not acting as an ISS. MUT 5 spliced similarly to WT (Figure <figr fid="F4">4b</figr>, compare lanes 1 and 5; Figure <figr fid="F4">4c</figr>), suggesting that the presence of the mutant AAA sequence in that region of LCAT intron 4 does not act as an ISS. These results suggest that GRM1 and GRM2 are ISEs important for the splicing of LCAT intron 4.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>G-rich motifs function as ISEs in LCAT intron 4 splicing</p>
               </caption>
               <text>
                  <p>G-rich motifs function as ISEs in LCAT intron 4 splicing. <b>(a) </b>LCAT intron 4 with the mutations shown in blue above the WT sequence. BPS, branchpoint. <b>(b) </b>Splicing of the LCAT intron 4 mini-genes (WT, MUT3, MUT4, MUT7 and MUT 5) in HeLa cells. Splicing products (isolated from HeLa, reverse-transcribed and amplified with radioactive PCR) were resolved on an 8% non-denaturing gel and scanned using a phosphorimager. The pre-mRNA (top) is a 472 bp product and the mRNA (bottom) is a 389 bp product. The average quantification and standard deviation of the percent pre-mRNA (pre-mRNA divided by total RNA) for at least triplicate reactions is reported below each lane. <b>(c) </b>Graphical representation of the percent pre-mRNA for each LCAT mini-gene. Error bars represent standard deviation of replicate experiments.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Mutation of the C-rich motifs</p>
            </st>
            <p>To determine whether the C-rich motifs function as ISEs, we mutated two C-rich motifs: C-rich motif (CRM)1 and CRM2 (Figure <figr fid="F5">5a</figr>), which are present upstream of the PY tract in LCAT intron 4. Mutation of CRM1 to AAA (MUT 1, Figure <figr fid="F5">5a</figr>) did not have a significant effect on splicing (Figure <figr fid="F5">5b</figr>, lane 2, and 5c). We also created a CRM1 mutant where we mutated CCC to AUA (MUT 1b, Figure <figr fid="F5">5a</figr>) and observed the same level of splicing as the AAA mutant (Figure <figr fid="F5">5b</figr>, compare lanes 2 and 3; Figure <figr fid="F5">5c</figr>). Similarly, mutation of CRM2 to AAA (MUT 2, Figure <figr fid="F5">5a</figr>) did not have a significant effect on splicing (Figure <figr fid="F5">5b</figr>, lane 4, and 5c). However, mutation of both CRM1 and CRM2 (MUT 6, Figure <figr fid="F5">5a</figr>) resulted in a decrease in splicing to 19 &#177; 3% pre-mRNA (Figure <figr fid="F5">5b</figr>, lane 5). These results suggest that while CRM1 and CRM2 do not individually contribute significantly to the splicing of LCAT intron 4, mutation of multiple C-rich motifs has a combinatorial effect.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>C-rich motifs function as ISEs in LCAT intron 4 splicing</p>
               </caption>
               <text>
                  <p>C-rich motifs function as ISEs in LCAT intron 4 splicing. <b>(a) </b>LCAT intron 4 with the mutations shown in blue above the WT sequence. BPS, branchpoint.<b> (b) </b>Splicing of the LCAT intron 4 mini-genes (WT, MUT1, MUT1b, MUT2, MUT 6 and MUT 5) in HeLa cells. Analysis was performed as in Figure 4. <b>(c) </b>Graphical representation of the percent pre-mRNA for each LCAT mini-gene. Error bars represent standard deviation of replicate experiments.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Cumulative mutation of the G-rich and C-rich motifs</p>
            </st>
            <p>We hypothesized that the G-rich motifs and C-rich motifs could be functioning together in the recognition of LCAT intron 4. We have observed that there are many examples of introns where the G-rich and C-rich motifs are both present (data not shown). Mutation of both GRM1 and CRM1 (MUT 24, Figure <figr fid="F6">6a</figr>) resulted in a greater decrease in splicing (shown as an increase in percent pre-mRNA) than mutation of either motif alone (Figure <figr fid="F6">6b</figr>, compare MUT 24, lane 5, to MUT 1, lane 2, or MUT 3, lane 3; Figure <figr fid="F6">6c</figr>). An even greater decrease in splicing was observed for the combined mutation of GRM1, CRM1 and CRM2 (MUT 25, Figure <figr fid="F6">6b</figr>, compare MUT 25, lane 6, to MUT 3, lane 3 or MUT 6, lane 4; Figure <figr fid="F6">6c</figr>). These results suggest that the G-rich motifs and C-rich motifs function in combination to promote the splicing of LCAT intron 4.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>G-rich and C-rich motifs function combinatorially in LCAT intron 4 splicing</p>
               </caption>
               <text>
                  <p>G-rich and C-rich motifs function combinatorially in LCAT intron 4 splicing. <b>(a) </b>LCAT intron 4 with the mutations shown in blue above the WT sequence. BPS, branchpoint. <b>(b) </b>Splicing of the LCAT intron 4 mini-genes (WT, MUT1, MUT3, MUT6, MUT 24 and MUT 25) in HeLa cells. Analysis was performed as in Figure 4. <b>(c) </b>Graphical representation of the percent pre-mRNA for each LCAT mini-gene. Error bars represent standard deviation of replicate experiments.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>G-rich and C-rich motifs can functionally replace one another as ISEs</p>
            </st>
            <p>We examined whether the C-rich motifs could function in the place of the G-rich motifs. Mutation of GRM1 to CCC (MUT 27, Figure <figr fid="F7">7a</figr>) resulted in a smaller decrease in splicing compared to that observed for mutation of GRM1 to AAA (Figure <figr fid="F7">7b</figr>, compare MUT 27, lane 5, to MUT 3, lane 2; Figure <figr fid="F7">7c</figr>). Mutation of GRM1 and GRM2 to C-rich motifs (MUT 28, Figure <figr fid="F7">7a</figr>) also resulted in a smaller decrease in splicing compared to mutating GRM1 and GRM2 to AAA (Figure <figr fid="F7">7b</figr>, compare MUT 28, lane 6, to MUT 7, lane 3). We observed that both the single and double GRM to CRM mutations resulted in similar effects on splicing (Figure <figr fid="F7">7b</figr>, compare MUT 27, lane 5, to MUT 28, lane 6). These results suggest that a C-rich motif can partially compensate for a G-rich motif in this location. Furthermore, it appears that a C-rich motif followed by a G-rich motif (MUT 27) functions as effectively as two C-rich motifs (MUT 28). Mutation of CRM1 and CRM2 to G-rich motifs (MUT 29, Figure <figr fid="F7">7a</figr>) resulted in splicing similar to WT (Figure <figr fid="F7">7b</figr>, compare MUT 29, lane 7, to WT, lane 1; Figure <figr fid="F7">7c</figr>). We conclude that G-rich motifs can fully compensate for, and function in the place of, C-rich motifs, while C-rich motifs can only partially compensate for G-rich motifs.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>G-rich and C-rich motifs can functionally replace one another as ISEs</p>
               </caption>
               <text>
                  <p>G-rich and C-rich motifs can functionally replace one another as ISEs. <b>(a) </b>LCAT intron 4 with the mutations shown in blue above the WT sequence. BPS, branchpoint.<b> (b) </b>Splicing of the LCAT intron 4 mini-genes (WT, MUT3, MUT7, MUT6, MUT 27, MUT 28 and MUT 29) in HeLa cells. Analysis was performed as in Figure 4. <b>(c) </b>Graphical representation of the percent pre-mRNA for each LCAT mini-gene. Error bars represent standard deviation of replicate experiments.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-7"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Strengthening the PY tract eliminates the role of the C-rich motifs</p>
            </st>
            <p>We next investigated the role of the PY tract in LCAT intron 4 splicing. We mutated the PY tract to determine whether the C-rich sequences in the PY tract were also being recognized. Mutation of a C-rich sequence in the PY tract (CRM3, MUT 16B, Figure <figr fid="F8">8a</figr>) resulted in a minor decrease in splicing (MUT 16B, Figure <figr fid="F8">8b</figr>, lane 9, and 8c), indicating that CRM3 is not singly making a major contribution to the recognition of LCAT intron 4. However, the minor decrease in splicing does suggest that the PY tract may be playing a role. Strengthening the PY tract by mutating the sequence to include a run of eight uridines (MUT 17, Figure <figr fid="F8">8a</figr>) resulted in similar splicing to WT (Figure <figr fid="F8">8b</figr>, compare WT, lane 1, to MUT 17, lane 5). However, in the context of this strengthened PY tract, mutation of CRM1 and CRM2 (MUT 20, Figure <figr fid="F8">8a</figr>) did not result in decreased splicing (Figure <figr fid="F8">8b</figr>, compare MUT 20, lane 6, to MUT 6, lane 2; Figure <figr fid="F8">8c</figr>). Furthermore, the cumulative mutation of GRM1 and CRM1 (MUT 48, Figure <figr fid="F8">8a</figr>) or GRM1, CRM1 and CRM2 (MUT 49, Figure <figr fid="F8">8a</figr>) did not affect splicing in the presence of the strengthened PY tract (Figure <figr fid="F8">8b</figr>, compare MUT 48 to MUT 24 and MUT 49 to MUT 25). This result suggests that, in the context of a strengthened PY tract, the C-rich motifs and G-rich motifs are no longer necessary for recognition, while in the WT context the C-rich motifs and G-rich motifs function as ISEs to compensate for the weak LCAT intron 4 PY tract.</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Strengthening the PY tract eliminates the role of the C-rich motifs</p>
               </caption>
               <text>
                  <p>Strengthening the PY tract eliminates the role of the C-rich motifs. <b>(a) </b>LCAT intron 4 with the mutations shown in blue above the WT sequence. BPS, branchpoint.<b> (b) </b>Splicing of the LCAT intron 4 mini-genes (WT, MUT6, MUT24, MUT24, MUT17, MUT20, MUT48, MUT49 and MUT 16B) in HeLa cells. Analysis was performed as in Figure 4. <b>(c) </b>Graphical representation of the percent pre-mRNA for each LCAT mini-gene. Error bars represent standard deviation of replicate experiments.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-8"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>C-rich motifs are ISEs in an additional intron containing a weak PY tract</p>
            </st>
            <p>GNPTG intron 2 is an alternatively spliced (intron retention) short intron containing multiple C-rich motifs upstream of a low scoring PY tract (Figure <figr fid="F9">9a</figr>, S<sub>65 </sub>score = 0.536). In order to test the function of the three C-rich motifs, we created a mini-gene containing exon 2, intron 2 and exon 3. The WT GNPTG intron 2 mini-gene splices such that 29 &#177; 6% pre-mRNA is observed (Figure <figr fid="F9">9b,c</figr>). Mutation of the three C-rich motifs upstream of the PY tract (Figure <figr fid="F9">9a</figr>) had a significant effect on splicing, resulting in 63 &#177; 5% pre-mRNA (Figure <figr fid="F9">9b,c</figr>). This result provides an additional example of C-rich motifs functioning as ISEs in an intron containing a weak PY tract.</p>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>C-rich motifs function as ISEs in GNPTG intron 2</p>
               </caption>
               <text>
                  <p>C-rich motifs function as ISEs in GNPTG intron 2. <b>(a) </b>GNPTG intron 2 with the mutations shown in blue above the WT sequence and the putative branchpoint sequence shown in bold.<b> (b) </b>Splicing of the GNPTG intron 2 mini-genes (WT and MUT) in HeLa cells. Splicing products (isolated from HeLa, reverse-transcribed and amplified with radioactive PCR) were resolved on a 10% non-denaturing gel and scanned using a phosphorimager. The average quantification and standard deviation of the percent pre-mRNA (pre-mRNA divided by total RNA) for triplicate reactions is reported below each lane. <b>(c) </b>Graphical representation of the percent pre-mRNA for the WT and MUT GNPTG mini-genes. Error bars represent standard deviation of replicate experiments.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-9"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The present model of pre-mRNA splicing is based on the recognition of the four canonical intronic motifs (5'ss, branchpoint sequence, PY tract and 3'ss) <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. However, many introns lack one or more of these motifs and yet they are spliced. The diversity of human intronic sequences suggests that novel recognition pathways exist for non-canonical introns. Using an experimentally validated computational approach, introns lacking a canonical PY tract were isolated and analyzed to identify putative ISEs that functionally compensate in splicing when the PY tract is weak.</p>
         <sec>
            <st>
               <p>U2AF65 binding to PY tracts confirms the U2AF65 SELEX scoring system</p>
            </st>
            <p>Our U2AF65 binding studies using various human intron PY tracts (Figure <figr fid="F2">2</figr>) confirm that the computational prediction can generally delineate strong and weak U2AF65 binding sites. Two caveats to our scoring system are: it is based solely on the U2AF65 SELEX data and, therefore, does not take into account nucleotide substitutions that are particularly deleterious for U2AF65 binding; and it cannot account for RNA secondary structure. Each of these parameters can contribute to lower than predicted binding affinities and may partially explain the deviations observed between predicted and observed binding strengths. Nevertheless, the S<sub>65 </sub>score is generally able to distinguish between sequences displaying strong and weak interactions with U2AF65, and it is more accurate than using simple uridine content alone.</p>
            <p>For this analysis we also assume that the PY tract is located in the last 30 nucleotides of the intron. While this is a fair assumption for the vast majority of human introns, there are examples of introns where the PY tract and branchpoint sequence are located a further distance from the 3'ss AG <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B62">62</abbr><abbr bid="B63">63</abbr><abbr bid="B64">64</abbr></abbrgrp>. Some of the human introns that score as having low scoring PY tracts may actually have high scoring PY tracts that are distally located. Although there are caveats to our scoring system, the S<sub>65 </sub>score generally distinguishes low and high affinity U2AF65 binding sites, allowing us to ask questions about the population of human introns with low affinity U2AF65 binding sites.</p>
         </sec>
         <sec>
            <st>
               <p>Intronic motifs enriched upstream of weak PY tracts</p>
            </st>
            <p>We have identified families of motifs that are over-represented upstream of weak PY tracts but not upstream of strong PY tracts (Figure <figr fid="F3">3</figr>). Our evidence, combined with previous observations, suggests that these motifs function as ISEs that appear to compensate for weakened U2AF65-PY tract interactions. While we chose to focus our attention on the G-rich and C-rich triplet motifs, our study identified at least one additional motif that may represent binding sites for members of the CELF family of proteins. However, additional experimental evidence will need to be obtained to verify the functional significance of the other motifs identified by our study.</p>
            <p>The experimental work presented here has focused on two relatively short introns, but our computational analysis found that the same families of motifs were over-represented in both short and long human introns (Additional data file 6). Although LCAT intron 4 is constitutively spliced, expressed sequence tag data suggest that GNPTG intron 2 is alternatively spliced, with some expressed sequence tags containing a retained intron 2. We expect to find examples where these motifs may play important roles in both constitutive and alternative splicing for both short and long introns.</p>
         </sec>
         <sec>
            <st>
               <p>Interplay of G-rich and C-rich ISEs in the splicing of LCAT intron 4</p>
            </st>
            <p>G-rich motifs have been shown to be enriched in short mammalian introns <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B65">65</abbr></abbrgrp>. The G-rich motif GRM1 is the strongest ISE we have observed in LCAT intron 4 (Figure <figr fid="F4">4</figr>). Double mutation of the two sequential G-rich motifs does not result in an additive effect on splicing. G-rich motifs have been shown to function in a combinatorial manner to promote splicing <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B56">56</abbr></abbrgrp>, although the spacing between G-rich motifs was greater (for example, 8-10 nucleotides <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>), than in LCAT intron 4, where only a single nucleotide separates the two G-rich motifs. Our studies confirm that G-rich sequences play an important role in promoting the recognition of GC-rich introns with weak PY tracts as previously observed <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
            <p>Our results also show that C-rich motifs can act as ISEs like the G-rich motifs, but that the C-rich motifs may play more of an ancillary role to the G-rich motifs, at least in the case of LCAT intron 4 (Figure <figr fid="F5">5</figr>). C-rich motifs have been shown to function as an ISE in a chicken intron near the 5'ss <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>, and as an ISS in a human intron near the 3'ss <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>. The single C-rich motif mutational studies presented here suggest that the C-rich motifs present in LCAT intron 4 have little individual effect on LCAT intron 4 splicing. However, we have observed that the C-rich motifs function additively, and that mutating both C-rich motifs (CRM1 and CRM2) is equivalent to mutating the one stronger G-rich motif (GRM1). Furthermore, we show that the mutation of multiple C-rich motifs in GNPTG intron 2 has a significant effect on splicing. This provides an additional example of C-rich motifs functioning as ISEs in an intron with a weak PY tract. C-rich motifs have not been previously shown to function as ISEs in human, nor to function in tandem to produce an additive effect on splicing.</p>
            <p>Interestingly, the C-rich and G-rich motifs together function additively (Figure <figr fid="F6">6</figr>). This suggests a model where the combinatorial recognition of two separate ISEs promotes LCAT intron 4 splicing. It is intriguing to consider that interactions between the two ISEs could exist and represent an opportunity for protein-protein interactions between the C-rich and G-rich trans-factors, which could enhance intron recognition to a greater extent than either ISE (or trans-factor) alone. Combinatorial recognition of G-rich repeats with each other has been reported <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>, as has the combinatorial function of a G-rich ISS with upstream ESSs <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Here we show that G-rich motifs can function in conjunction with C-rich ISEs to promote splicing, showing the flexibility of the G-rich motifs to function in different contexts.</p>
            <p>We have also observed that the C-rich motifs can only partially compensate in the place of G-rich motifs in LCAT intron 4 splicing, while G-rich motifs appear to fully compensate in the place of C-rich motifs (Figure <figr fid="F7">7</figr>). It may be that the C-rich motifs have a positional dependence while the G-rich motifs do not. G-rich motifs appear to be the dominant ISE in LCAT intron 4 splicing. G-rich motifs appear to be capable of alleviating the need for C-rich motifs. The observation that C-rich motifs only partially rescue splicing reinforces the model that the C-rich motifs do not have the same enhancer strength as the G-rich motifs.</p>
            <p>An examination of the primary sequence and predicted secondary structure (using mfold <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>) of LCAT intron 4 suggests that the intron could be folding into a stem-loop structure with the G-rich and C-rich sequences base-pairing (data not shown). While this is an intriguing model, when the C-rich motifs are replaced with G-rich motifs (a mutation that would abolish stem-loop formation; MUT 29, Figure <figr fid="F7">7</figr>), we observe splicing similar to WT levels, suggesting that a stem-loop structure is not contributing to the splicing of LCAT intron 4.</p>
         </sec>
         <sec>
            <st>
               <p>Candidate protein factors for the G-rich and C-rich ISEs</p>
            </st>
            <p>There are multiple candidate proteins that could be recognizing the G-rich and C-rich motifs present in LCAT intron 4. A G-rich motif trans-factor, hnRNP H, has been identified and shown to bind G-rich sequences and regulate splicing both positively and negatively <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>. Several additional hnRNP proteins, including hnRNPs A1, A2, and F, have also been shown to bind G-rich RNA sequences <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B57">57</abbr><abbr bid="B58">58</abbr><abbr bid="B59">59</abbr></abbrgrp>. An alternative model for G-rich sequence recognition involves RNA-RNA interactions to promote U1 snRNP binding. G-triplets near the 5'ss have been shown to bind the U1 snRNP by interacting with the U1 snRNA and this interaction was shown to be important for human alpha globin splicing <it>in vivo </it><abbrgrp><abbr bid="B69">69</abbr></abbrgrp>.</p>
            <p>hnRNP K and the &#945;-CP proteins are the major poly-C-binding proteins identified in mammalian cells <abbrgrp><abbr bid="B70">70</abbr><abbr bid="B71">71</abbr></abbrgrp>. Both hnRNP K and several &#945;-CP isoforms have been implicated in post-transcriptional control <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>. There have also been two studies that have implicated these proteins in splicing. hnRNP K was shown to enhance the splicing of a chicken b-Tropomyosin intron by binding a C-rich motif near the 5'ss <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. A recent study has shown that &#945;-CP2 binds a C-rich patch upstream of a weak PY tract in the human &#945;-globin intron 1 transcript and inhibits splicing of this intron <it>in vitro </it><abbrgrp><abbr bid="B67">67</abbr></abbrgrp>. This is in contrast to our results with LCAT intron 4 where the C-rich motifs function as splicing enhancers, not silencers. Several <it>cis</it>-elements, including G-rich ISEs, have been shown to act as both splicing enhancers and silencers <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr><abbr bid="B72">72</abbr></abbrgrp>. The C-rich motifs and their trans-factor may also possess the flexibility to function as silencers and enhancers.</p>
         </sec>
         <sec>
            <st>
               <p>Role of the PY tract in splicing</p>
            </st>
            <p>Our results suggest a model where ISEs present upstream of a weak PY tract compensate for a weakened U2AF65-RNA interaction (Figure <figr fid="F10">10</figr>). In the case of LCAT intron 4, the G-rich and C-rich motifs and the branchpoint sequence are highly conserved and yet the PY tract is not well conserved. It is possible that the presence of strong enhancers upstream of the PY tract has allowed for greater degeneracy in the PY tract region. In support of this model we have observed that when the PY tract is strengthened to include a run of eight uridines, mutation of both C-rich motifs or the cumulative mutation of the G-rich and C-rich motifs no longer have an effect on LCAT intron 4 splicing (Figure <figr fid="F8">8</figr>). The G-rich and C-rich motifs appear dispensable in the presence of a strong PY tract. G-rich motifs have previously been shown to be dispensable for maximal splicing in the presence of a strengthened PY tract <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. These results suggest that strong U2AF65-PY tract interactions alleviate the role of upstream ISEs.</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>ISEs compensate for a weakened PY tract</p>
               </caption>
               <text>
                  <p>ISEs compensate for a weakened PY tract. The four factors present in the early (E) complex (U1 snRNP, SF1, U2AF65 and U2AF35) recognize the four canonical intronic splicing elements (the 5' splice site, the branchpoint (BPS), the PY tract and the 3' splice site). During A complex formation, which follows E complex, the U2 snRNP is recruited by U2AF65 and replaces SF1 at the branchpoint. There are presumably multiple redundant pathways that compensate for weak U2AF65-PY tract interactions, including bridging interactions between SF1, U2AF65 and U2AF35, alternative PY tract binding proteins (shown here as factor 'P'), and pathways involving additional non-canonical motifs such as ESEs or ISEs. We propose that ISEs in the region upstream of a weak PY tract (nucleotides -30 to -80) are important for recognizing introns with weak PY tracts. Specifically, we have shown that G-rich and C-rich motifs are ISEs that compensate for weakened U2AF65-PY tract interactions. Factors X and Y represent proteins binding the compensating ISEs. We propose that ISE-factor X/Y interactions can compensate for weak PY tract-U2AF65 interactions and help recruit the U2 snRNP to the branchpoint. The dash (//) indicates the variable length between the 5' splice site and 3' end of the intron.</p>
               </text>
               <graphic file="gb-2008-9-6-r97-10"/>
            </fig>
            <p>An alternative model to weakened U2AF65-RNA interactions is that a splicing factor other than U2AF65 binds the weakened PY tracts. Recent work has shown that PUF60 plays a role in splicing by interacting with the PY tract <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Our observation that many of the weak PY tracts are particularly C-rich leads us to propose that a C-rich binding protein may function in this region. When we mutated a C-rich motif in the LCAT intron 4 PY tract we observed a small effect on splicing, suggesting that the C-rich motifs in the PY tract itself are recognized by a trans-factor (Figure <figr fid="F8">8</figr>). It is possible that such a splicing factor, be it U2AF65 or another protein, could be functioning in conjunction with the factor(s) that recognize the G-rich and C-rich ISEs upstream of the PY tract.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <sec>
            <st>
               <p>Novel mechanisms of intron recognition promote splicing of introns with non-canonical PY tracts</p>
            </st>
            <p>The pool of introns containing low-scoring U2AF65 binding sites represents a significant class of human introns lacking a canonical splicing element. The ISEs identified and validated here suggest that novel mechanisms exist in the cell for coping with weakened U2AF65-RNA interactions. Specifically, we have observed that the interplay of multiple <it>cis</it>-elements, in this case the G-rich and C-rich motifs, appears to be crucial for the recognition of non-canonical introns. In the future we plan to explore additional ISEs identified by this study to gain a broader picture of how the splicing machinery functions in the recognition of introns with weak PY tracts. While we have focused our attention here on a single key canonical splicing element, the PY tract, we plan on extending our analyses and expect to find that similar strategies exist for the recognition of other classes of non-canonical pre-mRNAs.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Computational prediction of strength of U2AF65 binding sites in PY tracts</p>
            </st>
            <p>The March 2006 human reference sequence (NCBI Build 36.1) in conjunction with the UCSC KnownGenes (hg18) annotation database (Release 8 April 2007) <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> was used to create a non-redundant database of human intronic sequences. After excluding annotated introns that did not begin with G [T/C] [A/G] and end with AG, and were less than 60 bases in length, we were left with 171,475 unique acceptor ends.</p>
            <p>In order to score PY tracts according to their likely affinity towards U2AF65, we developed a score that reflects the level of similarity of the PY tract sequence to the sequences that were enriched in RNAs derived from <it>in vitro </it>SELEX experiments using human U2AF65. In particular, if we let the frequency of occurrence of an n-mer <it>n </it>of length <it>k </it>within the SELEX sequences be represented by <it>f</it><sub><it>n</it></sub>, then for a subject sequence of length <it>L </it>the S<sub>65 </sub>score is determined according to:</p>
            <p>
               <display-formula>
                  <m:math name="gb-2008-9-6-r97-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>S</m:mi>
                              <m:mrow>
                                 <m:mn>65</m:mn>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munderover>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mn>0</m:mn>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mi>L</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>k</m:mi>
                                          <m:mo>+</m:mo>
                                          <m:mn>1</m:mn>
                                       </m:mrow>
                                    </m:munderover>
                                    <m:mrow>
                                       <m:mi>ln</m:mi>
                                       <m:mo>&#8289;</m:mo>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:mfrac>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>f</m:mi>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>n</m:mi>
                                                            <m:mi>i</m:mi>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:msub>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:mfrac bevelled="true">
                                                      <m:mn>1</m:mn>
                                                      <m:mrow>
                                                         <m:msup>
                                                            <m:mn>4</m:mn>
                                                            <m:mi>k</m:mi>
                                                         </m:msup>
                                                      </m:mrow>
                                                   </m:mfrac>
                                                </m:mrow>
                                             </m:mfrac>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>L</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>k</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciGacaGaaeqabaqabeGadaaakeaacaWGtbWaaSbaaSqaaiaaiAdacaaI1aaabeaakiabg2da9KqbaoaalaaabaWaaabCaeaaciGGSbGaaiOBamaabmaabaWaaSaaaeaacaWGMbWaaSbaaeaacaWGUbWaaSbaaeaacaWGPbaabeaaaeqaaaqaamaaliaabaGaaGymaaqaaiaaisdadaahaaqabeaacaWGRbaaaaaaaaaacaGLOaGaayzkaaaabaGaamyAaiabg2da9iaaicdaaeaacaWGPbGaeyypa0JaamitaiabgkHiTiaadUgacqGHRaWkcaaIXaaacqGHris5aaqaaiaadYeacqGHsislcaWGRbGaey4kaSIaaGymaaaaaaa@4D00@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>f</it><sub><it>n </it></sub>represents the frequency (within the SELEX population) of the n-mer found at position <it>i </it>in the subject sequence.</p>
            <p>The term <inline-formula><m:math name="gb-2008-9-6-r97-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>ln</m:mi><m:mo>&#8289;</m:mo><m:mrow><m:mo>(</m:mo><m:mrow><m:mfrac><m:mrow><m:msub><m:mi>f</m:mi><m:mrow><m:msub><m:mi>n</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:mrow><m:mfrac bevelled="true"><m:mn>1</m:mn><m:mrow><m:msup><m:mn>4</m:mn><m:mi>k</m:mi></m:msup></m:mrow></m:mfrac></m:mrow></m:mfrac></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaciiBaiaac6gadaqadaqcfayaamaalaaabaGaamOzamaaBaaabaGaamOBamaaBaaabaGaamyAaaqabaaabeaaaeaadaWccaqaaiaaigdaaeaacaaI0aWaaWbaaeqabaGaam4AaaaaaaaaaaGccaGLOaGaayzkaaaaaa@3963@</m:annotation></m:semantics></m:math></inline-formula> is a log-odds representation of the degree to which the particular n-mer was enriched within the SELEX sequences. Since the SELEX experiment began with uniformly random sequences, the denominator is simply the expectation for random occurrence of an n-mer of length <it>k</it>. For this study we chose the n-mer length to be five and the SELEX data were those reported in Singh <it>et al</it>. <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and both SELEX experiments reported in Banerjee <it>et al</it>. <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. The frequency of occurrence for all pentamers within these sequences is shown in Additional data file 1. Introns with 'strong' PY tracts (that is, expected to have high affinity for U2AF65) were defined to be those that are above the median value for all introns (0.811). All but one of the RNAs derived from <it>in vitro </it>SELEX had S<sub>65 </sub>scores above this value.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of intronic motifs over-represented upstream of weak PY tracts</p>
            </st>
            <p>In order to avoid biases due to long interspersed repetitive elements (LINEs) and short interspersed repetitive elements (SINEs), repetitive elements in the intronic sequence database (obtained as described above) were masked using the masking coordinates associated with the UCSC hg18 annotation database (Release 8 April 2007) <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. However, simple repeats (many of which resemble known hnRNP binding sites) were not masked. The intronic acceptor sequences were then separated according to their GC content within the last 100 bases (or last half if the intron was less than 200 bases in length). AT-rich introns were defined to be introns containing less than 50% GC content. GC-rich introns were defined to be those containing greater than or equal to 50% GC content.</p>
            <p>For each of these data sets, the occurrence of all n-mers (4-7 nucleotides) in the 50 nucleotide region from -80 to -30 (relative to the acceptor splice-junction) were determined using a sliding window. These counts were used to determine the background expectations for each n-mer. The occurrence of each 4-7 nucleotide n-mer within the equivalent region for all introns possessing 'weak' PY tracts (defined as above) was determined using a sliding window. From these values, n-mers that are enriched upstream of the branchpoint region for introns possessing weak PY tracts was determined using the binomial confidence interval method described in Voelker and Berglund <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. For the AT-rich class, 99 n-mers were determined to be significantly enriched (<it>P </it>&lt; 0.01), and 349 n-mers were determined to be significantly enriched for the GC-rich class. Enriched n-mers and corresponding counts and statistics are available in Additional data files 2 and 3. Enriched n-mers were used to construct motifs as in Voelker and Berglund <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. All of the derived motifs and the identities and occurrences of all n-mers that were used to construct the motifs are available in Additional data files 4 and 5.</p>
         </sec>
         <sec>
            <st>
               <p>U2AF65 binding</p>
            </st>
            <p>RNA oligonucleotides (listed in Figure <figr fid="F2">2b</figr>, IDT, Integrated DNA Technologies, San Diego, CA, USA) for U2AF65 binding assays were 5' end-labeled with &#947;-<sup>32</sup>P ATP using T4 polynucleotide kinase (NEB, Ipswich, MA, USA) for 30 minutes at 37&#176;C. The RNAs were then gel purified using an 8% denaturing gel, eluted from the gel in 0.3M Na acetate and ethanol precipitated. The resulting pellet was resuspended in nanopure water and purified with a Bio-spin 6 column (BioRad, Hercules, CA, USA) equilibrated with nanopure water. The radioactivity level of the purified RNA solution was determined by scintillation. Gel-shift binding assays were performed using varying concentrations of recombinant human U2AF65 with constant amounts of radiolabeled RNA oligonucleotides as previously described <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. The Ensembl gene accession numbers for the genes addressed in this study are: BRUNOL4 [ENSEMBL: ENSG00000101489], INSR [ENSEMBL: ENSG00000171105], LCAT [ENSEMBL: ENSG00000124067], MBNL1 [ENSEMBL: ENSG00000152601], SR140 [ENSEMBL: ENSG00000163714], and U2AF2 [ENSEMBL: ENSG00000063244].</p>
         </sec>
         <sec>
            <st>
               <p>Cloning of mini-genes and mutants</p>
            </st>
            <p>WT LCAT intron 4 mini-gene was cloned from HeLa genomic DNA using primers to amplify the region between the last 50 nucleotides of LCAT intron 3 to the first 50 nucleotides of LCAT intron 5 (502 nucleotides). The forward primer included a <it>Bam</it>H1 site and the reverse primer included an <it>Eco</it>R1 site. The amplified genomic DNA was cut with <it>Bam</it>H1 and <it>Eco</it>R1, inserted into pcDNA3 and sequenced. LCAT intron 4 mutants were made by PCR using the WT LCAT 4 mini-gene as template and primers containing the mutation of interest. LCAT intron 4 mutants were also cloned into pcDNA3 using <it>Bam</it>H1 and <it>Eco</it>R1 and sequenced. The WT and mutant GNPTG [ENSEMBL: ENSG00000090581] intron 2 mini-genes were cloned using overlapping primers to create a sequence containing exon 2, intron 2 and exon 3. This sequence was flanked by cut sites <it>Hin</it>dIII and <it>Not</it>1, cloned into pcDNA3 and sequenced.</p>
         </sec>
         <sec>
            <st>
               <p><it>In vivo </it>splicing assays: cell culture, transfection, and harvesting</p>
            </st>
            <p>HeLa cells were grown in monolayers in DMEM with GLUTAMAX (Invitrogen, Carlsbad, CA, USA) and supplemented with 10% fetal bovine serum (GIBCO). For the LCAT splicing experiments 1.5 (&#177; 0.2) &#215; 10<sup>5 </sup>cells were plated in 6-well plates and transfected 18-20 h later at approximately 70% confluency. Plasmid (1 &#956;g) was transfected into each well of cells using 5 &#956;l of Lipofectin (Invitrogen, Carlsbad, CA, USA) and 10 &#956;l of Plus reagent (Invitrogen) according to the manufacturer's protocols. For the GNPTG splicing experiments, 2 &#215; 10<sup>5 </sup>cells were plated in 6-well plates and transfected with 1 &#956;g plasmid 18-20 h later using 5 &#956;l of Lipofectamine 2000. Cells were harvested 24 h (LCAT experiments) or 16 h (GNPTG experiments) after transfection using TriplE (GIBCO) and then pelleted by centrifugation. RNA was isolated from the cell pellets using an RNeasy kit (QIAGEN, Valencia, CA, USA).</p>
         </sec>
         <sec>
            <st>
               <p><it>In vivo </it>splicing assays: DNAsing, reverse transcription, PCR, and quantifying percent mRNA</p>
            </st>
            <p>Isolated RNA (500 ng) was incubated with 1 unit of RQI DNase (Promega, Madison, WI, USA) in a 10 &#956;l reaction for 2 h (LCAT experiments) or 1 h (GNPTG experiments) according to the manufacturer's protocol. DNAsed RNA (2 &#956;l (100 ng)) was reverse transcribed in a 10 &#956;l reaction (1:5 dilution) using Superscript II and an LCAT-specific reverse primer or a reverse primer to the pCDNA3 SP6 sequence for the GNPTG experiments, according to manufacturer's protocols with the exception that we used half the recommended amount of Superscript II (Invitrogen, Carlsbad, CA, USA). For the LCAT splicing experiments, 2 &#956;l of the reverse transcription reaction was subjected to 20 rounds of PCR amplification in a 20 &#956;l reaction (1:10 dilution) using LCAT specific primers spiked with a kinased LCAT forward primer (0.4 nM). Twenty rounds of PCR were found to be within the linear range for this PCR experiment (data not shown). The resulting PCR products were run on an 8% (19:1) polyacrylamide native gel. For the GNPTG splicing experiments, 2 &#956;l of the reverse transcription reaction was subjected to 27 rounds of PCR amplification in a 20 &#956;l reaction (1:10 dilution) using primers specific to the T7 (forward) and SP6 (reverse) sequences of the pcDNA3 plasmid spiked with kinased T7 forward primer. Twenty-seven rounds of PCR were found to be within the linear range for this PCR experiment (data not shown). The resulting PCR products were run on a 10% (19:1) polyacrylamide gel. The gels were dried and exposed overnight to a phosphorimager screen. Quantification of the radioactive bands was performed using ImageQuant software (GE Healthcare, London, UK). The percent pre-mRNA was calculated by dividing the amount of the pre-mRNA band by the total amount of the pre-mRNA and mRNA bands and multiplying by 100%.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>ADML, adenovirus major late; CRM, C-rich motif; ESE, exonic splicing enhancer; ESS, exonic splicing silencer; GNPTG, N-acetylglucosamine-1-phosphotransferase gamma subunit; GRM, G-rich motif; hnRNP, heterogeneous nuclear ribonucleoproteins; ISE, intronic splicing enhancer; ISS, intronic splicing silencer; LCAT, lecithin cholesterol acyltransferase; PY tract, polypyrimidine tract; S<sub>65 </sub>score, U2AF65 binding site score; SF, splicing factor; snRNP, small nuclear ribonucleoprotein; ss, splice site; U2AF, U2 snRNP auxilliary factor; WT, wild-type.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JIM and RBV designed experiments, performed experiments, analyzed data and wrote the paper. KLH and MBW performed experiments and analyzed data. JAB designed experiments, analyzed data and wrote the paper.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data files are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a table listing the probability of occurrence for pentamers found in U2AF65 SELEX derived sequences. Additional data file <supplr sid="S2">2</supplr> is a table listing the n-mers enriched upstream of weak PY tracts from GC-rich introns. Additional data file <supplr sid="S3">3</supplr> is a table listing the n-mers enriched upstream of weak PY tracts from AT-rich introns. Additional data file <supplr sid="S4">4</supplr> is a table listing the clusters enriched upstream of weak PY tracts from GC-rich introns. Additional data file <supplr sid="S5">5</supplr> is a table listing the clusters enriched upstream of weak PY tracts from AT-rich introns. Additional data file <supplr sid="S6">6</supplr> is a figure of the scatterplots of Z-scores for enrichment upstream of weak PY tracts for long versus short introns. Additional data file <supplr sid="S7">7</supplr> is a histogram showing the percentage of introns possessing specific n-mers that are enriched upstream of weak PY tracts</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Probability of occurrence for pentamers found in U2AF65 SELEX derived sequences</p>
            </caption>
            <text>
               <p>Table listing the count and the probability of occurrence (using a sliding window) for all pentamers found in the sequences reported in Singh <it>et al</it>. <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and both SELEX experiments reported in Banerjee <it>et al</it>. <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>.</p>
            </text>
            <file name="gb-2008-9-6-r97-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>N-mers enriched upstream of weak PY tracts from GC-rich introns</p>
            </caption>
            <text>
               <p>Associated statistics and listing of n-mers (4-7 nucleotides) determined to be enriched in the 50 nucleotide region upstream of weak PY tracts from GC-rich introns.</p>
            </text>
            <file name="gb-2008-9-6-r97-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>N-mers enriched upstream of weak PY tracts from AT-rich introns</p>
            </caption>
            <text>
               <p>Associated statistics and listing of n-mers (4-7 nucleotides) determined to be enriched in the 50 nucleotide region upstream of weak PY tracts from AT-rich introns.</p>
            </text>
            <file name="gb-2008-9-6-r97-S3.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>Clusters enriched upstream of weak PY tracts from GC-rich introns</p>
            </caption>
            <text>
               <p>Listing of all clusters derived from n-mers enriched in the 50 nucleotide region upstream of weak PY tracts from GC-rich introns. Included are the individual n-mers and associated statistics used to produce each motif.</p>
            </text>
            <file name="gb-2008-9-6-r97-S4.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>Clusters enriched upstream of weak PY tracts from AT-rich introns</p>
            </caption>
            <text>
               <p>Listing of all clusters derived from n-mers enriched in the 50 nucleotide region upstream of weak PY tracts from AT-rich introns. Included are the individual n-mers and associated statistics used to produce each motif.</p>
            </text>
            <file name="gb-2008-9-6-r97-S5.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>Scatterplots of Z-scores for enrichment upstream of weak PY tracts for long versus short introns</p>
            </caption>
            <text>
               <p>The Z-scores for enrichment of all 4-7 nucleotide n-mers in the intronic region upstream (-80 to -30 relative to the acceptor splice-junction) of PY tracts with low S<sub>65 </sub>scores for short (&lt;200 nucleotide) introns is plotted versus long (&#8805; 200 nucleotide) introns. <b>(a) </b>Data for GC-rich introns. <b>(b) </b>Data for AT-rich introns.</p>
            </text>
            <file name="gb-2008-9-6-r97-S6.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S7">
            <title>
               <p>Additional data file 7</p>
            </title>
            <caption>
               <p>Histograms showing the percentage of the -80 to -30 region of introns that matches one or more enriched n-mers</p>
            </caption>
            <text>
               <p>The portion of the sequence corresponding to the -80 to -30 region matching one or more of the n-mers enriched in the same region for introns with weak PY tracts (Additional data files 2 and 3) was determined. These values (referred to as the percent coverage) were binned as indicated along the x-axis.</p>
            </text>
            <file name="gb-2008-9-6-r97-S7.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>JIM was supported by an AHA Pacific Mountain Affiliate pre-doctoral fellowship. KLH was supported by a NSF graduate research fellowship. MBW was supported by NIH training grant GM-07759, to the Institute of Molecular Biology at the University of Oregon. This work was supported by NSF grant 0616264-MCB to JAB. We thank members of the Berglund lab for helpful discussion.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Spliceosome structure and function.</p>
            </title>
            <aug>
               <au>
                  <snm>Will</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>L&#252;hrmann</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>The RNA World</source>
            <publisher>Cold Spring Harbor, New York Cold Spring Harbor Laboratory Press</publisher>
            <editor>Gesteland R, Cech T, Atkins J</editor>
            <edition>3</edition>
            <pubdate>2006</pubdate>
            <fpage>369</fpage>
            <lpage>400</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Mechanical devices of the spliceosome: motors, clocks, springs, and things.</p>
            </title>
            <aug>
               <au>
                  <snm>Staley</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Guthrie</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1998</pubdate>
            <volume>92</volume>
            <fpage>315</fpage>
            <lpage>326</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)80925-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">9476892</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Splicing of precursors to mRNA by the spliceosome.</p>
            </title>
            <aug>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Tuschl</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sharp</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>The RNA World</source>
            <publisher>Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press</publisher>
            <editor>Gesteland R, Cech T, Atkins J</editor>
            <edition>2</edition>
            <pubdate>1999</pubdate>
            <fpage>525</fpage>
            <lpage>560</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: causes and consequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Krawczak</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Reiss</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>DN</fnm>
               </au>
            </aug>
            <source>Hum Genet</source>
            <pubdate>1992</pubdate>
            <volume>90</volume>
            <fpage>41</fpage>
            <lpage>54</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00210743</pubid>
                  <pubid idtype="pmpid">1427786</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Pre-mRNA splicing and human disease.</p>
            </title>
            <aug>
               <au>
                  <snm>Faustino</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2003</pubdate>
            <volume>17</volume>
            <fpage>419</fpage>
            <lpage>437</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gad.1048803</pubid>
                  <pubid idtype="pmpid" link="fulltext">12600935</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Splicing in disease: disruption of the splicing code and the decoding machinery.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>749</fpage>
            <lpage>761</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg2164</pubid>
                  <pubid idtype="pmpid" link="fulltext">17726481</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Mechanisms of alternative pre-messenger RNA splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Black</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Annu Rev Biochem</source>
            <pubdate>2003</pubdate>
            <volume>72</volume>
            <fpage>291</fpage>
            <lpage>336</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.biochem.72.121801.161720</pubid>
                  <pubid idtype="pmpid" link="fulltext">12626338</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The U1 small nuclear RNA-protein complex selectively binds a 5' splice site <it>in vitro</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Mount</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Pettersson</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Hinterberger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Karmas</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Steitz</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1983</pubdate>
            <volume>33</volume>
            <fpage>509</fpage>
            <lpage>518</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(83)90432-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">6190573</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Identification of functional U1 snRNA-pre-mRNA complexes committed to spliceosome assembly and splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Seraphin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Rosbash</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1989</pubdate>
            <volume>59</volume>
            <fpage>349</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(89)90296-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">2529976</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The splicing factor BBP interacts specifically with the pre-mRNA branchpoint sequence UACUAAC.</p>
            </title>
            <aug>
               <au>
                  <snm>Berglund</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Chua</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Abovich</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Reed</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rosbash</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1997</pubdate>
            <volume>89</volume>
            <fpage>781</fpage>
            <lpage>787</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)80261-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">9182766</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>A cooperative interaction between U2AF65 and mBBP/SF1 facilitates branchpoint region recognition.</p>
            </title>
            <aug>
               <au>
                  <snm>Berglund</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Abovich</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Rosbash</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>1998</pubdate>
            <volume>12</volume>
            <fpage>858</fpage>
            <lpage>867</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">316625</pubid>
                  <pubid idtype="pmpid" link="fulltext">9512519</pubid>
                  <pubid idtype="doi">10.1101/gad.12.6.858</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Identification, purification, and biochemical characterization of U2 small nuclear ribonucleoprotein auxiliary factor.</p>
            </title>
            <aug>
               <au>
                  <snm>Zamore</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1989</pubdate>
            <volume>86</volume>
            <fpage>9243</fpage>
            <lpage>9247</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">298470</pubid>
                  <pubid idtype="pmpid" link="fulltext">2531895</pubid>
                  <pubid idtype="doi">10.1073/pnas.86.23.9243</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Biochemical characterization of U2 snRNP auxiliary factor: an essential pre-mRNA splicing factor with a novel intranuclear distribution.</p>
            </title>
            <aug>
               <au>
                  <snm>Zamore</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1991</pubdate>
            <volume>10</volume>
            <fpage>207</fpage>
            <lpage>214</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">452631</pubid>
                  <pubid idtype="pmpid">1824937</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Both subunits of U2AF recognize the 3' splice site in <it>Caenorhabditis elegans</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Zorio</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Blumenthal</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <fpage>835</fpage>
            <lpage>838</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/45597</pubid>
                  <pubid idtype="pmpid" link="fulltext">10617207</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Functional recognition of the 3' splice site AG by the splicing factor U2AF35.</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Romfo</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Nilsen</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <fpage>832</fpage>
            <lpage>835</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/45996</pubid>
                  <pubid idtype="pmpid" link="fulltext">10617206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Inhibition of msl-2 splicing by Sex-lethal reveals interaction between U2AF35 and the 3' splice site AG.</p>
            </title>
            <aug>
               <au>
                  <snm>Merendino</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Guth</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bilbao</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Martinez</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Valcarcel</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <fpage>838</fpage>
            <lpage>841</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/45602</pubid>
                  <pubid idtype="pmpid" link="fulltext">10617208</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>A potential role for U2AF-SAP 155 interactions in recruiting U2 snRNP to the branch site.</p>
            </title>
            <aug>
               <au>
                  <snm>Gozani</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Potashkin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Reed</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1998</pubdate>
            <volume>18</volume>
            <fpage>4752</fpage>
            <lpage>4760</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">109061</pubid>
                  <pubid idtype="pmpid" link="fulltext">9671485</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The SF3b155 N-terminal domain is a scaffold important for splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Cass</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Berglund</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2006</pubdate>
            <volume>45</volume>
            <fpage>10092</fpage>
            <lpage>10101</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi060429o</pubid>
                  <pubid idtype="pmpid" link="fulltext">16906767</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Exon recognition in vertebrate splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Berget</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>2411</fpage>
            <lpage>2414</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7852296</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>G triplets located throughout a class of small vertebrate introns enforce intron borders and regulate splice site selection.</p>
            </title>
            <aug>
               <au>
                  <snm>McCullough</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Berget</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1997</pubdate>
            <volume>17</volume>
            <fpage>4562</fpage>
            <lpage>4571</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">232310</pubid>
                  <pubid idtype="pmpid" link="fulltext">9234714</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases.</p>
            </title>
            <aug>
               <au>
                  <snm>Blencowe</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>106</fpage>
            <lpage>110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(00)01549-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">10694877</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Listening to silence and understanding nonsense: exonic mutations that affect splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Cartegni</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chew</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Krainer</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>285</fpage>
            <lpage>298</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg775</pubid>
                  <pubid idtype="pmpid" link="fulltext">11967553</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Finding signals that regulate alternative splicing in the post-genomic era.</p>
            </title>
            <aug>
               <au>
                  <snm>Ladd</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>reviews0008</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">244920</pubid>
                  <pubid idtype="pmpid" link="fulltext">12429065</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-11-reviews0008</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Understanding alternative splicing: towards a cellular code.</p>
            </title>
            <aug>
               <au>
                  <snm>Matlin</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Nat Rev Mol Cell Biol.</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>386</fpage>
            <lpage>398</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15956978</pubid>
                  <pubid idtype="doi">10.1038/nrm1645</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Silencers regulate both constitutive and alternative splicing events in mammals.</p>
            </title>
            <aug>
               <au>
                  <snm>Pozzoli</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Sironi</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Cell Mol Life Sci</source>
            <pubdate>2005</pubdate>
            <volume>62</volume>
            <fpage>1579</fpage>
            <lpage>1604</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00018-005-5030-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">15905961</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Regulation of alternative RNA splicing by exon definition and exon sequences in viral and mammalian gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Zheng</snm>
                  <fnm>ZM</fnm>
               </au>
            </aug>
            <source>J Biomed Sci</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>278</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF02254432</pubid>
                  <pubid idtype="pmpid">15067211</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Distinct binding specificities and functions of higher eukaryotic polypyrimidine tract-binding proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Singh</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Valcarcel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>268</volume>
            <fpage>1173</fpage>
            <lpage>1176</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.7761834</pubid>
                  <pubid idtype="pmpid" link="fulltext">7761834</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p><it>In vivo </it>requirement of the small subunit of U2AF for recognition of a weak 3' splice site.</p>
            </title>
            <aug>
               <au>
                  <snm>Pacheco</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Coelho</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Desterro</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Mollet</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Carmo-Fonseca</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>2006</pubdate>
            <volume>26</volume>
            <fpage>8183</fpage>
            <lpage>8190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1636752</pubid>
                  <pubid idtype="pmpid" link="fulltext">16940179</pubid>
                  <pubid idtype="doi">10.1128/MCB.00350-06</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Control of pre-mRNA splicing by the general splicing factors PUF60 and U2AF65.</p>
            </title>
            <aug>
               <au>
                  <snm>Hastings</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Allemand</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Duelli</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Krainer</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>PLoS ONE</source>
            <pubdate>2007</pubdate>
            <volume>2</volume>
            <fpage>e538</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1888729</pubid>
                  <pubid idtype="pmpid" link="fulltext">17579712</pubid>
                  <pubid idtype="doi">10.1371/journal.pone.0000538</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>A conditional role of U2AF in splicing of introns with unconventional polypyrimidine tracts.</p>
            </title>
            <aug>
               <au>
                  <snm>Sridharan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Singh</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>2007</pubdate>
            <volume>27</volume>
            <fpage>7334</fpage>
            <lpage>7344</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2168890</pubid>
                  <pubid idtype="pmpid" link="fulltext">17709389</pubid>
                  <pubid idtype="doi">10.1128/MCB.00627-07</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Pre-spliceosome formation in <it>S. pombe </it>requires a stable complex of SF1-U2AF(59)-U2AF(23).</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vilardell</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Query</snm>
                  <fnm>CC</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>2002</pubdate>
            <volume>21</volume>
            <fpage>5516</fpage>
            <lpage>5526</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">129087</pubid>
                  <pubid idtype="pmpid">12374752</pubid>
                  <pubid idtype="doi">10.1093/emboj/cdf555</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Alternative modes of binding by U2AF65 at the polypyrimidine tract.</p>
            </title>
            <aug>
               <au>
                  <snm>Henscheid</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Voelker</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Berglund</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>2007</pubdate>
            <volume>47</volume>
            <fpage>449</fpage>
            <lpage>459</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi701240t</pubid>
                  <pubid idtype="pmpid" link="fulltext">18067274</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Sorting out the complexity of SR protein functions.</p>
            </title>
            <aug>
               <au>
                  <snm>Graveley</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Rna</source>
            <pubdate>2000</pubdate>
            <volume>6</volume>
            <fpage>1197</fpage>
            <lpage>1211</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1369994</pubid>
                  <pubid idtype="pmpid" link="fulltext">10999598</pubid>
                  <pubid idtype="doi">10.1017/S1355838200000960</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>A splicing enhancer complex controls alternative splicing of doublesex pre-mRNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Tian</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Maniatis</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1993</pubdate>
            <volume>74</volume>
            <fpage>105</fpage>
            <lpage>114</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(93)90298-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">8334698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Specific interactions between proteins implicated in splice site selection and regulated alternative splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>JY</fnm>
               </au>
               <au>
                  <snm>Maniatis</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1993</pubdate>
            <volume>75</volume>
            <fpage>1061</fpage>
            <lpage>1070</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2001174</pubid>
                  <pubid idtype="pmpid" link="fulltext">8261509</pubid>
                  <pubid idtype="doi">10.1016/0092-8674(93)90316-I</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The splicing factor U2AF35 mediates critical protein-protein interactions in constitutive and enhancer-dependent splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Zuo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Maniatis</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>1996</pubdate>
            <volume>10</volume>
            <fpage>1356</fpage>
            <lpage>1368</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gad.10.11.1356</pubid>
                  <pubid idtype="pmpid" link="fulltext">8647433</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>An intronic mutation in a lariat branchpoint sequence is a direct cause of an inherited human disorder (fish-eye disease).</p>
            </title>
            <aug>
               <au>
                  <snm>Kuivenhoven</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Weibusch</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pritchard</snm>
                  <fnm>PH</fnm>
               </au>
               <au>
                  <snm>Funke</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Benne</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Assmann</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kastelein</snm>
                  <fnm>JJ</fnm>
               </au>
            </aug>
            <source>J Clin Invest</source>
            <pubdate>1996</pubdate>
            <volume>98</volume>
            <fpage>358</fpage>
            <lpage>364</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">507438</pubid>
                  <pubid idtype="pmpid" link="fulltext">8755645</pubid>
                  <pubid idtype="doi">10.1172/JCI118800</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A factor, U2AF, is required for U2 snRNP binding and splicing complex assembly.</p>
            </title>
            <aug>
               <au>
                  <snm>Ruskin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Zamore</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>MR</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1988</pubdate>
            <volume>52</volume>
            <fpage>207</fpage>
            <lpage>219</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(88)90509-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">2963698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>The conserved RNA recognition motif 3 of U2 snRNA auxiliary factor (U2AF 65) is essential <it>in vivo </it>but dispensable for activity <it>in vitro</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Banerjee</snm>
                  <fnm>H</fnm>
               </au>
               <au>
    