<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2010-11-3-r29</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Genome-wide functional analysis of human 5' untranslated region introns</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Cenik</snm>
               <fnm>Can</fnm>
               <insr iid="I1"/>
               <email>cancenik@fas.harvard.edu</email>
            </au>
            <au id="A2">
               <snm>Derti</snm>
               <fnm>Adnan</fnm>
               <insr iid="I1"/>
               <email>adnan_derti@hms.harvard.edu</email>
            </au>
            <au id="A3">
               <snm>Mellor</snm>
               <mi>C</mi>
               <fnm>Joseph</fnm>
               <insr iid="I1"/>
               <email>joseph_mellor@hms.harvard.edu</email>
            </au>
            <au id="A4">
               <snm>Berriz</snm>
               <mi>F</mi>
               <fnm>Gabriel</fnm>
               <insr iid="I1"/>
               <email>gberriz@hms.harvard.edu</email>
            </au>
            <au ca="yes" id="A5">
               <snm>Roth</snm>
               <mi>P</mi>
               <fnm>Frederick</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>fritz_roth@hms.harvard.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Harvard Medical School, Department of Biological Chemistry and Molecular Pharmacology, 250 Longwood Avenue, SGMB-322, Boston, MA 02115, USA</p>
            </ins>
            <ins id="I2">
               <p>Center for Cancer Systems Biology, Dana Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2010</pubdate>
         <volume>11</volume>
         <issue>3</issue>
         <fpage>R29</fpage>
         <url>http://genomebiology.com/2010/11/3/R29</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2010-11-3-r29</pubid>
               <pubid idtype="pmpid">20222956</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>22</day>
               <month>1</month>
               <year>2010</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>11</day>
               <month>3</month>
               <year>2010</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>11</day>
               <month>3</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>Cenik et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>5'UTR introns</p>
      </shorttitle>
      <shortabs>
         <p>Genes with short 5'UTR introns have higher expression than genes with no or long 5'UTR introns. Complex evolutionary forces act on these introns.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We performed a genome-scale computational analysis of 5'UTR introns in humans. We discovered that the most highly expressed genes tended to have short 5'UTR introns rather than having long 5'UTR introns or lacking 5'UTR introns entirely. Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories. In particular, genes with regulatory roles were surprisingly enriched in having 5'UTR introns. Finally, we analyzed the evolution of 5'UTR introns in non-receptor protein tyrosine kinases (NRTK), and identified a conserved DNA motif enriched within the 5'UTR introns of human NRTKs.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner. While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification id="30010002" subtype="man_spc_id" type="BMC">Bioinformatics</classification>
         <classification id="30010008" subtype="man_spc_id" type="BMC">Evolution</classification>
         <classification id="300100010" subtype="man_spc_id" type="BMC">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The advent, evolution and functional significance of introns in eukaryotes have been topics of intense debate over the past 30 years (reviewed in <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>). There are two major opposing views on when introns arose in evolution; this 'introns-early' versus 'introns-late' controversy is reviewed in <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Also, debate exists on what causes their frequent losses and gains <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp> and whether they have any adaptive significance.</p>
         <p>Neutral or nearly neutral population genetic processes under general, non-adaptive conditions have been suggested to result in dynamic gains and losses of introns. Such neutral processes could account for some of the observed patterns of intron presence <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, but do not rule out the possibility that adaptive processes are simultaneously contributing to the maintenance of some introns. Introns have been suggested to confer adaptive advantages by functioning in diverse mechanisms ranging from modifying recombination rates to increasing the efficacy of natural selection <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, and even to protecting exons from deleterious R-loops <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. A relatively well-understood functional role of introns is to facilitate the production of distinct forms of mature mRNA through alternative splicing <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Recent genome-wide analyses suggest that nearly 95% of all human genes are alternatively spliced <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. Many alternative splicing events are tissue-specific, and functional regulatory elements in exons and introns are associated with tissue specificity of these variants <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Therefore, introns can contribute to gene regulation.</p>
         <p>Most of the theoretical and empirical work on the evolution of introns has focused on those found in coding regions, yet an appreciable fraction of human genes (approximately 35%) contain introns in their 5'UTRs <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Introns in 5'UTRs are twice as long as those in coding regions, on average, and moderately lower in density, such that 5'UTRs contain a lower percentage of intronic bases than do coding regions <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. By contrast, 3'UTRs are typically much longer than 5'UTRs but a study in human, mouse, fruit fly and mustard weed have shown that relatively few 3'UTRs (&lt;5%) contain introns <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. This observation is partly explained by nonsense-mediated decay given that an intron downstream of the stop codon would typically signal a transcript for degradation by nonsense-mediated decay <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. In addition, splicing signals within 3'UTRs have been suggested to have reduced maintaining selection and, therefore, 3'UTRs tend to be longer and contain fewer introns compared to 5'UTRs <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. In summary, these differences suggest that introns in different regions of genes constitute distinct functional classes with unique evolutionary histories.</p>
         <p>As 5'UTR introns (5UIs) are unusually long and can considerably increase the total number of bases transcribed for a given gene, it is useful to consider the two main adaptationist theories about the functional consequences of intron length. The first model argues that it is energetically costly for cells to transcribe long stretches of DNA that does not encode protein <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. By this reasoning, total intronic length should be relatively low in highly expressed genes. Consistent with this prediction, the most highly expressed genes tend to have shorter introns in both humans and the worm <it>Caenorhabditis elegans </it><abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, and there seems to be additional selective pressures towards having shorter proteins and more biased codon usage <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. However, an opposite effect is observed in <it>Oryza </it>and <it>Arabidopsis</it>, such that highly expressed genes have more and longer introns <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. If the selection against longer introns in highly expressed genes minimizes the energetic cost of unnecessary transcription, this observation is unexpected, as we would expect the model to hold across all taxa.</p>
         <p>The second model, termed 'genome design', posits that the pressure to maintain many intronic regulatory elements favors longer introns in tissue-specific genes <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. The main supporting observation for this hypothesis is that human 'housekeeping' genes tend to be compact, with fewer and shorter introns as well as shorter coding regions relative to tissue-specific genes <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. Tissue-specific genes, on the other hand, tend to have longer and more conserved introns, perhaps because their functional complexity requires a more stringent level of regulation <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Furthermore, genes with higher functional complexity tend to be longer and seem to be under more complex regulation <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. However, analyses of human antisense genes contradict the claims of the genome design hypothesis <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. These studies showed that antisense genes, which need to be expressed rapidly, are compact but can be tissue-specific regulators <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. Curiously, some studies supporting the genome design hypothesis explicitly disregard 5UIs (see methods in <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>) even though these introns might be expected to include regulatory elements, being closer to transcription and often to translation start sites <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>.</p>
         <p>Neither of these two principal theories addresses the possible role of 5UIs and the evolutionary pressures acting on them; therefore, the functional significance, if any, of their frequent occurrence remains unclear. Given that splicing of these sequences seemingly has no effect on the amino acid sequence of the encoded protein, it is unclear what selective benefit might accompany their removal from the mature mRNA. The reduced splice-site conservation and high variability in length of 5UIs have led to the suggestion that they contract and expand without significant functional consequences <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. However, an exception to the trend of reduced splice-site conservation is observed in <it>Cryptococcus</it>, an intron-rich fungus with longer 5' and 3' UTR introns than coding region introns <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and high conservation near UTR intron boundaries <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
         <p>Given these conflicting results and the scarcity of studies regarding the evolution of UTR introns, it is worthwhile to consider a functional perspective. An analysis of functional trends among human genes with 5UIs could lead to a better understanding of their evolution and also potentially to the detection of novel mechanisms of regulation mediated by these introns. Here, we analyze expression profiles of genes with 5UIs and examine the distribution of these introns in different functional categories of genes.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Characterization of a set of genes with 5'UTR introns</p>
            </st>
            <p>To investigate the functional properties of human 5UIs, we used NCBI's Reference Sequence (RefSeq) collection. These are curated, full-length sequences with annotated UTR boundaries, and expression data are available for many of them. The lack of a translation reading frame makes the computational prediction of splice sites in 5'UTRs inherently more difficult <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, necessitating the choice of such a validated set. In humans, approximately 8.5k (35%) out of 24.5k RefSeq mRNAs contained at least one intron in their 5'UTR (Additional file <supplr sid="S1">1</supplr>). Previous estimates of the percentage of genes with 5UIs ranged between 22% and 26% <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and 38% <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> in humans, suggesting that the RefSeq collection had no major bias in terms of presence or absence of 5UIs compared to other previously used datasets. The distribution of total 5'UTR intronic length for genes in our dataset was also similar to that observed previously (Figure <figr fid="F1">1a</figr>). The inter-quartile range of total length of 5UIs within each gene was approximately 1.3 - 16 kb. Some 5UIs were extremely long -- 16% were longer than 27 kb, the length of the average protein coding gene in the human genome <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, and 5% were longer than 76 kb (Figure <figr fid="F1">1a</figr>). As previously reported <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>, most genes had few 5UIs. More than 90% had a single intron, and the percentage of genes with two or more introns decreased exponentially (Figure <figr fid="F1">1b</figr>).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <text>
                  <p>Complete list of RefSeq mRNA IDs that have 5'UTR introns. This file contains the genomic coordinates and RefSeq IDs for all transcripts with 5'UTR introns. '+'and '-' represent the forward and reverse strands, respectively.</p>
               </text>
               <file name="gb-2010-11-3-r29-S1.TXT">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Characterization of fundamental properties of 5'UTR introns</p>
               </caption>
               <text>
                  <p><b>Characterization of fundamental properties of 5'UTR introns</b>. <b>(a) </b>Histogram of the total 5'UTR intron length. A well annotated set of RefSeq transcript IDs are used in this analysis and this histogram shows the distribution of the log<sub>10 </sub>of the total number of intronic nucleotides in the 5'UTR. <b>(b) </b>Distribution of the number of introns in the 5'UTR. The log<sub>10 </sub>of number of transcripts that have a given number of introns in their 5'UTR is shown. The number of transcripts with a given number of 5'UTR introns decreases exponentially. <b>(c) </b>Heat map depicting the relationship between total lengths of 5'UTR introns and 5'UTR exons. <b>(d) </b>Heat map depicting the relationship between total lengths of 5'UTR introns and non-5'UTR introns. In both heatmaps, darker shades of gray indicate more transcripts.</p>
               </text>
               <graphic file="gb-2010-11-3-r29-1"/>
            </fig>
            <p>We next considered the relationship between the total lengths of 5'UTR exons and of 5UIs. Even though there was a correlation between the lengths of 5UIs and 5'UTR exons overall, this correlation was slight and was driven by the genes with the longest 5UIs (Figure <figr fid="F1">1c</figr>; Pearson correlation coefficient or Pearson correlation coefficient (PCC) = 0.21, <it>P </it>&lt; 2.2e-16). In fact, when genes with 5UI lengths in the lowest 25th percentile were analyzed, the correlation was no longer significant (Figure <figr fid="F1">1c</figr>; PCC = -0.005, <it>P </it>= 0.84). A statistically significant, albeit slight, correlation was found for genes with 5UI length below the median (Figure <figr fid="F1">1c</figr>; PCC = 0.07, <it>P </it>= 8.4e-05). Among the genes with 5UIs, a similar relationship was evident between the total length of 5UIs and the total length of the remaining introns (Figure <figr fid="F1">1d</figr>). Although these two variables were significantly correlated (Figure <figr fid="F1">1d</figr>; PCC = 0.18, <it>P </it>&lt; 2.2e-16), the relationship was clearly driven by the genes with longer 5UIs. When genes with 5UI lengths either in the lowest 25th or 50th percentile were considered, correlation was negligible (Figure <figr fid="F1">1d</figr>; PCC = -0.02 and 0.04, <it>P </it>= 0.53 and 0.04, respectively).</p>
            <p>Thus, genes with long 5UIs tend to have a high total intronic length and longer 5'UTR exons. While this tendency holds in genes with additional introns, several genes with total 5UI lengths greater than 10 kb lack any coding-region or 3'UTR introns (Figure <figr fid="F1">1d</figr>). On the other hand, amongst genes with short 5UIs, the total length of 5UIs is uncorrelated with the lengths of either 5'UTR exons or the remaining introns.</p>
         </sec>
         <sec>
            <st>
               <p>Gene expression analysis</p>
            </st>
            <p>We next examined gene expression-related predictions of the two principal models of intron evolution. Previous studies have suggested that the genes with the highest expression levels are selected to have shorter introns <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. If a similar selective pressure were acting on 5UIs (in conjunction with neutral evolutionary processes <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>), one would expect a tendency towards reduced gene expression level as a function of increased 5UI length in a subset of genes. We therefore compared gene expression from 79 tissues as a function of the total 5'UTR intronic length. We divided 5UI-containing genes into three categories with respect to the total 5'UTR intronic length (short, 0 to 25%; intermediate, 25 to 75%; long, 75 to 100% in length). The short 5UI-containing genes were highly overrepresented in the top 1% of mean expression level for the genes with 5UIs (Fisher's exact test, <it>P </it>= 3.3e-15) and also in the top 5% (Fisher's exact test, <it>P </it>= 1.7e-14) (Figure <figr fid="F2">2a</figr>). These genes were 12.7 times more likely than all other genes with 5UIs to be in the highest 1% of mean expression and 3 times more likely to be in the highest 5% of mean expression. There was also a global trend for genes with short 5UIs to be expressed at a higher level compared to genes with longer 5UIs (25 to 100 percentile in length; one-sided Wilcoxon rank sum test, <it>P </it>= 2.98e-05; Figure <figr fid="F2">2a</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Expression analysis as a function of total 5'UTR intron length</p>
               </caption>
               <text>
                  <p><b>Expression analysis as a function of total 5'UTR intron length</b>. <b>(a) </b>Heat map of the mean expression level versus the total 5'UTR intron length. The shade of gray represents the number of transcripts in each bin with darker shades implying more transcripts. The overrepresentation of short 5'UTR-intron-containing genes among the highest expression levels is apparent. <b>(b) </b>Quantile-quantile plot of total 5'UTR intron length of short 5'UTR intron-containing genes divided into highly expressed (top 5%) and other genes. The most highly expressed genes tend to have shorter 5'UTR introns. <b>(c) </b>Smoothed histogram of the mean expression level with respect to presence/absence of 5'UTR intron and its length. A kernel density estimator was fitted to the expression data and the corresponding probability density is plotted as a function of the mean expression level. The black line corresponds to the probability density for transcripts without any 5'UTR introns. Genes with long 5'UTR introns are represented by the red line while genes with short 5'UTR introns are represented by the blue line. The vertical line represents the top 5% of mean expression level of all genes. <b>(d) </b>Total 5'UTR intron length of genes in different expression level categories. The width of the boxes represents the relative number of data points in each category. Transcripts in the top 1% and top 5% in expression level tend to have shorter 5'UTR introns.</p>
               </text>
               <graphic file="gb-2010-11-3-r29-2"/>
            </fig>
            <p>The enrichment for high expression in genes with short 5UIs held even when genes with the longest 25% of 5UIs were removed. In this case, the genes with the highest 1% and 5% expression were, respectively, 9.5 times and 2.5 times more likely to have short 5UIs as opposed to intermediate length 5UIs (25 to 75 percentile in length; Fisher's exact test, <it>P </it>= 1.53e-11 and <it>P </it>= 3.21e-10, respectively).</p>
            <p>The most highly expressed 5UI-bearing genes show a striking tendency to harbor short 5UIs. Of all 5UI-containing genes, 26% had a total 5UI length below 1.3 kb. By contrast, the corresponding fractions for genes in the top 5% and 1% by expression were 50% and 83%, respectively. We then separated short 5UI-containing genes into two groups: the most highly expressed genes (top 5% in expression); and the remaining genes. For the most highly expressed genes, the inter-quartile range of total 5UI length was 215 to 734 nucleotides compared with 289 to 870 nucleotides for the remaining genes (Figure <figr fid="F2">2b</figr>). Thus, the most highly expressed genes in humans are very strongly enriched for short 5UIs.</p>
            <p>Interestingly, no expression dependence was observed among genes with intermediate or long 5UIs: genes with long 5UIs (top 25th percentile in length) did not tend to be expressed less than those with the intermediate length 5UIs (Wilcoxon rank sum test, <it>P </it>= 0.25). Also, no statistically significant depletion for the long 5UI category was observed in either the top 1% or the top 5% expression group (Fisher's exact test, <it>P </it>= 0.29, odds ratio = 0.25, and <it>P </it>= 0.017, odds ratio = 0.58, respectively). Thus, we did not observe the inverse relationship between expression and total 5UI length that might have been expected under the energetic cost model.</p>
            <p>Next, we considered all RefSeq genes and asked whether having an intron in the 5'UTR has an effect on overall expression. We found no differences in 5UI representation in the top 1% or the top 5% of the mean expression groups. Furthermore, no difference was detected in the distribution of mean expression between genes with and without 5UIs (two-sided Wilcoxon rank sum test, <it>P </it>= 0.17). However, genes with short 5UIs were 1.8 times more likely to be in the top 5% and 3.3 times more likely to be in the top 1% in overall expression level than genes with no 5UIs (Fisher's Exact Test, <it>P </it>= 3.15e-08 and <it>P </it>= 7.57e-07, respectively) than genes with no 5UIs (Figure <figr fid="F2">2c</figr>). Thus, the presence of short 5UIs is correlated with high mean expression.</p>
            <p>The observed expression trends could reflect the influence of genomic features other than 5UIs. Yet, short 5UIs do not seem to predict a short total length of either non-5'UTR introns or 5'UTR exons (Figure <figr fid="F1">1c, d</figr>). Furthermore, when genes in the top 5% in mean expression were divided into two groups with respect to 5UI presence or absence, we observed no differences in total non-5'UTR intron length between genes with 5UIs and those that lack these introns (Wilcoxon rank sum test, <it>P </it>= 0.20, data not shown). Therefore, the tendency of highly expressed genes to have short 5UIs is unlikely to be confounded by the effects of 5'UTR exons or the remaining introns.</p>
            <p>For genes with the highest expression levels, these results are in contrast to the neutral model of 5UI evolution, which predicts that 5'UTR intronic length should not depend on expression level. These results are also not explained by the energetic cost hypothesis, which would predict that genes with the highest expression levels should be less likely to have 5UIs. In stark contrast to the predictions of each model, we found the most highly expressed genes to be significantly enriched in short 5UIs. Furthermore, the energetic cost hypothesis would also predict a linear decrease in the total 5UI length as a function of increasing gene expression. Yet, we found no overall differences with respect to 5UI length except for the most highly expressed genes. Even though a neutral model of 5UI evolution is plausible for most genes, our results for the most highly expressed genes are inconsistent with both neutral and energetic cost models (Figure <figr fid="F2">2d</figr>).</p>
            <p>We next used expression to assess the applicability to 5UIs of the other major hypothesis of intron evolution, the 'genome design model', which predicts that intermediate or long introns should be enriched in tissue-specific genes as a consequence of complex regulation. As originally outlined, the genome design model explicitly disregards 5UIs <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>; however, a direct corollary of this hypothesis is that genes with higher variance in expression across tissues should have intermediate or long introns in their 5'UTRs as well.</p>
            <p>We sought to address two potential sources of bias. First, gene expression levels vary greatly and variance is strongly correlated with mean expression. Therefore, we calculated the standard deviation-to-mean ratio (coefficient of variation or CV) <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, a normalized measure of dispersion, for each gene across all tissues. Second, due to technological limitations of expression arrays, precise measurement of expression level is more difficult for genes with low or no expression in a given tissue; therefore, artificially high variance in expression might be observed for genes with low mean expression across all tissues. We therefore calculated a robust measure of dispersion that minimizes this effect:</p>
            <p>
               <display-formula>
                  <graphic file="gb-2010-11-3-r29-i1.gif"/>
               </display-formula>
            </p>
            <p>where <it>CV</it><sub><it>x </it></sub>is the CV of expression of gene <it>x </it>across all tissues, <b><it>y</it></b><sub><it>x </it></sub>represents the vector of CV values for all 201 genes in a window centered around gene <it>x</it>, while <it>&#956;</it><sub>1/2 </sub>and <it>MAD </it>represent the median and median absolute deviation, respectively. As expected, genes with low expression tended to have much more variability across tissues (Figure <figr fid="F3">3a</figr>). Based on the observed trend line, the genes with the lowest 25% expression were removed from further analysis (Figure <figr fid="F3">3a</figr>). The remaining genes were sorted into three categories with respect to the total intronic 5'UTR length as before (short, 0 to 25%; intermediate, 25 to 75%; long, 75 to 100%). We found no significant differences between these groups with respect to inter-tissue variability as measured by the coefficient of variation (Figure <figr fid="F3">3b</figr>; Kruskal-Wallis rank sum test, df = 2, <it>P </it>= 0.23). We then examined the lengths of the introns as a function of variability in expression (Figure <figr fid="F3">3c</figr>). The genes with the highest 5% variability across tissues did not differ from the other genes with respect to their 5UI lengths (Wilcoxon rank sum test, <it>P </it>= 0.07, 95% confidence interval between -0.008 and 0.25), but the genes with highest 1% across-tissue variability tended to have slightly shorter 5UIs (Wilcoxon rank sum test, <it>P </it>= 0.006, 95% confidence interval between -0.67 and -0.11). Genes with short 5UIs were also overrepresented in the top 1% across-tissue variability category (Fisher's Exact Test, <it>P </it>= 0.005, odds-ratio = 2.7). Our results suggested that length of the 5UI was not a major factor in determining across-tissue variability but there was a preference for shorter 5UIs in the most variable genes.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Analysis of variability in expression across tissues as a function of the total 5'UTR intron length</p>
               </caption>
               <text>
                  <p><b>Analysis of variability in expression across tissues as a function of the total 5'UTR intron length</b>. <b>(a) </b>Transcripts with low mean expression have higher normalized expression variability. A standardized measure of the variability in gene expression across tissues was calculated and plotted against the natural logarithm of mean expression level. The black vertical line represents the lowest 25th percentile in mean expression. Since transcripts with low levels of mean expression tend to exhibit an artificially high variability in expression, they are removed from further analysis. <b>(b) </b>Boxplot of the coefficient of variation (standard deviation-to-mean ratio) of genes grouped by the total length of 5'UTR intron. The width of the boxes represents the relative number of data points in each category. There are no apparent differences between the three groups <b>(c) </b>Boxplot of log<sub>10 </sub>of total 5'UTR intron length of genes grouped by their across-tissue variability. Genes are divided into six categories depending on their coefficient of variation. Error bars correspond to standard deviation of the mean. No obvious dependence of expression variability to total 5UI length can be observed except for the most highly variable genes, which tend to have slightly shorter 5'UTR introns. <b>(d) </b>Boxplot of log<sub>10 </sub>of total 5'UTR intron length for gene groups defined by the number of tissues in which expression of each gene was detected. A gene was defined to have detectable expression in a given tissues if its expression was higher than the 25th percentile of mean expression of all genes. We found no differences in total 5'UTR intron length amongst the different gene groups. <b>(e) </b>Histogram of number of genes divided by the presence of 5'UTR introns and by the number of tissues in which expression was detected. The number of tissues in which expression was detected was independent of the presence of 5'UTR introns.</p>
               </text>
               <graphic file="gb-2010-11-3-r29-3"/>
            </fig>
            <p>Although our approach reliably captures across-tissue variability in gene expression, it disregards any potential effects of 5UI presence or length on how widely a gene is expressed. To consider the potential impact of such effects, we calculated the number of tissues in which expression was detected for each gene. Based on our analysis presented in Figure <figr fid="F3">3a</figr>, we defined a given gene as 'present' in a given tissue if its expression was greater than the 25th percentile in the distribution of mean expression over all tissues, calculated for all genes. Genes were placed into one of five classes according to the number of tissues in which they were present. No significant difference was detected amongst the corresponding five distributions of total 5UI length (Figure <figr fid="F3">3d</figr>; Kruskal-Wallis rank sum test, df = 4, <it>P </it>= 0.19). Furthermore, the distribution of number of tissues in which each gene was present did not differ between genes containing and lacking 5UIs (Figure <figr fid="F3">3e</figr>). These results clearly contradict predictions of the 'genome design' hypothesis, in that narrowly expressed genes did not show a greater tendency to contain 5UIs nor did they tend to have longer 5UIs. These results strongly suggest that the evolution of 5UIs is not driven primarily by the selective pressures proposed by the 'genome design' hypothesis.</p>
         </sec>
         <sec>
            <st>
               <p>Functional enrichment of Gene Ontology categories</p>
            </st>
            <p>Under the neutral model, genes with 5UIs should be uniformly distributed across functional groups. We used Gene Ontology (GO) function annotations to determine which groups of genes are enriched or depleted in 5UIs, if any. Two popular functional trend analysis tools, FuncAssociate <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> and GoStat <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, were used for this analysis. One key challenge was the translation of the gene identifiers from RefSeq RNA IDs to those used in the GO database. There are different approaches to this problem and the two software packages differ from each other in this respect. FuncAssociate uses the Synergizer <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> software to resolve the problem of synonyms while GoStat uses definitions in the UniGene database as well as the information provided in the GO databases. Both software packages yielded very similar results, suggesting that our general conclusions were independent of the methods of synonym resolution or enrichment calculation.</p>
            <p>A significant overrepresentation of genes with 5UIs was found in many regulatory pathways (Table <tblr tid="T1">1</tblr>). Non-receptor protein tyrosine kinases (NRTKs) formed the most highly overrepresented group, followed by genes involved in the regulation of actin organization, transcriptional regulators, and zinc ion binding proteins (Table <tblr tid="T1">1</tblr>). NRTKs lack transmembrane domains and therefore do not recognize extracellular ligands, unlike the majority of protein tyrosine kinases. Nevertheless, they play crucial roles in nearly all aspects of biology and are implicated in many cancers (reviewed in <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>). Among NRTKs, genes harboring 5UIs encode key regulatory kinases, such as the proto-oncogene tyrosine kinase <it>SRC</it>, c-src tyrosine kinase (<it>CSK</it>), janus kinases (<it>JAK</it>), spleen tyrosine kinase (<it>SYK</it>), tec protein tyrosine kinase (<it>TEC</it>), and Bruton agammaglobulinemia tyrosine kinase (<it>BTK</it>) among others.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Overrepresented Gene Ontology attributes for genes with 5'UTR introns</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>N</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>X</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>LOD</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>P</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>P-adj</it>
                           </b>
                        </p>
                     </c>
                     <c cspan="2" ca="left">
                        <p>
                           <b>Gene Ontology attribute</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>0.650</p>
                     </c>
                     <c ca="center">
                        <p>1.4e-05</p>
                     </c>
                     <c ca="center">
                        <p>0.0153</p>
                     </c>
                     <c ca="center">
                        <p>GO:0004715:</p>
                     </c>
                     <c ca="left">
                        <p>non-membrane spanning protein tyrosine kinase activity</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>0.644</p>
                     </c>
                     <c ca="center">
                        <p>7.5e-06</p>
                     </c>
                     <c ca="center">
                        <p>0.0073</p>
                     </c>
                     <c ca="center">
                        <p>GO:0051261:</p>
                     </c>
                     <c ca="left">
                        <p>protein depolymerization</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>31</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>0.633</p>
                     </c>
                     <c ca="center">
                        <p>2.1e-06</p>
                     </c>
                     <c ca="center">
                        <p>0.0017</p>
                     </c>
                     <c ca="center">
                        <p>GO:0051494:</p>
                     </c>
                     <c ca="left">
                        <p>negative regulation of cytoskeleton organization and biogenesis</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>48</p>
                     </c>
                     <c ca="center">
                        <p>0.560</p>
                     </c>
                     <c ca="center">
                        <p>9.2e-06</p>
                     </c>
                     <c ca="center">
                        <p>0.0085</p>
                     </c>
                     <c ca="center">
                        <p>GO:0032956:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of actin cytoskeleton organization and biogenesis</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>49</p>
                     </c>
                     <c ca="center">
                        <p>0.534</p>
                     </c>
                     <c ca="center">
                        <p>1.8e-05</p>
                     </c>
                     <c ca="center">
                        <p>0.0193</p>
                     </c>
                     <c ca="center">
                        <p>GO:0032970:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of actin filament-based process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>48</p>
                     </c>
                     <c ca="center">
                        <p>76</p>
                     </c>
                     <c ca="center">
                        <p>0.497</p>
                     </c>
                     <c ca="center">
                        <p>6.6e-07</p>
                     </c>
                     <c ca="center">
                        <p>0.0004</p>
                     </c>
                     <c ca="center">
                        <p>GO:0051493:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of cytoskeleton organization and biogenesis</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>39</p>
                     </c>
                     <c ca="center">
                        <p>62</p>
                     </c>
                     <c ca="center">
                        <p>0.491</p>
                     </c>
                     <c ca="center">
                        <p>8.3e-06</p>
                     </c>
                     <c ca="center">
                        <p>0.0078</p>
                     </c>
                     <c ca="center">
                        <p>GO:0016459:</p>
                     </c>
                     <c ca="left">
                        <p>myosin complex</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>43</p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                     <c ca="center">
                        <p>0.449</p>
                     </c>
                     <c ca="center">
                        <p>1.2e-05</p>
                     </c>
                     <c ca="center">
                        <p>0.0120</p>
                     </c>
                     <c ca="center">
                        <p>GO:0051129:</p>
                     </c>
                     <c ca="left">
                        <p>negative regulation of cellular component organization and biogenesis</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>0.404</p>
                     </c>
                     <c ca="center">
                        <p>1.1e-05</p>
                     </c>
                     <c ca="center">
                        <p>0.0114</p>
                     </c>
                     <c ca="center">
                        <p>GO:0033043:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of organelle organization and biogenesis</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>105</p>
                     </c>
                     <c ca="center">
                        <p>216</p>
                     </c>
                     <c ca="center">
                        <p>0.243</p>
                     </c>
                     <c ca="center">
                        <p>3.5e-05</p>
                     </c>
                     <c ca="center">
                        <p>0.0398</p>
                     </c>
                     <c ca="center">
                        <p>GO:0015629:</p>
                     </c>
                     <c ca="left">
                        <p>actin cytoskeleton</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1094</p>
                     </c>
                     <c ca="center">
                        <p>2356</p>
                     </c>
                     <c ca="center">
                        <p>0.232</p>
                     </c>
                     <c ca="center">
                        <p>5.7e-33</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0008270:</p>
                     </c>
                     <c ca="left">
                        <p>zinc ion binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>139</p>
                     </c>
                     <c ca="center">
                        <p>294</p>
                     </c>
                     <c ca="center">
                        <p>0.220</p>
                     </c>
                     <c ca="center">
                        <p>1.3e-05</p>
                     </c>
                     <c ca="center">
                        <p>0.0139</p>
                     </c>
                     <c ca="center">
                        <p>GO:0003779:</p>
                     </c>
                     <c ca="left">
                        <p>actin binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>996</p>
                     </c>
                     <c ca="center">
                        <p>2218</p>
                     </c>
                     <c ca="center">
                        <p>0.199</p>
                     </c>
                     <c ca="center">
                        <p>1.4e-23</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0006355:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of transcription, DNA-dependent</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1000</p>
                     </c>
                     <c ca="center">
                        <p>2233</p>
                     </c>
                     <c ca="center">
                        <p>0.197</p>
                     </c>
                     <c ca="center">
                        <p>3.4e-23</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0051252:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of RNA metabolic process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1061</p>
                     </c>
                     <c ca="center">
                        <p>2380</p>
                     </c>
                     <c ca="center">
                        <p>0.195</p>
                     </c>
                     <c ca="center">
                        <p>7.5e-24</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0045449:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of transcription</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1013</p>
                     </c>
                     <c ca="center">
                        <p>2273</p>
                     </c>
                     <c ca="center">
                        <p>0.193</p>
                     </c>
                     <c ca="center">
                        <p>1.2e-22</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0006351:</p>
                     </c>
                     <c ca="left">
                        <p>transcription, DNA-dependent</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1015</p>
                     </c>
                     <c ca="center">
                        <p>2277</p>
                     </c>
                     <c ca="center">
                        <p>0.193</p>
                     </c>
                     <c ca="center">
                        <p>9.5e-23</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0032774:</p>
                     </c>
                     <c ca="left">
                        <p>RNA biosynthetic process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>191</p>
                     </c>
                     <c ca="center">
                        <p>420</p>
                     </c>
                     <c ca="center">
                        <p>0.190</p>
                     </c>
                     <c ca="center">
                        <p>8.3e-06</p>
                     </c>
                     <c ca="center">
                        <p>0.0077</p>
                     </c>
                     <c ca="center">
                        <p>GO:0008092:</p>
                     </c>
                     <c ca="left">
                        <p>cytoskeletal protein binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1078</p>
                     </c>
                     <c ca="center">
                        <p>2436</p>
                     </c>
                     <c ca="center">
                        <p>0.189</p>
                     </c>
                     <c ca="center">
                        <p>6.6e-23</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0019219:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1106</p>
                     </c>
                     <c ca="center">
                        <p>2512</p>
                     </c>
                     <c ca="center">
                        <p>0.185</p>
                     </c>
                     <c ca="center">
                        <p>1.3e-22</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0010468:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of gene expression</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1189</p>
                     </c>
                     <c ca="center">
                        <p>2713</p>
                     </c>
                     <c ca="center">
                        <p>0.183</p>
                     </c>
                     <c ca="center">
                        <p>1.6e-23</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0031323:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of cellular metabolic process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1088</p>
                     </c>
                     <c ca="center">
                        <p>2477</p>
                     </c>
                     <c ca="center">
                        <p>0.182</p>
                     </c>
                     <c ca="center">
                        <p>8.6e-22</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0006350:</p>
                     </c>
                     <c ca="left">
                        <p>transcription</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1211</p>
                     </c>
                     <c ca="center">
                        <p>2791</p>
                     </c>
                     <c ca="center">
                        <p>0.175</p>
                     </c>
                     <c ca="center">
                        <p>4.7e-22</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0019222:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of metabolic process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>989</p>
                     </c>
                     <c ca="center">
                        <p>2267</p>
                     </c>
                     <c ca="center">
                        <p>0.174</p>
                     </c>
                     <c ca="center">
                        <p>1.2e-18</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0003677:</p>
                     </c>
                     <c ca="left">
                        <p>DNA binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1507</p>
                     </c>
                     <c ca="center">
                        <p>3515</p>
                     </c>
                     <c ca="center">
                        <p>0.172</p>
                     </c>
                     <c ca="center">
                        <p>2.9e-25</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0003676:</p>
                     </c>
                     <c ca="left">
                        <p>nucleic acid binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1212</p>
                     </c>
                     <c ca="center">
                        <p>2825</p>
                     </c>
                     <c ca="center">
                        <p>0.165</p>
                     </c>
                     <c ca="center">
                        <p>5.5e-20</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0046914:</p>
                     </c>
                     <c ca="left">
                        <p>transition metal ion binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1682</p>
                     </c>
                     <c ca="center">
                        <p>4053</p>
                     </c>
                     <c ca="center">
                        <p>0.147</p>
                     </c>
                     <c ca="center">
                        <p>1e-20</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0050794:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of cellular process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1157</p>
                     </c>
                     <c ca="center">
                        <p>2784</p>
                     </c>
                     <c ca="center">
                        <p>0.136</p>
                     </c>
                     <c ca="center">
                        <p>5.6e-14</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0016070:</p>
                     </c>
                     <c ca="left">
                        <p>RNA metabolic process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1758</p>
                     </c>
                     <c ca="center">
                        <p>4305</p>
                     </c>
                     <c ca="center">
                        <p>0.134</p>
                     </c>
                     <c ca="center">
                        <p>3.7e-18</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0050789:</p>
                     </c>
                     <c ca="left">
                        <p>regulation of biological process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1772</p>
                     </c>
                     <c ca="center">
                        <p>4364</p>
                     </c>
                     <c ca="center">
                        <p>0.129</p>
                     </c>
                     <c ca="center">
                        <p>4.2e-17</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0005634:</p>
                     </c>
                     <c ca="left">
                        <p>nucleus</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1463</p>
                     </c>
                     <c ca="center">
                        <p>3584</p>
                     </c>
                     <c ca="center">
                        <p>0.127</p>
                     </c>
                     <c ca="center">
                        <p>1.1e-14</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>GO:0006139:</p>
                     </c>
                     <c ca="left">
                        <p>nucleobase, nucleoside, nucleotide and nucleic acid metabolic process</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><it>N </it>represents the number of transcripts in the RefSeq collection that have both a 5'UTR intron and a given GO attribute; <it>X </it>represents the total number of transcripts having that GO attribute. For each attribute, <it>P </it>is the nominal <it>P</it>-value obtained from a one-tailed Fisher's Exact Test that calculates the probability that at least <it>N </it>transcripts have the particular attribute given the number of genes with 5'UTR introns. This nominal <it>P</it>-value is adjusted for multiple hypothesis testing to yield <it>P-adj </it>using a resampling approach that accounts for dependencies among the tested hypotheses (see <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> for precise procedure). The table is sorted in descending order by the log<sub>10 </sub>of the odds ratio (<it>LOD </it>score), where <inline-formula><graphic file="gb-2010-11-3-r29-i2.gif"/></inline-formula> and <it>M </it>is the number of all genes, <it>e </it>is a pseudocount of 0.5 and <it>q </it>is the query set size. All attributes with <it>LOD </it>> 0.125 and a <it>P-adj </it>&lt; 0.05 are reported.</p>
               </tblfn>
            </tbl>
            <p>To gain insight into the evolution of NRTK 5UIs, we identified orthologous genes in mouse and rat genomes corresponding to each human NRTK. We collected 5'UTR features for these genes in each genome using RefSeq annotations (Additional file <supplr sid="S2">2</supplr>). More widely studied organisms tend to have more accurate transcript structures and include many more splice variants in the RefSeq collection. For example, 18 human genes were represented by more than one transcript, while only four mouse and no rat NRTKs had more than one splice variant. The paucity of transcripts in some mammalian species is more likely to have arisen from limited testing rather than biology, given recent studies suggesting that alternative splicing is ubiquitous across several taxa <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <text>
                  <p>Complete list of 5'UTR intron lengths for the human non-receptor tyrosine kinases and their orthologs in mouse and rat genomes. This file contains the RefSeq IDs and gene symbols for all human NRTKs and their mouse and rat orthologs. For all transcripts, 5'UTR intron lengths are given.</p>
               </text>
               <file name="gb-2010-11-3-r29-S2.TXT">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>UTRs are also generally less well defined in less intensively studied organisms. For example, <it>ABL2</it>, <it>BTK</it>, <it>FRK </it>and <it>SRC </it>all lack defined 5'UTR boundaries in the rat RefSeq collection, even though EST evidence suggests that <it>SRC</it>, <it>BTK </it>and <it>ABL2 </it>all have 5'UTR-containing transcripts (data not shown). Another current limitation is ambiguity in identifying the specific branch in which a given deletion or insertion event took place. Despite these shortcomings, a comparison of orthologs already provides insight into the dynamics of the evolution of 5UIs in NRTK genes.</p>
            <p>When every ortholog of a given NRTK had at least one annotated 5UI, the lengths of those introns were generally highly correlated (Figure <figr fid="F4">4a</figr>). Given the number of different splice variants for each human gene, we used three different approaches to calculate the 5UI length for each gene. We either used the mean length of splice variants with non-zero 5UI lengths, or picked the variant with the longest 5UIs, or the one whose length was closest to its ortholog in either of the rat or mouse genomes. All three measures resulted in high correlation overall between 5UI lengths across species (PCC ranged between 89 and 91% for human-mouse and 79 and 89% for human-rat comparisons; <it>P </it>&lt; 0.0001 for all; Figure <figr fid="F4">4a</figr>). As expected from evolutionary distances, the highest correlation in 5UI lengths was observed between rat and mouse orthologs of NRTKs (PCC = 93%, <it>P </it>= 1.4e-07).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Comparative genomics of 5'UTR introns within non-receptor tyrosine kinases</p>
               </caption>
               <text>
                  <p><b>Comparative genomics of 5'UTR introns within non-receptor tyrosine kinases</b>. Several human NRTKs have multiple splice isoforms and for these we used three different methods for calculating total 5'UTR intron length: mean of 5'UTR intron length for isoforms with 5'UTR introns (HS_Mean); longest total 5'UTR intron length (HS_Longest); 5'UTR intron length most similar to its ortholog in the genome of interest (HS_Closest). <b>(a) </b>Heatmap of length correlation (considering genes with non-zero 5'UTR intron lengths) was plotted for the specified comparisons. As expected from the evolutionary distances between the analyzed species, the highest correlation (93%) was observed between mouse and rat NRTKs. <b>(b) </b>For each mouse ortholog of a human NRTK, the heatmap depicts the changes in total 5'UTR intron length (color reflects log<sub>10 </sub>of total 5'UTR intron length). The histogram above the color scale summarizes the distribution of changes in 5'UTR intron length. A 5'UTR intron may be present in mouse but not in the compared species (light blue) or vice versa (dark blue). Comparisons require an annotated 5'UTR for each ortholog, and were therefore not possible in some cases (white). <b>(c) </b>Same as (b) but substituting 'rat' for 'mouse'. <b>(d) </b>Human genomic region containing the 5'UTR and first few coding exons (UCSC Genome Browser view). '7X Regulatory Potential', for which higher scores indicate a greater potential for harboring regulatory sequence elements, was calculated using alignments of seven mammalian genomes as previously described <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
               </text>
               <graphic file="gb-2010-11-3-r29-4"/>
            </fig>
            <p>Despite a generally strong correlation in 5UI length among orthologs, some sets of orthologs had a widespread distribution of length changes. While the total 5UI length of <it>FES </it>changed by less than five nucleotides in all possible comparisons, rat <it>PTK2 </it>and mouse <it>PTK2 </it>5UIs differed by approximately 63.5 kb (Figure <figr fid="F4">4b, c</figr>). The length conservation observed for the <it>FES </it>5UI is notably consistent with the high regulatory potential previously calculated for this 5UI <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> (Figure <figr fid="F4">4d</figr>). More broadly, introns containing regulatory regions might be expected to have high length conservation.</p>
            <p>When each orthologous group of NRTKs was analyzed, we found variability with respect to presence/absence of 5UIs in some of these groups. For example, <it>STYK1 </it>and <it>WEE1 </it>both had 5UIs in humans, but not in mouse or rat (Figure <figr fid="F4">4b, c</figr>). In the case of human <it>WEE1</it>, two transcripts were identified in the human RefSeq collection - while one variant had a 512-nucleotide 5UI, the other variant lacked 5UIs entirely. This observation suggested the possibility that intron-containing variants might be present in mouse and rat without being represented in the RefSeq transcript collection. Indeed, we found EST evidence that rat <it>WEE1 </it>has a splice variant that includes a 5UI [GenBank:<ext-link ext-link-id="CK603528.1" ext-link-type="gen">CK603528.1</ext-link>]. On the other hand, mouse <it>FRK </it>(Figure <figr fid="F4">4b</figr>) and rat <it>TXK </it>(Figure <figr fid="F4">4c</figr>) had 5UIs while their orthologs did not. We also observed several NRTKs having 5UIs in two of the species but not in the other one. For example, both human and mouse orthologs of <it>LCK</it>, <it>BTK</it>, <it>CSK</it>, <it>TNK1</it>, and <it>YES1 </it>had annotated 5UIs, while both human and rat orthologs of <it>JAK3 </it>and <it>TEC </it>had annotated 5UIs (Figure <figr fid="F4">4b, c</figr>). Our results suggest that NRTK 5UIs are frequently conserved, a conclusion that would be further strengthened should the apparent gain/loss events be attributable to incomplete transcript annotation.</p>
            <p>The appearance of 5UIs in most human NRTKs (Table <tblr tid="T1">1</tblr>) suggested the potential for a common regulatory mechanism acting via shared motifs. To search for shared and conserved motifs in these introns, human NRTK 5UI sequences were located in human-to-mouse and human-to-rat genome alignments. For 37 out of 42 human NRTKs, more than 10% of the 5UIs could be aligned to both genomes; only these conserved fragments were used for motif finding. Overrepresented RNA and DNA motifs were sought in these aligned sequences using the PhyloGibbs software <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. In our search for overrepresented RNA elements, we identified two complementary motifs, so that the motif in these 5UIs is more likely to be relevant at the DNA level. A representative DNA motif (Figure <figr fid="F5">5a</figr>) with the highest log-posterior-probability was compared to the TRANSFAC v11.3 database of known transcription factor binding sites and to a list of conserved human predicted motifs <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> using the STAMP website <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> (Figure <figr fid="F5">5b, c</figr>). In both comparisons, the known binding site motif of the MAZ transcription factor was the most likely match. However, this does not rule out the possibility of this motif being the target of another DNA binding protein.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Characterization of an 8-nucleotide DNA motif in the 5'UTR of human NRTKs</p>
               </caption>
               <text>
                  <p><b>Characterization of an 8-nucleotide DNA motif in the 5'UTR of human NRTKs</b>. <b>(a) </b>Representative motif and its reverse complement. <b>(b) </b>Comparison of the representative motif to the TRANSFAC v11.3 database of known transcription factor binding sites. <b>(c) </b>Comparison of the representative motif to a list of conserved human predicted motifs <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. STAMP website was used for the comparisons <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. The default ungapped Smith-Waterman alignment was used and the <it>P</it>-value was calculated using the methods of Sandelin and Wasserman <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>.</p>
               </text>
               <graphic file="gb-2010-11-3-r29-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Comparison between 5'UTR and 5'-proximal coding introns</p>
            </st>
            <p>5UIs are, by definition, the most 5'-proximal introns in their transcript. However, not all 5'-proximal introns need lie within the 5'UTR. We sought to understand whether the observed functional properties of 5UIs were shared with 5'-proximal coding region introns (5PCIs). Given that the median position of the first 5UI was approximately 130 nucleotides away from the transcription start site regardless of the number of 5UIs <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, we defined the genes without a 5UI but with a coding region intron within 150 nucleotides of the transcription start site as 5PCI-containing genes. This criterion resulted in 24% of 5UI-lacking genes having a coding region intron that was deemed to be a 5PCI.</p>
            <p>We next used GO annotations to compare the functional properties of 5UI-lacking genes with 5PCIs to those without 5PCIs. We observed the strongest enrichment of 5PCIs among genes in the following functional groups: MHC protein complex 1, cytosolic ribosome, hemoglobin complex, glutathione transferase activity, and transmembrane transporters (Additional file <supplr sid="S3">3</supplr>). This result contrasts the observed enrichment of 5UIs in regulatory genes. The differences in the enrichment profiles suggest that distinct functional groups of genes prefer early introns in either the 5'UTR or the coding region but not in both.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <text>
                  <p>Overrepresented GO attributes for genes with 5'-proximal coding introns. This file contains the table of overrepresented GO attributes for genes with 5'-proximal coding introns. The methods and legend are the same as in Table <tblr tid="T1">1</tblr>.</p>
               </text>
               <file name="gb-2010-11-3-r29-S3.PDF">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>To assess the possible effect of 5' proximity on gene expression, we analyzed microarray data from the human gene expression atlas for 5UI-lacking genes. We found that genes with 5PCIs were more highly expressed on average (one-sided Wilcoxon rank sum test, <it>P </it>= 6e-08; Figure <figr fid="F6">6</figr>). We also observed a 2.3- and 3.7-fold enrichment for genes with 5PCIs among the most highly expressed top 5% and 1% of genes, respectively (Fisher's Exact Test, <it>P </it>= 4e-15 and <it>P </it>= 4e-09, respectively; Figure <figr fid="F6">6</figr>). The correlation between high expression and 5PCI presence was evident without any consideration of these introns' lengths. In contrast, no expression difference was observed between genes with or without 5UIs, on average, but short 5UIs were highly enriched among the most highly expressed genes (Figure <figr fid="F2">2c</figr>). These results suggest that early introns (both 5PCIs and 5UIs) are associated with the most highly expressed genes, but that this correlation is limited to short introns for 5UIs.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>The effect of 5'-proximal coding intron presence on gene expression</p>
               </caption>
               <text>
                  <p><b>The effect of 5'-proximal coding intron presence on gene expression</b>. <b>(a) </b>Smoothed histogram of the mean expression level with respect to presence/absence of 5'-proximal coding region introns (5PCIs). A kernel density estimator was fitted to the expression data and the corresponding probability density is plotted as a function of the mean expression level. The black line corresponds to the probability density for transcripts without any 5'UTR introns or any 5PCIs. The red line represents the probability density for 5'UTR intronless transcripts that have 5PCIs. The vertical line represents the top 5% of mean expression level of all genes without 5'UTR introns.</p>
               </text>
               <graphic file="gb-2010-11-3-r29-6"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>We compared the expression patterns and functional annotations of genes with and without 5UIs. We found that the most highly expressed genes reveal a strong enrichment for having short 5UIs as opposed to having either no 5UIs or longer 5UIs. This effect was specific to genes with the highest expression levels and no relationship between length and expression level was observed for genes with intermediate or long introns (Figure<figr fid="F2">2d</figr>). These results are contrary to the energetic cost model <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, which predicts that genes with no 5UIs will be more highly represented among those with the highest expression levels. Because expression reflects both production and degradation rates of mRNAs, our results suggest that short 5UIs tend to either enhance transcription or stabilize mature mRNAs.</p>
         <p>The prevalence and the significance of these intron-dependent mechanisms of transcriptional enhancement at a genome-wide level are poorly understood in mammalian systems. There are a few examples in mammals of increased transcription due to the proximity of an intron to the transcription start site <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp>, and these can be divided into two major categories with respect to the mechanism of enhanced transcription. The first mechanism is at the DNA level and involves the presence of activating transcription regulatory elements in the intron or the modulation of nucleosome positioning to make the promoter more accessible <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. Similarly, 5UIs and other 5'-proximal introns in plants were shown to enhance gene expression at the transcriptional level in a position-specific manner <abbrgrp><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr></abbrgrp>. The second mechanism is at the mRNA level, obviously related to splicing. <it>In vitro </it>studies have linked position-specific splicing and transcription enhancement mechanistically by demonstrating a direct interaction between the spliceosomal U small nuclear ribonucleoproteins with transcription elongation factors <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>.</p>
         <p>Our study thus suggests a distinction between 5UIs and 5PCIs with respect to their effects on gene expression. A splicing-dependent explanation might be the most compatible with the overall higher expression of genes with early coding-region introns compared to those without such introns. In contrast, even though a splicing-dependent effect may exist for 5UIs as well, the most highly expressed genes are highly enriched in having short 5UIs (approximately less than 1 kb in length), but 5UI presence or absence alone (without considering 5UI length) does not correlate with gene expression. Therefore, for 5UIs, short intron length seems to be a more important predictor of a high expression level than the presence or absence of 5UIs.</p>
         <p>Given the inconsistency between our observations and the energetic cost hypothesis, we suggest two alternative models of 5UIs' effect on gene expression. The first model is that splicing-dependent enhancement in gene expression is influenced not only by the position of an intron, but also its size. The second model is that transcriptional regulatory proteins are recruited as a result of the presence of DNA elements, which in turn enhance expression level. This process could be restricted spatially, such that if the distance between the regulatory element and the transcription start site is long, then the enhancement should be less pronounced. Hence the genes with the highest expression levels might be under selective pressure to keep their introns short in order to retain their enhancer elements closer to the transcription start site. In this scenario, one can further imagine these elements to function in a tissue-specific regulatory mechanism if the recruited factors are themselves tissue-specific. Such an enhancer, located in the first intron of the mammalian acetylcholinesterase gene, was previously found to mediate the tissue-specific expression of this gene <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. Another example of tissue-specific gene expression enhancement mediated by a 5UI was reported for the rice gene <it>rubi3 </it><abbrgrp><abbr bid="B57">57</abbr></abbrgrp>.</p>
         <p>The pressure to maintain regulatory elements in introns is also the central idea of the genome design model, and we tested the applicability of this hypothesis to 5UIs by analyzing genes with tissue-dependent variability in gene expression. As the most proximal intron to the transcription start site has been shown to contain more regulatory elements <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>, the genome design model might be expected to apply to 5UIs as well as coding region introns. Specifically, the genome design hypothesis predicts that tissue-specific or highly variable genes contain many regulatory elements in their introns and hence have longer introns in general <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. However, we found no relationship between variability in expression across tissues and the length of the 5UI (Figure <figr fid="F3">3a, b</figr>). Furthermore, neither 5'UTR presence nor length was correlated with how widely a gene was expressed. Most known nucleotide-level regulatory elements are short (&lt;15 nucleotides), and most known <it>cis</it>-regulatory modules could be contained within even a short (&lt;1 kb) 5UI. Therefore, 5UIs need not be particularly long to enable complex and conserved regulation via <it>cis</it>-regulatory elements. Our results support the idea that the genome design model is not likely to be the most useful guide for understanding the evolved lengths of 5UIs.</p>
         <p>Finally, we considered whether certain classes of genes preferentially include 5UIs, and whether 5UIs contain regulatory elements. We found that genes with regulatory functions are enriched for 5UIs. The non-receptor tyrosine kinases, which play fundamental roles in all aspects of cell biology and signal transduction, were the most strongly enriched gene category. We identified a conserved DNA motif in the 5UIs of many non-receptor tyrosine kinases that could function by recruiting transcription factors. This recruitment might lead to tissue- or condition-specific regulation of NRTKs. For example, in the gene encoding Bruton's tyrosine kinase (a non-receptor tyrosine kinase), an SP1 transcription factor binding site was identified within the 5UI <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. Furthermore, a point mutation in the 5UI region was shown to be associated with X-linked agammaglobulinemia, suggesting a functional role for this intron <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>.</p>
         <p>It is worth considering other forms of selection pressure that might affect 5'UTRs and therefore 5UIs. Upstream AUGs (uAUGs) tend to decrease translational efficiency, so that highly expressed genes should tend to avoid uAUGs in exons. On the other hand, intronic uAUGs are spliced out before the mature message encounters the cytoplasmic translation machinery; hence, they should not have a similar effect. The negative selection pressure against exonic uAUGs that tends to favor increased intronic sequence content within 5'UTRs <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> should be expected to be most pronounced for the most highly expressed genes. Our observation that the most highly expressed genes are enriched in having short 5UIs runs contrary to this expectation. Furthermore, shorter 5UIs did not imply shorter 5'UTR exon lengths, which might complicate our expectation for uAUG effects. Thus, models based solely on uAUG-based selection cannot explain the overrepresentation of short 5UIs among the most highly expressed genes.</p>
         <p>Alternative splicing has emerged as a fundamental mechanism of regulation and expansion of the proteome, with nearly 95% of all genes thought to be alternatively spliced in mammals <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. Tissue-dependent alternative splicing within 5'UTRs is common and can be functionally important. For example, aberrant splicing of 5'UTRs of <it>BRCA1 </it>and <it>ER&#946; </it>was recently implicated in carcinogenesis <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. Whether these different splice variants play any regulatory role is unknown in all but a few cases. A plausible mechanism for the potential impact of alternative splicing in 5'UTRs is an effect on translation efficiency through differential inclusion of uAUGs.</p>
         <p>The functional importance of alternative splicing in 5'UTRs is exemplified by human <it>NOD2</it>, which is associated with Crohn's disease. Only a subset of <it>NOD2</it>'s multiple splice variants include the uAUGs in the mature mRNA, and these have decreased translation efficiency <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. Alternative splicing of 5'UTRs can also affect mRNA secondary structure. In the ETS domain transcription factor <it>ELK1</it>, for example, a facultative secondary structure modulates translation initiation <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. Yet another connection between splicing and translation is the deposition of the exon junction complex following splicing, which induces translation through an interaction with the mammalian target of rapamycin (mTOR) signaling pathway <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. The position or the sequence composition of the intron could potentially affect this splicing-dependent enhancement of translation efficiency by the mTOR pathway. These mechanisms of additional regulation by alternative splicing of 5UIs may underlie our observation that these introns are enriched in regulatory genes. Given that regulatory genes must themselves be precisely governed, additional means of regulation may allow for greater control, flexibility or complexity. Future work will need to address the full genome-wide functional implications and importance of alternative splicing of 5UIs.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>Our results highlight the functional importance of 5'UTR introns. Existing models predicting selective effects, such as avoidance of uAUGs, minimization of transcriptional cost, or accumulation of regulatory elements, do not suffice to explain results from our genome-scale analysis of 5UIs. Given 5UI enrichment and depletion in specific functional categories of genes, and the potential ability of 5UIs to enhance gene expression, a complex interplay of multiple selective forces appears to have influenced the evolution of this distinct class of introns.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>A collection of genes with 5'UTR introns</p>
            </st>
            <p>NCBI's human Reference Gene Collection (RefSeq) <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> and the associated annotation table were downloaded from the UCSC genome browser <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>, genome assembly of May 2004. The annotation table was parsed using the Galaxy website <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> (as of June 2007) to obtain 5UI coordinates. Specifically, we extracted all introns annotated to lie between two 5'UTR exons. Then we removed all the cases where another splice variant was present in the RefSeq collection such that any sequence within the intron was part of the coding region. Hence, all the introns in our final dataset were strictly present in the 5'UTR according to the annotation of RefSeq genes. 5'UTR exon coordinates were similarly retrieved as of June 2007. Recent studies suggest that nearly all human genes are alternatively spliced <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. However, it is not clear what fraction of these events have biological significance as opposed to reflecting random noise associated with the less than perfect fidelity of the splicing machinery. Only when multiple independent sources of evidence support tissue-dependent alternative splicing can we be confident that these variants have real biological significance. Therefore, we used RefSeq transcripts, which are (unlike ESTs) manually curated and supported by multiple sources of evidence. For the comparisons between total lengths of 5UIs and the rest of the introns, we extracted coordinates of all non-5'UTR introns from the RefSeq annotation table (as of May 2009). A complete list of the genomic coordinates of 5UIs examined in this study is available as Additional file <supplr sid="S1">1</supplr>.</p>
         </sec>
         <sec>
            <st>
               <p>Microarray data and analysis</p>
            </st>
            <p>The microarray data were downloaded from Gene Expression Atlas, which included expression data from 79 different tissues in humans <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. We used the gcRMA-normalized data from the Affymetrix U133a and GNF1H arrays. Synergizer <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> was used to associate RefSeq genes with probe sets on the U133a array and custom Perl v5.8.8 scripts were used to parse the GNF1H annotation table (available on the Gene Expression Atlas website). The resulting correspondences of RefSeq IDs to probe sets on the GNF1H and U133a microarrays were merged to obtain a final mapping. Where multiple probe sets corresponded to a single RefSeq ID, the arithmetic mean of the expression values of all the probes was used to obtain a representative expression level for that RefSeq ID in each tissue. A single region of the genome can correspond to more than one RefSeq ID due to alternative splice variants and/or alternative promoters, and there were cases of a single probe set corresponding to multiple RefSeq IDs. To avoid overweighting such regions, we removed RefSeq IDs such that there were no duplicates. The representative RefSeq ID from each such probe set was chosen uniformly at random. For each gene with a 5UI, we calculated the mean expression level across all tissues and divided the genes into three groups with respect to total 5'UTR intronic length: short, 0 to 25%; intermediate, 25 to 75%; long, 75 to 100% in length. All expression analysis was performed using the R software package v2.6.0. In addition, the 'hexbin' <abbrgrp><abbr bid="B67">67</abbr></abbrgrp> and 'zoo' <abbrgrp><abbr bid="B68">68</abbr></abbrgrp> packages for the R platform were used.</p>
         </sec>
         <sec>
            <st>
               <p>Functional enrichment of Gene Ontology categories</p>
            </st>
            <p>GoSTAT <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> and FuncAssociate <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> were used for functional trend analysis. We restricted the space of genes to all genes in the RefSeq collection because we used annotations in this collection to determine the set of genes with 5UIs. We used the RefSeq IDs as input for analysis with both programs. FuncAssociate uses Synergizer <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> to resolve the synonyms using Ensembl as the authority. To quantify the effect size, all the statistically significant GO categories that are enriched in the genes with introns are sorted according to their log<sub>10 </sub>odds ratio. All reported log odds ratios were obtained from FuncAssociate. Similar results were obtained using GoSTAT (data not shown).</p>
         </sec>
         <sec>
            <st>
               <p>Comparative genomic analysis of non-receptor tyrosine kinases</p>
            </st>
            <p>To study the evolution of 5UI presence and length among NRTKs, we first identified orthologs of human NRTKs in the mouse and rat genomes. We used NCBI's Homologene Release 64 <abbrgrp><abbr bid="B69">69</abbr></abbrgrp> (as of September 2009) to identify 'true' orthologous genes. Based on a recent evaluation of different approaches, Homologene showed greater specificity than other comparable orthology sources for the purposes of detailed phylogenetic and functional analysis <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>. We extracted the corresponding RefSeq IDs for each of the human NRTKs, and their mouse and rat orthologs. Then, we downloaded the RefSeq annotation tables for current genome builds (hg19, mm9, and rn4; as of September 2009) and used these annotations to determine 5UI lengths. All statistical analyses were performed using R software package v2.6.0. The raw data used in this analysis of human NRTKs are provided in Additional file <supplr sid="S2">2</supplr>.</p>
         </sec>
         <sec>
            <st>
               <p>Motif discovery</p>
            </st>
            <p>The coordinates for the non-receptor tyrosine kinase genes that harbor introns were converted to human genome build hg18 using the LiftOver utility tool obtained from the UCSC Genome Browser website <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>. If there were known alternative splice variants in the RefSeq database, the longest intron was used for motif discovery purposes. Multiple alignment blocks for the human, mouse, and rat genomes (builds hg18, mm8, and rn4, respectively) were extracted from the 17-way multiZ alignment at the UCSC Genome Browser. These alignment blocks were merged using the Stitch MAF blocks utility on the Galaxy website <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> to obtain a final alignment of the human non-receptor tyrosine kinases to the mouse and rat orthologs. We obtained alignments that covered more than 10% of the length of the 5UIs for 37 human NRTKs, and excluded the other five introns from the subsequent motif discovery steps.</p>
            <p>PhyloGibbs v1.2 was used in motif finding <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B72">72</abbr></abbrgrp>. Different phylogenetic trees were tested but they did not significantly affect the results (not shown); therefore, all the results we report here were generated using the (hg18:0.5,(mm8:0.8, rn4:0.9):0.6) phylogeny specified in Newick tree format. Both RNA and DNA motifs (that is, forward strand only and both strands, respectively) were searched and the intronic sequences were used to define the background nucleotide distribution of the region to account for differences in nucleotide composition of 5UIs. The resulting motifs were represented by position-specific scoring matrices. The STAMP <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> web site was used to find similar motifs in the TRANSFAC v11.3 database as well as in a comparative genomics study in humans <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. Default parameters were used in all comparisons.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of the total exonic/intronic length and 5PCIs</p>
            </st>
            <p>To determine the lengths and positions of various genomic features, we first compiled a list of all RefSeq IDs. A single ID can correspond to multiple transcripts either that are expressed from the same or different regions in the genome. Such IDs can be associated with different transcript structures, and are therefore removed from further analysis. RefSeq IDs corresponding to genes in the hypervariable hla-locus were similarly represented multiple times in the RefSeq collection. In these cases, only the version in the reference genome was retained for further analysis.</p>
            <p>After these initial filters, we calculated total lengths of 5UIs, 5'UTR exons, and other introns for each remaining RefSeq transcript. The position of the first coding intron was determined using the coordinates of all introns from the RefSeq annotation table that was retrieved as of May 2009. There were multiple identifiers for different splice variants that were transcribed from the same genomic location in the RefSeq collection. To avoid any systematic biases, we compared three different approaches in selecting RefSeq transcripts for further analysis. First, we kept all transcripts regardless of how many were transcribed from a given loci. Second, we determined equivalence classes of RefSeq transcripts, such that two IDs were in the same set if their transcription intervals (from start to stop position) overlapped by more than 20 base pairs. Then, we randomly removed RefSeqs transcripts such that only a single representative transcript remained for each equivalence class. Third, exact duplicates with respect to the 5'UTR were removed. Specifically, if two or more RefSeq IDs had the exact same 5'UTR, a single identifier was selected as a representative for that particular region. Splice variants that differ in their 5'UTR were not removed because these provide additional information about the lengths of 5'UTR introns and exons. All three methods yielded similar results and led to identical conclusions. Therefore, only one representative method is shown in the figures. The third method conveys the most information when discussing total 5UI lengths and hence was used in Figure <figr fid="F1">1a</figr>. By contrast, considering one representative from each transcriptional unit is more relevant when analyzing the correlation between two genomic features. Hence, the second method was used for Figures <figr fid="F1">1c, d</figr>.</p>
            <p>For the specific GO categories used in our analysis, all the genes in a given category were retrieved from the human GOA database <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. The corresponding RefSeq identifiers were determined using the Synergizer software <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. Total exonic length and intronic length were calculated for all these genes as described above.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>5PCI: 5' proximal coding region intron; 5UI: 5'UTR intron; CV: coefficient of variation; EST: expressed sequence tag; GO: Gene Ontology; kb: kilobase; NRTK: non-receptor protein tyrosine kinase; PCC: Pearson Correlation Coefficient; uAUG: upstream AUG; UTR: untranslated region.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>CC carried out all analyses, designed the study and drafted the manuscript. AD contributed to the generation of the 5UI dataset, provided guidance with all the analyses and contributed to the writing of the manuscript. JCM participated in the design of the study. GFB helped with functional enrichment analysis and contributed to the writing of the manuscript. FPR conceived and supervised the study, and contributed to the writing of the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank the West Quad Computing Group at Harvard Medical School for research computing support, Murat Ta&#351;an, and Murat &#199;okol for helpful discussions. FPR was supported in part by the National Institutes of Health (NIH; grants MH087394, HG003224, HG007115, HL081341, NS035611, and HG004233) and the Keck Foundation, and by a Fellowship from the Canadian Institute for Advanced Research. JCM was supported by an NIH National Research Service Award Fellowship from the National Human Genome Res Institute (HG004825).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Origins and evolution of spliceosomal introns.</p>
            </title>
            <aug>
               <au>
                  <snm>Rodriguez-Trelles</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Tarrio</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ayala</snm>
                  <fnm>FJ</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>40</volume>
            <fpage>47</fpage>
            <lpage>76</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.40.110405.090625</pubid>
                  <pubid idtype="pmpid" link="fulltext">17094737</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The evolution of spliceosomal introns: patterns, puzzles and progress.</p>
            </title>
            <aug>
               <au>
                  <snm>Roy</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>211</fpage>
            <lpage>221</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16485020</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Sorokin</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Mirkin</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1512</fpage>
            <lpage>1517</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(03)00558-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12956953</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Patterns of intron gain and conservation in eukaryotic genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Carmel</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2007</pubdate>
            <volume>7</volume>
            <fpage>192</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2148-7-192</pubid>
                  <pubid idtype="pmcid">2151770</pubid>
                  <pubid idtype="pmpid" link="fulltext">17935625</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The origins of genome complexity.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Conery</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>1401</fpage>
            <lpage>1404</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1089370</pubid>
                  <pubid idtype="pmpid" link="fulltext">14631042</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The correlation between intron length and recombination in drosophila. Dynamic equilibrium between mutational and selective forces.</p>
            </title>
            <aug>
               <au>
                  <snm>Comeron</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Kreitman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2000</pubdate>
            <volume>156</volume>
            <fpage>1175</fpage>
            <lpage>1190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461334</pubid>
                  <pubid idtype="pmpid">11063693</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Why do genes have introns? Recombination might add a new piece to the puzzle.</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>172</fpage>
            <lpage>175</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(01)02236-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">11275306</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Protecting exons from deleterious R-loops a potential advantage of having introns.</p>
            </title>
            <aug>
               <au>
                  <snm>Niu</snm>
                  <fnm>DK</fnm>
               </au>
            </aug>
            <source>Biol Direct</source>
            <pubdate>2007</pubdate>
            <volume>2</volume>
            <fpage>11</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1745-6150-2-11</pubid>
                  <pubid idtype="pmcid">1863416</pubid>
                  <pubid idtype="pmpid" link="fulltext">17459149</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Alternative splicing: new insights from global analyses.</p>
            </title>
            <aug>
               <au>
                  <snm>Blencowe</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2006</pubdate>
            <volume>126</volume>
            <fpage>37</fpage>
            <lpage>47</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2006.06.023</pubid>
                  <pubid idtype="pmpid" link="fulltext">16839875</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Alternative splicing and RNA selection pressure - evolutionary consequences for eukaryotic genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Xing</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>499</fpage>
            <lpage>510</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1896</pubid>
                  <pubid idtype="pmpid" link="fulltext">16770337</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Understanding alternative splicing: towards a cellular code.</p>
            </title>
            <aug>
               <au>
                  <snm>Matlin</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Nat Rev Mol Cell Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>386</fpage>
            <lpage>398</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrm1645</pubid>
                  <pubid idtype="pmpid" link="fulltext">15956978</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Castle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Garrett-Engele</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Loerch</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Armour</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Santos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schadt</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Stoughton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>DD</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>2141</fpage>
            <lpage>2144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090100</pubid>
                  <pubid idtype="pmpid" link="fulltext">14684825</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Alternative isoform regulation in human tissue transcriptomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Sandberg</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Khrebtukova</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mayr</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kingsmore</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Schroth</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2008</pubdate>
            <volume>456</volume>
            <fpage>470</fpage>
            <lpage>476</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature07509</pubid>
                  <pubid idtype="pmcid">2593745</pubid>
                  <pubid idtype="pmpid" link="fulltext">18978772</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines.</p>
            </title>
            <aug>
               <au>
                  <snm>Castle</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Kulkarni</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Kalsotra</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2008</pubdate>
            <volume>40</volume>
            <fpage>1416</fpage>
            <lpage>1425</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng.264</pubid>
                  <pubid idtype="pmpid" link="fulltext">18978788</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing.</p>
            </title>
            <aug>
               <au>
                  <snm>Pan</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Shai</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Frey</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Blencowe</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2008</pubdate>
            <volume>40</volume>
            <fpage>1413</fpage>
            <lpage>1415</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng.259</pubid>
                  <pubid idtype="pmpid" link="fulltext">18978789</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Splicing regulation: from a parts list of regulatory elements to an integrated splicing code.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2008</pubdate>
            <volume>14</volume>
            <fpage>802</fpage>
            <lpage>813</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1261/rna.876308</pubid>
                  <pubid idtype="pmcid">2327353</pubid>
                  <pubid idtype="pmpid" link="fulltext">18369186</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Unusual intron conservation near tissue-regulated exons found by splicing microarrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>O'Brien</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cline</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kulp</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Blume</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ares</snm>
                  <fnm>M</fnm>
                  <suf>Jr</suf>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e4</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020004</pubid>
                  <pubid idtype="pmcid">1331982,1331982</pubid>
                  <pubid idtype="pmpid" link="fulltext">16424921</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Structural and functional features of eukaryotic mRNA untranslated regions.</p>
            </title>
            <aug>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mignone</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gissi</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Grillo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Licciulli</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Liuni</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2001</pubdate>
            <volume>276</volume>
            <fpage>73</fpage>
            <lpage>81</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">11591473</pubid>
                  <pubid idtype="doi">10.1016/S0378-1119(01)00674-6</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Intron size, abundance, and distribution within untranslated regions of genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Hong</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Scofield</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>2392</fpage>
            <lpage>2404</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msl111</pubid>
                  <pubid idtype="pmpid" link="fulltext">16980575</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The nonsense-mediated decay RNA surveillance pathway.</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>YF</fnm>
               </au>
               <au>
                  <snm>Imam</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Wilkinson</snm>
                  <fnm>MF</fnm>
               </au>
            </aug>
            <source>Annu Rev Biochem</source>
            <pubdate>2007</pubdate>
            <volume>76</volume>
            <fpage>51</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.biochem.76.050106.093909</pubid>
                  <pubid idtype="pmpid" link="fulltext">17352659</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Nonsense-mediated mRNA decay in mammals.</p>
            </title>
            <aug>
               <au>
                  <snm>Maquat</snm>
                  <fnm>LE</fnm>
               </au>
            </aug>
            <source>J Cell Sci</source>
            <pubdate>2005</pubdate>
            <volume>118</volume>
            <fpage>1773</fpage>
            <lpage>1776</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1242/jcs.01701</pubid>
                  <pubid idtype="pmpid" link="fulltext">15860725</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Position of the final intron in full-length transcripts: determined by NMD?</p>
            </title>
            <aug>
               <au>
                  <snm>Scofield</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Hong</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <fpage>896</fpage>
            <lpage>899</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msm010</pubid>
                  <pubid idtype="pmpid" link="fulltext">17244600</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Selection for short introns in highly expressed genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Castillo-Davis</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Mekhedov</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Kondrashov</snm>
                  <fnm>FA</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>31</volume>
            <fpage>415</fpage>
            <lpage>418</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12134150</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>The signature of selection mediated by expression on human genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Urritia</snm>
                  <fnm>AO</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2260</fpage>
            <lpage>2264</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.641103</pubid>
                  <pubid idtype="pmcid">403694</pubid>
                  <pubid idtype="pmpid" link="fulltext">12975314</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Expression pattern and, surprisingly, gene length, shape codon usage in <it>Caenorhabditis, Drosophila</it>, and <it>Arabidopsis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>4482</fpage>
            <lpage>4487</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.96.8.4482</pubid>
                  <pubid idtype="pmcid">16358</pubid>
                  <pubid idtype="pmpid" link="fulltext">10200288</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>In plants, highly expressed genes are the least compact.</p>
            </title>
            <aug>
               <au>
                  <snm>Ren</snm>
                  <fnm>X-Y</fnm>
               </au>
               <au>
                  <snm>Vorst</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Fiers</snm>
                  <fnm>MWEJ</fnm>
               </au>
               <au>
                  <snm>Stiekema</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Nap</snm>
                  <fnm>J-P</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>528</fpage>
            <lpage>532</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2006.08.008</pubid>
                  <pubid idtype="pmpid" link="fulltext">16934358</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>'Genome design' model and multicellular complexity: golden middle.</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>5906</fpage>
            <lpage>5914</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkl773</pubid>
                  <pubid idtype="pmcid">1635334</pubid>
                  <pubid idtype="pmpid" link="fulltext">17062620</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Human housekeeping genes are compact.</p>
            </title>
            <aug>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Levanon</snm>
                  <fnm>EY</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>362</fpage>
            <lpage>366</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(03)00140-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12850439</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Compactness of human housekeeping genes: selection for economy or genomic design?</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>248</fpage>
            <lpage>253</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2004.03.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">15109779</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>'Genome design' model: evidence from conserved intronic sequence in human-mouse comparison.</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>347</fpage>
            <lpage>354</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.4318206</pubid>
                  <pubid idtype="pmcid">1415212</pubid>
                  <pubid idtype="pmpid" link="fulltext">16461636</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Human antisense genes have unusually short introns: evidence for selection for rapid transcription.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Carmichael</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>203</fpage>
            <lpage>207</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.02.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">15797613</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The small introns of antisense genes are better explained by selection for rapid transcription than by 'genomic design'.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2005</pubdate>
            <volume>171</volume>
            <fpage>2151</fpage>
            <lpage>2155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1534/genetics.105.048066</pubid>
                  <pubid idtype="pmcid">1456133</pubid>
                  <pubid idtype="pmpid">16143605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Similar rates but different modes of sequence evolution in introns and at exonic silent sites in rodents evidence for selectively driven codon usage.</p>
            </title>
            <aug>
               <au>
                  <snm>Chamary</snm>
                  <fnm>J-V</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>1014</fpage>
            <lpage>1023</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh087</pubid>
                  <pubid idtype="pmpid" link="fulltext">15014158</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Distribution and characterization of regulatory elements in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Majewski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ott</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1827</fpage>
            <lpage>1836</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.606402</pubid>
                  <pubid idtype="pmcid">187578</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466286</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Complex selection on intron size in <it>Cryptococcus</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Hughes</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Buckley</snm>
                  <fnm>CO</fnm>
               </au>
               <au>
                  <snm>Neafsey</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2008</pubdate>
            <volume>25</volume>
            <fpage>247</fpage>
            <lpage>253</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msm220</pubid>
                  <pubid idtype="pmpid" link="fulltext">18171915</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Evolutionary conservation of UTR intron boundaries in <it>Cryptococcus</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Roy</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Penny</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Neafsey</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <fpage>1140</fpage>
            <lpage>1148</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msm045</pubid>
                  <pubid idtype="pmpid" link="fulltext">17374879</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Analysis and recognition of 5'UTR intron splice sites in human pre-mRNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Eden</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>1131</fpage>
            <lpage>1142</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkh273</pubid>
                  <pubid idtype="pmcid">373407</pubid>
                  <pubid idtype="pmpid" link="fulltext">14960723</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>The sequence of the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Mural</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Yandell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Amanatides</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ballew</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Huson</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Kodira</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Skupski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gabor Miklos</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Broder</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Nadeau</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>McKusick</snm>
                  <fnm>VA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>291</volume>
            <fpage>1304</fpage>
            <lpage>1351</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1058040</pubid>
                  <pubid idtype="pmpid" link="fulltext">11181995</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Intrinsic noise in gene regulatory networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Thattai</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>van Oudenaarden</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>8614</fpage>
            <lpage>8619</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.151588598</pubid>
                  <pubid idtype="pmcid">37484</pubid>
                  <pubid idtype="pmpid" link="fulltext">11438714</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Characterizing gene sets with FuncAssociate.</p>
            </title>
            <aug>
               <au>
                  <snm>Berriz</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>OD</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>FP</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>2502</fpage>
            <lpage>2504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg363</pubid>
                  <pubid idtype="pmpid" link="fulltext">14668247</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>GOstat: find statistically overrepresented Gene Ontologies within a group of genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Bei&#223;barth</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>1464</fpage>
            <lpage>1465</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth088</pubid>
                  <pubid idtype="pmpid" link="fulltext">14962934</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>The Synergizer service for translating gene, protein and other biological identifiers.</p>
            </title>
            <aug>
               <au>
                  <snm>Berriz</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>FP</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>2272</fpage>
            <lpage>2273</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btn424</pubid>
                  <pubid idtype="pmcid">2553440</pubid>
                  <pubid idtype="pmpid" link="fulltext">18697767</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Non-receptor protein tyrosine kinases.</p>
            </title>
            <aug>
               <au>
                  <snm>Tsygankov</snm>
                  <fnm>AY</fnm>
               </au>
            </aug>
            <source>Front Biosci</source>
            <pubdate>2003</pubdate>
            <volume>8</volume>
            <fpage>s595</fpage>
            <lpage>635</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.2741/1106</pubid>
                  <pubid idtype="pmpid">12700079</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>King</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chiaromonte</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1051</fpage>
            <lpage>1060</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.3642605</pubid>
                  <pubid idtype="pmcid">1182217</pubid>
                  <pubid idtype="pmpid" link="fulltext">16024817</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>PhyloGibbs: aGibbs sampling motif finder that incorporates phylogeny.</p>
            </title>
            <aug>
               <au>
                  <snm>Siddharthan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Siggia</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Nimwegen</snm>
                  <fnm>Ev</fnm>
               </au>
            </aug>
            <source>PloS Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <fpage>e67</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pcbi.0010067</pubid>
                  <pubid idtype="pmcid">1309704,1309704</pubid>
                  <pubid idtype="pmpid" link="fulltext">16477324</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals.</p>
            </title>
            <aug>
               <au>
                  <snm>Xie</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kulbokas</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Mootha</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>434</volume>
            <fpage>338</fpage>
            <lpage>345</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03441</pubid>
                  <pubid idtype="pmpid" link="fulltext">15735639</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>STAMP: a web tool for exploring DNA-binding motif similarities.</p>
            </title>
            <aug>
               <au>
                  <snm>Mahony</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Benos</snm>
                  <fnm>PV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>W253</fpage>
            <lpage>258</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkm272</pubid>
                  <pubid idtype="pmcid">1933206</pubid>
                  <pubid idtype="pmpid" link="fulltext">17478497</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Promoter proximal splice sites enhance transcription.</p>
            </title>
            <aug>
               <au>
                  <snm>Furger</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>O'Sullivan</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Binnie</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Proudfout</snm>
                  <fnm>NJ</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2002</pubdate>
            <volume>16</volume>
            <fpage>2792</fpage>
            <lpage>2799</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gad.983602</pubid>
                  <pubid idtype="pmcid">187476</pubid>
                  <pubid idtype="pmpid" link="fulltext">12414732</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Introns increase transcriptional efficiency in transgenic mice.</p>
            </title>
            <aug>
               <au>
                  <snm>Brinster</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Behringer</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Gelinas</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Palmiter</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>836</fpage>
            <lpage>840</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.85.3.836</pubid>
                  <pubid idtype="pmcid">279650</pubid>
                  <pubid idtype="pmpid">3422466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Heterologous introns can enhance expression of transgenes in mice.</p>
            </title>
            <aug>
               <au>
                  <snm>Palmiter</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Sandgren</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Avarbock</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Brinster</snm>
                  <fnm>RL</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1991</pubdate>
            <volume>88</volume>
            <fpage>478</fpage>
            <lpage>482</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.88.2.478</pubid>
                  <pubid idtype="pmcid">50834</pubid>
                  <pubid idtype="pmpid">1988947</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Intron requirement for expression of the human purine nucleoside phosphorylase gene.</p>
            </title>
            <aug>
               <au>
                  <snm>Jonsson</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Foresman</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>McIvor</snm>
                  <fnm>RS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1992</pubdate>
            <volume>20</volume>
            <fpage>3191</fpage>
            <lpage>3198</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/20.12.3191</pubid>
                  <pubid idtype="pmcid">312458</pubid>
                  <pubid idtype="pmpid" link="fulltext">1620616</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>How introns influence and enhance eukaryotic gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Le Hir</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nott</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>2003</pubdate>
            <volume>28</volume>
            <fpage>215</fpage>
            <lpage>220</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(03)00052-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">12713906</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>The effect of intron location on intron-mediated enhancement of gene expression in <it>Arabidopsis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Rose</snm>
                  <fnm>AB</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>2004</pubdate>
            <volume>40</volume>
            <fpage>744</fpage>
            <lpage>751</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-313X.2004.02247.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15546357</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Promoter-proximal introns in <it>Arabidopsis thaliana </it>are enriched in dispersed signals that elevate gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Rose</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Elfersi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Parra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Plant Cell</source>
            <pubdate>2008</pubdate>
            <volume>20</volume>
            <fpage>543</fpage>
            <lpage>551</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1105/tpc.107.057190</pubid>
                  <pubid idtype="pmcid">2329928</pubid>
                  <pubid idtype="pmpid" link="fulltext">18319396</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Stimulatory effect of splicing factors on transcriptional elongation.</p>
            </title>
            <aug>
               <au>
                  <snm>Fong</snm>
                  <fnm>YW</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>Q</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>414</volume>
            <fpage>929</fpage>
            <lpage>933</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/414929a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11780068</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>An intronic enhancer containing an N-box motif is required for synapse- and tissue-specific expression of the acetylcholinesterase gene in skeletal muscle fibers.</p>
            </title>
            <aug>
               <au>
                  <snm>Chan</snm>
                  <fnm>RY</fnm>
               </au>
               <au>
                  <snm>Boudreau-Lariviere</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Angus</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Mankal</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>Jasmin</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>4627</fpage>
            <lpage>4632</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.96.8.4627</pubid>
                  <pubid idtype="pmcid">16383</pubid>
                  <pubid idtype="pmpid" link="fulltext">10200313</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Gene expression enhancement mediated by the 5' UTR intron of the rice rubi3 gene varied remarkably among tissues in transgenic rice plants.</p>
            </title>
            <aug>
               <au>
                  <snm>Lu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sivamani</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Azhakanandam</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Samadder</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Qu</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Genet Genomics</source>
            <pubdate>2008</pubdate>
            <volume>279</volume>
            <fpage>563</fpage>
            <lpage>572</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00438-008-0333-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">18320227</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Transcriptional regulatory elements within the first intron of Bruton's tyrosine kinase.</p>
            </title>
            <aug>
               <au>
                  <snm>Rohrer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Conley</snm>
                  <fnm>ME</fnm>
               </au>
            </aug>
            <source>Blood</source>
            <pubdate>1998</pubdate>
            <volume>91</volume>
            <fpage>214</fpage>
            <lpage>221</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9414287</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Post-transcriptional regulation of gene expression by alternative 5'-untranslated regions in carcinogenesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Biochem Soc Trans</source>
            <pubdate>2008</pubdate>
            <volume>36</volume>
            <fpage>708</fpage>
            <lpage>711</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1042/BST0360708</pubid>
                  <pubid idtype="pmpid" link="fulltext">18631145</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Functional characterization of two novel 5' untranslated exons reveals a complex regulation of NOD2 protein expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Rosenstiel</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huse</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Franke</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hampe</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Reichwald</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Platzer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>RG</fnm>
               </au>
               <au>
                  <snm>Mathew</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Platzer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schreiber</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>472</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2164-8-472</pubid>
                  <pubid idtype="pmcid">2228316</pubid>
                  <pubid idtype="pmpid" link="fulltext">18096043</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Alternatively spliced isoforms of the human elk-1 mRNA within the 5' UTR implications for ELK-1 expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Araud</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Genolet</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jaquier-Gubler</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Curran</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>4649</fpage>
            <lpage>4663</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkm482</pubid>
                  <pubid idtype="pmcid">1950554</pubid>
                  <pubid idtype="pmpid" link="fulltext">17591614</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>SKAR links pre-mRNA splicing to mTOR/S6K1-mediated enhanced translation efficiency of spliced mRNAs.</p>
            </title>
            <aug>
               <au>
                  <snm>Ma</snm>
                  <fnm>XM</fnm>
               </au>
               <au>
                  <snm>Yoon</snm>
                  <fnm>S-O</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Julich</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Blenis</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2008</pubdate>
            <volume>133</volume>
            <fpage>303</fpage>
            <lpage>313</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2008.02.031</pubid>
                  <pubid idtype="pmpid" link="fulltext">18423201</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>D61</fpage>
            <lpage>D65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkl842</pubid>
                  <pubid idtype="pmcid">1716718</pubid>
                  <pubid idtype="pmpid" link="fulltext">17130148</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>The UCSC Genome Browser database: update 2010.</p>
            </title>
            <aug>
               <au>
                  <snm>Rhead</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kuhn</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Zweig</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Fujita</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Raney</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Pohl</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pheasant</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>LR</fnm>
               </au>
               <au>
                  <snm>Learned</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hsu</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hillman-Jackson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Giardine</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Dreszer</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Clawson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Barber</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2010</pubdate>
            <volume>38</volume>
            <fpage>D613</fpage>
            <lpage>D619</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkp939</pubid>
                  <pubid idtype="pmcid">2808870</pubid>
                  <pubid idtype="pmpid" link="fulltext">19906737</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Galaxy: a platform for interactive large-scale genome analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Giardine</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Riemer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Burhans</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Blankenberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Albert</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Nekrutenko</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1451</fpage>
            <lpage>1455</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.4086505</pubid>
                  <pubid idtype="pmcid">1240089</pubid>
                  <pubid idtype="pmpid" link="fulltext">16169926</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>A gene atlas of the mouse and human protein-encoding transcriptomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Su</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Wiltshire</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Batalov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lapp</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ching</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Block</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Soden</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hayakawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kreiman</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cooke</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Hogenesch</snm>
                  <fnm>JB</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>6062</fpage>
            <lpage>6067</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0400782101</pubid>
                  <pubid idtype="pmcid">395923</pubid>
                  <pubid idtype="pmpid" link="fulltext">15075390</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>hexbin: Hexagonal Binning Routines. R package version 1.18.0</p>
            </title>
            <url>http://www.bioconductor.org/packages/bioc/html/hexbin.html</url>
         </bibl>
         <bibl id="B68">
            <title>
               <p>zoo: S3 Infrastructure for Regular and Irregular Time Series.</p>
            </title>
            <aug>
               <au>
                  <snm>Zeileis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Grothendieck</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>J Stat Software</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <fpage>1</fpage>
            <lpage>27</lpage>
         </bibl>
         <bibl id="B69">
            <title>
               <p>HomoloGene</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/homologene</url>
         </bibl>
         <bibl id="B70">
            <title>
               <p>Phylogenetic and functional assessment of orthologs inference projects and methods.</p>
            </title>
            <aug>
               <au>
                  <snm>Altenhoff</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Dessimoz</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2009</pubdate>
            <volume>5</volume>
            <fpage>e1000262</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pcbi.1000262</pubid>
                  <pubid idtype="pmcid">2612752</pubid>
                  <pubid idtype="pmpid" link="fulltext">19148271</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>UCSC Genome Browser LiftOver Utility</p>
            </title>
            <url>http://genome.ucsc.edu/cgi-bin/hgLiftOver</url>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Detecting regulatory sites using PhyloGibbs.</p>
            </title>
            <aug>
               <au>
                  <snm>Siddharthan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Nimwegen</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Methods in Molecular Biology</source>
            <publisher>Bergman NH: Humana Press</publisher>
            <pubdate>2007</pubdate>
            <fpage>382</fpage>
            <lpage>402</lpage>
         </bibl>
         <bibl id="B73">
            <title>
               <p>The GOA database in 2009 - an integrated Gene Ontology Annotation resource.</p>
            </title>
            <aug>
               <au>
                  <snm>Barrell</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dimmer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Huntley</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Binns</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2009</pubdate>
            <volume>37</volume>
            <fpage>D396</fpage>
            <lpage>D403</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkn803</pubid>
                  <pubid idtype="pmcid">2686469</pubid>
                  <pubid idtype="pmpid" link="fulltext">18957448</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics.</p>
            </title>
            <aug>
               <au>
                  <snm>Sandelin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wasserman</snm>
                  <fnm>WW</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>338</volume>
            <fpage>207</fpage>
            <lpage>215</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.02.048</pubid>
                  <pubid idtype="pmpid" link="fulltext">15066426</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
