<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-7-r140</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Housekeeping genes tend to show reduced upstream sequence conservation</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Farr&#233;</snm>
               <fnm>Dom&#232;nec</fnm>
               <insr iid="I1"/>
               <email>domenec.farre@crg.es</email>
            </au>
            <au id="A2">
               <snm>Bellora</snm>
               <fnm>Nicol&#225;s</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>nicolas.bellora@upf.edu</email>
            </au>
            <au id="A3">
               <snm>Mularoni</snm>
               <fnm>Loris</fnm>
               <insr iid="I3"/>
               <email>lmularoni@imim.es</email>
            </au>
            <au id="A4">
               <snm>Messeguer</snm>
               <fnm>Xavier</fnm>
               <insr iid="I4"/>
               <email>peypoch@lsi.upc.es</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Alb&#224;</snm>
               <fnm>M Mar</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <insr iid="I5"/>
               <email>malba@imim.es</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Centre for Genomic Regulation, Dr Aiguader 88, Barcelona 08003, Spain</p>
            </ins>
            <ins id="I2">
               <p>Universitat Pompeu Fabra, Dr Aiguader 88, Barcelona 08003, Spain</p>
            </ins>
            <ins id="I3">
               <p>Fundaci&#243; Institut Municipal d'Investigaci&#243; M&#232;dica, Dr Aiguader 88, Barcelona 08003, Spain</p>
            </ins>
            <ins id="I4">
               <p>Universitat Polit&#232;cnica de Catalunya, Jordi Girona 1-3, Barcelona 08034, Spain</p>
            </ins>
            <ins id="I5">
               <p>Catalan Institution for Research and Advanced Studies, Pg Lluis Companys 23, Barcelona 08010, Spain</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>7</issue>
         <fpage>R140</fpage>
         <url>http://genomebiology.com/2007/8/7/R140</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17626644</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-7-r140</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>20</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>16</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>13</day>
               <month>7</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>13</day>
               <month>07</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Farr&#233; et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Upstream sequence conservation and expression</p>
      </shorttitle>
      <shortabs>
         <p>Mammalian housekeeping genes show significantly lower promoter sequence conservation, especially upstream of position -500 with respect to the transcription start site, than genes expressed in a subset of tissues.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We show that mammalian housekeeping genes, expressed in all or nearly all tissues, show significantly lower promoter sequence conservation, especially upstream of position -500 with respect to the transcription start site, than genes expressed in a subset of tissues. In addition, we evaluate the effect of gene function, CpG island content and protein evolutionary rate on promoter sequence conservation. Finally, we identify a subset of transcription factors that bind to motifs that are specifically over-represented in housekeeping gene promoters.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This is the first report that shows that the promoters of housekeeping genes show reduced sequence conservation with respect to genes expressed in a more tissue-restricted manner. This is likely to be related to simpler gene expression, requiring a smaller number of functional <it>cis</it>-regulatory motifs.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010016">Molecular biology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The correct functioning of multicellular organisms depends on a complex orchestration of gene regulatory events, which ensure that genes are expressed at the right time, place and level. Much of this regulation occurs at the level of gene transcription, and is mediated by specific interactions between transcription factors and <it>cis</it>-regulatory DNA motifs. Regulatory motifs concentrate in sequences upstream of the transcription start site (TSS), the region known as the gene promoter (for a recent review, see <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>).</p>
         <p>Changes in gene expression patterns can cause important phenotypic modifications. Mutations in <it>cis</it>-regulatory motifs can alter the binding affinity of transcription factors and affect the expression of a gene. However, the evolutionary dynamics of promoter sequences are still poorly understood. A commonly used approach to assess the existence of evolutionary constraints and identify regulatory motifs is the identification of conserved non-coding sequences across orthologues. This rationale is behind several described 'phylogenetic footprinting' methods to discover functional regulatory sequences <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>.</p>
         <p>Contrary to coding sequences, gene expression regulatory sequences do not have very well defined boundaries. A region spanning approximately 100 base-pairs (bp) upstream of the TSS, known as the basal promoter, plays a fundamental part in the assembly of the transcription initiation complex. Further upstream regulatory sequences are of variable length depending on the particular gene <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Nevertheless, a recent study has shown that, at distances longer than 2 Kb from the TSS, the similarity between orthologous promoters drastically drops, indicating that most of the functional elements concentrate in the 2 Kb promoter region <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. In accordance, about 85% of the known mouse transcription regulatory motifs are located within 2 Kb of the gene promoter region <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and functional assays have shown that a region spanning -500 to +50 relative to the TSS region is sufficient to drive transcription in cultured cells for most human genes <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>Promoter sequence comparisons across different species have shed light on the different constraints exhibited by promoters of different types of genes. In particular, it has been observed that the promoters of genes encoding regulatory proteins, such as transcription factors and/or developmental proteins, tend to show remarkably strong sequence conservation <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, suggesting that the expression of this class of genes requires a relatively large amount of <it>cis</it>-regulatory motifs.</p>
         <p>Another important factor that may be related to promoter sequence conservation is the number of tissues in which a gene is expressed. In the adult organism, some genes show high tissue-specificity while others show little or no tissue expression restrictions (ubiquitous expression). The effect of expression breadth on promoter conservation has not been addressed previously. Here we provide evidence that, in mammals, the simple expression patterns exhibited by housekeeping genes - expressed in all or nearly all tissues - are often associated with limited promoter sequence conservation, while tissue expression restrictions are associated with increasingly high promoter conservation. This defines a new important property of mammalian gene promoters.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Divergence of orthologous human and mouse promoter sequences</p>
            </st>
            <p>The promoters of different genes exhibit varying degrees of sequence divergence <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. In genes from nematodes <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and yeast <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, the level of promoter sequence divergence is positively correlated with the evolutionary rate of the encoded protein. An interesting question is whether such a correspondence also exists in mammals. We collected human and mouse orthologous promoters (6,698 pairs, 2 Kb from the transcription start site) and applied different measures of sequence divergence. We aimed at quantifying promoter sequence divergence, evaluating the strength of selection and identifying any significant relationship between the divergence of promoter and coding sequences.</p>
            <p>First, we calculated the fraction of the promoter sequence that failed to align between human and mouse orthologues. We used the local pairwise sequence alignment program described in Castillo-Davis <it>et al</it>. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, which provides a score, d<sub>SM </sub>(shared motif divergence), that corresponds to the fraction of non-aligned sequence. The average value was 0.701, which means that, on average, 29.9% of the 2 Kb promoter sequence was successfully aligned. On the promoter alignments we estimated the number of nucleotide substitutions per site using PAML <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. This promoter substitution rate, which we term Kp, was, on average, 0.334 substitutions per site.</p>
            <p>Next we estimated the synonymous (Ks) and non-synonymous (Ka) substitution rates of the corresponding gene coding sequences using PAML. In mammals, Ks can be used to account for the background mutation level. Ka, on the contrary, corresponds to changes at the amino acid level and reflects the strength of selection on the protein. In the orthologous dataset, the average Ks was 0.709 and the average Ka 0.084. The approximately two-fold difference between Kp and Ks (0.334 and 0.709, respectively) indicates stronger negative or purifying selection in the evolution of promoter sequences with respect to synonymous sites in coding regions.</p>
            <p>We subsequently addressed the question of whether the level of promoter sequence divergence is related to the evolutionary rate in the corresponding coding sequence in mammals. Interestingly, we found a modest although significant positive correlation between the promoter divergence (d<sub>SM</sub>) and the coding sequence substitution rate (d<sub>SM </sub>and Ka, r = 0.20, <it>p </it>&lt; 10<sup>-58</sup>; d<sub>SM </sub>and Ka/Ks, r = 0.14, <it>p </it>&lt; 10<sup>-29</sup>; d<sub>SM </sub>and Ks, r = 0.18, <it>p </it>&lt; 10<sup>-48</sup>). That is, in general, proteins that showed high divergence between human and mouse (high Ka or Ka/Ks) showed a tendency to be encoded by genes with reduced promoter sequence conservation.</p>
         </sec>
         <sec>
            <st>
               <p>Gene expression breadth</p>
            </st>
            <p>We used mouse transcriptome microarray data from Zhang <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> to classify the previously defined genes into different groups according to their expression in 55 mouse organs and tissues (see Supplementary table S5 in Additional data file 1). The orthologous dataset with expression data contained 3,893 genes. The tissue distribution profile in five-tissue bins (Figure <figr fid="F1">1</figr>) showed a bimodal shape with a moderate excess of genes expressed in a few tissues and a more acute excess of genes expressed in a very large number of tissues. Genes with expression restricted to 1-10 tissues were classified as 'restricted' (986 genes), those with ubiquitous or nearly ubiquitous expression (51-55 tissues) as 'housekeeping' (HK; 1,018 genes), and the rest, expressed in 11-50 tissues, as 'intermediate' (1,889 genes).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Mouse tissue expression distribution</p>
               </caption>
               <text>
                  <p>Mouse tissue expression distribution. We define three groups: low expression breadth (Restricted; 1-10 tissues), intermediate expression breadth (Intermediate; 11-50 tissues), high expression breadth (Housekeeping; 51-55 tissues).</p>
               </text>
               <graphic file="gb-2007-8-7-r140-1"/>
            </fig>
            <p>We compared d<sub>SM</sub>, Kp, Ka and Ks values for genes classified in the three different expression groups (Table <tblr tid="T1">1</tblr>). We observed that the average d<sub>SM </sub>score, which corresponds to the fraction of the 2 Kb promoter that cannot be aligned, consistently increased with the expression breadth. The average d<sub>SM </sub>in HK genes was 0.732 (26.8% promoter conservation), whereas in genes with 'restricted' expression it was 0.688 (31.2% promoter conservation). The d<sub>SM </sub>values were significantly different between HK genes and the other non-HK groups (Wilcoxon-Mann-Whitney and Kruskal-Wallis tests, <it>p </it>&lt; 10<sup>-5</sup>). The nucleotide substitution rate within aligned regions, Kp, was, instead, not significantly different across the different datasets. Kp also showed decreased variability with respect to Ks, with about three times lower standard deviation values (Table <tblr tid="T1">1</tblr>). In contrast to promoter divergence, both Ka and Ka/Ks in coding sequences were significantly lower in HK genes than in the other groups (Table <tblr tid="T1">1</tblr>). In fact, we observed a negative correlation between expression breadth and Ka (r = -0.31, <it>p </it>&lt; 10<sup>-87</sup>), in accordance with previous results <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Therefore, while at the promoter level the constraints appeared to be weaker in HK genes than in the rest of the genes, at the level of the protein sequence the situation was reversed.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Sequence divergence versus tissue expression breadth</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>No. of tissues</p>
                     </c>
                     <c ca="right">
                        <p>N (total = 3,893)</p>
                     </c>
                     <c ca="right">
                        <p>d<sub>SM</sub></p>
                     </c>
                     <c ca="right">
                        <p>Kp</p>
                     </c>
                     <c ca="right">
                        <p>Ka</p>
                     </c>
                     <c ca="right">
                        <p>Ks</p>
                     </c>
                     <c ca="right">
                        <p>Ka/Ks</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>01-10</p>
                     </c>
                     <c ca="right">
                        <p>986</p>
                     </c>
                     <c ca="right">
                        <p>0.688</p>
                     </c>
                     <c ca="right">
                        <p>0.337</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.107</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.733</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.150</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>0.735</p>
                     </c>
                     <c ca="right">
                        <p>0.328</p>
                     </c>
                     <c ca="right">
                        <p>0.084</p>
                     </c>
                     <c ca="right">
                        <p>0.673</p>
                     </c>
                     <c ca="right">
                        <p>0.119</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>0.221</p>
                     </c>
                     <c ca="right">
                        <p>0.110</p>
                     </c>
                     <c ca="right">
                        <p>0.093</p>
                     </c>
                     <c ca="right">
                        <p>0.299</p>
                     </c>
                     <c ca="right">
                        <p>0.122</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>11-50</p>
                     </c>
                     <c ca="right">
                        <p>1,889</p>
                     </c>
                     <c ca="right">
                        <p>0.701</p>
                     </c>
                     <c ca="right">
                        <p>0.333</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.079</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>0.708</p>
                     </c>
                     <c ca="right">
                        <p>0.116</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>0.752</p>
                     </c>
                     <c ca="right">
                        <p>0.328</p>
                     </c>
                     <c ca="right">
                        <p>0.058</p>
                     </c>
                     <c ca="right">
                        <p>0.633</p>
                     </c>
                     <c ca="right">
                        <p>0.089</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>0.216</p>
                     </c>
                     <c ca="right">
                        <p>0.093</p>
                     </c>
                     <c ca="right">
                        <p>0.073</p>
                     </c>
                     <c ca="right">
                        <p>0.307</p>
                     </c>
                     <c ca="right">
                        <p>0.103</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>51-55</p>
                     </c>
                     <c ca="right">
                        <p>1,018</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.732</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>0.328</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.050</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.639</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>0.079</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>0.791</p>
                     </c>
                     <c ca="right">
                        <p>0.323</p>
                     </c>
                     <c ca="right">
                        <p>0.031</p>
                     </c>
                     <c ca="right">
                        <p>0.572</p>
                     </c>
                     <c ca="right">
                        <p>0.054</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>0.208</p>
                     </c>
                     <c ca="right">
                        <p>0.079</p>
                     </c>
                     <c ca="right">
                        <p>0.057</p>
                     </c>
                     <c ca="right">
                        <p>0.305</p>
                     </c>
                     <c ca="right">
                        <p>0.085</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p><it>p </it>value (K-W test)</p>
                     </c>
                     <c ca="right">
                        <p>&lt;10<sup>-5</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.226</p>
                     </c>
                     <c ca="right">
                        <p>&lt;10<sup>-75</sup></p>
                     </c>
                     <c ca="right">
                        <p>&lt;10<sup>-18</sup></p>
                     </c>
                     <c ca="right">
                        <p>&lt;10<sup>-62</sup></p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>N, number of genes; d<sub>SM</sub>, promoter divergence (see text); Kp, promoter substitution rate; Ka, non-synonymous substitution rate; Ks, synonymous substitution rate. Mean (top), median (middle), and standard deviation (bottom) are indicated for each variable. Numbers in bold indicate significant differences at <it>p </it>&lt; 0.001 in each expression group with respect to the rest (two-sample Wilcoxon-Mann-Whitney test). The last row shows the <it>p </it>value of Kruskal-Wallis (K-W) test that evaluates differences between the three tissue expression breadth groups.</p>
               </tblfn>
            </tbl>
            <p>Additional support for the results was obtained using human gene expression data. We mapped the orthologous genes to the eVOC database (anatomical system and cell type) <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, based on expressed sequence tag data, and to Gene Atlas <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The results obtained using these datasets were in strong agreement with the results presented in Table <tblr tid="T1">1</tblr> (see Supplementary tables S1, S2 and S3, respectively, in Additional data file 1). That is, the fraction of human genes with the broadest tissue expression (HK genes) always showed significantly higher promoter divergence values.</p>
            <p>The next question we addressed was whether the reduced sequence conservation observed in HK genes was uniformly distributed along the 2 Kb upstream sequence or, alternatively, it could be mapped to a particular region of the promoter. Considering the complete 2 Kb sequences, d<sub>SM </sub>differences between HK and non-HK datasets were significant at <it>p </it>&lt; 10<sup>-6 </sup>(Wilcoxon-Mann-Whitney test). Then, we calculated the average sequence conservation (1 - d<sub>SM</sub>) in 100 nucleotide overlapping sequence windows (bins) along the 2 Kb promoter sequence in HK and non-HK genes (Figure <figr fid="F2">2</figr>, top row, left). We found that the region spanning from the TSS to position -100 showed the highest level of sequence conservation (average 1 - d<sub>SM </sub>0.576, or 57.6% promoter conservation). Further upstream, the sequence conservation gradually dropped, with a stronger decay in HK than in non-HK genes (Figure <figr fid="F2">2</figr>, top row, left). If we considered only the proximal promoter region, from the TSS to position -500, we did not detect statistically significant differences (<it>p </it>= 0.0633). However, using the region from the TSS to -600, differences became significant at <it>p </it>&lt; 0.05 (<it>p </it>= 0.0195). On the other hand, when we considered the distal promoter region only, from -500 to -2,000, the gap between the two types of sequences regarding promoter divergence increased (<it>p </it>&lt; 10<sup>-8</sup>). Therefore, we concluded that the observed lower promoter sequence conservation of HK genes concentrated in regions upstream from position -500.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Promoter sequence conservation in HK and non-HK genes</p>
               </caption>
               <text>
                  <p>Promoter sequence conservation in HK and non-HK genes. The x-axis shows 100 nucleotide bins along 2 Kb upstream of the TSS. The y-axis shows percent conservation ((1 - d<sub>SM</sub>) &#215; 100). Genes were grouped according to the presence or absence of a CpG island and Ka/Ks values. Significant <it>p </it>values for 2 Kb promoter sequence divergence comparisons are indicated below the curves. Beneath these, the <it>p </it>values obtained for regions -2,000 to -500 (left), and -500 to the TSS (right), are given in smaller font size.</p>
               </text>
               <graphic file="gb-2007-8-7-r140-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Functions of encoded gene products</p>
            </st>
            <p>Our data show that HK genes contained poorly conserved promoters, particularly in the promoter distal part (upstream from -500). Other studies reported differences in the conservation of promoter sequences in relation to the function of the protein <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. As HK genes encode proteins with biased function composition <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>, we measured the over- and under-representation of different Gene Ontology (GO) terms <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> in the group of HK genes. We also assessed whether the functional biases in HK genes could alone explain the differences observed in promoter sequence conservation.</p>
            <p>We determined which GO classes were over- or under-represented among HK genes (<it>p </it>&lt; 0.01, &#967;<sup>2 </sup>test), using the 'molecular function', 'biological process', and 'cellular component' classification systems (Supplementary table S4 in Additional data file 1). As expected, an important fraction of the classes statistically over-represented among HK genes showed significantly high promoter sequence divergence. For example, in genes classified as 'structural constituent of ribosome', and 'mitochondrion' the average promoter sequence conservation was only 23% (d<sub>SM </sub>= 0.77). On the other hand, many classes under-represented among HK genes showed significantly high promoter sequence conservation (low d<sub>SM</sub>). For example, genes annotated as 'transcription factor activity' or 'nervous system development' showed an average promoter conservation of 42% (d<sub>SM </sub>= 0.58), and genes annotated as 'cell differentiation' showed an average promoter conservation of 43% (d<sub>SM </sub>= 0.57).</p>
            <p>Given the promoter sequence divergence differences among gene functional classes, one possibility was that the functional class bias in HK genes could fully explain the differences found between HK and non-HK genes. For this reason we tested whether there were any d<sub>SM </sub>differences between HK and non-HK genes within the same GO class. For statistical robustness we considered only GO classes with a minimum of 150 genes (22 classes; Table <tblr tid="T2">2</tblr>). In 19 of these classes, the average d<sub>SM </sub>of HK genes was higher than that of non-HK genes. For example, transcription factors with HK expression had an average d<sub>SM </sub>of 0.673 (32.7% promoter conservation), while those with no HK expression had an average d<sub>SM </sub>of 0.602 (39.8% promoter conservation). Of the 19 classes, 9 showed significant d<sub>SM </sub>differences between HK and non-HK genes (<it>p </it>&lt; 0.05). On the other hand, in the three classes with higher average d<sub>SM </sub>scores in non-HK than in HK genes the differences were not significant (<it>p </it>> 0.64). Therefore, we concluded that the promoter sequence divergence differences between HK and non-HK genes were essentially maintained within the different GO classes.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Average promoter divergence values (d<sub>SM</sub>) for HK and non-HK genes classified in different GO classes</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>All</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>CpG+</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>CpG-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO term</p>
                     </c>
                     <c ca="left">
                        <p>Description</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>d<sub>SM </sub>(HK)</p>
                     </c>
                     <c ca="center">
                        <p>d<sub>SM </sub>(nonHK)</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>d<sub>SM </sub>(HK)</p>
                     </c>
                     <c ca="center">
                        <p>d<sub>SM </sub>(nonHK)</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>d<sub>SM </sub>(HK)</p>
                     </c>
                     <c ca="center">
                        <p>d<sub>SM </sub>(nonHK)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Molecular function</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0000166</p>
                     </c>
                     <c ca="left">
                        <p>Nucleotide binding</p>
                     </c>
                     <c ca="center">
                        <p>464</p>
                     </c>
                     <c ca="center">
                        <p>0.727</p>
                     </c>
                     <c ca="center">
                        <p>0.699</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>363</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.732</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.698</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>101</p>
                     </c>
                     <c ca="center">
                        <p>0.684</p>
                     </c>
                     <c ca="center">
                        <p>0.700</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0004872</p>
                     </c>
                     <c ca="left">
                        <p>Receptor activity</p>
                     </c>
                     <c ca="center">
                        <p>259</p>
                     </c>
                     <c ca="center">
                        <p>0.734</p>
                     </c>
                     <c ca="center">
                        <p>0.675</p>
                     </c>
                     <c ca="center">
                        <p>131</p>
                     </c>
                     <c ca="center">
                        <p>0.747</p>
                     </c>
                     <c ca="center">
                        <p>0.656</p>
                     </c>
                     <c ca="center">
                        <p>128</p>
                     </c>
                     <c ca="center">
                        <p>0.655</p>
                     </c>
                     <c ca="center">
                        <p>0.692</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0004871</p>
                     </c>
                     <c ca="left">
                        <p>Signal transducer activity</p>
                     </c>
                     <c ca="center">
                        <p>440</p>
                     </c>
                     <c ca="center">
                        <p>0.689</p>
                     </c>
                     <c ca="center">
                        <p>0.658</p>
                     </c>
                     <c ca="center">
                        <p>246</p>
                     </c>
                     <c ca="center">
                        <p>0.692</p>
                     </c>
                     <c ca="center">
                        <p>0.656</p>
                     </c>
                     <c ca="center">
                        <p>194</p>
                     </c>
                     <c ca="center">
                        <p>0.663</p>
                     </c>
                     <c ca="center">
                        <p>0.661</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0003700</p>
                     </c>
                     <c ca="left">
                        <p>Transcription factor activity</p>
                     </c>
                     <c ca="center">
                        <p>183</p>
                     </c>
                     <c ca="center">
                        <p>0.673</p>
                     </c>
                     <c ca="center">
                        <p>0.602</p>
                     </c>
                     <c ca="center">
                        <p>113</p>
                     </c>
                     <c ca="center">
                        <p>0.657</p>
                     </c>
                     <c ca="center">
                        <p>0.600</p>
                     </c>
                     <c ca="center">
                        <p>70</p>
                     </c>
                     <c ca="center">
                        <p>0.766</p>
                     </c>
                     <c ca="center">
                        <p>0.605</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0043169</p>
                     </c>
                     <c ca="left">
                        <p>Cation binding</p>
                     </c>
                     <c ca="center">
                        <p>485</p>
                     </c>
                     <c ca="center">
                        <p>0.711</p>
                     </c>
                     <c ca="center">
                        <p>0.671</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>308</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.732</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.670</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>177</p>
                     </c>
                     <c ca="center">
                        <p>0.582</p>
                     </c>
                     <c ca="center">
                        <p>0.671</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Biological process</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0044249</p>
                     </c>
                     <c ca="left">
                        <p>Cellular biosynthesis</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>256</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.765</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.735</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>183</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.781</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.729</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>0.629</p>
                     </c>
                     <c ca="center">
                        <p>0.741</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0045184</p>
                     </c>
                     <c ca="left">
                        <p>Establishment of protein transport</p>
                     </c>
                     <c ca="center">
                        <p>162</p>
                     </c>
                     <c ca="center">
                        <p>0.720</p>
                     </c>
                     <c ca="center">
                        <p>0.737</p>
                     </c>
                     <c ca="center">
                        <p>138</p>
                     </c>
                     <c ca="center">
                        <p>0.723</p>
                     </c>
                     <c ca="center">
                        <p>0.731</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>0.677</p>
                     </c>
                     <c ca="center">
                        <p>0.760</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0007049</p>
                     </c>
                     <c ca="left">
                        <p>Cell cycle</p>
                     </c>
                     <c ca="center">
                        <p>188</p>
                     </c>
                     <c ca="center">
                        <p>0.697</p>
                     </c>
                     <c ca="center">
                        <p>0.706</p>
                     </c>
                     <c ca="center">
                        <p>152</p>
                     </c>
                     <c ca="center">
                        <p>0.703</p>
                     </c>
                     <c ca="center">
                        <p>0.724</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>0.656</p>
                     </c>
                     <c ca="center">
                        <p>0.646</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0019538</p>
                     </c>
                     <c ca="left">
                        <p>Protein metabolism</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>700</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.748</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.703</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>523</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.755</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.698</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>177</p>
                     </c>
                     <c ca="center">
                        <p>0.682</p>
                     </c>
                     <c ca="center">
                        <p>0.713</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0044260</p>
                     </c>
                     <c ca="left">
                        <p>Cellular macromolecule metabolism</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>761</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.748</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.705</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>560</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.755</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.700</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>201</p>
                     </c>
                     <c ca="center">
                        <p>0.686</p>
                     </c>
                     <c ca="center">
                        <p>0.713</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0050874</p>
                     </c>
                     <c ca="left">
                        <p>Organismal physiological process</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>292</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.795</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.681</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>109</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.813</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.675</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>183</p>
                     </c>
                     <c ca="center">
                        <p>0.756</p>
                     </c>
                     <c ca="center">
                        <p>0.685</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0009605</p>
                     </c>
                     <c ca="left">
                        <p>Response to external stimulus</p>
                     </c>
                     <c ca="center">
                        <p>209</p>
                     </c>
                     <c ca="center">
                        <p>0.676</p>
                     </c>
                     <c ca="center">
                        <p>0.711</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>0.758</p>
                     </c>
                     <c ca="center">
                        <p>0.699</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>124</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.538</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.718</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0007166</p>
                     </c>
                     <c ca="left">
                        <p>Cell surface receptor linker signal transduction</p>
                     </c>
                     <c ca="center">
                        <p>221</p>
                     </c>
                     <c ca="center">
                        <p>0.683</p>
                     </c>
                     <c ca="center">
                        <p>0.626</p>
                     </c>
                     <c ca="center">
                        <p>113</p>
                     </c>
                     <c ca="center">
                        <p>0.659</p>
                     </c>
                     <c ca="center">
                        <p>0.645</p>
                     </c>
                     <c ca="center">
                        <p>108</p>
                     </c>
                     <c ca="center">
                        <p>0.762</p>
                     </c>
                     <c ca="center">
                        <p>0.609</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0048513</p>
                     </c>
                     <c ca="left">
                        <p>Organ development</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>214</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.677</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.566</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>103</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.699</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.528</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>111</p>
                     </c>
                     <c ca="center">
                        <p>0.633</p>
                     </c>
                     <c ca="center">
                        <p>0.598</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0009653</p>
                     </c>
                     <c ca="left">
                        <p>Morphogenesis</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>262</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.679</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.584</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>132</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.685</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.549</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>130</p>
                     </c>
                     <c ca="center">
                        <p>0.664</p>
                     </c>
                     <c ca="center">
                        <p>0.615</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0009607</p>
                     </c>
                     <c ca="left">
                        <p>Response to biotic stimulus</p>
                     </c>
                     <c ca="center">
                        <p>166</p>
                     </c>
                     <c ca="center">
                        <p>0.761</p>
                     </c>
                     <c ca="center">
                        <p>0.723</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>74</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.783</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.686</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>0.680</p>
                     </c>
                     <c ca="center">
                        <p>0.745</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0007165</p>
                     </c>
                     <c ca="left">
                        <p>Signal transduction</p>
                     </c>
                     <c ca="center">
                        <p>563</p>
                     </c>
                     <c ca="center">
                        <p>0.684</p>
                     </c>
                     <c ca="center">
                        <p>0.656</p>
                     </c>
                     <c ca="center">
                        <p>342</p>
                     </c>
                     <c ca="center">
                        <p>0.687</p>
                     </c>
                     <c ca="center">
                        <p>0.668</p>
                     </c>
                     <c ca="center">
                        <p>221</p>
                     </c>
                     <c ca="center">
                        <p>0.666</p>
                     </c>
                     <c ca="center">
                        <p>0.643</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cellular component</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0005739</p>
                     </c>
                     <c ca="left">
                        <p>Mitochondrion</p>
                     </c>
                     <c ca="center">
                        <p>171</p>
                     </c>
                     <c ca="center">
                        <p>0.785</p>
                     </c>
                     <c ca="center">
                        <p>0.756</p>
                     </c>
                     <c ca="center">
                        <p>148</p>
                     </c>
                     <c ca="center">
                        <p>0.780</p>
                     </c>
                     <c ca="center">
                        <p>0.770</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>0.869</p>
                     </c>
                     <c ca="center">
                        <p>0.707</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0005737</p>
                     </c>
                     <c ca="left">
                        <p>Cytoplasm</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>773</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.756</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.719</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>579</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.759</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.727</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>194</p>
                     </c>
                     <c ca="center">
                        <p>0.728</p>
                     </c>
                     <c ca="center">
                        <p>0.707</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0005783</p>
                     </c>
                     <c ca="left">
                        <p>Endoplasmic reticulum</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>153</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.791</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.713</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>112</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.776</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.712</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>41</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.881</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.713</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0005576</p>
                     </c>
                     <c ca="left">
                        <p>Extracellular region</p>
                     </c>
                     <c ca="center">
                        <p>219</p>
                     </c>
                     <c ca="center">
                        <p>0.653</p>
                     </c>
                     <c ca="center">
                        <p>0.621</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                     <c ca="center">
                        <p>0.718</p>
                     </c>
                     <c ca="center">
                        <p>0.591</p>
                     </c>
                     <c ca="center">
                        <p>142</p>
                     </c>
                     <c ca="center">
                        <p>0.523</p>
                     </c>
                     <c ca="center">
                        <p>0.635</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>GO:0005886</p>
                     </c>
                     <c ca="left">
                        <p>Plasma membrane</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>373</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.720</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.661</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>189</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.735</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.656</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>184</p>
                     </c>
                     <c ca="center">
                        <p>0.663</p>
                     </c>
                     <c ca="center">
                        <p>0.666</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Entries in bold are those that have a significantly different d<sub>SM </sub>distribution (<it>p </it>&lt; 0.05). The number of genes (N) is indicated for each GO class. Results for CpG+ and CpG- genes are shown.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>CpG island content and coding sequence evolutionary rate</p>
            </st>
            <p>The promoters of HK genes are rich in CpG islands <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. This could potentially influence the level of conservation of promoter sequences. Therefore, we divided the gene dataset into genes containing CpG islands (CpG+) and genes not containing CpG islands (CpG-), according to the presence or absence of a CpG island in the region -100 to +100 (see Materials and methods), and analyzed the two groups separately. Of the mouse genes, 65% were classified as CpG+ (91% of the human orthologs of these were also CpG+). Among the genes classified as HK, this number went up to 88%. The length of CpG islands was not significantly different in HK and non-HK genes.</p>
            <p>Within CpG+ genes, we observed the previously described positive relationship between promoter sequence divergence (d<sub>SM</sub>) and expression breadth. HK genes (expressed in 51-55 tissues) had an average d<sub>SM </sub>of 0.739, whereas genes expressed in an intermediate number of tissues (11-50) and those with restricted expression (1-10 tissues) had average d<sub>SM </sub>scores of 0.708 and 0.679, respectively. These scores are comparable to those obtained previously (Table <tblr tid="T1">1</tblr>) and the differences between HK and non-HK genes were highly significant (<it>p </it>&lt; 10<sup>-4</sup>; Figure <figr fid="F2">2</figr>, top row, middle). Similar results were obtained with other gene expression datasets (Figures S1, S2 and S3 in Additional data file 2).</p>
            <p>In contrast, in CpG- genes the differences between HK and non-HK genes were smaller, and did not reach statistical significance in the mouse gene dataset (Figure <figr fid="F2">2</figr>, top row, right). Indeed, HK genes that did not contain CpG islands (12% of the HK genes) showed average promoter sequence divergence similar to that of non-HK genes (around 0.69). Thus, this minority of HK genes with no CpG islands appeared to have increased sequence evolutionary constraints in relation to the rest of the HK genes.</p>
            <p>We also assessed if the presence or absence of CpG islands influenced d<sub>SM </sub>differences between HK and non-HK genes within the same GO class. In CpG+ genes the differences between HK and non-HK genes were even more marked than in the complete dataset, and three additional GO functions showed statistical differences (Table <tblr tid="T2">2</tblr>). In CpG- genes, instead, the differences between HK and non-HK genes per GO class were, in almost all cases, not significant.</p>
            <p>We had previously described a positive correlation between the non-synonymous substitution rate, Ka (or Ka/Ks), and promoter sequence divergence (d<sub>SM</sub>). That is, many rapidly evolving coding sequences were associated with poorly conserved promoters. This seemed at first to contradict the finding that HK genes, with typically low Ka values, tended to have highly divergent promoters. To unravel the effect of coding sequence evolutionary rate and expression breadth in promoter sequence evolution, we divided the gene dataset into two groups, genes with Ka/Ks &lt; 0.06, a fraction representing about one-third of the genes and highly enriched in HK genes, and the rest of the genes, with Ka/Ks &#8805; 0.06.</p>
            <p>The first observation was that, according to the general correlation, genes with more slowly evolving coding sequences (Ka/Ks &lt; 0.06) showed higher promoter conservation than those with Ka/Ks &#8805; 0.06 (average d<sub>SM </sub>of 0.663 and 0.722, respectively). However, this was mostly due to genes that were not HK genes (Figure <figr fid="F2">2</figr>, middle row, left), which explained the apparent contradiction mentioned before. Among genes with Ka/Ks &lt; 0.06, the average d<sub>SM </sub>was 0.72 for HK genes, but 0.65 for non-HK genes. Not surprisingly, we found that the previously observed correlation between d<sub>SM </sub>and Ka/Ks was more relevant in non-HK genes (r = 0.17, <it>p </it>&lt; 10<sup>-19</sup>) than in HK genes (r = 0.10, <it>p </it>&lt; 0.002).</p>
         </sec>
         <sec>
            <st>
               <p><it>Cis</it>-regulatory motif content in housekeeping gene promoters</p>
            </st>
            <p>The differences in promoter sequence divergence associated with expression tissue distribution are likely to reflect the presence of different functional regulatory motifs in genes with diverse expression patterns. Among the expression groups previously defined (restricted, intermediate and HK) only the HK gene group probably represents a rather homogeneous class from a gene expression regulatory perspective. Other groups include genes that are active in diverse tissues and that are likely to be regulated by very different factors. We thus investigated whether the promoters of HK genes were enriched in specific transcription factor binding motifs.</p>
            <p>In the first place, we mapped all experimentally verified transcription factor binding sites (TFBSs) from TRANSFAC <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> in the human and mouse promoter sequences. We observed that approximately 75% of mapped TFBSs fell into conserved regions, which only occupy approximately 30% of the sequence analyzed. However, as only less than 2% of the genes in the dataset contained known TFBSs, we could not infer any statistically significant biases from these data. For this reason, we decided to use motifs predicted by weight matrices representing known TFBSs. We performed separate analysis with the vertebrate TFBS weight matrix collections available from TRANSFAC and PROMO <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. We identified nine motifs that were consistently over-represented in the aligned parts of HK gene promoters using the two weight matrix datasets (&#967;<sup>2 </sup>test, <it>p </it>&lt; 10<sup>-5</sup>; Table <tblr tid="T3">3</tblr>). The motifs were recognized by particular transcription factors or families of transcription factors, according to data in TRANSFAC and PROMO. Among them were commonly found regulators such as Sp1, or members of the ATF (activating transcription factor) family. We also analyzed HK motif over-representation separately in aligned regions located either downstream or upstream of position -500. Whereas in the region from the TSS to -500 the nine distinct motifs became even more strongly over-represented than in the 2 Kb promoter, in the more distal promoter region, upstream of -500, four of the motifs - ATF, CREB, NRF1/2 and USF - were no longer significant. We next determined the expression class of the transcription factors that could bind to the nine motif types, using the previously defined three expression groups. Importantly, all transcription factors showed HK or intermediate expression patterns (Table <tblr tid="T3">3</tblr>), and none showed tissue-restricted expression, which is consistent with a putative role in the regulation of HK genes. Therefore, we could define a group of factors that, mainly through interactions with HK proximal promoter regions, are likely to play important roles in the maintenance of adequate levels of expression of this type of genes.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Transcription factors with predicted binding motifs over-represented in HK gene promoters</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>Transcription factor</p>
                     </c>
                     <c ca="left">
                        <p>Description</p>
                     </c>
                     <c ca="left">
                        <p>Expression breadth</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AHR and ARNT</p>
                     </c>
                     <c ca="left">
                        <p>Aryl hydrocarbon receptor; it can interact with ARNT (AHR:ARNT heterodimer)</p>
                     </c>
                     <c ca="left">
                        <p>INT</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ATF family</p>
                     </c>
                     <c ca="left">
                        <p>Activating transcription factor</p>
                     </c>
                     <c ca="left">
                        <p>HK</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CREB family</p>
                     </c>
                     <c ca="left">
                        <p>cAMP responsive element binding protein</p>
                     </c>
                     <c ca="left">
                        <p>INT</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>E2F family</p>
                     </c>
                     <c ca="left">
                        <p>E2F transcription factor</p>
                     </c>
                     <c ca="left">
                        <p>INT and HK</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HIF1A</p>
                     </c>
                     <c ca="left">
                        <p>Hypoxia inducible factor 1, alpha subunit; as AHR, it can interact with ARNT</p>
                     </c>
                     <c ca="left">
                        <p>HK</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MYC and MAX</p>
                     </c>
                     <c ca="left">
                        <p>Proto-oncogene protein c-myc and MYC associated factor X; they can form MYC:MAX heterodimers</p>
                     </c>
                     <c ca="left">
                        <p>INT and HK</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>NRF1 and NRF2</p>
                     </c>
                     <c ca="left">
                        <p>Nuclear respiratory factor 1 and 2</p>
                     </c>
                     <c ca="left">
                        <p>INT and HK</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SP1</p>
                     </c>
                     <c ca="left">
                        <p>SP1 transcription factor</p>
                     </c>
                     <c ca="left">
                        <p>HK</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>USF</p>
                     </c>
                     <c ca="left">
                        <p>Upstream transcription factor (USF1 and USF2)</p>
                     </c>
                     <c ca="left">
                        <p>INT</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>HK, housekeeping; INT, intermediate.</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>In this work we present the first evidence, at least to our knowledge, of a relationship between promoter sequence divergence and gene expression breadth. We have observed that the promoters of HK genes tend to be less conserved than those of non-HK genes, especially in the distal promoter region, upstream of position -500. Given the strong conservation of HK gene expression patterns across organisms <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, high promoter sequence divergence is likely to reflect weak functional constraints rather than sequence diversification driven by the acquisition of new functionalities. These observations raise the interesting possibility that HK genes have shorter functional promoters. Interestingly, other features of HK genes tend to shortness; in particular, they have been described to have shorter coding, intronic, and intergenic sequences <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. As a consequence, and with the exception of plants <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, transcripts of HK genes tend to be short. One hypothesis put forward to explain this observation is selection for economy in transcription and translation <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. An alternative hypothesis, called 'genome design', is that tissue-specific genes require a greater amount of non-coding DNA due to their more complex regulation <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Our results show that HK genes contain more divergent distal promoter sequences than non-HK genes. In line with the 'genome design' hypothesis, this may be due to their relatively simple expression patterns, requiring less regulatory sequences.</p>
         <p>In mammals, conservation of a gene's upstream sequence is related to the function of the encoded protein <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. Iwama and Gojobori <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> found that genes encoding transcription factors and developmental proteins showed high gene upstream sequence conservation. Similarly, Lee <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> showed that genes involved in complex and adaptative processes, such as development, cell communication, neural function, and signaling, were associated with higher promoter sequence conservation despite their relative recent emergence during evolution. On the contrary, genes involved in basic processes, such as metabolism and ribosomal function, contained poorly conserved promoters. Our study is consistent with these findings, as the former genes are under-represented in HK genes, while the later are over-represented. However, by directly relating promoter conservation to mode of expression, we are able to propose a more direct explanation for the differences in promoter sequence conservation between genes that perform basic housekeeping functions, and which are simply regulated, and genes that are important for tissue- or organ-specific processes, which may require a more complex regulation. In addition, function alone cannot explain the differences across genes, as the reduced promoter sequence conservation in HK genes with respect to non-HK genes is essentially maintained within different functional (GO) classes.</p>
         <p>The existence of a positive correlation between the speed of evolution of regulatory sequences and that of coding sequences in orthologous genes is suggestive of a link between rapid diversification of a protein and its expression pattern. We have found that in mammals there is a weak but significant correlation between these two factors, in accordance with previous observations in nematodes <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and yeast <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Interestingly, we have observed that this relationship is especially relevant for non-HK genes, while in HK genes the effect is practically negligible.</p>
         <p>The CpG island gene classification and association with expression breadth observed here is consistent with other reports <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B24">24</abbr></abbrgrp>. The majority of mammalian promoters contain CpG islands and HK genes are particularly rich in this type of sequence. Our study shows that promoters that do not contain CpG islands are more strongly conserved than those that do, and even more so if the genes encode slowly evolving proteins. Promoters with no CpG islands correspond to classical TATA-containing promoters and it has been recently shown in a large-scale analysis that they are particularly well-conserved across different mammalian species <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
         <p>We identify nine different motifs, corresponding to known transcription factor binding sites, that are significantly over-represented in HK genes. Most of the transcription factors that bind to these sites are themselves encoded by HK genes and the rest are encoded by genes classified as of intermediate expression breadth. Five of the motifs (binding Sp1, USF, NRF1, CREB, or ATF) show high frequency peaks in the vicinity of the TSS (-200 to -1) in a large collection of human promoters, and the combination of two of them (binding Sp1 and NRF1) is over-represented in HK gene promoters <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Some of the motifs identified are bound by known regulators of HK genes; examples are Sp1 and USF for the APEX nuclease gene <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> or Sp1 and HIF-1 for the endoglin gene <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
         <p>Of note, besides HK genes, we also find differences between the groups of genes with restricted expression (1-10 tissues) and intermediate expression (11-50 tissues). 'Restricted' genes tend to show higher promoter conservation than 'intermediate' genes (Table <tblr tid="T1">1</tblr>; Aupplementary Tables S1, S2, and S3 in Additional data file 1). These results may seem counter-intuitive, as one could argue that genes expressed in only a few tissues should have more simple regulation than genes expressed in an intermediate number of tissues. However, one possibility is that 'restricted' genes contain a larger number of negative regulatory elements. Interestingly, gene reporter assays of promoter activity in ENCODE regions (approximately 1% of the genome) have shown that negative elements appear to be present from 1,000 to 500 nucleotides upstream of the TSS in 55% of genes <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. This indicates that motifs for inhibitory transcription factors may be present in a substantial fraction of genes. One expects that such regions will be more common in tissue-specific 'restricted' genes, which would be consistent with the observed stronger distal promoter sequence conservation.</p>
         <p>It has been observed that metazoan-specific proteins tend to be more tissue-specific than universal eukaryotic proteins <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. In other words, HK genes are enriched for proteins of ancient origin. Old eukaryotic proteins typically evolve more slowly and are longer than proteins of a more recent origin, probably due to increased functional constraints <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. However, at the level of gene expression regulatory regions they may be simpler and less constrained than genes that represent innovations in multi-cellular organisms. Cross-species comparisons will be used in future studies to gain further insight into these questions.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We describe that genes with housekeeping expression contain more divergent promoters than genes with a more restricted tissue expression. Importantly, this property cannot be fully explained by the functional class of the encoded gene products, or by a higher prevalence of CpG islands in HK gene promoters. In addition, we have identified a number of transcription factors that are likely to play a predominant role in the control of HK gene expression. We argue that the lower promoter conservation observed in HK genes could be due to a more simple regulation of gene transcription.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Sequence retrieval and alignment</p>
            </st>
            <p>We identified human and mouse orthologous genes using the Ensembl database (release 34) <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. We considered only orthology relationships of type UBRH (unique best reciprocal hit): 17,620 records of human genes with orthologous mouse genes (human-mouse dataset) and 12,868 of mouse genes with orthologous human genes (mouse-human dataset). We extracted the promoter sequences from these genes, comprising 2 Kb upstream of the TSS, from the UCSC database (hg17 and mm6 releases) <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, excluding genes with multiple TSSs, discarding duplicates, and considering only gene pairs with human-mouse and mouse-human orthology data that were both available and congruent. The resulting dataset contained 8,972 orthologous promoter sequence pairs. We discarded repeats from alignments using RepeatMasker (release 1.1.65) <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. We aligned the sequences with the local pairwise sequence alignment program described in Castillo-Davis <it>et al</it>. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, using a minimum alignment length of 16 nucleotides. For each orthologous pair we obtained the promoter sequence divergence score (d<sub>SM</sub>; shared motif divergence), which is the fraction of the sequence that does not align, taking the average between the human and mouse promoter sequences. The fraction of sequence aligned was then 1 - d<sub>SM</sub>. We calculated the average 1 - d<sub>SM </sub>in 100 nucleotide sequence windows overlapping by 20 nucleotides. Failure to align portions of the promoter may be due to very high divergence or the occurrence of insertions/deletions. To obtain an estimate of the d<sub>SM </sub>random expectation we aligned, with the same program, 1,000,000 pairs of 2 Kb random sequences and calculated their d<sub>SM </sub>scores. We discarded orthologous pairs with an overall average d<sub>SM </sub>> 0.97 (random expectation &#8805;0.01), obtaining 7,330 orthologous promoter sequence pairs. Coding sequences were extracted from the Ensembl database (release 34) and aligned with ClustalW <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Substitution rate estimation</p>
            </st>
            <p>Synonymous (Ks) and non-synonymous (Ka) substitution rates were estimated with the codeml program in PAML <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. From the 7,330 orthologous pairs, 6,698 remained after discarding those with Ka &#8805; 0.5, Ks &#8805; 2.0, or Kp &#8805; 2.0 (saturated pairs). We estimated, for each gene, the number of nucleotide substitutions per site in the concatenated promoter sequence alignment, using the baseml program, with the Hasegawa, Kishino and Yano (1985) model <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, in PAML. This substitution rate was termed Kp.</p>
         </sec>
         <sec>
            <st>
               <p>Gene expression datasets</p>
            </st>
            <p>We used mouse transcriptome microarray data from Zhang <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> to classify the previously defined genes into different groups according to their expression in 55 different mouse organs and tissues (see Supplementary table S5 in Additional data file 1). Zhang <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> considered genes to be expressed only if their intensity exceeded the 99th percentile of intensities from the negative controls.</p>
            <p>In addition, we used human gene expression data from Gene Atlas (GNF1H), based on transcriptome microarray data <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and human gene expression data from the eVOC database (anatomical system and cell type ontologies, release 2.7), based on expressed sequence tag data <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. We considered genes to be expressed in a tissue according to Gene Atlas data only if the expression level was &#8805;200. Gene Atlas covers 79 human organs and tissues (see Supplementary table S5 in Additional data file 1). For eVOC anatomical systems and cell types we discarded classes with a very small number of genes (&lt;1,000) or large classes with high redundancy (>90% of genes shared with other classes). This resulted in 57 anatomical systems and 10 cell types (see Supplementary table S5 in Additional data file 1). HK, intermediate and restricted expression groups were defined following similar criteria as for the mouse transcriptome data.</p>
            <p>Complete sequence divergence data for the different expression groups are available in Additional data file 3.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical tests and correlations</p>
            </st>
            <p>Correlations were calculated with the Spearman Rank correlation method. Two-sample Wilcoxon-Mann-Whitney statistical test was used to assess differences between groups unless stated. The R statistical package was used <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Gene Ontology functions</p>
            </st>
            <p>GO annotations were extracted from Ensembl (release 34) <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. We used the GO term definitions of 30 March, 2005 <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Over-representation and under-representation of HK genes in different GO classes were verified by chi-square test (<it>p </it>&lt; 0.01), using expected values calculated from the percent number of HK genes in the root GO term of each ontology (GO:0003674, molecular function; GO:0008150, biological process; GO:0005575, cellular component). Only GO terms containing a number of genes between 50 and 1,000, both included, were considered. Some GO terms were discarded to reduce redundancies.</p>
         </sec>
         <sec>
            <st>
               <p>Transcription factor binding site predictions</p>
            </st>
            <p>We used weight matrices from PROMO (release 3) <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B45">45</abbr></abbrgrp> and TRANSFAC (release 7.0) <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> to predict transcription factor binding sites. Motif searches were carried out with a similarity cut-off of 0.85. We selected motifs consistently predicted by both matrix collections that were over-represented in HK genes versus all the genes taken together using the chi-square test.</p>
         </sec>
         <sec>
            <st>
               <p>CpG islands</p>
            </st>
            <p>We extracted sequences -100 to +100 with respect to the TSS. We classified genes as CpG+ (CpG island-positive near TSS), when the C+G content exceeded 0.55 and the CpG score (observed CpG/expected CpG) exceeded 0.65 in the -100 to +100 region, or as CpG- (CpG island-negative near TSS), otherwise. This classification is similar to that used by Yamashita <it>et al</it>. <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, but with more stringent values for CpG+ determination, in line with the CpG island definition proposed by Takai and Jones <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. To study differences in CpG island sequence conservation between HK and non-HK genes, we extended the CpG islands upstream, such that the G+C content exceeded 0.55 and the CpG score exceeded 0.65, calculating in this manner the 5' end point of CpG islands.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this manuscript. Additional data file <supplr sid="S1">1</supplr> contains Supplementary tables S1-S5: Table S1 lists human gene sequence divergence values in expression groups according to Gene Atlas (GNF1H); Table S2 lists human gene sequence divergence values in expression groups according to the eVOC anatomical system classification; Table S3 lists human gene sequence divergence values in expression groups according to the eVOC cell type classification; Table S4 lists GO terms over-represented and under-represented in HK genes with their average d<sub>SM </sub>values; and Table S5 lists the organs, tissues, and cell types considered in each expression dataset. Additional data file <supplr sid="S2">2</supplr> contains figures plotting promoter sequence conservation along 2 Kb upstream of the TSS in HK and non-HK genes considering expression groups according to Gene Atlas GNF1H (Figure S1), the eVOC anatomical system classification (Figure S2), and the eVOC cell type classification (Figure S3). Additional data file <supplr sid="S3">3</supplr> contains the complete sequence divergence and expression group data used in this manuscript. Additional data file <supplr sid="S4">4</supplr> contains human 2 Kb upstream sequences (human promoters), in fasta format. Additional data file <supplr sid="S5">5</supplr> contains mouse 2 Kb upstream sequences (mouse promoters), in fasta format.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Supplementary tables S1-S5</p>
            </caption>
            <text>
               <p>Table S1: human gene sequence divergence values in expression groups according to Gene Atlas (GNF1H). Table S2: human gene sequence divergence values in expression groups according to the eVOC anatomical system classification. Table S3: human gene sequence divergence values in expression groups according to the eVOC cell type classification. Table S4: lists GO terms over-represented and under-represented in HK genes with their average d<sub>SM </sub>values. Table S5: the organs, tissues, and cell types considered in each expression dataset.</p>
            </text>
            <file name="gb-2007-8-7-r140-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Plots of promoter sequence conservation along 2 Kb upstream of the TSS in HK and non-HK genes</p>
            </caption>
            <text>
               <p>Expression groups are according to Gene Atlas GNF1H (Figure S1), the eVOC anatomical system classification (Figure S2), and the eVOC cell type classification (Figure S3).</p>
            </text>
            <file name="gb-2007-8-7-r140-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Complete sequence divergence and expression group data used in this manuscript</p>
            </caption>
            <text>
               <p>Complete sequence divergence and expression group data used in this manuscript.</p>
            </text>
            <file name="gb-2007-8-7-r140-S3.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>Human 2 Kb upstream sequences (human promoters), in fasta format</p>
            </caption>
            <text>
               <p>Human 2 Kb upstream sequences (human promoters), in fasta format.</p>
            </text>
            <file name="gb-2007-8-7-r140-S4.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>Mouse 2 Kb upstream sequences (mouse promoters), in fasta format</p>
            </caption>
            <text>
               <p>Mouse 2 Kb upstream sequences (mouse promoters), in fasta format.</p>
            </text>
            <file name="gb-2007-8-7-r140-S5.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors thank Miriam Subirats and Neus Xivill&#233;, and members of the Computational Genomics group at GRIB/CRG, for helpful comments. We acknowledge support from Instituto Nacional de Bioinform&#225;tica, Fundaci&#243;n Banco Bilbao Vizcaya Argentaria, Plan Nacional de I+D Ministerio de Educaci&#243;n y Ciencia (BIO2006-07120/BMC), European Comission Infobiomed NoE, and Fundaci&#243; ICREA.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The evolution of transcriptional regulation in eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Wray</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Hahn</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Abouheif</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Balhoff</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Pizer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rockman</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Romano</snm>
                  <fnm>LA</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>1377</fpage>
            <lpage>1419</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msg140</pubid>
                  <pubid idtype="pmpid" link="fulltext">12777501</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Embryonic epsilon and gamma globin genes of a prosimian primate (<it>Galago crassicaudatus</it>). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints.</p>
            </title>
            <aug>
               <au>
                  <snm>Tagle</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Koop</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Goodman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Slightom</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Hess</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>RT</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1988</pubdate>
            <volume>203</volume>
            <fpage>439</fpage>
            <lpage>455</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(88)90011-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">3199442</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Identification of conserved regulatory elements by comparative genome analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Lenhard</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sandelin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mendoza</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Engstrom</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Jareborg</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Wasserman</snm>
                  <fnm>WW</fnm>
               </au>
            </aug>
            <source>J Biol</source>
            <pubdate>2003</pubdate>
            <volume>2</volume>
            <fpage>13</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193685</pubid>
                  <pubid idtype="pmpid" link="fulltext">12760745</pubid>
                  <pubid idtype="doi">10.1186/1475-4924-2-13</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover.</p>
            </title>
            <aug>
               <au>
                  <snm>Dermitzakis</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>1114</fpage>
            <lpage>1121</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12082130</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Evidence for widespread degradation of gene control regions in hominid genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Keightley</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Lercher</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Eyre-Walker</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>e42</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">544929</pubid>
                  <pubid idtype="pmpid" link="fulltext">15678168</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0030042</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Initial sequencing and comparative analysis of the mouse genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Waterston</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Agarwala</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ainscough</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Alexandersson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>An</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01262</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466850</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Identification and functional analysis of human transcriptional promoters.</p>
            </title>
            <aug>
               <au>
                  <snm>Trinklein</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Aldred</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Saldanha</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>308</fpage>
            <lpage>312</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">420378</pubid>
                  <pubid idtype="pmpid" link="fulltext">12566409</pubid>
                  <pubid idtype="doi">10.1101/gr.794803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kohane</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>168</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1310621</pubid>
                  <pubid idtype="pmpid" link="fulltext">16309559</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-6-168</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Highly conserved upstream sequences for transcription factor genes and implications for the regulatory network.</p>
            </title>
            <aug>
               <au>
                  <snm>Iwama</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gojobori</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>17156</fpage>
            <lpage>17161</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">534610</pubid>
                  <pubid idtype="pmpid" link="fulltext">15572454</pubid>
                  <pubid idtype="doi">10.1073/pnas.0407670101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Sequence comparison of human and mouse genes reveals a homologous block structure in the promoter regions.</p>
            </title>
            <aug>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shirota</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sakakibara</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chiba</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mizushima-Sugano</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nakai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sugano</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1711</fpage>
            <lpage>1718</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">515316</pubid>
                  <pubid idtype="pmpid" link="fulltext">15342556</pubid>
                  <pubid idtype="doi">10.1101/gr.2435604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p><it>cis</it>-Regulatory and protein evolution in orthologous and duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Castillo-Davis</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Achaz</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1530</fpage>
            <lpage>1536</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">509261</pubid>
                  <pubid idtype="pmpid" link="fulltext">15256508</pubid>
                  <pubid idtype="doi">10.1101/gr.2662504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Genome-wide regulatory complexity in yeast promoters: separation of functionally conserved and neutral sequence.</p>
            </title>
            <aug>
               <au>
                  <snm>Chin</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Chuang</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>205</fpage>
            <lpage>213</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">546519</pubid>
                  <pubid idtype="pmpid" link="fulltext">15653830</pubid>
                  <pubid idtype="doi">10.1101/gr.3243305</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>PAML: a program package for phylogenetic analysis by maximum likelihood.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <fpage>555</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9367129</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The functional landscape of mouse gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>QD</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shai</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Bakowski</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Mitsakakis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mohammad</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Zirngibl</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Somogyi</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Biol</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <fpage>21</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">549719</pubid>
                  <pubid idtype="pmpid" link="fulltext">15588312</pubid>
                  <pubid idtype="doi">10.1186/jbiol16</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Mammalian housekeeping genes evolve more slowly than tissue-specific genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>236</fpage>
            <lpage>239</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh010</pubid>
                  <pubid idtype="pmpid" link="fulltext">14595094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate.</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>68</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10666707</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>eVOC: a controlled vocabulary for unifying gene expression data.</p>
            </title>
            <aug>
               <au>
                  <snm>Kelso</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Visagie</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Theiler</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Christoffels</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bardien</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Smedley</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Otgaar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Greyling</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Jongeneel</snm>
                  <fnm>CV</fnm>
               </au>
               <au>
                  <snm>McCarthy</snm>
                  <fnm>MI</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1222</fpage>
            <lpage>1230</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403650</pubid>
                  <pubid idtype="pmpid" link="fulltext">12799354</pubid>
                  <pubid idtype="doi">10.1101/gr.985203</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>A gene atlas of the mouse and human protein-encoding transcriptomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Su</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Wiltshire</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Batalov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lapp</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ching</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Block</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Soden</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hayakawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kreiman</snm>
                  <fnm>G</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>6062</fpage>
            <lpage>6067</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395923</pubid>
                  <pubid idtype="pmpid" link="fulltext">15075390</pubid>
                  <pubid idtype="doi">10.1073/pnas.0400782101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Protein domains enriched in mammalian tissue-specific or widely expressed genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lehner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>468</fpage>
            <lpage>472</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2004.08.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">15363898</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Relationship between the tissue-specificity of mouse gene expression and the evolutionary origin and function of the proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Freilich</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Massingham</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bhattacharyya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ponsting</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lyons</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Freeman</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R56</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1175987</pubid>
                  <pubid idtype="pmpid" link="fulltext">15998445</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-7-r56</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium.</p>
            </title>
            <