<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2001-2-9-research0035</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>A functional update of the <it>Escherichia coli</it> K-12 genome</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Serres</snm>
               <mi>H</mi>
               <fnm>Margrethe</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2">
               <snm>Gopal</snm>
               <fnm>Shuba</fnm>
               <insr iid="I2"/>
            </au>
            <au id="A3">
               <snm>Nahum</snm>
               <mi>A</mi>
               <fnm>Laila</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A4">
               <snm>Liang</snm>
               <fnm>Ping</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A5">
               <snm>Gaasterland</snm>
               <fnm>Terry</fnm>
               <insr iid="I2"/>
            </au>
            <au id="A6" ca="yes">
               <snm>Riley</snm>
               <fnm>Monica</fnm>
               <insr iid="I1"/>
               <email> mriley@mbl.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>The Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543, USA</p>
            </ins>
            <ins id="I2">
               <p>Laboratory of Computational Genomics, Rockefeller University, New York, NY 10021, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2001</pubdate>
         <volume>2</volume>
         <issue>9</issue>
         <fpage>research0035.1</fpage>
         <lpage>research0035.7</lpage>
         <url>http://genomebiology.com/2001/2/9/research/0035</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2001-2-9-research0035</pubid>
               <pubid idtype="pmpid">11574054</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>2</day>
               <month>4</month>
               <year>2001</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>8</day>
               <month>6</month>
               <year>2001</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>10</day>
               <month>7</month>
               <year>2001</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>20</day>
               <month>8</month>
               <year>2001</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2001</year>
         <collab>Serres et al., licensee BioMed Central Ltd</collab>
      </cpyrt>
      <shortabs>
         <p>The genome of <it>Escherichia coli</it> K-12 has been reannotated, with the aid of new biological characterization and as the functions of sequence-similar proteins has become available. The coding sequences are represented by modules (protein elements of at least 100 amino acids). Of these, 48.9% have been characterized, 29.5% have an imputed function, 2.1% have a phenotype and 19.5% have no function assignment. Only 7% of the modules appear unique to <it>E. coli</it>.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Since the genome of <it>Escherichia coli</it> K-12 was initially annotated in 1997, additional functional information based on biological characterization and functions of sequence-similar proteins has become available. On the basis of this new information, an updated version of the annotated chromosome has been generated.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The <it>E. coli</it> K-12 chromosome is currently represented by 4,401 genes encoding 116 RNAs and 4,285 proteins. The boundaries of the genes identified in the GenBank Accession U00096 were used. Some protein-coding sequences are compound and encode multimodular proteins. The coding sequences (CDSs) are represented by modules (protein elements of at least 100 amino acids with biological activity and independent evolutionary history). There are 4,616 identified modules in the 4,285 proteins. Of these, 48.9% have been characterized, 29.5% have an imputed function, 2.1% have a phenotype and 19.5% have no function assignment. Only 7% of the modules appear unique to <it>E. coli,</it> and this number is expected to be reduced as more genome data becomes available. The imputed functions were assigned on the basis of manual evaluation of functions predicted by BLAST and DARWIN analyses and by the MAGPIE genome annotation system.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Much knowledge has been gained about functions encoded by the <it>E. coli</it> K-12 genome since the 1997 annotation was published. The data presented here should be useful for analysis of <it>E. coli</it> gene products as well as gene products encoded by other genomes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010015">Model organisms</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The field of genomics has been expanding at a rapid pace since the annotated <it>Escherichia coli</it> K-12 genome was published in 1997 [<abbr bid="B1">1</abbr>], with the current number of published genomes exceeding 66 and with another 364 on their way according to the Genomes OnLine Database (GOLD) [<abbr bid="B2">2</abbr>]. Deciphering the functions encoded by all gene products of the genomes is the next big challenge in the field. Function attributions through experimental, biochemical and genetic analyses and through bioinformatic studies are continuing, and microarray technology is shedding additional light on the functions associated with the gene products of the organism in question. The wealth of biological information on <it>E. coli</it> is still increasing [<abbr bid="B3">3</abbr>] and is contributing to a better understanding of this organism as well as of functions encoded in other organisms. It is therefore important that the most up-to-date information on <it>E. coli</it> gene products is available and used by researchers.</p>
         <p>Several databases have been assembled for various areas of knowledge about the <it>E. coli</it> genome [<abbr bid="B4">4</abbr>,<abbr bid="B5">5</abbr>,<abbr bid="B6">6</abbr>,<abbr bid="B7">7</abbr>,<abbr bid="B8">8</abbr>,<abbr bid="B9">9</abbr>]. Each compilation has a different emphasis and collects different sets of information related to the function of the gene products. In the GenProtEC database, we have been curating information on physiological function and modular construction of gene products. Other databases most closely related to ours include EcoCyc, with emphasis on metabolic pathways [<abbr bid="B6">6</abbr>], the CGSC database, with information on the genotypes and phenotypes of mutant strains [<abbr bid="B8">8</abbr>], and EcoGene, which includes information on gene reconstructions, alternative gene boundaries and verified amino-terminal amino-acid sequences of the mature proteins [<abbr bid="B5">5</abbr>]. The <it>E. coli</it> genome project at the University of Wisconsin-Madison presents genome data on <it>E. coli</it> K-12 and pathogenic enterobacteria [<abbr bid="B9">9</abbr>].</p>
         <p>We present a functional update for <it>E. coli</it> K-12 gene products that incorporates information from the literature and referenced databases obtained since the 1997 GenBank deposit. Our focus has been the biological function of the gene products. Coding sequences (CDSs) encoding proteins whose function previously was imputed or not known were re-evaluated, and putative functions were assigned by manually evaluating the results from BLAST and DARWIN (data analysis and retrieval with indexed nucleotide/peptide sequences) analyses. The MAGPIE (multipurpose automated genome project investigation environment) genome annotation system [<abbr bid="B10">10</abbr>] was also applied. MAGPIE detected alternative boundaries for some of the open reading frames (ORFs).</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Number of genes in the <it>E. coli</it> K-12 genome</p>
            </st>
            <p>For the initial annotation of the <it>E. coli</it> K-12 genome [<abbr bid="B1">1</abbr>], 4,404 genes were identified with Blattner numbers (Bnums). Among the genes, 4,288 were believed to encode proteins and 116 to encode RNAs. Since then six Bnums have been retired: bo322, bo395, bo663, bo667, bo669 and bo671 (G. Plunkett, personal communication). In addition, three new genes have been identified and assigned to Bnums. These include the protein-coding b4406 (<it>yaeP,</it> SWISS-PROT P52099) and b4407 (<it>thiS,</it> SWISS-PROT 032583) and the RNA encoding b4408. The current number of <it>E. coli</it> genes is 4,401, with 4,285 encoding proteins and 116 encoding RNAs.</p>
            <p>MAGPIE identified 5,527 candidate CDSs that were assigned to MAGPIE identifiers (Magnums) (see MAGPIE [<abbr bid="B11">11</abbr>] for details). The 4,285 CDSs identified by Bnums were also identified with Magnums. Variations were detected for either the start or stop positions for 1,077 of these CDSs resulting in differences in the encoded proteins ranging from 1 to 147 amino acids, the latter in PtsA (Bnum b3947, Magnum ec_6103). The other Magnum-identified candidate CDSs include retired Bnums (six Magnums), CDSs located between the boundaries of Bnums (506 Magnums), and CDSs overlapping existing Bnums (730 Magnums). Among the Magnums located between the boundaries of Bnums are 21 CDSs that encode proteins of 80 or more amino acids. One such CDS identified by MAGPIE (Magnum ec_2510) is located between b1624 and b1625 and encodes a protein of 66 amino acids. The carboxy-terminal 41 amino acids of this CDS are identical to the amino-acid sequence of the recently characterized beta-lactam resistance protein Blr (SWISS-PROT P56976) located at the same position [<abbr bid="B12">12</abbr>]. Other Magnums located between Bnum boundaries may correspond to short <it>E. coli</it> proteins.</p>
         </sec>
         <sec>
            <st>
               <p>Functional annotation of <it>E. coli</it> K-12 gene products</p>
            </st>
            <p>The functional assignments of the <it>E. coli</it> gene products in the November 97 GenBank U00096 deposit represented an accumulation of information retrieved from the literature (collected in the GenProtEC and EcoCyc databases) as well as imputed functions based on similarity of a known protein to the translated sequences [<abbr bid="B1">1</abbr>]. Since the deposit to GenBank was made, our database GenProtEC has continually been updated with knowledge on <it>E. coli</it> gene products appearing in the literature [<abbr bid="B3">3</abbr>,<abbr bid="B13">13</abbr>]. Information on transcriptional regulators has been incorporated from the work of J. Collado-Vides [<abbr bid="B14">14</abbr>,<abbr bid="B15">15</abbr>], and transport protein information has been adapted from the work of M.H. Saier and I.T. Paulsen [<abbr bid="B16">16</abbr>,<abbr bid="B17">17</abbr>]. GenProtEC also contains imputed function assignments based on sequence similarity to orthologous or paralogous proteins, on gene (operon) location and on phenotypes of mutants [<abbr bid="B18">18</abbr>].</p>
            <p>Gene products whose functions were known were not considered further for the functional update. The remaining 2,294 CDSs whose gene products had a putative or unknown function assignment were analyzed using BLAST and DARWIN. BLAST analyses were carried out for both the Bnum- and the Magnum-derived protein sequences. The results for the Bnum-derived protein sequences and the automatic functions predicted by MAGPIE or HERON (human-emulated reasoning for objective notations) were manually evaluated and imputed functions were assigned. Although the manual annotation step could not compete with the speed of the automatic annotation process of HERON, it provided us with more useful function descriptions. A comparison of the manually assigned putative functions with the HERON predicted functions showed that when leaving aside issues of specificity, a nearly equivalent function was predicted in 46% of the cases, whereas in 52% of the cases less information was obtained with HERON.</p>
            <p>After the function update of the 2,294 CDSs, 1,306 gene products were assigned a putative function and 126 gene products were described by a phenotype. The remaining gene products were given one of the following three assignments: 'conserved protein', where sequence-similar matches were found but the function could not be determined in the absence of consistent functions reported for the matching sequences; 'conserved hypothetical protein', where sequence-similar matches existed but these had no associated function; 'unknown CDS', where the translated sequence had no known sequence match outside <it>E. coli.</it> The current function description includes 256 conserved proteins, 282 conserved hypothetical proteins and 324 unknown CDSs. The 862 gene products with no function assignment represent 19.6% of the <it>E. coli</it> chromosomal genes, and the unknown CDSs at this time represent 7.4% of <it>E. coli</it> genes.</p>
            <p>A sample of the annotated <it>E. coli</it> K-12 genes is shown in Table <tblr tid="T1">1</tblr>. Each gene is identified by a Bnum, Magnum, gene product type and gene product (Function April 2001). A complete table of the current 4,401 Bnums is available as an additional data file online and at MAGPIE [<abbr bid="B11">11</abbr>]. In this table the genes are identified by their Bnum, Bnum_module, Bnum start and stop position, Magnum, Magnum start and stop position, and gene product type. The functions of the gene products are described by the currently annotated function (Function April 2001) and by the function in the GenBank deposit (Function November 1997). A continually updated table that contains the functions of <it>E. coli</it> gene products is available through GenProtEC [<abbr bid="B4">4</abbr>].</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>A sample of annotated E. coli K-12 genes</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Bnum</p>
                     </c>
                     <c ca="center">
                        <p>Magnum</p>
                     </c>
                     <c ca="center">
                        <p>Gene</p>
                     </c>
                     <c ca="center">
                        <p>Gene product type<sup>*</sup></p>
                     </c>
                     <c ca="left">
                        <p>Gene product<sup>&#8224;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0038</p>
                     </c>
                     <c ca="center">
                        <p>ec_0059</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>caiB</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>l-carnitine dehydratase, NAD(P)-binding</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0039</p>
                     </c>
                     <c ca="center">
                        <p>ec_0061</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>caiA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pe</p>
                     </c>
                     <c ca="left">
                        <p>Putative acyl-CoA dehydrogenase</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0019</p>
                     </c>
                     <c ca="center">
                        <p>ec_0026</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>nhaA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>t</p>
                     </c>
                     <c ca="left">
                        <p>Na<sup>+</sup>/H<sup>+</sup> antiporter, NhaA family</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0040</p>
                     </c>
                     <c ca="center">
                        <p>ec_0062</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>caiT</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pt</p>
                     </c>
                     <c ca="left">
                        <p>Putative betaine/carnitine/choline transport protein, BCCT family</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0064</p>
                     </c>
                     <c ca="center">
                        <p>ec_0098</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>araC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>r</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional regulator of arabinose catabolism, AraC/XylS family</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0076</p>
                     </c>
                     <c ca="center">
                        <p>ec_0116</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>leuO</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pr</p>
                     </c>
                     <c ca="left">
                        <p>Putative transcriptional regulator of leucine biosynthesis, LysR family</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0814</p>
                     </c>
                     <c ca="center">
                        <p>ec_1234</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ompX</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>m</p>
                     </c>
                     <c ca="left">
                        <p>Outer membrane protease, receptor for phage OX2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0117</p>
                     </c>
                     <c ca="center">
                        <p>ec_0171</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>yacH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pm</p>
                     </c>
                     <c ca="left">
                        <p>Putative membrane protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0170</p>
                     </c>
                     <c ca="center">
                        <p>ec_0246</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>tsf</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>f</p>
                     </c>
                     <c ca="left">
                        <p>Protein chain elongation factor EF-Ts</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0236</p>
                     </c>
                     <c ca="center">
                        <p>ec_0334</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>prfH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pf</p>
                     </c>
                     <c ca="left">
                        <p>Putative peptide chain release factor</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0023</p>
                     </c>
                     <c ca="center">
                        <p>ec_0031</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rpsT</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>s</p>
                     </c>
                     <c ca="left">
                        <p>30S ribosomal subunit protein S20</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0138</p>
                     </c>
                     <c ca="center">
                        <p>ec_0200</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>yadM</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>ps</p>
                     </c>
                     <c ca="left">
                        <p>Putative fimbrial-like protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0684</p>
                     </c>
                     <c ca="center">
                        <p>ec_1032</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>fldA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>c</p>
                     </c>
                     <c ca="left">
                        <p>Flavodoxin 1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b1697</p>
                     </c>
                     <c ca="center">
                        <p>ec_2618</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ydiQ</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pc</p>
                     </c>
                     <c ca="left">
                        <p>Putative electron transfer flavoprotein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0251</p>
                     </c>
                     <c ca="center">
                        <p>ec_0359</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>yafY</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>h</p>
                     </c>
                     <c ca="left">
                        <p>CP4-6 prophage</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0054</p>
                     </c>
                     <c ca="center">
                        <p>ec_0083</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>imp</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>ph</p>
                     </c>
                     <c ca="left">
                        <p>Organic solvent tolerance</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0201</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rrsH</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>n</p>
                     </c>
                     <c ca="left">
                        <p>16S rRNA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0001</p>
                     </c>
                     <c ca="center">
                        <p>ec_G0001</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>thrL</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>l</p>
                     </c>
                     <c ca="left">
                        <p><it>thr</it> operon leader peptide</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0050</p>
                     </c>
                     <c ca="center">
                        <p>ec_0078</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>apaG</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>o</p>
                     </c>
                     <c ca="left">
                        <p>Conserved protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0081</p>
                     </c>
                     <c ca="center">
                        <p>ec_0123</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>mraZ</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>o</p>
                     </c>
                     <c ca="left">
                        <p>Conserved hypothetical protein</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0005</p>
                     </c>
                     <c ca="center">
                        <p>ec_G0005</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>yaaX</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>o</p>
                     </c>
                     <c ca="left">
                        <p>Unknown CDS</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>*</sup>Gene product type: c, carrier; e, enzyme; f, factor; h, extrachromosomal origin; l, leader peptide; m, membrane component; n, RNA; o, ORF of unknown function; pc, putative carrier; pe, putative enzyme; pf, putative factor; ph, phenotype; pm, putative membrane component; pr, putative regulator; ps, putative structure; pt, putative transporter; r, regulator; s, structure; t, transporter. <sup>&#8224;</sup>Gene products consisting of one identified module.</p>
               </tblfn>
            </tbl>
            <p>Many changes are evident when comparing the updated annotation to that of 1997. The number of CDSs without function assignment has been reduced from 1,354 to 862. This reduction is due to functions being experimentally determined (77 CDSs), assignment of putative functions (367 CDSs), phenotype-associated functions (14 CDSs), and genes identified as belonging to phages (138 CDSs). In addition, inferred function assignments were withdrawn for 104 CDS-coded proteins whose functions remain unknown.</p>
            <p>The number of gene products with putative function assignments has changed from 1,120 to 1,306. New functions were inferred for 473 CDSs. Putative function assignments were also removed as a result of new experimental data (175 CDSs), assignment of phenotype (8 CDSs) or reassessment of putative function assignments (104 CDSs).</p>
         </sec>
         <sec>
            <st>
               <p>Proteins as modular entities</p>
            </st>
            <p>Some of the proteins encoded in the <it>E. coli</it> genome have arisen through fusion of two or more genes. Examples of such gene fusions are the multifunctional enzymes Aas (2-acylglycerophospho-ethanolamine acyl transferase and acyl-acyl carrier protein synthetase) and G1mU (<it>N</it>-acetyl glucosamine-1-phosphate uridyltransferase and glucosamine-1-phosphate acetyl transferase) [<abbr bid="B19">19</abbr>,<abbr bid="B20">20</abbr>]. We have chosen to deal with proteins as modular entities where a module is defined as a protein element that has at least 100 amino-acid residues, carries a biological function and is presumed to have an independent evolutionary history [<abbr bid="B21">21</abbr>]. Most modules in <it>E. coli</it> are individual proteins. They can, however, also be part of a protein where multiple modules have been joined by gene fusion, as is the case for Aas and G1mU. Other protein types in <it>E. coli</it> such as transporters and regulators also involve gene fusion events. The current modular assignments are based on analysis of protein sequences within <it>E. coli</it> K-12 (P. Liang and M. Riley, unpublished data).</p>
            <p>There are at present 287 compound genes identified in the <it>E. coli</it> genome, each containing two to four modules. Table <tblr tid="T2">2</tblr> contains a list of multimodular proteins where each module encodes a distinct function. Enzymes, transporters and regulators are all present in the list. The majority of modular proteins, 217, contain modules belonging to different paralogous groups (data not shown). Other multimodular proteins appear to be a result of internal duplication (56 genes) or a combination of gene fusion and duplication (14 genes). The <it>E. coli</it> chromosome is currently represented by 4,401 genes encoding 116 RNAs and 4,616 protein modules. Additional modules are expected to be identified upon analysis of protein sequences from other genomes (P. Liang and M. Riley, unpublished data). Examples are the bifunctional proteins ThrA (aspartokinase I and homoserine dehydrogenase I) and MetL (aspartokinase II and homoserine dehydrogenase II) where only the amino-terminal modules representing the kinase activities have been identified on the basis of their sequence similarity to the <it>E. coli</it> unimodular aspartokinase III (LysC). Both the amino-terminal aspartokinase and the carboxy-terminal homoserine dehydrogenase activities of ThrA and MetL have been verified with biochemical and genetic tools [<abbr bid="B22">22</abbr>,<abbr bid="B23">23</abbr>]. The module representing the dehydrogenase activity has not been identified by matching internal paralogs as <it>E. coli</it> itself does not contain a unimodular sequence-similar dehydrogenase. <it>Saccharomyces cerevisiae,</it> however, does contain a unimodular sequence-similar homoserine dehydrogenase (DhoM, SWISS-PROT P31116), which can be used in identifying the carboxy-terminal module. Thus, by detecting orthologous matches to parts of genes we will be able to identify additional multimodular proteins.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>A sample of multimodular gene products of <it>E. coli</it> K-12</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Module</p>
                     </c>
                     <c ca="center">
                        <p>Magnum</p>
                     </c>
                     <c ca="center">
                        <p>Gene</p>
                     </c>
                     <c ca="center">
                        <p>Gene product type<sup>*</sup></p>
                     </c>
                     <c ca="left">
                        <p>Gene product</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0149_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_0214</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>mrcB</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>Glycosyl transferase of penicillin-binding protein 1b (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0149_3</p>
                     </c>
                     <c ca="center">
                        <p>ec_0214</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>mrcB</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>Transpeptidase of penicillin-binding protein 1b (3rd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0679_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_1018</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>nagE</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>t</p>
                     </c>
                     <c ca="left">
                        <p>PTS family enzyme IIC, n-acetylglucosamine-specific (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0679_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_1018</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>nagE</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>t</p>
                     </c>
                     <c ca="left">
                        <p>PTS family enzyme IIB, n-acetylglucosamine-specific (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0679_3</p>
                     </c>
                     <c ca="center">
                        <p>ec_1018</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>nagE</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>t</p>
                     </c>
                     <c ca="left">
                        <p>PTS family, enzyme IIA, n-acetylglucosamine-specific (3rd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0886_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_1338</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>cydC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>t</p>
                     </c>
                     <c ca="left">
                        <p>ABC superfamily (membrane) cytochrome-related transporter (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b0886_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_1338</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>cydC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>t</p>
                     </c>
                     <c ca="left">
                        <p>ABC superfamily (atp_bind) cytochrome-related transporter (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b1241_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_1883</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>adhE</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>Acetaldehyde-CoA dehydrogenase (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b1241_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_1883</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>adhE</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>Iron-dependent alcohol dehydrogenase (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b1439_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_2214</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ydcR</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pr</p>
                     </c>
                     <c ca="left">
                        <p>Putative transcriptional regulator, GntR family (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b1439_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_2214</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ydcR</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pt</p>
                     </c>
                     <c ca="left">
                        <p>Putative ATP-binding component of a transport system (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b1621_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_2503</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>malX</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>t</p>
                     </c>
                     <c ca="left">
                        <p>PTS family enzyme IIC, maltose and glucose-specific (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b1621_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_2503</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>malX</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>t</p>
                     </c>
                     <c ca="left">
                        <p>PTS family enzyme IIB, maltose and glucose-specific (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b2463_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_3778</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>maeB</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pe</p>
                     </c>
                     <c ca="left">
                        <p>Putative malic oxidoreductase (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b2463_3</p>
                     </c>
                     <c ca="center">
                        <p>ec_3778</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>maeB</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pe</p>
                     </c>
                     <c ca="left">
                        <p>Putative phosphate acetyl transferase (3rd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b2537_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_3905</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>hcaR</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>r</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional activator of hca cluster, LysR family (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b2537_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_3905</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>hcaR</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>pe</p>
                     </c>
                     <c ca="left">
                        <p>Putative oxidoreductase (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b2836_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_4348</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>aas</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>2-acylglycerophospho-ethanolamine acyl transferase (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b2836_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_4348</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>aas</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>Acyl-acyl carrier protein synthetase (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b3464_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_5327</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ftsY</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>m</p>
                     </c>
                     <c ca="left">
                        <p>Membrane-binding component of cell division protein (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b3464_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_5327</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ftsY</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>GTPase component of cell division membrane protein (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b3692_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_5701</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>dgoA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>2-dehydro-3-deoxygalactonate 6-phosphate aldolase (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b3692_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_5701</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>dgoA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>Galactonate dehydratase (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b3730_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_5762</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>glmU</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p><it>N</it>-acetyl glucosamine-1-phosphate uridyltransferase (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b3730_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_5762</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>glmU</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>Glucosamine-1-phosphate acetyl transferase (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b3846_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_5942</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>fadB</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>3-hydroxybutyryl-coa epimerase; delta(3)-cis-delta(2)-trans-enoyl-coa-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c indent="1" ca="left">
                        <p>isomerase;enoyl-coa-hydratase (1st module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b3846_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_5942</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>fadB</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>e</p>
                     </c>
                     <c ca="left">
                        <p>3-hydroxyacyl-coa dehydrogenase (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b4035_2</p>
                     </c>
                     <c ca="center">
                        <p>ec_6229</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>malK</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>r</p>
                     </c>
                     <c ca="left">
                        <p>Phenotypic repressor of mal operon (2nd module)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>b4035_1</p>
                     </c>
                     <c ca="center">
                        <p>ec_6229</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>malK</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>t</p>
                     </c>
                     <c ca="left">
                        <p>ABC superfamily (atp_bind) maltose transport protein (1st module)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>*</sup>Gene product type: e, enzyme; m, membrane; pe, putative enzyme; pr, putative regulator; pt, putative transporter; r, regulator; t, transporter.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Current status</p>
            </st>
            <p>Table <tblr tid="T3">3</tblr> presents a summary of the gene products encoded in the <it>E. coli</it> K-12 genome represented as modular entities. Half of the modules have been experimentally characterized. Enzymes are the largest gene product type, representing 43.9% of the characterized gene products and 34.2% of the total gene products. Other major gene product types are transporters and regulators. Among the remaining modules, 60% have function predictions. The gene products without a function assignment still constitute a significant portion of the <it>E. coli</it> genome (19% of modules). A summary of the development of information on <it>E. coli</it> gene products over the past eight years is shown in Table <tblr tid="T4">4</tblr>. It is evident that much knowledge has been gained since these analyses began in 1993 [<abbr bid="B3">3</abbr>,<abbr bid="B24">24</abbr>].</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Gene products encoded by the <it>E. coli</it> K-12 chromosome</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Gene product type</p>
                     </c>
                     <c ca="center">
                        <p>Characterized</p>
                     </c>
                     <c ca="center">
                        <p>Putative assignment</p>
                     </c>
                     <c ca="center">
                        <p>Total (%)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Enzyme</p>
                     </c>
                     <c ca="center">
                        <p>1,042</p>
                     </c>
                     <c ca="center">
                        <p>578</p>
                     </c>
                     <c ca="center">
                        <p>1,620 (34.2)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Transport</p>
                     </c>
                     <c ca="center">
                        <p>382</p>
                     </c>
                     <c ca="center">
                        <p>364</p>
                     </c>
                     <c ca="center">
                        <p>746 (15.8)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Regulator</p>
                     </c>
                     <c ca="center">
                        <p>238</p>
                     </c>
                     <c ca="center">
                        <p>167</p>
                     </c>
                     <c ca="center">
                        <p>405 (8.5)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Membrane</p>
                     </c>
                     <c ca="center">
                        <p>53</p>
                     </c>
                     <c ca="center">
                        <p>158</p>
                     </c>
                     <c ca="center">
                        <p>211 (4.4)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Factor</p>
                     </c>
                     <c ca="center">
                        <p>117</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>150 (3.2)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Structure</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>127 (2.7)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Carrier</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>60 (1.3)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Extrachromosomal<sup>*</sup></p>
                     </c>
                     <c ca="center">
                        <p>288</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>288 (6.1)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Phenotype</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>98 (2.1)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RNA</p>
                     </c>
                     <c ca="center">
                        <p>116</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>116 (2.4)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Leader</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>12 (0.3)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ORF</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>899 (19.0)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>2,375</p>
                     </c>
                     <c ca="center">
                        <p>1,360</p>
                     </c>
                     <c ca="center">
                        <p>4,732 (100.0)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Proteins are represented as modules. <sup>*</sup>Extrachromosomal origin.</p>
               </tblfn>
            </tbl>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>History of distribution of gene product types for <it>E. coli</it> K-12</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Gene product type</p>
                     </c>
                     <c ca="center">
                        <p>1993<sup>*</sup></p>
                     </c>
                     <c ca="center">
                        <p>1998<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>2001</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Enzyme</p>
                     </c>
                     <c ca="center">
                        <p>748<sup>&#8225;</sup></p>
                     </c>
                     <c ca="center">
                        <p>906</p>
                     </c>
                     <c ca="center">
                        <p>990</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative enzyme</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>452</p>
                     </c>
                     <c ca="center">
                        <p>550</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Transporter</p>
                     </c>
                     <c ca="center">
                        <p>221</p>
                     </c>
                     <c ca="center">
                        <p>257</p>
                     </c>
                     <c ca="center">
                        <p>310</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative transporter</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>281</p>
                     </c>
                     <c ca="center">
                        <p>298</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Regulator</p>
                     </c>
                     <c ca="center">
                        <p>164</p>
                     </c>
                     <c ca="center">
                        <p>204</p>
                     </c>
                     <c ca="center">
                        <p>213</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative regulator</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>168</p>
                     </c>
                     <c ca="center">
                        <p>151</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Membrane</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>47</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative membrane</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>55</p>
                     </c>
                     <c ca="center">
                        <p>132</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Factor</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>109</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative factor</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>52</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Structure</p>
                     </c>
                     <c ca="center">
                        <p>113<sup>&#167;</sup></p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative structure</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Carrier</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative carrier</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Extrachromosomal origin</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>56</p>
                     </c>
                     <c ca="center">
                        <p>282</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative extrachromosomal</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Phenotype</p>
                     </c>
                     <c ca="center">
                        <p>314</p>
                     </c>
                     <c ca="center">
                        <p>148</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RNA</p>
                     </c>
                     <c ca="center">
                        <p>104</p>
                     </c>
                     <c ca="center">
                        <p>112</p>
                     </c>
                     <c ca="center">
                        <p>116</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Putative RNA</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Leader</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ORF</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>1,413</p>
                     </c>
                     <c ca="center">
                        <p>886</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>1,700</p>
                     </c>
                     <c ca="center">
                        <p>4,404</p>
                     </c>
                     <c ca="center">
                        <p>4,412<sup>&#182;</sup></p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>*</sup>Adapted from Riley [24]. <sup>&#8224;</sup>Data from July 1998 record of GenProtEC. <sup>&#8225;</sup>Includes enzymes, leader peptides and enzyme activity. <sup>&#167;</sup>Includes membrane components. <sup>&#182;</sup>This number includes overlap situations where modules of a gene belong to different gene product type categories. The total number of genes is 4,401.</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>An updated version of the function assignments for <it>E. coli</it> K-12 gene products has been presented using the genes identified in the GenBank U00096 deposit. Alternative gene boundaries were produced by MAGPIE. The MAGPIE genome annotation system also identified candidate CDSs that may represent gene products not identified in the GenBank U00096 deposit. Small ORFs with biological activity are likely to be abundant in the organism but await verification by biological data. Undoubtedly, the intergenic regions of <it>E. coli</it> K-12, as studied by Rudd [<abbr bid="B25">25</abbr>] and Bachellier <it>et al.</it> [<abbr bid="B26">26</abbr>], are also important for the function and regulation of gene products.</p>
         <p>The percentage of identified chromosomal gene products without a function assignment is decreasing and is currently 19.6%. Only 7.4% of <it>E. coli</it> genes have no match in current sequence databases. This number will be further reduced with the release of the annotated genomes of <it>Salmonella, Shigella</it> and other closely related organisms. Preliminary data show that the number of unknown CDSs (ORFs encoding proteins without sequence-similar matches) will be less than 170 after data on the <it>Salmonella typhimurium</it> genome is included (M.H.S., unpublished data).</p>
         <p>The function assignments presented here mainly represent the molecular functions of the gene products. With the generation of microarray data, gene products will also be characterized to a greater degree by the role they play in the cell under specific conditions. We have recently developed a classification system for cellular functions of <it>E. coli</it> K-12 gene products and have assigned more than one cellular role to some gene products where this is appropriate [<abbr bid="B27">27</abbr>]. There is also a need for a more uniform way of describing both the molecular and cellular roles of gene products among diverse organisms, and this issue is currently being addressed by the Gene Ontology Consortium [<abbr bid="B28">28</abbr>].</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We have presented a functional update of the gene products encoded by the genes of <it>E. coli</it> K-12 identified in the GenBank Accession U00096 deposit. The <it>E. coli</it> proteins were treated as modular entities where a module is at least 100 amino acids, carries a biological function, and has an independent evolutionary history. The functional update was performed by manual evaluation of the data obtained from GenProtEC, BLAST and DARWIN analyses, and MAGPIE annotation. A table containing the updated function assignments of <it>E. coli</it> K-12 gene products is available as an additional data file online, and at GenProtEC [<abbr bid="B4">4</abbr>] and MAGPIE [<abbr bid="B11">11</abbr>]. We believe these data will be valuable for analysis of <it>E. coli</it> K-12 itself as well as for the analysis of gene products encoded by other genomes.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Automated annotation</p>
            </st>
            <sec>
               <st>
                  <p>MAGPIE ORF prediction</p>
               </st>
               <p>A three-step approach to ORF prediction was taken to prepare the MAGPIE project for <it>E. coli.</it> GLIMMER 2.0 with a minimum ORF length of 80 nucleotides was initially used to create the base set of predictions [<abbr bid="B29">29</abbr>]. Glimmer 2.0 was run with all default parameters, as recommended in the documentation [<abbr bid="B29">29</abbr>] and trained on the annotated set of ORFs from the Blattner <it>et al.</it> release of 1997 [<abbr bid="B1">1</abbr>]. Because GLIMMER selectively identifies ORFs that match a statistical model of a gene for the organism [<abbr bid="B29">29</abbr>], GLIMMER may miss genes that were laterally transferred or acquired more recently from other genomes. We therefore chose to combine the GLIMMER predictions with those of a syntactic tool encoded within MAGPIE. This tool identifies stop codons and then 'backtracks' to the farthest upstream acceptable in-frame start codon and defines this as the ORF [<abbr bid="B10">10</abbr>]. A non-redundant set of all GLIMMER ORFs plus syntactic ORFs between GLIMMER ORFs was generated. Finally, ORFs annotated by Blattner <it>et al.</it> that were not present in the non-redundant set were added to the MAGPIE project.</p>
            </sec>
            <sec>
               <st>
                  <p>BLAST analysis</p>
               </st>
               <p>The CDSs were compared to the NCBI nucleotide (nt) and non-redundant protein (nr) databases using gapped BLAST [<abbr bid="B30">30</abbr>]. Protein-sequence motifs were identified by PROSITE [<abbr bid="B31">31</abbr>]. A search against the MAGPIE-predicted proteins of over 40 completed genomes, including the previously annotated <it>E. coli</it> set, was also performed.</p>
            </sec>
            <sec>
               <st>
                  <p>Functional annotation</p>
               </st>
               <p>Automated function annotation was provided using HERON. Description lines with low information content (for example, descriptions containing words such as "hypothetical" or "putative") were filtered out. HERON then calculated word frequencies in the remaining descriptions, identified the top three most common words, and selected the description of the highest-scoring sequence match (for homology comparisons) with one or more high-frequency words. The selected description became the automated annotation for the coding region.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Manual annotation</p>
            </st>
            <sec>
               <st>
                  <p>BLAST analysis</p>
               </st>
               <p>The protein sequences collected from GenBank Accession U00096 were compared to the nr database using gapped BLAST [<abbr bid="B30">30</abbr>].</p>
            </sec>
            <sec>
               <st>
                  <p>DARWIN analysis</p>
               </st>
               <p>DARWIN (version 2.0) was used to detect sequence-similar proteins within <it>E. coli</it> K-12 and in 20 additional microbial genomes [<abbr bid="B32">32</abbr>] (P. Liang and M. Riley, unpublished data). In addition to orthologous matches, groups of paralogous proteins of <it>E. coli</it> K-12 were generated on the basis of the DARWIN results. In our hands, DARWIN is particularly successful in identifying distant sequence similarities, a consequence no doubt of the application of multiple substitution matrices optimized for the organism and to each sequence pair.</p>
            </sec>
            <sec>
               <st>
                  <p>Functional annotation</p>
               </st>
               <p>Functions were assigned to gene products on the basis of a manual evaluation of the results from the BLAST and DARWIN analyses. The automatic function prediction was also taken into account. In addition to incorporating recent experimental information, a substantial amount of human judgment was brought to bear.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>A complete table of the current <supplr sid="S1">4,401 Bnums</supplr> is provided as an Excel file.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Table of the current 4,401 Bnums</p>
            </caption>
            <text>
               <p>Table of the current 4,401 Bnums</p>
            </text>
            <file name="gb-2001-2-9-research0035-S1.xls">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported by NIH grant ROI RR07861, the NASA Astrobiology Institute grant NCC2-1054, grants from the Edward Mallinckrodt, Jr Foundation and the Sinsheimer Foundation, and NSF grants NSF DBI-9984882 and NSF IIS &#8211; 9996304. We thank Alastair Kerr for help on data retrieval and Edward A. Adelberg for help on monitoring the <it>E. coli</it> literature. We thank Guy Plunkett 3rd for information on Blattner number status. Mark Schroeder for assistance with the MAGPIE analysis, and Peter Karp and Stefan Bekiranov for suggestions regarding the design and implementation of HERON.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The complete genome sequence of <it>Escherichia coli</it> K-12.</p>
            </title>
            <aug>
               <au>
                  <snm>Blattner</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Plunkett</snm>
                  <fnm>G</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Bloch</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Burland</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Collado-Vides</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Glasner</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Rode</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Mayhew</snm>
                  <fnm>GF</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>277</volume>
            <fpage>1453</fpage>
            <lpage>1474</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.277.5331.1453</pubid>
                  <pubid idtype="pmpid" link="fulltext">9278503</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>GOLD: Genomes OnLine Database homepage</p>
            </title>
            <url>http://igweb.integratedgenomics.com/GOLD/</url>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Interim report on genomics of <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Serres</snm>
                  <fnm>MH</fnm>
               </au>
            </aug>
            <source>Annu Rev Microbiol</source>
            <pubdate>2000</pubdate>
            <volume>54</volume>
            <fpage>341</fpage>
            <lpage>411</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.micro.54.1.341</pubid>
                  <pubid idtype="pmpid" link="fulltext">11018132</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>GenProtEC database</p>
            </title>
            <url>http://genprotec.mbl.edu/</url>
         </bibl>
         <bibl id="B5">
            <title>
               <p>EcoGene: a genome sequence database for <it>Escherichia coli</it> K-12.</p>
            </title>
            <aug>
               <au>
                  <snm>Rudd</snm>
                  <fnm>KE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>60</fpage>
            <lpage>64</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102481</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592181</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.60</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The EcoCyc and MetaCyc databases.</p>
            </title>
            <aug>
               <au>
                  <snm>Karp</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Saier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Paley</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Pellegrini-Toole</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>56</fpage>
            <lpage>59</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102475</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592180</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.56</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Completing the <it>E. coli</it> proteome: a database of gene products characterised since the completion of the genome sequence.</p>
            </title>
            <aug>
               <au>
                  <snm>Thomas</snm>
                  <fnm>GH</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <fpage>860</fpage>
            <lpage>861</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/15.10.860</pubid>
                  <pubid idtype="pmpid" link="fulltext">10705439</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>CGSC: <it>E.coli</it> Genetic Stock Center</p>
            </title>
            <url>http://cgsc.biology.yale.edu/</url>
         </bibl>
         <bibl id="B9">
            <title>
               <p><it>E. coli</it> genome project University of Wisconsin-Madison</p>
            </title>
            <url>http://www.genome.wisc.edu/</url>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture.</p>
            </title>
            <aug>
               <au>
                  <snm>Gaasterland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sensen</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Biochimie</source>
            <pubdate>1996</pubdate>
            <volume>78</volume>
            <fpage>302</fpage>
            <lpage>310</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0300-9084(96)84761-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">8905148</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>MAGPIE automated genome project investigation environment</p>
            </title>
            <url>http://genomes.rockefeller.edu/magpie/ecoli/</url>
         </bibl>
         <bibl id="B12">
            <title>
               <p>'Intergenic' blr gene in <it>Escherichia coli</it> encodes a 41-residue membrane protein affecting intrinsic susceptibility to certain inhibitors of peptidoglycan synthesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Wong</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>McMurry</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>SB</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2000</pubdate>
            <volume>37</volume>
            <fpage>364</fpage>
            <lpage>370</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2000.01998.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">10931331</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Genomics and metabolism in <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Serres</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>In The Prokaryotes: An Evolving Electronic Database for the Microbiological Community. Edited by Dworkin M, et al. New York: Springer-Verlag,</source>
            <pubdate>2000</pubdate>
            <url>http://www.prokaryotes.com</url>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The repertoire of DNA-binding transcriptional regulators in <it>Escherichia coli K-12.</it></p>
            </title>
            <aug>
               <au>
                  <snm>Perez-Rueda</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Collado-Vides</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>1838</fpage>
            <lpage>1847</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102813</pubid>
                  <pubid idtype="pmpid" link="fulltext">10734204</pubid>
                  <pubid idtype="doi">10.1093/nar/28.8.1838</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>RegulonDB</p>
            </title>
            <url>http://www.cifn.unam.mx/regulondb/</url>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A functional-phylogenetic classification system for transmembrane solute transporters.</p>
            </title>
            <aug>
               <au>
                  <snm>Saier</snm>
                  <fnm>MH</fnm>
                  <suf>Jr</suf>
               </au>
            </aug>
            <source>Microbiol Mol Biol Rev</source>
            <pubdate>2000</pubdate>
            <volume>64</volume>
            <fpage>354</fpage>
            <lpage>411</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">98997</pubid>
                  <pubid idtype="pmpid" link="fulltext">10839820</pubid>
                  <pubid idtype="doi">10.1128/MMBR.64.2.354-411.2000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Genomic Comparisons of Membrane Transport Systems</p>
            </title>
            <url>http://www.biology.ucsd.edu/~ipaulsen/transport/</url>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Genes and proteins of <it>Escherichia coli</it> K-12.</p>
            </title>
            <aug>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>54</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147169</pubid>
                  <pubid idtype="pmpid" link="fulltext">9399799</pubid>
                  <pubid idtype="doi">10.1093/nar/26.1.54</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Sequence and function of the <it>aas</it> gene in <it>Escherichia coli</it> .</p>
            </title>
            <aug>
               <au>
                  <snm>Jackowski</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jackson</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Rock</snm>
                  <fnm>CO</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1994</pubdate>
            <volume>269</volume>
            <fpage>2921</fpage>
            <lpage>2928</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8300626</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Copurification of glucosamine-1-phosphate acetyltransferase and N-acetylglucosamine-1-phosphate uridyltransferase activities of <it>Escherichia coli</it> : characterization of the <it>glmU</it> gene product as a bifunctional enzyme catalyzing two subsequent steps in the pathway for UDP-N-acetylglucosamine synthesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Mengin-Lecreulx</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>van Heijenoort</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1994</pubdate>
            <volume>176</volume>
            <fpage>5788</fpage>
            <lpage>5795</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">196783</pubid>
                  <pubid idtype="pmpid">8083170</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Protein evolution viewed through <it>Escherichia coli</it> protein sequences: introducing the notion of a structural segment of homology, the module.</p>
            </title>
            <aug>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Labedan</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>857</fpage>
            <lpage>868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.1003</pubid>
                  <pubid idtype="pmpid" link="fulltext">9180377</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Aspartokinase I-homoserine dehydrogenase I of <it>Escherichia coli</it> K12. Concentration-dependent dissociation to dimers in the presence of L-threonine.</p>
            </title>
            <aug>
               <au>
                  <snm>Vickers</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Ackers</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Ogilvie</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1978</pubdate>
            <volume>253</volume>
            <fpage>2155</fpage>
            <lpage>2160</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">204643</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The threonine-sensitive homoserine dehydrogenase and aspartokinase activities of <it>Escherichia coli</it> K-12. Subunit structure of the protein catalyzing the two activities.</p>
            </title>
            <aug>
               <au>
                  <snm>Truffa-Bachi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Van Rapenbusch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gros</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>GN</fnm>
               </au>
               <au>
                  <snm>Janin</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Eur J Biochem</source>
            <pubdate>1969</pubdate>
            <volume>7</volume>
            <fpage>401</fpage>
            <lpage>407</lpage>
            <xrefbib>
               <pubid idtype="pmpid">4893089</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Functions of the gene products of <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Microbiol Rev</source>
            <pubdate>1993</pubdate>
            <volume>57</volume>
            <fpage>862</fpage>
            <lpage>952</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7508076</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Novel intergenic repeats of <it>Escherichia coli</it> K-12.</p>
            </title>
            <aug>
               <au>
                  <snm>Rudd</snm>
                  <fnm>KE</fnm>
               </au>
            </aug>
            <source>Res Microbiol</source>
            <pubdate>1999</pubdate>
            <volume>150</volume>
            <fpage>653</fpage>
            <lpage>664</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0923-2508(99)00126-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">10673004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Short palindromic repetitive DNA elements in enterobacteria: a survey.</p>
            </title>
            <aug>
               <au>
                  <snm>Bachellier</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Clement</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Hofnung</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Res Microbiol</source>
            <pubdate>1999</pubdate>
            <volume>150</volume>
            <fpage>627</fpage>
            <lpage>639</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0923-2508(99)00128-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">10673002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>MultiFun, a multifunctional classification scheme for <it>Escherichia coli</it> K-12 gene products.</p>
            </title>
            <aug>
               <au>
                  <snm>Serres</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Microb Comp Genomics</source>
            <pubdate>2000</pubdate>
            <volume>5</volume>
            <fpage>205</fpage>
            <lpage>222</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11471834</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.</p>
            </title>
            <aug>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Eppig</snm>
                  <fnm>JT et al</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <fpage>25</fpage>
            <lpage>29</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/75556</pubid>
                  <pubid idtype="pmpid" link="fulltext">10802651</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Improved microbial gene identification with GLIMMER.</p>
            </title>
            <aug>
               <au>
                  <snm>Delcher</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Harmon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>4636</fpage>
            <lpage>4641</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148753</pubid>
                  <pubid idtype="pmpid" link="fulltext">10556321</pubid>
                  <pubid idtype="doi">10.1093/nar/27.23.4636</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>PROSITE: a dictionary of sites and patterns in proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1992</pubdate>
            <volume>Suppl 20</volume>
            <fpage>2013</fpage>
            <lpage>2018</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1598232</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Exhaustive matching of the entire protein sequence database.</p>
            </title>
            <aug>
               <au>
                  <snm>Gonnet</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Benner</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1992</pubdate>
            <volume>256</volume>
            <fpage>1443</fpage>
            <lpage>1445</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1604319</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
