<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2008-9-1-r20</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>The tryptophan pathway genes of the Sargasso Sea metagenome: new operon structures and the prevalence of non-operon organization</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Kagan</snm>
               <fnm>Juliana</fnm>
               <insr iid="I1"/>
               <email>kjuliana@tx.technion.ac.il</email>
            </au>
            <au id="A2">
               <snm>Sharon</snm>
               <fnm>Itai</fnm>
               <insr iid="I2"/>
               <email>itaish@cs.technion.ac.il</email>
            </au>
            <au id="A3">
               <snm>Beja</snm>
               <fnm>Oded</fnm>
               <insr iid="I1"/>
               <email>beja@tx.technion.ac.il</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Kuhn</snm>
               <mi>C</mi>
               <fnm>Jonathan</fnm>
               <insr iid="I1"/>
               <email>jkuhn@tx.technion.ac.il</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Faculty of Biology, Technion, Israel Institute of Technology, Haifa, Israel 32000</p>
            </ins>
            <ins id="I2">
               <p>Computer Science Department, Technion, Israel Institute of Technology, Haifa, Israel 32000</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>R20</fpage>
         <url>http://genomebiology.com/2008/9/1/R20</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18221558</pubid>
               <pubid idtype="doi">10.1186/gb-2008-9-1-r20</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>01</day>
               <month>11</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>17</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>27</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>27</day>
               <month>01</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Kagan et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Sargasso Sea metagenome tryptophan pathway genes</p>
      </shorttitle>
      <shortabs>
         <p>An analysis of the seven genes of the tryptophan pathway in the Sargasso Sea metagenome shows that the majority of contigs and scaffolds contain whole or split operons that are similar to previously analyzed trp gene organizations. </p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The enormous database of microbial DNA generated from the Sargasso Sea metagenome provides a unique opportunity to locate genes participating in different biosynthetic pathways and to attempt to understand the relationship and evolution of those genes. In this article, an analysis of the Sargasso Sea metagenome is made with respect to the seven genes of the tryptophan pathway.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>At least 5% of all the genes that are related to amino acid biosynthesis are tryptophan (<it>trp</it>) genes. Many contigs and scaffolds contain whole or split operons that are similar to previously analyzed <it>trp </it>gene organizations. Only two scaffolds discovered in this analysis possess a different operon organization of tryptophan pathway genes than those previously known. Many marine organisms lack an operon-type organization of these genes or have mini-operons containing only two <it>trp </it>genes. In addition, the <it>trpB </it>genes from this search reveal that the dichotomous division between <it>trpB</it>_1 and <it>trpB</it>_2 also occurs in organisms from the Sargasso Sea. One cluster was found to contain <it>trpB </it>sequences that were closely related to each other but distinct from most known <it>trpB </it>sequences.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The data show that <it>trp </it>genes are widely dispersed within this metagenome. The novel organization of these genes and an unusual group of <it>trpB</it>_1 sequences that were found among some of these Sargasso Sea bacteria indicate that there is much to be discovered about both the reason for certain gene orders and the regulation of tryptophan biosynthesis in marine bacteria.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010001">Biochemistry and structural biology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010016">Molecular biology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The tryptophan pathway and the organization of the <it>trp </it>genes involved in its synthesis have been a model system for many years and these genes continue to receive attention <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. With the availability of extensive DNA sequences, it has been found that <it>trp </it>genes are not identically organized in all organisms. The classical structure of the <it>trp </it>operon contains genes for all seven catalytic domains in the following order: promoter, <it>trpE, trpG, trpD</it>, <it>trpC</it>, <it>trpF</it>, <it>trpB </it>and <it>trpA</it>. In some organisms each catalytic domain is encoded by a different gene. As shown in Figure <figr fid="F1">1</figr>, there are seven catalytic domains that carry out the reactions that convert chorismate and L-glutamine to L-tryptophan.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>The biochemical pathway of tryptophan biosynthesis</p>
            </caption>
            <text>
               <p><b>The biochemical pathway of tryptophan biosynthesis</b>. The genetic nomenclature for the seven genes that encode the enzymes is that for <it>Bacillus subtilis</it>. PR-Anth, N-(5'-phosphoribosyl)-anthranilate; CdRP, 1-(o-carboxy-phenylamino)-1-deoxyribulose-5-phosphate; InGP, indole 3-glycerol phosphate. <it>trpE </it>encodes the large aminase subunit of anthranilate synthase; <it>trpG </it>encodes for small glutamine binding subunit of anthranilate synthase and catalyzes the glutaminase reaction; <it>trpD </it>encodes anthranilate-phosphoribosyl transferase; <it>trpF </it>encodes phosphoribosyl-anthranilate isomerase; <it>trpC </it>encodes indoleglycerol phosphate synthase; <it>trpA</it>, the a subunit of tryptophan synthase which converts InGP to indole; <it>trpB </it>encodes the b subunit of tryptophan synthase and converts indole and serine to tryptophan and glyceraldehydes-3-phosphate.</p>
            </text>
            <graphic file="gb-2008-9-1-r20-1"/>
         </fig>
         <p>To date, several deviations from the classical structure have been reported. Gene fusion may result in a single polypeptide carrying two or more catalytic domains. The most extreme exception is found in the eukaryote <it>Euglena </it>in which a single gene encodes a polypeptide with five catalytic domains <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. In split operons, the <it>trp </it>genes are organized into two or more sub-operons <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Other events include gene reshuffling, gene insertions and gene deletions. An analysis of more than 100 genomes showed that the evolution of <it>trp </it>operon is both the result of vertical genealogy and lateral gene transfer. It has been found that, if events of lateral gene transfer and paralogy can be sorted out, the vertical transfer of the <it>trp </it>genes becomes apparent <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <p>As a result of the publication of the Sargasso Sea metagenome by Venter <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, it may be possible to deduce the evolutionary relationships between the <it>trp </it>genes of different marine organisms from the Sargasso Sea. This metagenome is composed of more than one million non-redundant sequences, or reads, that have been estimated to derive from 1,800 different genomes, including 148 phylotypes. These sequences were assembled and scanned for the presence of open reading frames, which were then annotated and analyzed <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Overall, more than 1.2 million putative genes were identified, including 37,118 genes for amino acid biosynthesis. Tryptophan pathway genes should be widely represented among these sequences. A vast amount of information about the <it>trp </it>genes from various bacterial species exists in the literature and the Sargasso Sea metagenome data should contribute much to our knowledge of the evolution and organizational diversity of these important genes <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, in particular those from a marine environment. Marine bacteria live in an exacting environment that makes selective demands on its inhabitants-in quite a different way to the terrestrial environment.</p>
         <p>We have made an extensive search for tryptophan pathway genes within the metagenome data. Our major goal was to determine whether the classical structure of the <it>trp </it>operon predominates in marine microorganisms and whether novel structures are present. This information should help us look at questions about the origin of the <it>trp </it>genes and the genetic and selective processes that have acted on them including their lateral transfer between different bacterial species</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Computer search for tryptophan pathway genes</p>
            </st>
            <p>Contigs and scaffolds from the Sargasso Sea metagenome were screened for <it>trp </it>genes. The search was run seven times, each using the amino acid sequence of a different <it>Bacillus subtilis trp </it>gene. Among contigs and scaffolds, we found 2,926 that had <it>trp </it>genes. Of these, 879 contained 2 or more <it>trp </it>genes and 2,047 contained only a single <it>trp </it>gene. After removing repeats resulting from sequences carrying several <it>trp </it>genes, we found 1,928 <it>trp </it>genes that were associated with at least one other <it>trp </it>gene, which makes it very likely that these are <it>trp </it>genes. A total of 4,009 <it>trp</it>-like genes were found but some of these might be pseudogenes. That is, a minimum of 5% of all the genes for amino acid biosynthesis (37,118 genes <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>) are <it>trp</it>-like genes</p>
            <p>The gene order <it>E-G-D-C-F-B-A </it>was taken as the prototype for complete operons. For "split-operons", the prototypes used were <it>E-G-D-C </it>and <it>F-B-A</it>. Table <tblr tid="T1">1</tblr> shows the distribution of the contigs for different <it>trp </it>genes. The assembly of important scaffolds and contigs (see Table <tblr tid="T2">2</tblr>) was verified by re-assembling their reads using the SEQUENCHER program version 4.1.2 by Gene Codes Corporation (Ann Arbor, MI, USA). The resulting assembly was found to be consistent with that previously generated by the Celera Assembler <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> The amount of coverage gives an estimate of the frequency of a contig within the population of organisms sampled and was determined for each contig. The results of this search are presented in Table <tblr tid="T2">2</tblr>. Full and split operons with a classical structure are widely represented.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Distribution of <it>trp </it>gene appearances on scaffolds and contigs in the Sargasso metagenome</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Gene</p>
                     </c>
                     <c ca="center">
                        <p>Total number of copies*</p>
                     </c>
                     <c ca="center">
                        <p>With other <it>trp </it>genes&#8224;</p>
                     </c>
                     <c ca="center">
                        <p>Alone&#8225;</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>trpE</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>663</p>
                     </c>
                     <c ca="center">
                        <p>277</p>
                     </c>
                     <c ca="center">
                        <p>386</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>trpG</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>826</p>
                     </c>
                     <c ca="center">
                        <p>396</p>
                     </c>
                     <c ca="center">
                        <p>430</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>trpD</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>426</p>
                     </c>
                     <c ca="center">
                        <p>278</p>
                     </c>
                     <c ca="center">
                        <p>148</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>trpC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>382</p>
                     </c>
                     <c ca="center">
                        <p>153</p>
                     </c>
                     <c ca="center">
                        <p>229</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>trpF</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>378</p>
                     </c>
                     <c ca="center">
                        <p>235</p>
                     </c>
                     <c ca="center">
                        <p>143</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>trpB</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>892</p>
                     </c>
                     <c ca="center">
                        <p>408</p>
                     </c>
                     <c ca="center">
                        <p>484</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>trpA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>442</p>
                     </c>
                     <c ca="center">
                        <p>215</p>
                     </c>
                     <c ca="center">
                        <p>227</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>4,009</p>
                     </c>
                     <c ca="center">
                        <p>879</p>
                     </c>
                     <c ca="center">
                        <p>2,047</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>* Total number of copies, number of occurrences of the gene in the Sargasso Sea metagenome. &#8224; With other <it>trp </it>genes, number of occurrences on scaffolds and contigs containing more than one <it>trp </it>gene. &#8225; Alone, number of occurrences on scaffolds and contigs with no other <it>trp </it>genes</p>
               </tblfn>
            </tbl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Coverage and gene order of different contigs and scaffolds</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Contig/Scaffold</p>
                     </c>
                     <c ca="center">
                        <p>Actual length*</p>
                     </c>
                     <c ca="center">
                        <p>Coverage&#8224;</p>
                     </c>
                     <c ca="center">
                        <p>Gene order&#8225;</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01037482</p>
                     </c>
                     <c ca="center">
                        <p>5934</p>
                     </c>
                     <c ca="center">
                        <p>10.81</p>
                     </c>
                     <c ca="center">
                        <p>D&#8594;C&#8594;F&#8594;B&#8594;A</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01011678</p>
                     </c>
                     <c ca="center">
                        <p>5668</p>
                     </c>
                     <c ca="center">
                        <p>10.66</p>
                     </c>
                     <c ca="center">
                        <p>Full operon</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH026811</p>
                     </c>
                     <c ca="center">
                        <p>14769</p>
                     </c>
                     <c ca="center">
                        <p>8.78</p>
                     </c>
                     <c ca="center">
                        <p>Full operon</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01096779</p>
                     </c>
                     <c ca="center">
                        <p>10932</p>
                     </c>
                     <c ca="center">
                        <p>8.69</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01096698</p>
                     </c>
                     <c ca="center">
                        <p>2822</p>
                     </c>
                     <c ca="center">
                        <p>8.51</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01104100</p>
                     </c>
                     <c ca="center">
                        <p>6690</p>
                     </c>
                     <c ca="center">
                        <p>8.21</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C&#8594;B&#8594;A</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01008961</p>
                     </c>
                     <c ca="center">
                        <p>7081</p>
                     </c>
                     <c ca="center">
                        <p>7.36</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01117014</p>
                     </c>
                     <c ca="center">
                        <p>7301</p>
                     </c>
                     <c ca="center">
                        <p>5.94</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01092457</p>
                     </c>
                     <c ca="center">
                        <p>4603</p>
                     </c>
                     <c ca="center">
                        <p>4.45</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01074747</p>
                     </c>
                     <c ca="center">
                        <p>3876</p>
                     </c>
                     <c ca="center">
                        <p>4.26</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;PLPDE_IV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01046473</p>
                     </c>
                     <c ca="center">
                        <p>3887</p>
                     </c>
                     <c ca="center">
                        <p>3.96</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01056517</p>
                     </c>
                     <c ca="center">
                        <p>4373</p>
                     </c>
                     <c ca="center">
                        <p>3.85</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH025535</p>
                     </c>
                     <c ca="center">
                        <p>76373</p>
                     </c>
                     <c ca="center">
                        <p>3.72</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C&#8594;F&#8594;B&#8594;X&#8594;A</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01039569</p>
                     </c>
                     <c ca="center">
                        <p>5041</p>
                     </c>
                     <c ca="center">
                        <p>3.45</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01065695</p>
                     </c>
                     <c ca="center">
                        <p>3747</p>
                     </c>
                     <c ca="center">
                        <p>3.37</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01088195</p>
                     </c>
                     <c ca="center">
                        <p>7958</p>
                     </c>
                     <c ca="center">
                        <p>3.27</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH020599</p>
                     </c>
                     <c ca="center">
                        <p>17648</p>
                     </c>
                     <c ca="center">
                        <p>3.18</p>
                     </c>
                     <c ca="center">
                        <p>G&#8594;D&#8594;C&#8594;F</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01010663</p>
                     </c>
                     <c ca="center">
                        <p>3644</p>
                     </c>
                     <c ca="center">
                        <p>3.17</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH006047</p>
                     </c>
                     <c ca="center">
                        <p>9399</p>
                     </c>
                     <c ca="center">
                        <p>3.03</p>
                     </c>
                     <c ca="center">
                        <p>Full operon</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01056487</p>
                     </c>
                     <c ca="center">
                        <p>4038</p>
                     </c>
                     <c ca="center">
                        <p>2.91</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH025058</p>
                     </c>
                     <c ca="center">
                        <p>36,150</p>
                     </c>
                     <c ca="center">
                        <p>2.69</p>
                     </c>
                     <c ca="center">
                        <p>B&#8594;A&#8594;E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH025585</p>
                     </c>
                     <c ca="center">
                        <p>10777</p>
                     </c>
                     <c ca="center">
                        <p>2.59</p>
                     </c>
                     <c ca="center">
                        <p>Full operon</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH006071</p>
                     </c>
                     <c ca="center">
                        <p>68188</p>
                     </c>
                     <c ca="center">
                        <p>2.53</p>
                     </c>
                     <c ca="center">
                        <p>Full operon</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01110889</p>
                     </c>
                     <c ca="center">
                        <p>4437</p>
                     </c>
                     <c ca="center">
                        <p>2.43</p>
                     </c>
                     <c ca="center">
                        <p>F&#8594;(EG)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01063516</p>
                     </c>
                     <c ca="center">
                        <p>4094</p>
                     </c>
                     <c ca="center">
                        <p>2.35</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01027084</p>
                     </c>
                     <c ca="center">
                        <p>3981</p>
                     </c>
                     <c ca="center">
                        <p>2.21</p>
                     </c>
                     <c ca="center">
                        <p>D&#8594;C&#8594;F&#8594;B&#8594;A</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01064621</p>
                     </c>
                     <c ca="center">
                        <p>5161</p>
                     </c>
                     <c ca="center">
                        <p>2.02</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01052709</p>
                     </c>
                     <c ca="center">
                        <p>2451</p>
                     </c>
                     <c ca="center">
                        <p>2.00</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01079380</p>
                     </c>
                     <c ca="center">
                        <p>1515</p>
                     </c>
                     <c ca="center">
                        <p>1.89</p>
                     </c>
                     <c ca="center">
                        <p>G&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01015506</p>
                     </c>
                     <c ca="center">
                        <p>2202</p>
                     </c>
                     <c ca="center">
                        <p>1.35</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH200199</p>
                     </c>
                     <c ca="center">
                        <p>1879</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH199785</p>
                     </c>
                     <c ca="center">
                        <p>1823</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH174161</p>
                     </c>
                     <c ca="center">
                        <p>1722</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;G&#8594;D&#8594;C</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Actual length, number of known nucleotides; &#8224;Coverage, average number of reads covering each nucleotide; &#8225;Gene order, of different contigs and scaffolds.</p>
               </tblfn>
            </tbl>
            <p>Table <tblr tid="T1">1</tblr> also gives the results for each separate gene. It shows that different genes are not represented with equal frequency: <it>trpE, trpG </it>and <it>trpB </it>are over-represented. A possible explanation for this is that <it>trpE </it>and <it>trpG </it>homologues take part in other biochemical pathways such as the pathway for para-amino benzoic acid <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and have been incorrectly identified as <it>trp </it>genes.</p>
            <p>A computer search of this type cannot determine the actual enzymatic activity of a particular coding region and this can lead to an over-representation of certain genes. An analysis of the <it>trpG </it>and <it>pabA </it>genes, which are almost certainly derived from a common source, showed that these cannot be distinguished from one another unless they are associated with an adjacent <it>trp </it>gene (for <it>trpG</it>) or a <it>pab </it>gene (for <it>pabA</it>). In the cases where there is no ambiguity as to their identity, it was found that these two genes from the same organism were often more closely related than when they were compared to their counterparts in other organisms (data not shown). An analysis of the <it>trpE </it>and <it>pabB </it>genes, which also have a common origin, gave similar results. Gene duplication could also cause an apparent over-representation and this is discussed below in reference to the occurrence of the two kinds of <it>trpB </it>genes. Genes that encode enzymes that act in more than one pathway and catalyze similar reactions can either appear in searches done on two different pathways or not appear in either search. An example of this phenomenon is the <it>trpF gene</it>, which is discussed below.</p>
            <p>In order to determine the extent of coverage by this search method, an analysis of the <it>trpE</it>, <it>trpD </it>and <it>trpA </it>genes was made using the genes from the ten different organisms listed in Table <tblr tid="T3">3</tblr> as probes. The results of these searches for <it>trpD </it>and <it>trpA </it>are shown in Table <tblr tid="T3">3</tblr>.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Search for <it>trpD </it>and <it>trpA </it>genes using multiple probes</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="center">
                        <p>Species and strain*</p>
                     </c>
                     <c ca="center">
                        <p>matches&#8224;</p>
                     </c>
                     <c ca="center">
                        <p>both&#8225;</p>
                     </c>
                     <c ca="center">
                        <p>probe only&#167;</p>
                     </c>
                     <c ca="center">
                        <p><it>Bacillus </it>only&#182;</p>
                     </c>
                     <c ca="center">
                        <p>% new&#165;</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>trpD</it>
                           </b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Sulfolobus solfataricus </it>P2</p>
                     </c>
                     <c ca="center">
                        <p>454</p>
                     </c>
                     <c ca="center">
                        <p>444</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>2</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Thermoplasma acidophilum </it>DSM 1728</p>
                     </c>
                     <c ca="center">
                        <p>409</p>
                     </c>
                     <c ca="center">
                        <p>404</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>64</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>1</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Nostoc </it>sp. PCC 7120</p>
                     </c>
                     <c ca="center">
                        <p>436</p>
                     </c>
                     <c ca="center">
                        <p>430</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>1</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Thermoanaerobacter tengcongensis </it>MB4</p>
                     </c>
                     <c ca="center">
                        <p>493</p>
                     </c>
                     <c ca="center">
                        <p>467</p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>6</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Rhodopirellula baltica </it>SH 1</p>
                     </c>
                     <c ca="center">
                        <p>448</p>
                     </c>
                     <c ca="center">
                        <p>442</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>1</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Bacteroides fragilis </it>NCTC 9343</p>
                     </c>
                     <c ca="center">
                        <p>424</p>
                     </c>
                     <c ca="center">
                        <p>419</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>49</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>1</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Corynebacterium jeikeium </it>K411</p>
                     </c>
                     <c ca="center">
                        <p>443</p>
                     </c>
                     <c ca="center">
                        <p>433</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>2</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Methanosphaera stadtmanae </it>DSM 3091</p>
                     </c>
                     <c ca="center">
                        <p>441</p>
                     </c>
                     <c ca="center">
                        <p>433</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>2</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Neisseria meningitidis </it>FAM18</p>
                     </c>
                     <c ca="center">
                        <p>474</p>
                     </c>
                     <c ca="center">
                        <p>458</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>3</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Clostridium kluyveri </it>DSM 555</p>
                     </c>
                     <c ca="center">
                        <p>492</p>
                     </c>
                     <c ca="center">
                        <p>464</p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>6</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>All#</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>514</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>468</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>46</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>10</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>trpA</it>
                           </b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Sulfolobus solfataricus </it>P2</p>
                     </c>
                     <c ca="center">
                        <p>222</p>
                     </c>
                     <c ca="center">
                        <p>222</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>241</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Nostoc </it>sp. PCC 7120</p>
                     </c>
                     <c ca="center">
                        <p>471</p>
                     </c>
                     <c ca="center">
                        <p>445</p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>6</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Pseudomonas putida </it>KT2440</p>
                     </c>
                     <c ca="center">
                        <p>498</p>
                     </c>
                     <c ca="center">
                        <p>457</p>
                     </c>
                     <c ca="center">
                        <p>41</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>9</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Rhodopirellula baltica </it>SH1</p>
                     </c>
                     <c ca="center">
                        <p>478</p>
                     </c>
                     <c ca="center">
                        <p>456</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>5</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Corynebacterium jeikeium </it>K4111</p>
                     </c>
                     <c ca="center">
                        <p>463</p>
                     </c>
                     <c ca="center">
                        <p>432</p>
                     </c>
                     <c ca="center">
                        <p>31</p>
                     </c>
                     <c ca="center">
                        <p>31</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>7</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Bacteroides fragilis </it>NCTC 9343</p>
                     </c>
                     <c ca="center">
                        <p>437</p>
                     </c>
                     <c ca="center">
                        <p>431</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>1</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Clostridium kluyveri </it>DSM 555</p>
                     </c>
                     <c ca="center">
                        <p>475</p>
                     </c>
                     <c ca="center">
                        <p>443</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>7</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Thermoplasma acidophilum </it>DSM 1728</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>438</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Neisseria meningitidis </it>053442</p>
                     </c>
                     <c ca="center">
                        <p>479</p>
                     </c>
                     <c ca="center">
                        <p>452</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>6</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Leptospira biflexa </it>serovar Patoc</p>
                     </c>
                     <c ca="center">
                        <p>474</p>
                     </c>
                     <c ca="center">
                        <p>451</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>5</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>All#</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>517</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>463</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>54</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>12</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>* Species and strain, those used to probe the database &#8224; Matches, number of genes detected using the specific probe &#8225; Both, genes detected by both the specific probe and that from <it>Bacillus</it>; &#167; Probe only, those sequences detected by the specific probe but not by that from <it>Bacillus </it>&#182; <it>Bacillus </it>only, those sequences detected by the <it>Bacillus </it>probe but not by the specific probe &#165; % new, per cent of new sequences not detected by the <it>Bacillus </it>probe # All, the total number of sequences found by all probes; those that were common to <it>Bacillus </it>and one or more of the specific probes; the number of genes found with specific probes but not by that from <it>Bacillus </it>(new sequences); those found by the <it>Bacillus </it>probe but not by the others; the per cent of new sequences, that is the number of new sequences divided by the number of <it>Bacillus </it>sequences times 100. The data given in the table are raw data without the elimination of sequences that are somewhat doubtful because in this table we are trying to maximally expand the search parameters.</p>
               </tblfn>
            </tbl>
            <p>The analysis of <it>trpE </it>sequences is complicated by the concomitant detection of <it>pabB </it>sequences. New <it>trpE </it>sequences were uncovered and these usually represent about 10% of those detected using the <it>Bacillus </it>probe. Using probes of ten species to search for <it>trpD </it>led to the discovery of an average of about 3% for each probe. However as many of the new genes will appear in more than one search, only an additional 10% (46/468) of new <it>trpD </it>genes were found <it>in toto</it>. Table <tblr tid="T3">3</tblr> also presents the data for <it>trpA</it>, another gene for which little ambiguity is anticipated. That search again led to the discovery of new genes (an average of 4.5% per search) but again the total of new <it>trpA </it>genes from the ten probes was only 12% (54/463). Therefore, the coverage provided by the <it>Bacillus </it>probes, while not complete, renders a fairly accurate picture of the <it>trp </it>genes in the Sargasso Sea metagenome database. We would expect that using more and more probes would be subject to the law of diminishing returns.</p>
         </sec>
         <sec>
            <st>
               <p>Operon structures</p>
            </st>
            <p>Table <tblr tid="T4">4</tblr> summarizes the number of scaffolds and contigs that contain several <it>trp </it>genes. Some scaffolds have all seven <it>trp </it>genes grouped together. The descriptions of several scaffolds of particular interest are presented in Table <tblr tid="T5">5</tblr>. Eleven of the 24 scaffolds and contigs containing 4 <it>trp </it>genes were lacking flanking sequences, and therefore could not be considered as split operons. The other 13 had genes unrelated to the <it>trp </it>operon on both ends, or at least after the <it>trpC </it>gene (for split operons of the <it>EGDC </it>type), and therefore fit the definition of split operons. In the 61 scaffolds and contigs that have three genes together, only 16 contain <it>trp </it>genes flanked by those that are unrelated and can be unambiguously denoted as split-operons. The following previously described split-operons were found: <it>E&#8594;G&#8594;D&#8594;C, F&#8594;B&#8594;A, F&#8594;B&#8594;X&#8594;A</it>. Calculations of frequencies of gene pairs (Figure <figr fid="F2">2</figr>) hint that the first two split operons are the most abundant within the Sargasso Sea metagenome, while other organizations, including the classical full operon, are much less abundant. This conclusion may be supported by the very few <it>C&#8594;F </it>pairs that have been found.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Distribution of neighboring genes involving at least one <it>trp </it>gene</p>
               </caption>
               <text>
                  <p><b>Distribution of neighboring genes involving at least one <it>trp </it>gene</b>. (a) Each arrow connects neighboring genes, its size and color represents number of pairs found in the Sargasso metagenome (see legend, only pairs observed more than 30 times are shown). Pairs of genes composing the two split operons E&#8594;G&#8594;D&#8594;C and F&#8594;B&#8594;A are abundant while the pair C&#8594;F was rarely found. This may hint that the <it>trp </it>genes are usually organized as split operons rather than as full operons. <b>(b) </b>The representation of classical full and split trp operons.</p>
               </text>
               <graphic file="gb-2008-9-1-r20-2"/>
            </fig>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Number of contigs and scaffolds containing multiple <it>trp </it>genes</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>No. of <it>trp </it>genes</p>
                     </c>
                     <c ca="center">
                        <p>No. of contigs and scaffolds</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>780</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2,046</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Description of selected scaffolds</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Scaffold</p>
                     </c>
                     <c ca="center">
                        <p>No of <it>trp </it>genes in the scaffold</p>
                     </c>
                     <c ca="center">
                        <p>Gene order</p>
                     </c>
                     <c ca="center">
                        <p>Comments</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH027495</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>EGD(CF)B</p>
                     </c>
                     <c ca="left">
                        <p>Lack of <it>trpA </it>gene Gap of unsequenced DNA between <it>trpB </it>and those genes that are unrelated to <it>trp </it>genes may contain gene <it>trpA</it>.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH027608</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>DCFBA</p>
                     </c>
                     <c ca="left">
                        <p>Lack of <it>trpE </it>and <it>trpG </it>genes. However, the region between <it>trpD </it>and genes unrelated to <it>trp </it>is missing.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH011919</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>EGDCBA</p>
                     </c>
                     <c ca="left">
                        <p>Lack of a <it>trpF </it>gene There is a gap in the sequence between two neighboring contigs that contain E-G-D-C on the one hand and B-A on the other. Until the connecting pieces are found in both these cases, no decision can be made as to whether the missing genes are separate from the other <it>trp </it>genes.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH005689</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>EGDFB</p>
                     </c>
                     <c ca="left">
                        <p>Lacks both <it>trpC </it>and <it>trpA</it>. While the absence of <it>trpC </it>is not in doubt because <it>trpD </it>is adjacent to <it>trpF</it>, and on the same contig, <it>trpA </it>is probably missing due to the incompleteness of the sequence.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH026313</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>DCFB</p>
                     </c>
                     <c ca="left">
                        <p>Lack of <it>trpE trpG </it>and <it>trpA </it>genes. Not definite that this is a split operon because of gaps between <it>trpD</it>/<it>trpB </it>and their neighboring genes. Moreover the gap between <it>trpD </it>and <it>trpC </it>challenge the correctness of assembly</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>AACY01051805 AACY01049273</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>EGDCFBA</p>
                     </c>
                     <c ca="left">
                        <p><it>Shewanella oneidensis</it>, SAR-1 and SAR-2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CH004526 CH004459</p>
                     </c>
                     <c ca="center">
                        <p>Split operon: 4 and 3</p>
                     </c>
                     <c ca="center">
                        <p>EGDC FBXA</p>
                     </c>
                     <c ca="left">
                        <p>One interesting feature of the <it>trp </it>genes of <it>Burkholderia </it>SAR-1 should be mentioned: in all previously known genomes of <it>Burkholderia </it>sp., the split-operons contain <it>F&#8594;B&#8594;X&#8594;A </it>where "X" is unrelated to known <it>trp </it>genes. The sequence from the Sargasso Sea metagenome of SAR-1 <it>Burkholderia</it>-like sequences contains an <it>F&#8594;X&#8594;A </it>split operon. The computer program used by Venter and colleagues failed to identify a <it>trpB </it>gene within the sequence. However when a search was made using the <it>Burkholderia trpB </it>sequence as a probe, a <it>trpB </it>gene was detected between <it>trpF </it>and X, as is true for all other <it>Burkholderia </it>species and there were no non-<it>trp </it>genes between <it>trpF </it>and <it>trpB</it>.</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>As illustrated in Figure <figr fid="F3">3</figr>, most of the complete and incomplete <it>trp </it>gene clusters maintain the structure of the prototype <it>trp </it>operon. All genes within these clusters have the same direction of transcription and the same gene order. Two of the split operons, [GenBank: <ext-link ext-link-type="gen" ext-link-id="AACY01080023">AACY01080023</ext-link>] and [GenBank: <ext-link ext-link-type="gen" ext-link-id="AACY01120345">AACY01120345</ext-link>], seem to be from the genome of <it>Burkholderia </it>SAR-1, while two full operons described in Table <tblr tid="T5">5</tblr> seem to come from <it>Shewanella </it>SAR 1 and 2. As the sequences of these do not differ from those found earlier for those organism and the probable source of these is a filter contamination as has been stated in several papers <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp> they were not taken into account in our calculations.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Alignment of <it>trp </it>sequences from different contigs and scaffolds</p>
               </caption>
               <text>
                  <p><b>Alignment of <it>trp </it>sequences from different contigs and scaffolds</b>. The following abbreviations are used: E, <it>trpE</it>; G, <it>trpG </it>(or sequences with a high similarity to <it>pabA</it>); C, <it>trpC</it>; D, <it>trpD</it>; F, <it>trpF</it>; B, <it>trpB</it>; A, <it>trpA</it>; Unk, an ORF with unknown function; <it>truA</it>, the tRNA pseudouridine synthase; <it>moaC</it>, a protein related to the molybdenum cofactor; <it>SSL22</it>, DNA or RNA helicases of superfamily II; <it>lexA</it>, the SOS-response transcriptional repressor.</p>
               </text>
               <graphic file="gb-2008-9-1-r20-3"/>
            </fig>
            <p>Two contigs show a different type of organization than that generally found in bacteria. In one contig [GenBank: <ext-link ext-link-type="gen" ext-link-id="AACY01110889">AACY01110889</ext-link>] <it>trpF </it>is followed by a gene that is a fusion between <it>trpE </it>and <it>trpG</it>. This contig is a part of a scaffold, [GenBank: <ext-link ext-link-type="gen" ext-link-id="CH022404">CH022404</ext-link>], which shows no similarity to any known bacterium with regard to <it>trpE </it>and <it>trpG</it>. While the fusion of <it>trpG </it>and <it>trpE </it>has been found in bacteria such as <it>Legionella pneumophila, Rhodopseudomonas palustris, Thermomonospora fusca, Anabaena sp</it>. and <it>Nostoc punctiforme</it>, none of them contain the gene order <it>F-(E-G)</it>. However, the gene order <it>trpF</it>-<it>trpE-trpG </it>has been found in some <it>Archaea </it>such as <it>Halobacterium sp., Methanosarcina barkeri </it>and <it>Ferroplasma acidarmanus</it>, but in these species <it>trpE </it>and <it>trpG </it>are separate genes. In a second contig [GenBank: <ext-link ext-link-type="gen" ext-link-id="AACY01079380">AACY01079380</ext-link>] the gene order <it>trpG-trpC </it>has been observed. This gene order has already been described for <it>Archaea </it>such as <it>Thermoplasma acidophilum, Thermoplasma volcanium, Ferroplasma acidarmanus and Sulfolobus solfataricus </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
            <p>The order of adjacent <it>trp </it>genes within two scaffolds, [GenBank: <ext-link ext-link-type="gen" ext-link-id="CH025058">CH025058</ext-link>] (gene order: <it>B-A-E-G-D-C</it>) and [GenBank: <ext-link ext-link-type="gen" ext-link-id="AACY01110889">AACY01110889</ext-link>] (gene order: <it>F-(EG</it>)) are entirely novel and have not been observed to date. Both have a relatively high coverage in the database, which confirms the importance and abundance of these gene orders in marine populations. An analysis of other, non-<it>trp </it>genes within these scaffolds failed to reveal any significant similarity between them and known genomes.</p>
            <p>A phylogenetic analysis of some of these complete and split operons was made against operons from known organisms. The results are presented in Figure <figr fid="F4">4</figr>. All the full operons are much more related to the full operons of known organisms than they are to the split operons of other known species. The figure also shows that most of the split operons are grouped with split operons from known organisms. The four exceptions to this rule are probably due to incomplete sequences and these are likely to be full operons. This analysis also supports our hypothesis that split operons are more prevalent than full operons (Figure <figr fid="F2">2</figr>) in the Sargasso Sea metagenome</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Phylogenetic analysis of scaffolds and contigs containing whole and complete operons</p>
               </caption>
               <text>
                  <p><b>Phylogenetic analysis of scaffolds and contigs containing whole and complete operons</b>. The concatenated amino acid sequences from genes <it>trpE</it>, <it>trpG</it>, <it>trpD</it>, and <it>trpC </it>were used to analyze the relationships among both known species and those from the Sargasso Sea metagenome. Full operons are written in bold whereas split operons are not.</p>
               </text>
               <graphic file="gb-2008-9-1-r20-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Non-operon organization</p>
            </st>
            <p>As shown in Table <tblr tid="T4">4</tblr>, 70% of the contigs and scaffolds detected have a single <it>trp </it>gene. Those with two <it>trp </it>genes are also very prevalent (26%) even though some of these are probably partial segments of larger operons. As shown in Table <tblr tid="T6">6</tblr>, 133 scaffolds and contigs carry one or two <it>trp </it>genes enclosed between non-<it>trp </it>genes. While <it>trpE </it>and <it>trpG </it>may be overrepresented due to the existence of homologous genes as mentioned above, other <it>trp </it>genes are also observed in a "detached" manner. This indicates that the <it>trp </it>genes of marine organisms are frequently detached or occur as pairs.</p>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>Frequency of scaffolds and contigs containing unusual organizations of trp genes.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Gene order (enclosed)*</p>
                     </c>
                     <c ca="center">
                        <p>No of occurences&#8224;</p>
                     </c>
                     <c ca="center">
                        <p>Gene order (partial) &#8225;</p>
                     </c>
                     <c ca="center">
                        <p>No of occurences</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>X&#8594;E&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>E&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>55</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>X&#8594;G&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>42</p>
                     </c>
                     <c ca="center">
                        <p>X&#8594;G</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>X&#8594;D&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>G&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>108</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>X&#8594;C&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>X&#8594;D</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>X&#8594;F&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>D&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>X&#8594;B&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>X&#8594;C</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>X&#8594;A&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>F&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>X&#8594;B&#8594;A(&#8594;X)</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>X&#8594;B</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>(X&#8594;) E&#8594;G&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>B&#8594;X</p>
                     </c>
                     <c ca="center">
                        <p>49</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>X&#8594;A</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>Total</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>133</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Total</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>431</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>* Gene order (enclosed), organizations of one and two trp operons enclosed between non-trp genes; &#8224; Number of occurrences, number of contigs and scaffolds carrying the organization; &#8225; Gene order (partial), pairs of <it>trp </it>and non-<it>trp </it>genes that are inconsistent with classical organization.</p>
               </tblfn>
            </tbl>
            <p>The existence of pairs of <it>trp </it>genes makes good sense biochemically. Anthranilate synthase is composed of an equal number of <it>trpE </it>and <it>trpD </it>encoded subunits. Tryptophan synthase contains two subunits each of the polypeptides from the <it>trpA </it>and <it>trpB </it>genes. The <it>trpG </it>when unfused to <it>trpE </it>or <it>trpD </it>leads to a polypeptide also found in equimolar amounts to those from <it>trpE </it>and <it>trpD</it>. Organizing these specific genes in pairs would seem to ensure that they are transcribed together and render the proper amounts of the translation products.</p>
            <p>The occurrence of detached <it>trp </it>genes is apparently an adaptation to the particular environment in which marine organisms are found. Most of the bacteria previously analyzed probably encounter periods of feast and famine with regard to tryptophan. Therefore they need to respond to external conditions that vary. The existence of transport systems for concentrating externally found tryptophan and the organization of the <it>trp </it>biosynthetic genes into operons almost certainly reflect their environmental challenges. In contrast, marine organisms exist in a rather constant environment with respect to tryptophan. It is unlikely that tryptophan from external sources is available and this amino acid must be synthesized entirely within the bacterial cell. The main regulation of the pathway is expected to be at the level of feedback inhibition and it is probable that <it>trp </it>gene expression is constitutive rather than controlled by the mechanism of repression-derepression. The level of expression of a detached <it>trp </it>gene can be controlled simply by modifying the strength of the associated promoter. A <it>trp </it>repressor or repressors and attenuation become superfluous under such circumstances. This should extend to most or all of the other genes involved in amino acid biosynthesis. Therefore axenic cultures of some of these marine organisms are eagerly awaited.</p>
         </sec>
         <sec>
            <st>
               <p>Conserved non-<it>trp </it>flanking genes</p>
            </st>
            <p>Another way of examining the evolution of the <it>trp </it>genes and the relationships between various species is the analysis of genes not involved in tryptophan biosynthesis that either neighbor the <it>trp </it>genes or are inserted between them. Xie and colleagues have reported that <it>trpF</it>, <it>trpB </it>and <it>trpA </it>in split-pathway operons are flanked by conserved genes that are unrelated to tryptophan biosynthesis <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. They have found genes that encode the &#946;-subunit of acetyl-coenzymeA-carboxylase (<it>accD</it>), folylpolyglutamate synthase/dihydrofolate synthase (<it>folC</it>), fimbria V protein (<it>lysM</it>) and the tRNA pseudouridine synthase (<it>truA</it>). In most cases the genes <it>accD </it>and <it>folC </it>follow <it>trpA</it>. For the <it>Thiobacillus-Pseudomonas</it>-<it>Azotobacter </it>cluster and others, the <it>trpF</it>-<it>trpB</it>-<it>trpA </it>operon is flanked on the <it>trpF </it>side by <it>lysM </it>and <it>truA</it>. The presence of particular genes appearing near those of <it>trp </it>was examined using the Sargasso Sea metagenome data and the results of this analysis are shown in Table <tblr tid="T7">7</tblr>.</p>
            <tbl id="T7">
               <title>
                  <p>Table 7</p>
               </title>
               <caption>
                  <p>Genes flanking the <it>trp </it>operon</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="center">
                        <p>Gene</p>
                     </c>
                     <c ca="center">
                        <p>Number of times in the metagenome</p>
                     </c>
                     <c ca="left">
                        <p>Percent found near <it>trp </it>genes</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>TruA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="left">
                        <p>93% are adjacent and before <it>trpF</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>AccD</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>53</p>
                     </c>
                     <c ca="left">
                        <p>86.8% are adjacent and after <it>trpA</it></p>
                        <p>9.4% are adjacent and after <it>trpB</it>; <it>trpA </it>is elsewhere</p>
                        <p>3.8% are adjacent and after <it>trpF</it>; <it>trpB </it>and <it>A </it>are absent</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>FolC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="left">
                        <p>77% occur as <it>trpA-accD-folC</it></p>
                        <p>23% occur in the order of <it>trpB-accD-folC</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>PyrF</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>60</p>
                     </c>
                     <c ca="left">
                        <p>77% are before <it>trpF </it>in split operons</p>
                        <p>23% are before <it>trpB</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>LexA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="left">
                        <p>100% are adjacent and after <it>trpC </it>when <it>trpF </it>is elsewhere</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>MoaC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>100% neighbor and are after <it>trpC</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>PLPDE_IV</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="left">
                        <p>57% adjacent and after <it>trpE</it></p>
                        <p>38% adjacent and after <it>trpG</it></p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>The first three rows of Table <tblr tid="T7">7</tblr> confirm previous publications. In addition, four other genes, not previously noted, were found with high frequencies near the <it>trp </it>genes of the Sargasso Sea metagenome: <it>pyrF </it>(orotidine-5'-phosphate decarboxylase), <it>lexA </it>(the SOS-response transcriptional repressor), <it>moaC </it>(a protein related to the molybdenum cofactor) and <it>PLPDE_IV </it>(the class of amino acid aminotransferases). It should be mentioned that <it>PLPDE_IV </it>is the only gene, besides <it>aroG </it>and <it>aroH </it>(see below), found near the <it>trp </it>genes that can be logically connected to tryptophan biosynthesis. This class of amino-transferases includes some D-amino acid transferases, pyridoxal-5-phosphate-dependent enzymes such as tryptophanase, and others. If in fact the cell is able to use D-tryptophan as a source of L-tryptophan via a D-amino acid transferase, then the inclusion of a gene encoding such an activity among the <it>trp </it>genes would make sense as this gene would undergo derepression in coordination with those involved in L-tryptophan biosynthesis.</p>
            <p>It is clear that specific neighboring genes are very prevalent when a split <it>trp </it>operon occurs. It seems unlikely that the same event has occurred many times: strains with these particular flanking genes are most likely derived from a common ancestor.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of <it>trpB </it>genes</p>
            </st>
            <p>Surprisingly, it has been found that a significant number of organisms possess more than one <it>trpB </it>gene encoding the &#946;-chain of tryptophan synthase. Usually, but not always, the 'extra' gene is unlinked to the <it>trpA </it>gene encoding the &#945; chain of this enzyme. These extra <it>trpB </it>genes belong to a distinct subgroup encoding the &#946;-chain which is termed <it>trpB</it>_2. This had been recognized in the COGs database as "alternative tryptophan synthase" - COG<sub>1350 </sub><abbrgrp><abbr bid="B11">11</abbr></abbrgrp> while the major group is denoted as <it>trpB</it>_1 and includes the well-studied polypeptides from such organisms as <it>Escherichia coli, Salmonella typhimurium </it>and <it>Bacillus subtilis</it>. The minor <it>trpB</it>_2 group includes mostly, but not exclusively, archaeal species. The evolution and properties of <it>trpB</it>_2, have been analyzed and discussed in a number of recent articles <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>.</p>
            <p>The 3-dimensional structure of tryptophan synthase from <it>Salmonella typhimurium </it>has been elucidated by X-ray crystallography to a resolution of 2.5 angstroms <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The enzyme is a &#945;&#946;&#946;&#945; complex which forms an internal hydrophobic tunnel into which indole, produced by the a subunit, enters and then reaches the active site of the b subunit. The &#945; monomers and &#946; dimers contact one another via a highly specific mechanism of recognition. In addition, the genes encoding these two subunits are almost always closely linked and their expression is frequently translationally coupled <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>.</p>
            <p>The data collected from the Sargasso Sea metagenome were examined to determine whether the <it>trpB </it>sequences from the Sargasso Sea differ from those of known organisms and whether both <it>trpB</it>_1 and <it>trpB</it>_2 exist in this sample. When a phylogenetic analysis of <it>trpB </it>genes found in the present survey was conducted, it was found that the majority of these (Figure <figr fid="F5">5</figr>) fall into the <it>trpB</it>_1 group while a few <it>trpB</it>_2 genes also occur. Among the <it>trpB</it>_1 genes, one cluster is quite distinct and probably split off from major type at a relatively early stage. Genes in this cluster have a high similarity to the marine bacterium <it>Pelagibacter ubique </it>(Candidatus) HTCC1062 (SAR11) and the sequence identity of these to <it>P. ubique </it>at the amino acid level was between 64% and 87% while the genes neighboring some of these <it>trpB</it>s showed an even higher identity to their counterparts from SAR11. One of the most remarkable features of <it>P. ubique </it>is its extremely small genome that lacks any pseudogenes or recent gene duplications. It has only one copy of <it>trpB</it>, and therefore it can be concluded that this gene must be functional in tryptophan biosynthesis and not a pseudogene. <it>P. ubique </it>contains two split operons: <it>trpE-trpG-trpD-trpC </it>and <it>trpF</it>-<it>trpB</it>-<it>trpA</it>. The gene order of the neighboring, non-related <it>trp </it>genes of the second split operon is: (gene not mentioned above) <it>himD-pyrF-trpF-trpB-trpA-accD-folC</it>. The <it>himD </it>gene encodes a sequence-specific DNA-binding transcriptional activator. Comparison of the gene order between contigs containing SAR11-like <it>trpB </it>from the Sargasso Sea metagenome showed that most of the contigs have a gene order that is similar to SAR11. Three of 37 contigs lack <it>trpF </it>and 2 contigs contain only a <it>trpB </it>gene flanked by genes unrelated to <it>trp </it>and which are similar in sequence and order to that of SAR11. This indicates that most or all of these <it>trpB </it>genes are part of the SAR11 group. Since the <it>trpB </it>of SAR11 is more closely related to <it>trpB_1 </it>than to <it>trpB_2 </it><abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, it seems that the genes from this particular cluster should probably be considered to be of the <it>trpB</it>_1 type.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Representation of Sargasso metagenome <it>trpB </it>sequences and those from known bacteria with respect to genetic distance</p>
               </caption>
               <text>
                  <p><b>Representation of Sargasso metagenome <it>trpB </it>sequences and those from known bacteria with respect to genetic distance</b>. 40 representatives from <it>trpB </it>sequences analyzed here were chosen for this analysis. As can be seen, the constructed tree shows two distinct groups; however a third group appears which consists of only environmental sequences and the Ple (<it>Pelagibacter ubique </it>(Candidatus)) sequence. The abbreviation of <it>trpB </it>genes from known bacteria are listed in Table 8. For the environmental <it>trpB </it>sequences abbreviation the NCBI accession numbers were taken. Bootstraps for the main groups are shown.</p>
               </text>
               <graphic file="gb-2008-9-1-r20-5"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The tryptophan operon of bacteria has been studied for more than 50 years and its structure and regulation are known for many terrestrial organisms that can be grown in laboratory culture. With the explosive expansion of genomics during the last decade and the data thus generated, many <it>trp </it>sequences from both known and unknown marine species have become available. This provides an excellent opportunity for expanding our knowledge about the ways in which different organisms, particularly marine bacteria, have organized these genes. In the present research, <it>trp </it>pathway genes within the Sargasso Sea database were retrieved by BLAST analysis using known <it>trp </it>protein sequences. It was found that <it>trp </it>genes account for about 5% of all genes that were previously identified as genes for amino acid synthesis in the Sargasso Sea metagenome. In almost all cases in which the <it>trp </it>genes form an operon, the order and direction of transcription of the <it>trp </it>genes are similar to familiar prototypes. The reason for this conservation remains unknown. This might be explained in part by an advantage conferred when genes whose products form complexes are adjacent to one another and translational coupling occurs. Of the 85 contigs and scaffolds that contain three or four <it>trp </it>genes, only 29 could be unambiguously defined as containing split pathway operons. The following already known orders of split operons were found: <it>E&#8594;G&#8594;D&#8594;C, F&#8594;B&#8594;A</it>. In addition, we have found evidence for completely dispersed <it>trp </it>genes in the form of isolated and pairs of genes.</p>
         <p>Since these marine organisms survive and grow in a very different environment from those organisms previously studied, they are likely to have been genetically separated from them and to have evolved to solve the particular regulatory problems that exist in their environment. It was expected that some marine bacteria would exhibit novel organizations of these genes and such organizations were in fact found. Among the <it>trp </it>genes organized into operon structures, most resemble examples already discovered. In addition, two previously unknown groupings were uncovered in the present search. However a notable quantity of genes that were either detached or in mini-operons containing only two <it>trp </it>genes was discovered. Novel organizations of the <it>trp </it>genes probably arise from adaptations to the marine environment and it is likely that some marine bacteria will have unusual regulatory features. Such features can only be elucidated when these organisms become amenable to axenic culture. Cloning and expressing these genes in the laboratory from those organisms that cannot yet be cultured may however provide a partial regulatory picture. In this regard, a search for genes related to the <it>trpR </it>gene of <it>Escherichia coli </it>(the gene that encodes the tryptophan repressor) in the Sargasso Sea metagenome was performed. This search failed to reveal any significant <it>trpR </it>homologs. This is not surprising, because regulatory circuits undoubtedly arise later than the genes for biosynthesis and are adaptations to specific environments.</p>
         <p>Genes with unknown function have been previously found to be inserted within the <it>trp </it>operon <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Such genes were found between <it>trpB </it>and <it>trpA </it>in one contig from the Sargasso Sea metagenome, a location already observed for some species of <it>Flavobacterium </it>and <it>Burkholderia</it>. Another contig carried such a gene between <it>trpF </it>and <it>trpB</it>. While the reason for the presence of these non-<it>trp </it>genes is unclear and the possibility exists that they are simply morons <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, it is possible that they actually participate in tryptophan biosynthesis. That is, these genes may not be essential for tryptophan synthesis but rather aid it by increasing the catalysis of one of the enzymes or by being involved in complex formation. Even a very small advantage is expected to be of great importance for the survival of an organism in an oligotrophic environment such as that of the Sargasso Sea.</p>
         <p>One should keep in mind that the arrangement of genes in operon confers both advantages and disadvantages. The most obvious advantage is that genes with similar function are transcribed together. The greatest disadvantage is that, unless some further level of regulation exists (differences in the amounts of mRNA or its stability, the strength of ribosomal binding sites, and so on), the amount of the polypeptides from these genes will be the same even though the resultant enzymes may have different catalytic rates <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The ones with slower rates will be the limiting factor. As a result, when the genes are transcribed together, an excess of some enzymes is likely to occur. However, the amount of mRNA and polypeptide synthesis is only one aspect of the control of the tryptophan pathway. Besides these, there are two other levels of control that affect the amount of tryptophan synthesis within the cell. The first of these is feed-back inhibition which influences the activity of the first two reactions <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, and thereby the amount of metabolites flowing through the pathway. The second is the formation of multi-enzyme complexes that greatly increases the catalytic efficiency of the various reactions. In complexes, the product of one reaction can be used directly by the next enzyme and the concentration of the substrate in the vicinity of the second enzyme is much higher than would occur were the two enzymes separate. Examples of such complexes are <it>trpE</it>-<it>trpD </it>(<it>trpG</it>) and <it>trpA</it>-<it>trpB </it>and the <it>trpC</it>-<it>trpF </it>gene fusion in <it>Escherichia coli</it>. In addition, one polypeptide can greatly enhance the activity of a second when a complex is formed (for example, in the <it>trpA</it>-<it>trpB </it>heterotetramer, &#945;&#946;&#946;&#945; from <it>Escherichia coli </it><abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>.</p>
         <p>Different solutions to the problems of optimal synthesis of tryptophan and the regulation of <it>trp </it>gene expression would not be surprising since this amino acid is one of the most expensive in chemical terms. One solution might be to organize the <it>trp </it>genes in a different manner; another would be the creation of <it>trp </it>gene fusions. Both of these have been observed. Our analysis uncovered some known gene fusions, <it>E-G </it>[GenBank: <ext-link ext-link-type="gen" ext-link-id="AACY01100727">AACY01100727</ext-link>] and <it>C-F </it>[GenBank: <ext-link ext-link-type="gen" ext-link-id="AACY01022048">AACY01022048</ext-link>] and two novel fusion of a <it>trp </it>gene with a gene unrelated to the <it>trp </it>genes: <it>E-PLPDE_IV </it>[GenBank: <ext-link ext-link-type="gen" ext-link-id="AACY01077237">AACY01077237</ext-link>] and <it>F-TruA </it>[GenBank: <ext-link ext-link-type="gen" ext-link-id="AACY01600616">AACY01600616</ext-link>]. All of above indicate that there is quite a lot of genetic diversity among marine bacteria.</p>
         <p>It was found that several specific genes are often neighbors of the <it>trp </it>genes of marine microorganisms. When present in contigs, <it>lexA</it>, <it>pyrF </it>and <it>moaC </it>were always placed after <it>trpC</it>. This may be a general phenomenon but our information is still too scanty to allow a definite conclusion to be drawn. Similarity in gene order is usually taken to indicate an evolutionary relationship between such segments. Of particular interest was the observation that in 3 cases <it>aroH </it>or <it>aroG </it>occur adjacent to <it>trpA</it>. For these examples, the distance between the end of <it>trpA </it>and the ensuing <it>aro </it>gene is 3, 18, or 20 base pairs, which makes it very likely that the two genes are expressed together. The synthesis and activity of the enzyme they encode, DAHP synthase, is involved in the synthesis of a precursor of chorismic acid and this <it>aro </it>gene is often regulated by the level of tryptophan. Therefore such an arrangement might make sense.</p>
         <p>Since there is more than one kind of <it>trpB </it>gene, a comparison was made of amino acid sequences of <it>trpB </it>genes from the Sargasso Sea metagenome with those from known organisms. The majority of the metagenomic <it>trpB </it>sequences detected fall into the <it>trpB</it>_1 group while some others were related to the <it>trpB</it>_2 group. One cluster containing a number of <it>trpB</it>_1 sequences is quite distant from the usual type and has a high similarity to that of <it>Pelagibacter ubique </it>(Candidatus) HTCC1062 (SAR11). This cluster probably diverged rather early from the major <it>trpB</it>_1 line.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The present analysis has revealed that tryptophan genes are rather frequent within the Sargasso Sea metagenome. All <it>trp </it>genes that were found have enough similarity to COGs to be recognized. This seems to indicate, but does not prove, that all have come from a common ancestor. However, additional genes for tryptophan biosynthesis may exist which we were unable to detect with the probes employed. In this regard, it has been reported <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> that some organisms indeed lack a recognizable <it>trpF </it>in their genomes but are capable of growing without external tryptophan. A gene whose sequence is not homologous to known <it>trpF</it>s but whose product catalyzes this reaction has in fact been found in <it>Streptomyces coelicolor </it>A3 and <it>Mycobacterium tuberculosis </it>HR37Rv <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. This <it>trpF </it>gene is an example of reticulate evolution because it can catalyze reactions in both the histidine and tryptophan pathways <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. A BLAST search with the amino acid sequence of the <it>trp</it>F gene from <it>Streptomyces coelicolor </it>A3 gene (SCO2050) against the Sargasso Sea metagenome data showed more than 500 hits that can be identified as <it>hisA </it>proteins. Thus, only a functional analysis of these environmental sequences can prove whether they can take part in both pathways or not. The fact that a group of marine <it>trpB</it>_1 sequences are similar to one another but quite distant from the major <it>trpB</it>_1 group supports the idea that there may be <it>trp </it>genes that are not recognized as such by those sequences presently known.</p>
         <p>While <it>trp </it>operons, both complete and split, exist in marine bacteria, many <it>trp </it>genes are no longer found in that framework. In contrast to most terrestrial bacteria, the operon structure is not used for the <it>trp </it>genes in some of marine origin. There are mini-operons of 2 genes in many cases (Table <tblr tid="T5">5</tblr>) and also an even more frequent occurrence of single <it>trp </it>genes. It is of course an open question whether what we observe is the result of the breakup of an original operon structure or that the <it>trp </it>operons at present have arisen from these unlinked genes. Since the marine environment is very exacting and selective, it is certain that organisms lacking an operon structure for the <it>trp </it>genes have found an evolutionary advantage in the organization of the <it>trp </it>genes that they possess. It should be mentioned that in <it>Escherichia coli </it>and <it>Salmonella</it>, about 50% of the genes encoding polypeptides involved in amino acid synthesis are separate although their <it>trp </it>genes are not. On the basis of our results in which novel <it>trp </it>gene orders were found, it appears likely that further studies of the <it>trp </it>genes and their regulation and organization will provide many future surprises.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Analysis of Sargasso Sea metagenome database</p>
            </st>
            <p>Amino acid sequences with homology to each <it>trp </it>catalytic domain were obtained from an NCBI BLAST search of the Sargasso Sea metagenome database <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The amino acid sequences from <it>Bacillus subtilis </it>of each pathway catalytic domain were used as query entries for protein BLAST. <it>Bacillus </it>proteins were chosen as a starting point for the search because the catalytic domains are encoded by separate genes. In <it>Bacillus </it>six genes, except <it>trpG</it>, are organized into one operon and have been intensively studied at the level of DNA, RNA and protein levels <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. For the <it>trpB</it>_2 search, the sequence of <it>Chlorobium tepidum </it>CT0192 (Q8KF11) was used. The list of <it>trp </it>genes has been generated in several steps. First, BLAST searches of <it>trp </it>genes against the Sargasso Sea metagenome has been performed, using an e-value threshold of 1e-5. For cross validation, both peptide and DNA sequence databases were searched and the results were compared. While 95% of the ORFs were identified in both searches, some were discovered only once. In such cases a manual check of the results has been performed. In addition, genes that are homologous to <it>trp </it>genes (<it>PabA</it>, <it>PabB</it>, <it>PhzA </it>and <it>PhzB</it>) were used to remove misclassified <it>trpE </it>and <it>trpG </it>genes. As a result, a list of contigs containing <it>trp </it>genes was created. Redundant contigs were removed based on BLAST searches with a 95% identity threshold. In the last step, contigs that belong to the same scaffolds were identified and treated. The results of the above semi-automatic process were validated by large-scale manual examinations.</p>
            <p>In order to assemble a contig, Venter and colleagues used the Celera Assembler <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. To validate the Sargasso Sea scaffolds the following procedure was performed. First, all singleton reads composing each scaffold were retrieved by conducting a BLASTn search for the scaffolds against the Sargasso Sea reads. Next, the SEQUENCHER program (Gene Codes Corporation) was used for re-assembling the reads and the results were compared to each original scaffold for validation. No significant differences between the assemblies of the Celera Assembler and SEQUENCHER were found.</p>
            <p>Coverage was calculated by recruiting reads from Sargasso Sea using BLAST, considering only reads with 90% and higher identity to the scaffold and at least 80% of the read taking part in the alignment. These parameters are rather stringent, but give a good indication with respect to the distribution of each scaffold.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic analysis</p>
            </st>
            <p>Amino acid sequences of many <it>trpB </it>genes were used to analyze the phylogenetic relationships between different environmental samples. Only genes encoding more than 251 amino acids were analyzed. The alignment was done using the ClustalW program <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Neighbor joining (NJ) and maximum parsimony (MP) analyses were conducted on protein data sets using version 4.0b10 of PAUP <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Default parameters were used in all analyses. Bootstrap resampling of NJ (1000 replicates) and MP (1000 replicates) trees were performed in all analyses to evaluate the reliability of the inferred topologies. The resultant trees were viewed through the TreeView (Win32) program <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. To understand the relationship between the sub-families each was analyzed both by comparing one group against the others and to representative <it>trpB </it>gene sequences that exist in the NCBI database.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>MP, Maximum Parsimony analysis;  NJ, Neighbor Joining analysis; ORF, Open Reading Frame; SSM, Sargasso Sea Metagenome; trp, Tryptophan. Additionally Table 8 lists the species names used and their abbreviations</p>
         <tbl id="T8">
            <title>
               <p>Table 8</p>
            </title>
            <caption>
               <p>List of species names and their abbreviations</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c ca="left">
                     <p>Species name</p>
                  </c>
                  <c ca="left">
                     <p>Abbreviation used</p>
                  </c>
                  <c ca="left">
                     <p>NCBI number</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Aeropyrum pernix</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Ape-1</p>
                  </c>
                  <c ca="left">
                     <p>Q9Y8T5</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Aeropyrum pernix</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Ape-2</p>
                  </c>
                  <c ca="left">
                     <p>Q9Y9H2</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Aquifex aeolicus</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Aae</p>
                  </c>
                  <c ca="left">
                     <p>O67409</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Arabidopsis thaliana</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Ath-1</p>
                  </c>
                  <c ca="left">
                     <p>P14671</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Arabidopsis thaliana</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Ath-2</p>
                  </c>
                  <c ca="left">
                     <p>BAB10143</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Archaeoglobus fulgidus</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Afu-1</p>
                  </c>
                  <c ca="left">
                     <p>O28672</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Archaeoglobus fulgidus</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Afu-2</p>
                  </c>
                  <c ca="left">
                     <p>O29028</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Bordetella pertussis</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Bpe</p>
                  </c>
                  <c ca="left">
                     <p>NP_882102</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Campylobacter jejuni</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Cje</p>
                  </c>
                  <c ca="left">
                     <p>CAL34499</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p><it>Pelagibacter ubique </it>(Candidatus) HTCC1062</p>
                  </c>
                  <c ca="left">
                     <p>Ple</p>
                  </c>
                  <c ca="left">
                     <p>YP_265913</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Chlamydia psittaci</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Cps</p>
                  </c>
                  <c ca="left">
                     <p>Q822W9</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Chlorobium tepidum</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Cte</p>
                  </c>
                  <c ca="left">
                     <p>Q8KF11</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Corynebacterium diphtheriae</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Cdip-1</p>
                  </c>
                  <c ca="left">
                     <p>NP_940652</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Corynebacterium diphtheriae</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Cdip-2</p>
                  </c>
                  <c ca="left">
                     <p>NP_940660</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Escherichia coli</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Eco</p>
                  </c>
                  <c ca="left">
                     <p>P0A879</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Geobacter sulfurreducens</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Gsu</p>
                  </c>
                  <c ca="left">
                     <p>AAT73768</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Haemophilus influenzae</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Hin</p>
                  </c>
                  <c ca="left">
                     <p>P43760</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Lactococcus lactis</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Lla</p>
                  </c>
                  <c ca="left">
                     <p>Q01998</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Legionella pneumophila</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Lpn</p>
                  </c>
                  <c ca="left">
                     <p>CAH15507</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Mesorhizobium loti</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Mlo</p>
                  </c>
                  <c ca="left">
                     <p>NP_105798</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Methanobacterium thermoautotrophicum</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Mth-1</p>
                  </c>
                  <c ca="left">
                     <p>O27696</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Methanobacterium thermoautotrophicum</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Mth-2</p>
                  </c>
                  <c ca="left">
                     <p>O27520</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Methanococcus jannaschii</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Mja</p>
                  </c>
                  <c ca="left">
                     <p>Q60179</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Methanosarcina barkeri</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Mba</p>
                  </c>
                  <c ca="left">
                     <p>AAZ72487</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Mycobacterium bovis</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Mbo</p>
                  </c>
                  <c ca="left">
                     <p>NP_855291</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Mycobacterium tuberculosis</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Mtu</p>
                  </c>
                  <c ca="left">
                     <p>P66984</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Neisseria gonorrhoeae</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Ngo</p>
                  </c>
                  <c ca="left">
                     <p>Q84GJ9</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Neisseria meningitides</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Nme</p>
                  </c>
                  <c ca="left">
                     <p>AAF41116</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Pyrobaculum aerophilum</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Paero</p>
                  </c>
                  <c ca="left">
                     <p>Q8ZV44</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Pyrococcus abyssi</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Pab-2</p>
                  </c>
                  <c ca="left">
                     <p>Q9V150</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Pyrococcus furiosus</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Pfu-2</p>
                  </c>
                  <c ca="left">
                     <p>Q8U0J5</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Pyrococcus horikoshii</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Pho</p>
                  </c>
                  <c ca="left">
                     <p>NP_143439</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Rhodopseudomonas palustris</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Rpa-1</p>
                  </c>
                  <c ca="left">
                     <p>YP_779393</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Salmonella typhimurium</it>
                     </p>
                  </c>
                  <c ca="left">
                     <p>Sty</p>
                  </c>
                  <c ca="left">
                     <p>NP_460685</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>
                        <it>Staphylococcus aureus</it>
 