<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2009-10-8-r85</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Community-wide analysis of microbial genome sequence signatures</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Dick</snm>
               <mi>J</mi>
               <fnm>Gregory</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>gdick@umich.edu</email>
            </au>
            <au id="A2">
               <snm>Andersson</snm>
               <mi>F</mi>
               <fnm>Anders</fnm>
               <insr iid="I1"/>
               <insr iid="I4"/>
               <insr iid="I5"/>
               <email>doubleanders@gmail.com</email>
            </au>
            <au id="A3">
               <snm>Baker</snm>
               <mi>J</mi>
               <fnm>Brett</fnm>
               <insr iid="I1"/>
               <email>acidophile@gmail.com</email>
            </au>
            <au id="A4">
               <snm>Simmons</snm>
               <mi>L</mi>
               <fnm>Sheri</fnm>
               <insr iid="I1"/>
               <email>sherisim@gmail.com</email>
            </au>
            <au id="A5">
               <snm>Thomas</snm>
               <mi>C</mi>
               <fnm>Brian</fnm>
               <insr iid="I1"/>
               <email>bcthomas@berkeley.edu</email>
            </au>
            <au id="A6">
               <snm>Yelton</snm>
               <fnm>A Pepper</fnm>
               <insr iid="I1"/>
               <email>pepperyelton@hotmail.com</email>
            </au>
            <au ca="yes" id="A7">
               <snm>Banfield</snm>
               <mi>F</mi>
               <fnm>Jillian</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jbanfield@berkeley.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Earth and Planetary Science, University of California, 307 McCone Hall, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Environmental Science, Policy, and Management, University of California, Hilgard Hall, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I3">
               <p>Current address: Department of Geological Sciences, University of Michigan, 1100 N. University Ave, Ann Arbor, MI 48109-1005, USA</p>
            </ins>
            <ins id="I4">
               <p>Current address: Evolutionary Biology Centre, Department of Limnology, Uppsala University, Norbyv. 18 D, SE-75236, Uppsala, Sweden</p>
            </ins>
            <ins id="I5">
               <p>Current address: Department of Bacteriology, Swedish Institute for Infectious Disease Control, Nobels v&#228;g 18 SE-17182 Solna, Sweden</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>8</issue>
         <fpage>R85</fpage>
         <url>http://genomebiology.com/2009/10/8/R85</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2009-10-8-r85</pubid>
               <pubid idtype="pmpid">19698104</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>29</day>
               <month>4</month>
               <year>2009</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>10</day>
               <month>7</month>
               <year>2009</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>21</day>
               <month>8</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>21</day>
               <month>8</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Dick et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Genome signatures in metagenomic datasets</p>
      </shorttitle>
      <shortabs>
         <p>Genome signatures are used to identify and cluster sequences de novo from an acid biofilm microbial community metagenomic dataset, revealing information about the low-abundance community members.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Analyses of DNA sequences from cultivated microorganisms have revealed genome-wide, taxa-specific nucleotide compositional characteristics, referred to as genome signatures. These signatures have far-reaching implications for understanding genome evolution and potential application in classification of metagenomic sequence fragments. However, little is known regarding the distribution of genome signatures in natural microbial communities or the extent to which environmental factors shape them.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We analyzed metagenomic sequence data from two acidophilic biofilm communities, including composite genomes reconstructed for nine archaea, three bacteria, and numerous associated viruses, as well as thousands of unassigned fragments from strain variants and low-abundance organisms. Genome signatures, in the form of tetranucleotide frequencies analyzed by emergent self-organizing maps, segregated sequences from all known populations sharing &lt; 50 to 60% average amino acid identity and revealed previously unknown genomic clusters corresponding to low-abundance organisms and a putative plasmid. Signatures were pervasive genome-wide. Clusters were resolved because intra-genome differences resulting from translational selection or protein adaptation to the intracellular (pH ~5) versus extracellular (pH ~1) environment were small relative to inter-genome differences. We found that these genome signatures stem from multiple influences but are primarily manifested through codon composition, which we propose is the result of genome-specific mutational biases.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>An important conclusion is that shared environmental pressures and interactions among coevolving organisms do not obscure genome signatures in acid mine drainage communities. Thus, genome signatures can be used to assign sequence fragments to populations, an essential prerequisite if metagenomics is to provide ecological and biochemical insights into the functioning of microbial communities.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010007">Ecology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The age of genomics has opened up new perspectives on the natural microbial world, offering insights into organisms that drive geochemical cycles and are critical to human and environmental health. The prevalence of horizontal gene transfer, recombination, and population-level genomic diversity underscores the dynamic nature of bacterial and archaeal genomes and demands reconsideration of fundamental issues such as microbial taxonomy <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp> and the concept of microbial species <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Application of genomics to uncultivated assemblages of microorganisms in natural environments ('metagenomics' or 'community genomics') has provided a new window into <it>in situ </it>microbial diversity and function <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. To date, community genomics has revealed the form and extent of recombination and heterogeneity in gene content <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, elucidated virus-host interactions <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, redefined the extent of genetic and biochemical diversity in the oceans <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>, uncovered new metabolic capabilities <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp> and taxonomic groups <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, and shown how functions are distributed across environmental gradients <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
         <p>An important approach to study evolutionary and ecological processes, pioneered by Karlin and others <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, is the analysis of nucleotide compositional characteristics of genomes. The simplest and most widely used measure of nucleotide composition, the abundance of guanine plus cytosine (%GC), is shaped by multiple factors encompassing both neutral and selective processes. Neutral factors include intrinsic properties of the replication, repair, and recombination machinery that result in mutational biases <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. Selective processes encompass both internal (for example, translation machinery) and external influences such as physical (temperature, pressure), chemical (salinity, pH) and ecological factors (competition for metabolic resources <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and niche complexity <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>). Although the relative importance of these factors remains uncertain <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, it is clear that %GC varies widely between species but is relatively constant within species. Thus, %GC has been used to trace origins of DNA fragments within genomes <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and to assign fragmentary metagenomic sequences to candidate organisms <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Such inferences must be made with caution: %GC simplifies nucleotide composition down to a single parameter with known limitations for investigating genome dynamics <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
         <p>Oligonucleotide frequencies capture species-specific characteristics of nucleotide composition more effectively than %GC <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Analyses of genome sequences from cultivated organisms have shown that the frequency at which oligonucleotides occur is unique between species while being conserved genome-wide within species <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. Taken together, the frequency of all oligonucleotides of a given length defines the 'genome signature' (for example, the frequency of all possible 256 tetranucleotides). Sequence signatures are evident in oligonucleotides ranging from di- (two-mers) to octanucleotides (eight-mers). While the specificity of genome signatures increases with oligonucleotide length <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, the number of possible oligomers increases exponentially with oligomer length, so signatures based on longer oligomers require calculations over larger genomic regions to achieve sufficient sampling. Genome signatures have been used to detect horizontally transferred DNA <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>, reconstruct phylogenetic relationships <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B32">32</abbr><abbr bid="B40">40</abbr></abbrgrp> and infer lifestyles of bacteriophage <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>.</p>
         <p>Genome signatures also offer a compelling means of assigning metagenomic sequence fragments to microbial taxa, a procedure termed 'binning' <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. This is a prerequisite for realizing some of the most valuable opportunities random shotgun metagenomics offers, including assignment of ecological and biogeochemical functions to particular community members and assessment of population-level genomic diversity and community structure. However, binning is a formidable challenge because: the inherent diversity of microbial communities typically limits genomic assembly, resulting in highly fragmentary data <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>; there are few universally conserved phylogenetically informative markers, leaving the vast majority of metagenomic sequence fragments 'anonymous' with regard to their organism of origin; and current sequence databases grossly under-represent the microbial diversity in the natural world, limiting the utility of fragment recruitment or BLAST-based methods <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. Consequently, it is important to develop methods that classify all genome sequence fragments independently of reference databases.</p>
         <p>Genome signatures are a promising approach for sequence classification. However, it is important to understand the source of the signal and how environmental effects and evolutionary distance will compromise it. To date, sequence signatures have been explored using genomes from cultivated microbes <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>, and prospects for binning have been evaluated based largely on simulated datasets consisting of mixtures of isolate genomes <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. Although these studies are indispensable in that they allow theoretical evaluation of binning capability, they do not represent the diversity (community-wide and within population) and dynamics (for example, horizontal gene transfer, recombination, viruses) of real microbial communities. Further, they employ genomes derived from disparate environments and so do not address the extent to which environmental factors shape genome signatures. It has been reported that environment shapes nucleotide composition <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>. If so, then genome signatures may not discriminate coexisting, coevolving organisms, especially where environmental pressures are extreme. On the other hand, binning results of real microbial communities <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B48">48</abbr><abbr bid="B52">52</abbr></abbrgrp> are inherently difficult to evaluate because the true identity of most sequence fragments is unknown. Thus, there remain fundamental questions regarding the forces and processes that give rise to and maintain genome signatures, and the extent to which these signatures are obscured by shared environmental pressures and community interactions such as horizontal gene transfer and broad host range viruses.</p>
         <p>Here we present a comprehensive analysis of genome signatures in sequences derived from natural biofilms inhabiting a subsurface chemolithoautotrophic acid mine drainage (AMD) ecosystem in the Richmond Mine at Iron Mountain, CA <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. The biofilms are dominated by just a handful of organisms that are sustained primarily by the oxidation of Fe(II) derived from pyrite (FeS<sub>2</sub>) dissolution <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Due to this relatively low diversity, modest levels of shotgun sequencing (approximately 100 Mb per sample) have yielded deep genomic sampling (10 to 20&#215; sequence coverage) of the dominant populations, enabling reconstruction of 12 near-complete genomes from three samples <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp> (BJ Baker <it>et al</it>., submitted). These assembled composite genomes provide the organism affiliation of sequences with which binning accuracy can be evaluated. Therefore, the dataset allows assessment of binning performance while capturing sequence heterogeneity that is an intrinsic feature of natural microbial populations. We find that AMD biofilm microorganisms are indeed distinguished by population-specific genome signatures and show that sequence signatures can be used to identify and cluster sequences from low-abundance community members <it>de novo</it>, without reference genomes or reliance on databases. Our results have implications for metagenomic binning and provide new insights into the sources of genome signatures that distinguish coexisting populations.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Description of samples, community genomic sequencing and assembly</p>
            </st>
            <p>An overview of our methodology is shown in Figure <figr fid="F1">1</figr>. Community genomic sequence was obtained from two previously described biofilm samples from the UBA location of the Richmond Mine at Iron Mountain: a pink subaerial biofilm collected in June 2005 ('UBA') <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> and a thicker floating biofilm collected in November 2005 ('UBA BS') <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. These two biofilms contained overlapping subsets of organisms in different proportions. The UBA biofilm was dominated by bacterial <it>Leptospirillum </it>spp. group II and group III (<it>Nitrospirae</it>) populations, for which near-complete genomes have been reconstructed <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>. The most abundant microorganisms represented in the UBA BS genomic data were from archaeal populations, including an uncultivated representative of a novel euryarchaeal lineage, ARMAN-2 <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, and A-plasma, E-plasma, and I-plasma, members of the order Thermoplasmatales. To facilitate reconstruction of genomes from these and other lower-abundance organisms, a combined assembly included unassigned sequences from UBA and all sequences from UBA BS. Random shotgun sequences derived from both ends of approximately 3-kb DNA fragments, and each fragment was likely sampled from a different individual cell with a potentially distinct genome sequence. Therefore, genome reconstructions represent composite sequences. However, single nucleotide polymorphism density was typically very low (&lt; 0.3%). For a small subset of the many cases where there were subpopulations with different gene content, alternative genome paths were also reconstructed <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B55">55</abbr></abbrgrp>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Overview of samples, data, and methods</p>
               </caption>
               <text>
                  <p>Overview of samples, data, and methods. MDA, Multiple Displacement Amplification. Lo <it>et al</it>. 2007 <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>; Tyson <it>et al</it>. 2004 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>; Allen <it>et al</it>. 2007 <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>; Edwards <it>et al</it>. 2000 <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>.</p>
               </text>
               <graphic file="gb-2009-10-8-r85-1"/>
            </fig>
            <p>From the combined dataset, near-complete genomes were reconstructed for ARMAN-2, I-plasma, E-plasma, G-plasma, and A-plasma (Table <tblr tid="T1">1</tblr>). In addition to sequences that were assigned to these deeply sampled genomes, 14,700 sequences remained unassigned to any organism, including 7,030 contigs longer than 1.4 kb and 3,631 contigs longer than 2.0 kb. A number of shallowly sampled 16S rRNA gene-containing sequence fragments were recovered, indicating substantial sampling of diverse lower-abundance community members (Figure <figr fid="F2">2</figr>).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Deeply sampled composite genomes from Iron Mountain community genomic datasets used in binning analysis</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Composite genome</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Sample(s)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sequence (Mb)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Coverage*</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>G+C content</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Reference</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>I-plasma<sup>&#8224;</sup></p>
                     </c>
                     <c ca="left">
                        <p>UBA, UBA BS</p>
                     </c>
                     <c ca="center">
                        <p>1.69</p>
                     </c>
                     <c ca="center">
                        <p>20&#215;</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>This study</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>E-plasma</p>
                     </c>
                     <c ca="left">
                        <p>UBA, UBA BS</p>
                     </c>
                     <c ca="center">
                        <p>1.58</p>
                     </c>
                     <c ca="center">
                        <p>9&#215;</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>This study</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A-plasma</p>
                     </c>
                     <c ca="left">
                        <p>UBA, UBA BS, UBA filtrate</p>
                     </c>
                     <c ca="center">
                        <p>1.94</p>
                     </c>
                     <c ca="center">
                        <p>8&#215;</p>
                     </c>
                     <c ca="center">
                        <p>46</p>
                     </c>
                     <c ca="center">
                        <p>This study</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>G-plasma</p>
                     </c>
                     <c ca="left">
                        <p>5-way, UBA</p>
                     </c>
                     <c ca="center">
                        <p>1.78</p>
                     </c>
                     <c ca="center">
                        <p>8&#215;</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>This study</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Leptospirillum </it>group II<sup>&#8224;</sup></p>
                     </c>
                     <c ca="left">
                        <p>UBA</p>
                     </c>
                     <c ca="center">
                        <p>2.64</p>
                     </c>
                     <c ca="center">
                        <p>25&#215;</p>
                     </c>
                     <c ca="center">
                        <p>55</p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B55">55</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Leptospirillum </it>group II<sup>&#8225;</sup></p>
                     </c>
                     <c ca="left">
                        <p>5-way</p>
                     </c>
                     <c ca="center">
                        <p>2.72</p>
                     </c>
                     <c ca="center">
                        <p>20&#215;</p>
                     </c>
                     <c ca="center">
                        <p>55</p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B9">9</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Leptospirillum </it>group III<sup>&#8224;</sup></p>
                     </c>
                     <c ca="left">
                        <p>UBA</p>
                     </c>
                     <c ca="center">
                        <p>2.82</p>
                     </c>
                     <c ca="center">
                        <p>10&#215;</p>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B56">56</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Ferroplasma acidarmanus </it>fer1<sup>&#8224;</sup></p>
                     </c>
                     <c ca="left">
                        <p>5-way</p>
                     </c>
                     <c ca="center">
                        <p>1.94</p>
                     </c>
                     <c ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B8">8</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Ferroplasma </it>fer1(env)</p>
                     </c>
                     <c ca="left">
                        <p>5-way</p>
                     </c>
                     <c ca="center">
                        <p>1.46</p>
                     </c>
                     <c ca="center">
                        <p>4.5&#215;</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B8">8</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Ferroplasma </it>fer2(env)</p>
                     </c>
                     <c ca="left">
                        <p>5-way</p>
                     </c>
                     <c ca="center">
                        <p>1.82</p>
                     </c>
                     <c ca="center">
                        <p>10&#215;</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B10">10</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ARMAN-2<sup>&#8224;</sup></p>
                     </c>
                     <c ca="left">
                        <p>UBA, UBA BS</p>
                     </c>
                     <c ca="center">
                        <p>1.0</p>
                     </c>
                     <c ca="center">
                        <p>15&#215;</p>
                     </c>
                     <c ca="center">
                        <p>47</p>
                     </c>
                     <c ca="center">
                        <p>Baker <it>et al</it>., submitted</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ARMAN-4</p>
                     </c>
                     <c ca="left">
                        <p>UBA filtrate</p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                     <c ca="center">
                        <p>8&#215;</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>Baker <it>et al</it>., submitted</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ARMAN-5</p>
                     </c>
                     <c ca="left">
                        <p>UBA filtrate</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>8&#215;</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>Baker <it>et al</it>., submitted</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Viral genomes</p>
                     </c>
                     <c ca="left">
                        <p>UBA, UBA BS</p>
                     </c>
                     <c ca="center">
                        <p>Variable</p>
                     </c>
                     <c ca="center">
                        <p>Variable</p>
                     </c>
                     <c ca="center">
                        <p>Variable</p>
                     </c>
                     <c ca="center">
                        <p>
                           <abbrgrp>
                              <abbr bid="B12">12</abbr>
                           </abbrgrp>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Estimated sequence coverage (read depth). <sup>&#8224;</sup>Genomes used for evaluation of binning performance on variable length fragments. <sup>&#8225;</sup>The <it>Leptospirillum </it>group II 5-way genome was included in some ESOM binning and was indistinguishable from the <it>Leptospirillum </it>group II UBA genome, but is not shown in Figure 2. NA, not applicable.</p>
               </tblfn>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Phylogenetic tree of 16S rRNA gene sequences from Iron Mountain community genome sequencing (red) and selected sequences from cultivated organisms</p>
               </caption>
               <text>
                  <p>Phylogenetic tree of 16S rRNA gene sequences from Iron Mountain community genome sequencing (red) and selected sequences from cultivated organisms. <it>Ferroplasma </it>types I/II are not shown due to their near-identical sequences to <it>F. acidarmanus</it>. Sequences for which only partial coverage of the 16S rRNA gene was obtained are not shown, including ARMAN-5, a gammaproteobacterium, additional Actinobacteria, and <it>Sulfobacillus</it>-like sequences.</p>
               </text>
               <graphic file="gb-2009-10-8-r85-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Clustering sequences by tetranucleotide frequency and emergent self-organizing map</p>
            </st>
            <p>We constructed a dataset that contained all sequences from the combined assembly (assigned and unassigned), previously assembled composite genome sequences, and the genome sequence from <it>Ferroplasma acidarmanus </it>fer1, which was cultivated from AMD solutions in the Richmond Mine <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B57">57</abbr></abbrgrp> (Figure <figr fid="F1">1</figr>, Table <tblr tid="T1">1</tblr>). To analyze the distribution of genome signatures among and between populations, all contigs and assembled genomes were fragmented into 5-kb pieces, then pooled and clustered by self-organizing map (SOM) <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> based on tetranucleotide frequency distributions (Figure <figr fid="F1">1</figr>; see Materials and methods for details). The SOM is an unsupervised neural network algorithm that clusters multidimensional data and represents it on a two-dimensional map. SOMs of tetranucleotide frequencies have been used previously to successfully bin sequence fragments from isolate genomes <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B59">59</abbr></abbrgrp> and some environmental samples <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B48">48</abbr><abbr bid="B52">52</abbr></abbrgrp>. We utilized an implementation of the SOM, emergent SOM (ESOM), which is distinguished by its use of large borderless maps (for example, thousands of neurons) and visualization of underlying distance structure with background topography <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. This visualization, where map 'elevation' represents the distance in tetranucleotide frequency between data points, is referred to as the U-Matrix <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. Thus, genomic clusters were visualized not only by the cohesive clustering of fragments from each genome, but also by distance structure whereby barriers between clusters represent the large differences in genome signatures between genomes relative to those within genomes (Figure <figr fid="F3">3</figr>). This visualization of genomic clustering was used to evaluate the accuracy of the binning based on assembled genomes and to identify novel regions of sequence signature space.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>ESOM of genomic sequence fragments based on tetranucleotide frequency (5-kb window size; all contigs > 2 kb were considered)</p>
               </caption>
               <text>
                  <p>ESOM of genomic sequence fragments based on tetranucleotide frequency (5-kb window size; all contigs > 2 kb were considered). Note that the map is continuous from top to bottom and side to side. <b>(a) </b>Each point represents a sequence fragment; sequences whose origin is known (from assembly information) are colored as indicated below. Unassigned sequences are shown in green. Regions are numbered as follows: (1) ARMAN-2, brown; (2) <it>Ferroplasma </it>(<it>F. acidarmanus </it>fer1, dark orange; fer1(env), orange; fer2(env), light orange); (3) I-plasma, purple; (4) <it>Leptospirillum </it>group II, light blue; (5) <it>Leptospirillum </it>group III, pink; (6) A-plasma, navy blue; (7) E-plasma, light purple; (8) G-plasma, turquoise; (9) ARMAN-4, black; (10) ARMAN-5, red. Regions 11 to 17 are novel genomic regions identified in this study: (11) putative <it>Leptospirillum </it>plasmid; (12) A-plasma variant and C-plasma; (13) D-plasma; (14) <it>Leptospirillum </it>group III variant; (15) an actinobacterium; (16) mixed Actinobacteria; (17) mixed low-abundance bacteria, including <it>Sulfobacillu</it>s spp., other <it>Firmicutes</it>, and a gammaproteobacterium. <b>(b) </b>Topography (U-Matrix) representing the structure of the underlying tetranucleotide frequency data from (a). 'Elevation' represents the difference in tetranucleotide frequency profile between nodes of the ESOM matrix (see legend); high 'elevations' (brown, white) indicate large differences in tetranucleotide frequency and thus represent natural divisions between taxonomic groups.</p>
               </text>
               <graphic file="gb-2009-10-8-r85-3"/>
            </fig>
            <p>Inspection of the clustering results in light of assembly information provided a broad measure of the ability of tetranucleotide frequency-based ESOM (tetra-ESOM) to resolve sequences from coexisting populations of the community. To quantify the degree of segregation of fragments from genomes at various evolutionary distances, we adapted a method using fixed point kernel densities (Figure <figr fid="F4">4</figr>; Additional data file 1). We found that sequence fragments from closely related strains or species could not be distinguished. For example, two strains of <it>F. acidarmanus </it>sharing 97% average nucleotide identity (fer1 and fer1(env) <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>) mapped directly on top of each other, as did two types of <it>Leptospirillum </it>group II, which share 95% average nucleotide identity <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> (only one type of Leptospirillum group II is shown in Figure <figr fid="F3">3</figr> for this reason; Figures <figr fid="F3">3</figr> and <figr fid="F4">4</figr>). Sequences from <it>Ferroplasma </it>types I and II, which share 83% average nucleotide identity and are known to participate in homologous recombination <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, were segregated to some extent by tetra-ESOM, but type II was split and there was no well-defined boundary between the two types. Good separation of <it>Leptospirillum </it>groups II and III was achieved, except for certain genomic regions containing mobile elements, as described further below. Among members of the Thermoplasmatales, populations were distinguished by genome signatures but borders were variably well-defined (Figure <figr fid="F3">3</figr>). In particular, G- and E-plasma were not well resolved. I-plasma, which is quite divergent from the other Thermoplasmatales (Figure <figr fid="F2">2</figr>), was the only member of the Thermoplasmatales for which a distance-based border was clearly delineated. Although genomes with similar %GC were generally more difficult to separate, several genomes with near-identical %GC were easily separated (for example, G-plasma versus <it>Ferroplasma</it>) (Figures <figr fid="F3">3</figr> and <figr fid="F4">4</figr>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Ability of tetra-ESOM to resolve AMD populations as a function of evolutionary distance (average amino acid identity) and %GC</p>
               </caption>
               <text>
                  <p>Ability of tetra-ESOM to resolve AMD populations as a function of evolutionary distance (average amino acid identity) and %GC. Black points represent comparisons between genomes with different %GC (> 2% different), red points are genome pairs with &lt; 2% different %GC. These data were collected using a 5-kb window size and 2-kb cutoff length.</p>
               </text>
               <graphic file="gb-2009-10-8-r85-4"/>
            </fig>
            <p>To quantitatively evaluate binning performance on sequence fragments of different lengths, tetra-SOMs were run on the same dataset (including unassigned sequences and reconstructed composite genomes) but with sequences broken into various fragment sizes. Binning accuracy was calculated for a subset of genomes for which deeply sampled and manually curated assemblies are available (Additional data file 2). For sequence fragments 5 kb or larger, sensitivity (percentage of fragments from each genome correctly identified) and precision (percentage of fragments in each bin belonging to the correct genome) rates of > 90% were achieved (Additional data file 2). Sensitivity was somewhat lower for <it>Leptospirillum </it>groups II and III due to poor resolution of certain genomic regions between these two populations. When <it>Leptospirillum </it>was considered as a single group, binning sensitivity was comparable to the other reference genomes. Sensitivity decreased notably only when shorter (&lt; 5 kb) sequence fragments were analyzed, but precision remained remarkably high even for 1,400-bp fragments (Additional data file 2). Lower sensitivity is due to sequence fragments that fall between clusters, beyond the borders of any bin. Notably, the tetra-ESOM correctly assigned sequence fragments as short as 500 bp, provided that some larger fragments were included in the analysis (Additional data file 2b). To address the question of how genome completeness influences performance, genomes randomly subsampled at different levels were analyzed by tetra-ESOM. Binning accuracy was maintained even at 20% genome sequence; only at 10% subsampling was a notable decline observed, and even then only for certain genomes (Additional data file 3).</p>
            <p>Incorrectly assigned fragments often contained mobile elements or other features expected to have atypical nucleotide composition. The majority (54 of 94) of incorrectly binned fragments from all five reference genomes show evidence of transposons, prophage, or integrated plasmids. Other frequently unresolved genomic regions contain CRISPR elements <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> and rRNA genes, both of which have constrained sequences and thus atypical tetranucleotide patterns <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. The region of the ESOM map containing a mixture of <it>Leptospirillum </it>groups II and III (Figure <figr fid="F3">3</figr>) was dominated by fragments (80 of 92) encoding mobile elements that may be exchangeable between the two <it>Leptospirillum </it>groups (for example, integrated plasmid-like sequence <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>) and strain/group-unique regions believed to have been recently acquired (for example, prophage).</p>
            <p>Interestingly, many strain-unique regions were correctly binned with their host genomes. There are 197 strain-unique genes between the fer1 and fer1(env) genomes, the majority of which occur in distinct genomic blocks of up to 24 genes with atypical %GC content inferred to be the result of prophage insertion <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Ninety-six percent (22 of 23) of sequence fragments containing these genomic islands were accurately assigned as <it>Ferroplasma </it>in our binning analysis.</p>
         </sec>
         <sec>
            <st>
               <p>Genome signatures of low-abundance community members and viruses</p>
            </st>
            <p>The tetra-ESOM revealed large regions of the map that were devoid of sequence fragments of known organism affiliation (Figure <figr fid="F3">3</figr>, regions 11 to 17). We used mate pair linkage with rRNA gene-containing contigs, phylogenetic analysis, and/or close relatedness (synteny and identity) to other community members to identify these bins as follows: a new type of <it>Leptospirillum </it>most closely related to <it>Leptospirillum ferrodiazotrophum </it>(group III); several members of the Thermoplasmatales for which genomic sequence had not been previously obtained (C-plasma, D-plasma, and a divergent type of A-plasma); several Actinobacteria; and multiple more shallowly sampled populations, including a gammaproteobacterium and several <it>Sulfobacillus</it>-like organisms (Figures <figr fid="F2">2</figr> and <figr fid="F3">3</figr>). A small, prominent region of the map adjacent to the <it>Leptospirillum </it>groups contained approximately 250 kb of composite sequence (Figure <figr fid="F3">3</figr>, region 11) inferred to be a <it>Leptospirillum </it>plasmid <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. Tetranucleotide usage patterns of this putative plasmid are quite distinct from those of either <it>Leptospirillum </it>groups (Additional data file 4).</p>
            <p>We calculated tetranucleotide frequencies for viral genomes that were recently reconstructed from the same genomic datasets and linked to their hosts via CRISPR viral resistance system sequences (Additional data file 4) <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Three of the viruses closely resemble their hosts' tetranucleotide usage (AMDV1, <it>Leptospirillum </it>groups II and III; AMDV4, E-plasma; AMDV3, A-/E-/G-plasma), a trend that has been observed previously for cultivated viruses and hosts <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B63">63</abbr></abbrgrp>. Interestingly, two viruses have very different tetranucleotide frequency patterns (AMDV2, E-plasma; AMDV5, I-plasma; Additional data file 4).</p>
         </sec>
         <sec>
            <st>
               <p>Characteristics of genome signatures</p>
            </st>
            <p>As expected, the frequency at which each tetranucleotide occurs is related to overall %GC: GC-rich tetranucleotides are abundant in high-GC genomes and uncommon in low-GC genomes. However, patterns of tetranucleotide usage extend beyond trends in %GC (Additional data file 4) and genomes with near-identical %GC were effectively segregated by tetra-SOM. Because tetranucleotide frequencies are calculated with a 1-bp sliding window and reverse complementary pairs of tetranucleotides are summed together, all possible reading frames on both strands are sampled. In addition to spanning complete single codons, adjacent pairs of partial codons are also sampled (Figure <figr fid="F5">5</figr>). Therefore, tetranucleotide frequency captures amino acid composition and synonymous codon usage, as well as information regarding avoidance of certain adjacent codons ('codon pair bias' <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Schematic of how tetranucleotide frequency relates to reading frame and potential codons</p>
               </caption>
               <text>
                  <p>Schematic of how tetranucleotide frequency relates to reading frame and potential codons. <b>(a) </b>Tetranucleotide frequencies are calculated independently of reading frame with a 1-bp sliding window; thus, they may sample a complete codon or span two partial codons. <b>(b) </b>Because reverse complementary pairs are summed together, both strands are sampled. Therefore, depending on the coding strand and reading frame, there are 12 potential codons sampled by each tetranucleotide.</p>
               </text>
               <graphic file="gb-2009-10-8-r85-5"/>
            </fig>
            <p>To assess the contributions of these potential sources of genome signature signal, we compared SOMs based on amino acid composition, codon composition, and tetranucleotide frequency. Amino acid composition alone distinguished certain genomes (Additional data file 5). This was especially true for phylogenetically distant organisms (for example, archaea versus bacteria), but some separation was also apparent among groups within some lineages such as <it>Ferroplasma </it>versus other Thermoplasmatales. SOMs based on codon composition were notably more accurate than amino acid composition and comparable to those based on tetranucleotide frequency (Additional data file 5).</p>
            <p>Additional features of the relationship between codon composition and tetranucleotide frequency were revealed by comparing the observed frequency of tetranucleotides to the frequency predicted from genome-wide codon usage (see Materials and methods). Observed and predicted tetranucleotide frequency correlated strongly (Figure <figr fid="F6">6</figr>), and differences in the frequencies of individual tetranucleotides between genomes are correlated with differences in corresponding codon usage between genomes (Additional data file 6). Exceptions to this trend are primarily palindromic tetranucleotides that occur less frequently than predicted (Figure <figr fid="F6">6b</figr>). Five of the 16 possible palindromic tetranucleotides are most strongly and consistently underrepresented: AATT, ATAT, TATA, GATC, and GGCC. The extent to which palindromic tetranucleotides are avoided in both viral and microbial genomes varies significantly and thus could be a factor in defining genome signatures (Additional data file 4). To test this possibility, we visualized the SOM distance structure for only one tetranucleotide at a time and found that certain palindromic tetranucleotides (GATC, TATA, ATAT) are particularly informative in distinguishing members of the Thermoplasmatales that share near-identical %GC (<it>Ferroplasma </it>types I and II, G-plasma, E-plasma). However, SOMs run excluding all 16 palindromic tetranucleotides distinguished populations with accuracy comparable to that achieved using all tetranucleotides, indicating that palindrome avoidance is not a primary component of the genome signature.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Tetranucleotide frequency predicted by codon abundance (a weighted average of the frequencies of the 12 potential codons associated with each tetranucleotide) versus observed tetranucleotide frequency</p>
               </caption>
               <text>
                  <p>Tetranucleotide frequency predicted by codon abundance (a weighted average of the frequencies of the 12 potential codons associated with each tetranucleotide) versus observed tetranucleotide frequency. <b>(a) </b>Color indicates the genome of origin (using the same color scheme as Figure 3). <b>(b) </b>Palindromic nucleotides are indicated in red. R<sup>2 </sup>indicates the square of the Pearson correlation coefficient.</p>
               </text>
               <graphic file="gb-2009-10-8-r85-6"/>
            </fig>
            <p>The correlation of genome signatures with codon usage raises the question of whether they persist in intergenic regions. Thus, we extracted intergenic regions from assembled and annotated genomes and analyzed them with coding regions by tetra-ESOM (intergenic regions were concatenated to tally tetranucleotide frequencies but care was taken to avoid artifacts; see Materials and methods). Intergenic regions from each genome formed discrete, cohesive clusters that mapped adjacent to coding regions from the same genome but were separated by U-Matrix boundaries (Additional data file 7). Intergenic sequences from each genome were grouped based on length, concatenated, and analyzed by ESOM; all size classes of intergenic regions from the same genome clustered together regardless of length, from the shortest (4 to 20 bp) to longest (> 1,000 bp) (data not shown). The noncoding complement of each Thermoplasmatales genome formed a distinct cluster adjacent to noncoding regions of the other Thermoplasmatales. The only outlier to this trend was A-plasma, which has the highest %GC among these organisms. Based on U-Matrix background, the distance between noncoding sequences of different genomes is comparable to the distance between noncoding and coding sequences of the same genome. To determine if the presence of noncoding sequence influences binning accuracy in the initial experiments, we calculated the percentage of coding sequence on incorrectly binned fragments from the five reference genomes (5 kb and 1 kb window sizes). For many genomes, the incorrectly binned fragments do indeed have a smaller average percentage of coding sequence. However, this percentage varied widely on incorrectly binned fragments. Only a small fraction of such fragments had a percentage of coding sequence smaller than one standard deviation below the genome-wide average (Additional data file 8).</p>
            <p>For sequence signatures to differentiate populations in a genome-wide manner, it is necessary that within-genome differences resulting from atypical regions of amino acid and/or synonymous codon usage are smaller than between-genome differences. This issue is especially relevant in AMD, where proteins are under diverse constraints depending on whether they function in the extracellular (around pH 1) or intracellular (around pH 5) environment <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>. Indeed, proteins from the AMD populations in these two fractions have disparate isoelectric points owing to the unique amino acid composition of acid-stable proteins <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. We identified 106 <it>Leptospirillum </it>group II-UBA proteins that are consistently enriched in the extracellular fraction according to environmental shotgun proteomics data <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B66">66</abbr></abbrgrp> and compared sequence signatures of their genes with the other 2,522 <it>Leptospirillum </it>group II genes. No systematic differences were detected via tetra-ESOM, suggesting that genome signatures persist even when gene sequences are influenced by considerable protein-coding constraints (Additional data file 9).</p>
            <p>Selection for codons that optimize translation rate may also influence codon usage. We analyzed genome signatures for the 50 <it>Leptospirillum </it>group II proteins most abundantly detected via environmental shotgun proteomics <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B66">66</abbr></abbrgrp>. With the exception of one subset of genes encoding mainly ribosomal proteins (which mapped into the mixed region between <it>Leptospirillum </it>groups II and III), highly expressed genes clustered with the rest of the genome (Additional data file 9).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Through analysis of a deeply sampled and extensively curated community genomic dataset, we have demonstrated that genome signatures can be used to differentiate coexisting microbial populations despite functional and environmental constraints, processes such as lateral gene transfer, and pressures imposed by viral predation that might have diminished them to the point that they are no longer diagnostic. The genome-wide nature of the signatures makes them potentially useful for classification of sequence fragments. Results from our AMD dataset show that the signal can be detected on fragments as small as 500 bp, genome clusters can be defined using fragments as short as 1,400 bp (Additional data file 2) and a small fraction of the genome (Additional data file 3). These findings suggest broad applicability of the tetra-ESOM approach for metagenomic studies. However, in order to understand and predict its utility for binning, it is important to identify sources of genome signatures as well as processes that are likely to diminish the signal.</p>
         <sec>
            <st>
               <p>Insights into the sources of distinctive genome signatures</p>
            </st>
            <p>It has been suggested that environmental constraints strongly shape nucleotide composition <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>. If this were the case, two effects should be apparent in genome signatures of AMD populations. First, shared pressures deriving from the extreme AMD environment would drive genome signatures together, potentially obscuring differences between populations. Second, since each genome encodes proteins destined for diverse environments (that is, intracellular and extracellular), there should be prominent intra-genome variation of genome signature and scattering of fragments from the same genome into disparate regions of the SOM. Neither of these expectations is met in the AMD dataset. There are vast differences in nucleotide composition between populations, with genomic %GC ranging from 35% (ARMAN-4 and ARMAN-5) to 69% (low-abundance Actinobacteria) and genome signatures forming discrete clusters. Amino acid compositional constraints required for stability of proteins exposed to acidic solutions do not result in sequence signatures that are markedly distinct from the rest of the genome. In other words, within-population differences in genome signature are small relative to differences between populations. Although we do not rule out some environmental influence on genome signatures, we conclude that, in AMD, this influence is not strong enough to obscure differences between populations. Similar community-wide analyses need to be conducted in other systems to determine whether our findings extend to other extremophilic microbial communities.</p>
            <p>Our results show that genome signatures are related to several traits, including %GC, amino acid composition, synonymous codon usage, and palindrome avoidance. These characteristics are interrelated and further connected to a host of biochemical, ecological, and evolutionary processes (Additional data file 10). Large differences in %GC and/or amino acid composition guarantee distinctive genome signatures but are not required to differentiate genomes. At finer evolutionary scales, where %GC and amino acid composition are not informative, populations can be readily distinguished through subtle differences in tetranucleotide frequency, which correlate with genome-specific synonymous codon usage. Tetra-ESOM analyses based on codon usage and tetranucleotide frequency displayed similar clustering resolution, indicating that little signal derives from longer-range characteristics such as codon pair bias. It should be noted, however, that using tetranucleotide frequency rather than codon composition has practical advantages for binning because it is independent of coding strand and reading frame and thus insensitive to errors in gene-calling or frame shifts due to poor quality sequence. These issues are particularly important for short, low-coverage sequence fragments.</p>
            <p>Although genome signatures are largely manifested through codon composition, the observation that population-specific signatures also occur in non-coding regions (Additional data file 7) suggests a mechanism of generation that is independent of protein coding. We hypothesize this underlying process is mutational bias associated with DNA replication and repair, which exerts directional pressure on nucleotide composition <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. In fact, between-genome codon biases can be predicted solely by %GC and context-dependent nucleotide biases (that is, mutation rates at each site are dependent on the identity of neighboring nucleotides) calculated from non-coding regions <abbrgrp><abbr bid="B67">67</abbr><abbr bid="B68">68</abbr></abbrgrp>. It is interesting to note that non-coding regions mapped into discrete clusters, distinct from coding regions of the same genome or non-coding regions of different genomes, including those with identical %GC. Differences in genome signature of coding and non-coding sequences from the same genome are to be expected based on differing functional constraints on these regions (for example, coding amino acids versus small RNAs or regulatory elements such as promoters). The distinction of non-coding regions from different genomes is consistent with genome-specific mutational biases.</p>
            <p>An alternative to the mutation bias hypothesis, at least for coding sequences, is that genome signatures are shaped by factors related to translation. Changes in codon usage can be driven by changes in the tRNA gene complement <abbrgrp><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr></abbrgrp> that may occur, for example, through interaction with plasmids and viruses <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>. However, we found AMD genomes with distinct genome signatures, such as G-plasma, E-plasma, and <it>Ferroplasma</it>, that have only minor differences in tRNA gene content, and these differences do not correspond to observed differences in codon usage. In addition to tRNA gene complement, there may be changes in tRNA gene regulation, which can significantly impact cellular tRNA concentrations and have been correlated with changes in codon usage <abbrgrp><abbr bid="B72">72</abbr></abbrgrp>. Thus, although we cannot rule out a tRNA regulatory influence on genome signatures, our findings suggest that coevolution of tRNA gene content and codon usage is not a primary mechanism underlying the divergence of genome signatures in related AMD populations.</p>
            <p>Codon bias can also arise as the result of selection for certain codons that are optimal for fast and/or accurate translation <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. This form of codon bias primarily influences the subset of genes encoding highly expressed proteins, is prevalent for fast-growing organisms <abbrgrp><abbr bid="B69">69</abbr><abbr bid="B74">74</abbr></abbrgrp>, and correlates with ecological strategy <abbrgrp><abbr bid="B75">75</abbr></abbrgrp>. In fact, a <it>Leptospirillum </it>group II genome fragment encoding nine ribosomal proteins and two translation elongation factors had distinctive tetranucleotide composition, indicating that this mode of codon bias occurs in AMD organisms. However, as commonly construed, translational selection would influence within-genome codon bias, not the genome-wide codon biases that differentiate populations as observed in our study. It is tempting to speculate that differences in ecological strategy (for example, response rate to resource availability <abbrgrp><abbr bid="B76">76</abbr></abbrgrp>) could have genome-wide influence on codon usage, but there is currently no evidence in our dataset to suggest that this is the case.</p>
            <p>Finally, restriction avoidance places another selective genome-wide constraint on DNA composition that may contribute to genome signatures. Under-representation of palindromic tetranucleotides (Figure <figr fid="F6">6</figr>) has been attributed to avoidance of enzymes designed to recognize and degrade foreign DNA <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B32">32</abbr><abbr bid="B46">46</abbr></abbrgrp>. Our data show that palindrome avoidance contributes to the genome signature but is not the sole or even primary determinant. Most archaeal viruses and bacteriophage have sequence signatures that resemble their hosts, including avoidance of specific subsets of palindromes. However, mismatches between the tetranucleotide signatures of AMDV2 and AMDV5 and their respective hosts point to the lesser importance of palindrome avoidance in these organisms. In the case of AMDV5, other evidence suggests a recent alteration in host range <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. It is interesting to note that the genomes of archaeal AMD viruses encode several restriction modification (RM) system genes. These may have significance for virus host-interactions <abbrgrp><abbr bid="B77">77</abbr></abbrgrp> and for influencing genome signatures. Broad host range viruses or viruses that jump to new hosts can potentially drive changes in the host sequence signatures if they replace or supplement the restriction systems of the host. Alternatively, the degree of similarity in tetranucleotide signatures of viruses and their hosts may be a function of the extent to which the virus relies upon its host's replication and translation machinery (for example, associated with a lysogenic versus lytic lifestyle) <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B63">63</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Implications for metagenomic, ecological, and evolutionary studies</p>
            </st>
            <p>Due to the high levels of diversity in most natural systems, random sequencing approaches yield fragmentary data, often comprising genomic sequences no more than a few kilobases in length. While more comprehensive coverage of individual organisms can be achieved by single cell genomics <abbrgrp><abbr bid="B78">78</abbr><abbr bid="B79">79</abbr><abbr bid="B80">80</abbr></abbrgrp> or targeted, large-insert approaches <abbrgrp><abbr bid="B81">81</abbr><abbr bid="B82">82</abbr></abbrgrp>, random shotgun approaches retain two important advantages: the random nature provides insights that are unbiased by preconceived notions of community composition; and population-level variation is captured because each sequencing read derives from a different individual cell.</p>
            <p>A key challenge for virtually all shotgun metagenomics investigations is the assignment of genome fragments to the organism they derive from. This step links organism to metabolism and function and is essential if we are to understand microbial community dynamics and predict ecosystem level impacts of changes in community membership and structure. Binning is particularly challenging for lower-abundance organisms, which may play keystone roles that are critical to ecosystem function. Thus, our finding that tetra-ESOM can resolve the phylogenetic affiliation of genome fragments on the scale of two mate-paired reads is of great significance. This approach has clear applicability to low-complexity datasets such as those derived from our AMD biofilms, bioreactors <abbrgrp><abbr bid="B83">83</abbr></abbrgrp>, and enrichment cultures <abbrgrp><abbr bid="B84">84</abbr></abbrgrp>. In fact, even for the relatively extensively analyzed AMD dataset, it revealed multiple new genomic clusters, including a near complete genome of a novel actinobacterium (GJ Dick <it>et al</it>., in preparation), a putative plasmid, and many discrete but less well-sampled populations.</p>
            <p>Tetra-ESOM may also provide a powerful method for analysis of unassembled data from complex samples such as soil, seawater, and the human microbiome if representative isolate genomes are available. The feasibility of binning metagenomic sequences from complex samples using reference genomes will increase with current initiatives to fill in the phylogenetic tree with genome sequences from cultivated microorganisms.</p>
            <p>An important advantage of unsupervised, compositional-based approaches such as tetra-ESOM is that gene sequences need not be represented in databases to be identified; only representation of the genome signature is required. This is in contrast to fragment recruitment <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and BLAST-based binning approaches that only work for homologous sequences. We found that clusters of a few hundred kilobases of sequence (as little as 20% of the genome) were resolved, suggesting that a few fosmids or bacterial artificial chromosomes linked to 16S rRNA genes can be sufficient to serve as a reference to define a bin. Thus, recent progress in using large-insert metagenomic libraries to link 16S rRNA genes to genomic sequence from diverse uncultivated microorganisms is very valuable in this regard <abbrgrp><abbr bid="B85">85</abbr></abbrgrp>.</p>
            <p>Because the reach of composition-based approaches to binning extends beyond gene content of reference genomes, they hold great promise for identifying and classifying genes from the variable fraction of the pan-genome (present in only a subset of strains or species), an important determinant of pathogenicity and niche differentiation <abbrgrp><abbr bid="B86">86</abbr><abbr bid="B87">87</abbr><abbr bid="B88">88</abbr></abbrgrp>. In AMD populations, genome reconstruction has shown that this strain-variable fraction often involves inserted plasmid and virus sequences <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. In the current study, these integrated elements clustered either with the host genome or in regions shared between different species or genera. Since horizontally transferred DNA is rapidly converted to the genome signature of its new host <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B28">28</abbr><abbr bid="B89">89</abbr></abbrgrp>, the extent to which such genomic regions reflect the genome-wide signature of nucleotide composition is likely a function of the donor of the genetic material and how recently they were acquired. Recently acquired sequences with distinctive tetranucleotide patterns may bin incorrectly, and unexpected binning outcomes can be used to identify laterally transferred regions <abbrgrp><abbr bid="B62">62</abbr><abbr bid="B90">90</abbr></abbrgrp>.</p>
            <p>Although the tetra-ESOM method works well to separate sequence fragments from organisms distinct at the genus or higher level, it has some limitations. Tetra-ESOM is generally unable to distinguish closely related species or strains. An important question, especially for more diverse samples, is whether limitations in genome sequence signature space will impose an inherent constraint on the number of populations that can be resolved. There are a staggering 6 &#215; 10<sup>222 </sup>ways to code for a typical protein in our samples (based on an average protein size of 467 amino acids and assuming an average of 3 possible ways to code for any amino acid). This richness of protein coding space suggests ample capacity for numerous genome signatures. To date, SOMs have shown promising results in resolving up to 81 complete genomes, in successfully classifying fragments of 1,502 genomes into phylogenetic groups, and in visualizing phylogenetic clustering of sequences in complex environmental samples <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. However, it remains difficult to assess the accuracy and phylogenetic resolution of oligonucleotide-based SOMs on metagenomic datasets from diverse natural microbial communities. Another concern is computational demand. Continued increases in processor speeds will likely need to be supplemented with more efficient and/or accurate algorithms such as the recently introduced hyperbolic SOM <abbrgrp><abbr bid="B91">91</abbr></abbrgrp> and growing SOM <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>Bacterial, archaeal, and viral populations in the AMD biofilm community have genome-wide signatures of nucleotide composition that are effectively captured and visualized through self-organizing maps of tetranucleotide frequency. We conclude that even under extremely acidic conditions, shared environmental pressure does not obscure genome signatures of nucleotide composition. Our data point to pervasive mechanisms of generating and maintaining genome signatures; although a variety of factors and processes contribute, we propose that mutational bias is the primary underlying mechanism driving the divergence of genome signature between closely related organisms. The resulting signal, evident through synonymous codon usage, is genome-wide and sufficiently diagnostic to classify fragmentary metagenomic data from coexisting populations of a natural microbial community at approximately the genus level. However, distinguishing features of genome signatures may be subtle, being masked by within-genome heterogeneity and the multidimensional nature of tetranucleotide frequency patterns. Tetra-ESOM is a key method for visualizing and exposing these potentially weak signals. Being unsupervised, it requires no database representation of the organisms present. Visualization of the data structure highlights differences between populations and reveals atypical regions corresponding to biologically meaningful genomic features such as mobile elements or previously unrecognized genotypes present at low abundance in the community. When employed in conjunction with complementary methods such as genomic assembly and analysis of phylogenetic marker genes, genome signatures offer powerful perspectives on metagenomic data.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Sample collection, construction of genomic libraries, sequencing, and community genomic assembly</p>
            </st>
            <p>An overview of the samples and methodology used in this study is provided in Figure <figr fid="F1">1</figr>. Sample collection, DNA extraction, random fragmentation and cloning of approximately 3-kb fragments, Sanger sequencing, assembly, and curation of community genomics data were performed using phred/phrap/consed package as detailed previously <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B55">55</abbr></abbrgrp>. The combined UBAs nonLeptos dataset was constructed by assembling sequencing reads derived from both the UBA BS and UBA biofilm samples (with UBA reads previously assigned to <it>Leptospirillum </it>spp. removed). This included 229,082 reads and approximately 210 Mb of total sequence, which assembled into 15,929 contigs and 36.6 Mb of composite sequence.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic analysis</p>
            </st>
            <p>The phylogenetic tree of 16S rRNA genes was constructed by neighbor joining (default parameters) with the ARB software package <abbrgrp><abbr bid="B92">92</abbr></abbrgrp> and 'SILVA SSU ref' database <abbrgrp><abbr bid="B93">93</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Calculation of tetranucleotide frequencies and clustering by ESOM</p>
            </st>
            <p>Tetranucleotide frequencies were determined for each assembled contig using a custom Perl script. Frequencies were calculated with a 1-bp sliding window and pairs of reverse complementary tetranucleotides were summed in order to avoid strand bias. Longer contigs and assembled genomes were split into 5-kb windows and only contigs longer than 2 kb were considered unless noted otherwise. To assess binning accuracy, data points (representing contigs/windows) are colored according to their genome of origin (when known), but this information is not available to the clustering process.</p>
            <p>Contigs were clustered by tetranucleotide frequency utilizing Databionics ESOM Tools <abbrgrp><abbr bid="B94">94</abbr></abbrgrp>. The input for tetra-ESOM was a 136-dimensional vector (representing the frequencies of the 136 unique reverse complement tetranucleotide pairs, normalized for contig length) for each contig/window. These raw frequencies were transformed with the 'Robust ZT' option built into Databionics ESOM Tools, which normalizes the data using robust estimates of mean and variance. Data were permuted before each run to avoid errors due to sampling order. Maps were toroidal (borderless) with Euclidean grid distance and dimensions scaled from the default map size (50 &#215; 82) as a function of the number of data points, to a ratio of approximately 5.5 map nodes per data point. For example, a typical clustering with approximately 7,500 data points was run on map with dimensions 155 &#215; 255. Training was conducted with the K-Batch algorithm (k = 0.15%) for 20 training epochs. The standard best match search method was used with local best match search radius of 8. Other training parameters were as follows: Gaussian weight initialization method; Euclidean data space function; starting value for training radius of 50 with linear cooling to 1; starting value for learning rate of 0.5 with linear cooling to 0.1; Gaussian kernel function.</p>
         </sec>
         <sec>
            <st>
               <p>Clustering resolution versus evolutionary distance</p>
            </st>
            <p>To quantify the degree of clustering between closely related genomes, we analyzed SOM maps using fixed point kernel densities <abbrgrp><abbr bid="B95">95</abbr></abbrgrp>. Spatial data from the SOM was imported into ArcGIS (ESRI Software) and clusters were defined using Hawth's Analysis Tools for ArcGIS <abbrgrp><abbr bid="B96">96</abbr></abbrgrp>. Cluster boundaries were determined using density estimators that captured 90% of data points from each genome (Additional data file 1). We then calculated separation between genomes as a percentage (Non-overlapping points/Total number of points) for two bins being compared. Average amino acid identity was calculated as described previously <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Predicted tetranucleotide frequency</p>
            </st>
            <p>The predicted frequency of each unique pair of reverse complementary tetranucleotides was calculated based on genome-wide frequencies of potentially contributing codons. As shown in Figure <figr fid="F5">5</figr>, for any given tetranucleotide there are 12 potentially associated codons depending on coding strand and reading frame. Four codons (numbers 3, 4, 9, and 10 in Figure <figr fid="F5">5</figr>) are fully captured by the tetranucleotide, four are partially captured at two of three positions (numbers 2, 5, 8, and 11), and four are partially captured at one of three positions (codons 1, 6, 7, and 12). Each of these three classes is weighted according to their contribution: 1, 2/3, and 1/3 respectively. For partially captured codons, contributions of all possibilities were taken into account; for example, in Figure <figr fid="F5">5</figr>, codon number 5 (TGX) there are four possible codons - TGA, TGT, TGC, and TGG.</p>
         </sec>
         <sec>
            <st>
               <p>Binning performance on variable length sequence fragments and subsampled genomes</p>
            </st>
            <p>Sensitivity (percentage of fragments from each genome correctly identified) and precision (percentage of fragments in each bin belonging to the correct genome) of binning were calculated for a subset of assembled genomes that are deeply sampled and manually curated (Table <tblr tid="T1">1</tblr>; Additional data file 2). Fragment size was varied in two ways: all contigs were broken into a given size (2, 4, 6, or 10 kb); or 10% of each genome was randomly selected and fragmented (0.5, 1.0, 1.5, or 2.0 kb) while the remaining fraction of the genome was fragmented into 5-kb windows (Additional data file 2). Bin territories were defined manually, using boundaries apparent via distance-based background topology (U-Matrix) as guidelines. It is important to note this method allows data points between bins or near borders to remain unclassified. Analysis of subsampled genomes was conducted with assembled genomes only - unassigned fragments were excluded to prevent them from contributing to definition of bins. Genomes were fragmented into 5-kb sequences, which were then randomly selected to obtain the indicated percentage of the genome.</p>
         </sec>
         <sec>
            <st>
               <p>Sequence signatures in coding versus non-coding regions</p>
            </st>
            <p>Intergenic regions were extracted and concatenated, with 'N's inserted between regions to avoid generation of erroneous tetranucleotides. Intergenic regions were grouped by size (in 20-bp bins) to monitor variance in sequence signatures from intergenic regions of differing lengths. All coding sequences were similarly concatenated with interleaving 'N's. Concatenated coding and non-coding regions were then broken into 5-kb windows and run against the same background dataset of assembled genomes and unassigned sequences as usual.</p>
         </sec>
         <sec>
            <st>
               <p>Sequence signatures in extracellular and highly expressed protein-coding genes</p>
            </st>
            <p>Shotgun proteomics data were obtained for <it>Leptospirillum </it>group II extracellular and whole cell fractions from the ABend, ABfront, and UBA locations of the Richmond mine <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B66">66</abbr></abbrgrp>. Proteins were defined as enriched in the extracellular fraction if, in at least two of the three samples, they were only detected in the extracellular fraction, or the ratio of spectral counts from extracellular to intracellular fraction was > 2. The 50 most abundantly expressed proteins were identified on the basis of tandem mass spectrometry (MS/MS) spectral counts. ESOM analysis of genes encoding extracellular and highly expressed proteins were both conducted as described above; open reading frames were concatenated, interleaved with 'N's, then split into 5-kb windows and analyzed along with the full dataset.</p>
         </sec>
         <sec>
            <st>
               <p>Nucleotide sequence accession numbers</p>
            </st>
            <p>This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the project accessions ACXJ00000000 (unassigned contigs), ACXK00000000 (A-plasma), ACXL00000000 (E-plasma), ACXM00000000, (I-plasma), and ACVJ00000000 (ARMAN-2, described in detail in BJ Baker et al., in preparation). The versions described in this paper are the first versions, ACXJ01000000, ACXK01000000, ACXL01000000, ACXM01000000, and ACVJ01000000.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>AMD: acid mine drainage; ESOM: emergent self-organizing map; %GC: percentage content of guanine plus cytosine; SOM: self-organizing map.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>GJD, AFA, SLS, and JFB conceived and designed the experiments. GJD, BJB, SLS, APY, and BCT performed the experiments. GJD, AFA, SLS, BCT, BJB, APY, and JFB analyzed the data. GJD and JFB wrote the paper.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper: a figure showing automated clustering of tetra-ESOM data using fixed point kernel densities (Additional data file <supplr sid="S1">1</supplr>); an evaluation of binning accuracy based on deeply sampled metagenomes for which contigs are assigned to genomes with a high degree of confidence (Additional data file <supplr sid="S2">2</supplr>); binning accuracy calculated for genomes that were sampled to varying extents of completeness (10 to 100%) (Additional data file <supplr sid="S3">3</supplr>); a heat map of average genome-wide frequency of each tetranucleotide for each genome, including bacteria, archaea, viruses, and a putative plasmid (Additional data file <supplr sid="S4">4</supplr>); comparison of tetra-ESOMs of assembled genomes based on amino acid composition, codon composition, and tetranucleotide frequency (Additional data file <supplr sid="S5">5</supplr>); a figure showing that the observed difference in frequency of each tetranucleotide between pairs of genomes correlates with the difference predicted based on codon composition (Additional data file <supplr sid="S6">6</supplr>); a figure showing tetra-ESOM of deeply sampled genomes for which coding and noncoding regions were separated (Additional data file <supplr sid="S7">7</supplr>); a figure showing for incorrectly binned fragments the percentage of sequence coding for genes in comparison with the genome-wide coding percentage (Additional data file <supplr sid="S8">8</supplr>); a figure showing tetra-ESOM of <it>Leptospirillum </it>group II genes coding for highly expressed proteins or proteins enriched in the extracellular fraction analyzed as separate fractions from the rest of the genome (Additional data file <supplr sid="S9">9</supplr>); a schematic of processes and factors influencing genome signature (Additional data file <supplr sid="S10">10</supplr>).</p>
         <suppl id="S1">
            <title>
               <p>Additional File 1</p>
            </title>
            <caption>
               <p>Automated clustering of tetra-ESOM data using fixed point kernel densities. Shown is the ESOM map presented in Fig. <figr fid="F3">3</figr>, without the U-matrix background displayed. Data points of known identity (from assembly information) are enlarged and colored; others are unassigned. Contour lines are delineated so that 90% of the data points for each genome are included within the bin boundaries.</p>
            </caption>
            <text>
               <p>Automated clustering of tetra-ESOM data using fixed point kernel densities.</p>
            </text>
            <file name="gb-2009-10-8-r85-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional File 2</p>
            </title>
            <caption>
               <p>Evaluation of binning accuracy based on deeply sampled metagenomes for which contigs are assigned to genomes with a high degree of confidence. Bins were defined manually using background topography (the U-matrix, which is based on distance structure) as a guide. Sensitivity is the percentage of sequence fragments from each reference genome that were correctly identified; precision is the percentage of sequence fragments in each bin from the correct reference genome (ignoring unassigned fragments). (A) Accuracy of binning when all sequences are fragmented into the window sizes indicated, with the minimum fragment length considered being equal to the window size. (B) Accuracy of binning for variable fragment lengths when larger fragments are present to define the signature; 10% of each genome was randomly sampled and fragmented into the indicated size while the remaining 90% was broken into 5 kb fragments. Note that the accuracy reported is only for the smaller fragments of length indicated in the figure (i.e. 10% of the genome), and does not include the 5 kb fragments.</p>
            </caption>
            <text>
               <p>Evaluation of binning accuracy based on deeply sampled metagenomes for which contigs are assigned to genomes with a high degree of confidence.</p>
            </text>
            <file name="gb-2009-10-8-r85-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional File 3</p>
            </title>
            <caption>
               <p>Binning accuracy calculated for genomes that were sampled to varying extents of completeness (10 to 100%). Binning and calculation of accuracy were done as described in additional file 2 with the exception that unassigned sequences were not included in the ESOM. Genomes were broken into 5-kb fragments, then this pool was randomly sampled to obtain the indicated % of the genome.</p>
            </caption>
            <text>
               <p>Binning accuracy calculated for genomes that were sampled to varying extents of completeness (10 to 100%).</p>
            </text>
            <file name="gb-2009-10-8-r85-S3.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional File 4</p>
            </title>
            <caption>
               <p>Heat map of average genome-wide frequency of each tetranucleotide for each genome, including bacteria, archaea, viruses, and a putative plasmid. Tetranucleotides (columns) are sorted from left to right based on the number of G+C per tetranucleotide as indicated at top. Palindromic tetranucleotides are marked with black circles, and those that effectively distinguish closely related members of the <it>Thermoplasmatales </it>with the same %GC (E-plasma, G-plasma, <it>Ferroplasma </it>types I and II) are indicated with stars: TATA, ATAT, GATC. (A) All Iron Mountain AMD bacterial and archaeal genomes, listed from high G+C content (top) to low (bottom). (B) Archaeal and bacterial genomes for which viral genomes have been reconstructed <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The tetranucleotide frequencies of the hosts are shown adjacent to their viruses.</p>
            </caption>
            <text>
               <p>Heat map of average genome-wide frequency of each tetranucleotide for each genome, including bacteria, archaea, viruses, and a putative plasmid.</p>
            </text>
            <file name="gb-2009-10-8-r85-S4.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional File 5</p>
            </title>
            <caption>
               <p>Comparison of tetra-ESOMs of assembled genomes based on amino acid composition, codon composition, and tetranucleotide frequency</p>
            </caption>
            <text>
               <p>Comparison of tetra-ESOMs of assembled genomes based on (a) amino acid composition, (b) codon composition, and (c) tetranucleotide frequency</p>
            </text>
            <file name="gb-2009-10-8-r85-S5.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional File 6</p>
            </title>
            <caption>
               <p>The observed difference in frequency of each tetranucleotide between pairs of genomes correlates with the difference predicted based on codon composition. (A) A-plasma versus E-plasma; (B) Eplasma versus <it>Ferroplasma acidarmanus</it>; (C) G-plasma versus E-plasma; (D) <it>Leptosprillum </it>sp. group II versus group III.</p>
            </caption>
            <text>
               <p>The observed difference in frequency of each tetranucleotide between pairs of genomes correlates with the difference predicted based on codon composition.</p>
            </text>
            <file name="gb-2009-10-8-r85-S6.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S7">
            <title>
               <p>Additional File 7</p>
            </title>
            <caption>
               <p>Tetra-ESOM of deeply sampled genomes for which coding and noncoding regions were separated. Noncoding regions are shown in bold with color corresponding to coding regions.</p>
            </caption>
            <text>
               <p>Tetra-ESOM of deeply sampled genomes for which coding and noncoding regions were separated.</p>
            </text>
            <file name="gb-2009-10-8-r85-S7.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S8">
            <title>
               <p>Additional File 8</p>
            </title>
            <caption>
               <p>Percentage of sequence coding for genes in comparison with the genome-wide coding percentage for incorrectly binned fragments of 5 kb (blue) and 1 kb (red) length. Error bars represent one standard deviation. Dotted line indicates genome-wide average % of sequence coding for genes. * Frac. seqs. &lt; genome avg. is the fraction of incorrectly binned sequences with coding % less than one standard deviation below the genome average.</p>
            </caption>
            <text>
               <p>Percentage of sequence coding for genes in comparison with the genome-wide coding percentage for incorrectly binned fragments.</p>
            </text>
            <file name="gb-2009-10-8-r85-S8.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S9">
            <title>
               <p>Additional File 9</p>
            </title>
            <caption>
               <p>Tetra-ESOM of <it>Leptospirillum </it>group II genes coding for highly expressed proteins (black) or proteins enriched in the extracellular fraction (white) analyzed as separate fractions from the rest of the genome (light green). The one black data point that clusters in the <it>Leptospirillum </it>group II/III unresolved region contains genes shown in the table.</p>
            </caption>
            <text>
               <p>Tetra-ESOM of <it>Leptospirillum </it>group II genes coding for highly expressed proteins or proteins enriched in the extracellular fraction analyzed as separate fractions from the rest of the genome.</p>
            </text>
            <file name="gb-2009-10-8-r85-S9.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S10">
            <title>
               <p>Additional File 10</p>
            </title>
            <caption>
               <p>Processes and factors influencing genome signature. Those inferred to be critical for distinguishing closely-related organisms in the AMD biofilm community are highlighted in red.</p>
            </caption>
            <text>
               <p>Processes and factors influencing genome signature.</p>
            </text>
            <file name="gb-2009-10-8-r85-S10.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Ms D Aliaga Goltsman, Dr V Denef, Ms C Sun, Dr R Hettich, Dr N VerBerkmoes, and Mr M Shah for data and bioinformatic assistance. We are grateful to Mrs M Kelly for guiding kernel density analysis, to Mr Rudy Carver for sampling assistance and Mr TW Arman, President, Iron Mountain Mines, and Dr R Sugarek for site access. The manuscript was significantly improved thanks to critical revisions from Mr D Soergel and Dr S Brenner and three anonymous reviewers. This work was supported by DOE Genomics:GTL project Grant No. DE-FG02-05ER64134 (Office of Science) and sequencing was done at the DOE Joint Genome Institute. AFA was supported by grants from the Swedish Research Council and Carl Tryggers Foundation.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Towards a genome-based taxonomy for prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Konstantinidis</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Tiedje</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2005</pubdate>
            <volume>187</volume>
            <fpage>6258</fpage>
            <lpage>6264</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JB.187.18.6258-6264.2005</pubid>
                  <pubid idtype="pmcid">1236649</pubid>
                  <pubid idtype="pmpid" link="fulltext">16159757</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Genomic insights that advance the species definition for prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Konstantinidis</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Tiedje</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>2567</fpage>
            <lpage>2572</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0409727102</pubid>
                  <pubid idtype="pmcid">549018</pubid>
                  <pubid idtype="pmpid" link="fulltext">15701695</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Microbial diversity and the genetic nature of microbial species.</p>
            </title>
            <aug>
               <au>
                  <snm>Achtman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wagner</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Rev Microbiol</source>
            <pubdate>2008</pubdate>
            <volume>6</volume>
            <fpage>431</fpage>
            <lpage>440</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1872</pubid>
                  <pubid idtype="pmpid" link="fulltext">18461076</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Phylogenetic classification and the universal tree.</p>
            </title>
            <aug>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>284</volume>
            <fpage>2124</fpage>
            <lpage>2128</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.284.5423.2124</pubid>
                  <pubid idtype="pmpid" link="fulltext">10381871</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Community genomics in microbial ecology and evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Nat Rev Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>489</fpage>
            <lpage>498</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1157</pubid>
                  <pubid idtype="pmpid" link="fulltext">15931167</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Microbial community genomics in the ocean.</p>
            </title>
            <aug>
               <au>
                  <snm>DeLong</snm>
                  <fnm>EF</fnm>
               </au>
            </aug>
            <source>Nat Rev Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>459</fpage>
            <lpage>469</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1158</pubid>
                  <pubid idtype="pmpid" link="fulltext">15886695</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Metagenomics: application of genomics to uncultured microorganisms.</p>
            </title>
            <aug>
               <au>
                  <snm>Handelsman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Microbiol Mol Biol Rev</source>
            <pubdate>2004</pubdate>
            <volume>68</volume>
            <fpage>669</fpage>
            <lpage>685</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/MMBR.68.4.669-685.2004</pubid>
                  <pubid idtype="pmpid" link="fulltext">15590779</pubid>
                  <pubid idtype="pmcid">539003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Genome dynamics in a natural archaeal population.</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Tyson</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Whitaker</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Detter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <fpage>1883</fpage>
            <lpage>1888</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0604851104</pubid>
                  <pubid idtype="pmcid">1794283</pubid>
                  <pubid idtype="pmpid" link="fulltext">17267615</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Population genomic analysis of strain variation in <it>Leptospirillum </it>group II bacteria involved in acid mine drainage formation.</p>
            </title>
            <aug>
               <au>
                  <snm>Simmons</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>DiBartolo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Denef</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Goltsman</snm>
                  <fnm>DSA</fnm>
               </au>
               <au>
                  <snm>Thelen</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>PLoS Biology</source>
            <pubdate>2008</pubdate>
            <volume>6</volume>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pbio.0060177</pubid>
                  <pubid idtype="pmcid">2475542,2475542</pubid>
                  <pubid idtype="pmpid" link="fulltext">18651792</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Genetic exchange across a species boundary in the archaeal genus Ferroplasma.</p>
            </title>
            <aug>
               <au>
                  <snm>Eppley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Tyson</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Getz</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2007</pubdate>
            <volume>177</volume>
            <fpage>407</fpage>
            <lpage>416</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1534/genetics.107.072892</pubid>
                  <pubid idtype="pmcid">2013692</pubid>
                  <pubid idtype="pmpid" link="fulltext">17603112</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Genomic patterns of recombination, clonal divergence, and environment in marine microbial populations.</p>
            </title>
            <aug>
               <au>
                  <snm>Konstantinidis</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Delong</snm>
                  <fnm>EF</fnm>
               </au>
            </aug>
            <source>ISME J</source>
            <pubdate>2008</pubdate>
            <volume>2</volume>
            <fpage>1052</fpage>
            <lpage>1065</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ismej.2008.62</pubid>
                  <pubid idtype="pmpid" link="fulltext">18580971</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Virus population dynamics and acquired virus resistance in natural microbial communities.</p>
            </title>
            <aug>
               <au>
                  <snm>Andersson</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2008</pubdate>
            <volume>320</volume>
            <fpage>1047</fpage>
            <lpage>1050</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1157358</pubid>
                  <pubid idtype="pmpid" link="fulltext">18497291</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic through eastern tropical Pacific.</p>
            </title>
            <aug>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Beeson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Baden-Tillson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Thorpe</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Freeman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Andrews-Pfannkoch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kravitz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>Y-H</fnm>
               </au>
               <au>
                  <snm>Falc&#243;n</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Souza</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bonilla-Rosso</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Eguiarte</snm>
                  <fnm>LE</fnm>
               </au>
               <au>
                  <snm>Karl</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Sathyendranath</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biology</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e77</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pbio.0050077</pubid>
                  <pubid idtype="pmcid">1821060</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355176</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Environmental Genome Shotgun Sequencing of the Sargasso Sea.</p>
            </title>
            <aug>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fouts</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Knap</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Lomas</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Nealson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Parsons</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Baden-Tillson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pfannkoch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <fpage>66</fpage>
            <lpage>74</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1093857</pubid>
                  <pubid idtype="pmpid" link="fulltext">15001713</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The <it>Sorcerer II </it>Global Ocean Sampling expedition: Expanding the universe of protein families.</p>
            </title>
            <aug>
               <au>
                  <snm>Yooseph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>SJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Biology</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <fpage>e16</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pbio.0050016</pubid>
                  <pubid idtype="pmcid">1821046</pubid>
                  <pubid idtype="pmpid" link="fulltext">17355171</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Community structure and metabolism through reconstruction of microbial genomes from the environment.</p>
            </title>
            <aug>
               <au>
                  <snm>Tyson</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Ram</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Solovyev</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Rokhsar</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>428</volume>
            <fpage>37</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02340</pubid>
                  <pubid idtype="pmpid" link="fulltext">14961025</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Genome-directed isolation of the key nitrogen fixer <it>Leptospirillum ferrodiazotrophum </it>sp. nov. from an acidophilic microbial community.</p>
            </title>
            <aug>
               <au>
                  <snm>Tyson</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Lo</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>71</volume>
            <fpage>6319</fpage>
            <lpage>6324</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/AEM.71.10.6319-6324.2005</pubid>
                  <pubid idtype="pmcid">1266007</pubid>
                  <pubid idtype="pmpid" link="fulltext">16204553</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Genomic studies of uncultivated archaea.</p>
            </title>
            <aug>
               <au>
                  <snm>Schleper</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jurgens</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Jonuscheit</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Rev Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>479</fpage>
            <lpage>488</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1159</pubid>
                  <pubid idtype="pmpid" link="fulltext">15931166</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Bacterial rhodopsin: evidence for a new type of phototrophy in the sea.</p>
            </title>
            <aug>
               <au>
                  <snm>B&#233;j&#224;</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Hadd</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Jovanovich</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Gates</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Feldman</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>289</volume>
            <fpage>1902</fpage>
            <lpage>1906</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.289.5486.1902</pubid>
                  <pubid idtype="pmpid" link="fulltext">10988064</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Lineages of acidophilic archaea revealed by community genomic analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Baker</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Tyson</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Webb</snm>
                  <fnm>RI</fnm>
               </au>
               <au>
                  <snm>Flanagan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>314</volume>
            <fpage>1933</fpage>
            <lpage>1935</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1132690</pubid>
                  <pubid idtype="pmpid" link="fulltext">17185602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Community genomics among stratified microbial assemblages in the ocean's interior.</p>
            </title>
            <aug>
               <au>
                  <snm>DeLong</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Preston</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Mincer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Hallam</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Frigaard</snm>
                  <fnm>N-U</fnm>
               </au>
               <au>
                  <snm>Martinez</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Brito</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Chisholm</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Karl</snm>
                  <fnm>DM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>311</volume>
            <fpage>496</fpage>
            <lpage>503</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1120250</pubid>
                  <pubid idtype="pmpid" link="fulltext">16439655</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Compositional biases of bacterial genomes and evolutionary implications.</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mr&#225;zek</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1997</pubdate>
            <volume>179</volume>
            <fpage>3899</fpage>
            <lpage>3913</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">179198</pubid>
                  <pubid idtype="pmpid" link="fulltext">9190805</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Directional mutation pressure and neutral molecular evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Sueoka</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>2653</fpage>
            <lpage>2657</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.85.8.2653</pubid>
                  <pubid idtype="pmcid">280056</pubid>
                  <pubid idtype="pmpid">3357886</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>The origins of genome architecture.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <publisher>Sunderland, MA: Sinauer Associates, Inc.</publisher>
            <pubdate>2007</pubdate>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Base composition might result from competition for metabolic resources.</p>
            </title>
            <aug>
               <au>
                  <snm>Rocha</snm>
                  <fnm>EPC</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>291</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02690-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">12044357</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Environments shape the nucleotide composition of genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Foerstner</snm>
                  <fnm>KU</fnm>
               </au>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hooper</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>EMBO Rep</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>1208</fpage>
            <lpage>1213</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.embor.7400538</pubid>
                  <pubid idtype="pmcid">1369203</pubid>
                  <pubid idtype="pmpid" link="fulltext">16200051</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Comparative genomic structure of prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Bentley</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>38</volume>
            <fpage>771</fpage>
            <lpage>792</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.38.072902.094318</pubid>
                  <pubid idtype="pmpid" link="fulltext">15568993</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Molecular archaeology of the Escherichia coli genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>9413</fpage>
            <lpage>9417</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.95.16.9413</pubid>
                  <pubid idtype="pmcid">21352</pubid>
                  <pubid idtype="pmpid" link="fulltext">9689094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Codon bias and base composition are poor indicators of horizontally transferred genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Koski</snm>
                  <fnm>LB</fnm>
               </au>
               <au>
                  <snm>Morton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Golding</snm>
                  <fnm>GB</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>404</fpage>
            <lpage>412</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11230541</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Quantifying the species-specificity in genomic signatures, synonymous codon choice, amino acid usage and G + C content.</p>
            </title>
            <aug>
               <au>
                  <snm>Sandberg</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Br&#228;nden</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ernberg</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>C&#246;ster</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2003</pubdate>
            <volume>311</volume>
            <fpage>35</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(03)00581-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12853136</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Genes from nine genomes are separated into their organisms in the dinucleotide composition space.</p>
            </title>
            <aug>
               <au>
                  <snm>Nakashima</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ota</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nishikawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ooi</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>DNA Res</source>
            <pubdate>1998</pubdate>
            <volume>5</volume>
            <fpage>251</fpage>
            <lpage>259</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/dnares/5.5.251</pubid>
                  <pubid idtype="pmpid" link="fulltext">9872449</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Evolutionary implications of microbial genome tetranucleotide frequency biases.</p>
            </title>
            <aug>
               <au>
                  <snm>Pride</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Meinersmann</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Wassenaar</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Blaser</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>145</fpage>
            <lpage>158</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.335003</pubid>
                  <pubid idtype="pmcid">420360</pubid>
                  <pubid idtype="pmpid" link="fulltext">12566393</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Informatics for unveiling hidden genome signatures.</p>
            </title>
            <aug>
               <au>
                  <snm>Abe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kanaya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kinouchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ichiba</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kozuki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ikemura</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>693</fpage>
            <lpage>702</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.634603</pubid>
                  <pubid idtype="pmcid">430167</pubid>
                  <pubid idtype="pmpid" link="fulltext">12671005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Genomic signature: characterization and classification of species assessed by chaos game representation of sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Deschavanne</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Giron</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vilain</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fagot</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fertil</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1999</pubdate>
            <volume>16</volume>
            <fpage>1391</fpage>
            <lpage>1399</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10563018</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Investigations of oligonucleotide usage variance within and between prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Bohlin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Skjerve</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ussery</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2008</pubdate>
            <volume>4</volume>
            <fpage>e1000057</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pcbi.1000057</pubid>
                  <pubid idtype="pmcid">2289840</pubid>
                  <pubid idtype="pmpid" link="fulltext">18421372</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Bayesian classifiers for detecting HGT using fixed and variable order markov models of genomic signatures.</p>
            </title>
            <aug>
               <au>
                  <snm>Dalevi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dubhashi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hermansson</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>517</fpage>
            <lpage>522</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btk029</pubid>
                  <pubid idtype="pmpid" link="fulltext">16403797</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Capturing whole-genome characteristics in short sequences using a naive bayesian classifier.</p>
            </title>
            <aug>
               <au>
                  <snm>Sandberg</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Winberg</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Br&#228;nden</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kaske</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ernberg</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>C&#246;ster</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1404</fpage>
            <lpage>1409</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.186401</pubid>
                  <pubid idtype="pmcid">311094</pubid>
                  <pubid idtype="pmpid" link="fulltext">11483581</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Atypical regions in large genomic DNA sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Scherer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McPeek</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1994</pubdate>
            <volume>91</volume>
            <fpage>7134</fpage>
            <lpage>7138</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.91.15.7134</pubid>
                  <pubid idtype="pmcid">44353</pubid>
                  <pubid idtype="pmpid">8041759</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Detection and characterization of horizontal transfers in prokaryotes using genomic signature.</p>
            </title>
            <aug>
               <au>
                  <snm>Dufraigne</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fertil</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lespinats</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Giron</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Deschavanne</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>e6</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gni004</pubid>
                  <pubid idtype="pmcid">546175</pubid>
                  <pubid idtype="pmpid" link="fulltext">15653627</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The reach of the genome signature in prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>van Passel</snm>
                  <fnm>MWJ</fnm>
               </au>
               <au>
                  <snm>Kuramae</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Luyf</snm>
                  <fnm>ACM</fnm>
               </au>
               <au>
                  <snm>Bart</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Boekhout</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2006</pubdate>
            <volume>6</volume>
            <fpage>84</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2148-6-84</pubid>
                  <pubid idtype="pmcid">1621082</pubid>
                  <pubid idtype="pmpid" link="fulltext">17040564</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Similarities and dissimilarities of phage genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Blaisdell</snm>
                  <fnm>BE</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1996</pubdate>
            <volume>93</volume>
            <fpage>5854</fpage>
            <lpage>5859</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.93.12.5854</pubid>
                  <pubid idtype="pmcid">39151</pubid>
                  <pubid idtype="pmpid">8650182</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Distinctive features of large complex virus genomes and proteomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Mr&#225;zek</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <fpage>5127</fpage>
            <lpage>5132</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0700429104</pubid>
                  <pubid idtype="pmcid">1829274</pubid>
                  <pubid idtype="pmpid" link="fulltext">17360339</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>What's in the mix: phylogenetic classification of metagenome sequence samples.</p>
            </title>
            <aug>
               <au>
                  <snm>McHardy</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Rigoutsos</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Curr Opin Microbiol</source>
            <pubdate>2007</pubdate>
            <volume>10</volume>
            <fpage>499</fpage>
            <lpage>503</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.mib.2007.08.004</pubid>
                  <pubid idtype="pmpid" link="fulltext">17933580</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.</p>
            </title>
            <aug>
               <au>
                  <snm>Mavromatis</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ivanova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Barry</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shapiro</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Goltsman</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>McHardy</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Rigoutsos</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Salamov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Korzeniewski</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Land</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lapidus</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Grigoriev</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>NC</fnm>
               </au>
            </aug>
            <source>Nat Methods</source>
            <pubdate>2007</pubdate>
            <volume>4</volume>
            <fpage>495</fpage>
            <lpage>500</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth1043</pubid>
                  <pubid idtype="pmpid" link="fulltext">17468765</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Metagenomics reveals our incomplete knowledge of global diversity.</p>
            </title>
            <aug>
               <au>
                  <snm>Pignatelli</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aparicio</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Blanquer</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Hern&#225;ndez</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Moya</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tamames</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>2124</fpage>
            <lpage>2125</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btn355</pubid>
                  <pubid idtype="pmcid">2530889</pubid>
                  <pubid idtype="pmpid" link="fulltext">18625611</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples.</p>
            </title>
            <aug>
               <au>
                  <snm>Abe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sugawara</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kinouchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kanaya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ikemura</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>DNA Res</source>
            <pubdate>2005</pubdate>
            <volume>12</volume>
            <fpage>281</fpage>
            <lpage>290</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/dnares/dsi015</pubid>
                  <pubid idtype="pmpid" link="fulltext">16769690</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Application of tetranucleotide frequencies for the assignment of genomic fragments.</p>
            </title>
            <aug>
               <au>
                  <snm>Teeling</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Meyerdierks</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bauer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Amann</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gl&#246;ckner</snm>
                  <fnm>FO</fnm>
               </au>
            </aug>
            <source>Environ Microbiol</source>
            <pubdate>2004</pubdate>
            <volume>6</volume>
            <fpage>938</fpage>
            <lpage>947</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1462-2920.2004.00624.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15305919</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Accurate phylogenetic classification of variable-length DNA fragments.</p>
            </title>
            <aug>
               <au>
                  <snm>McHardy</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Mart&#237;n</snm>
                  <fnm>HG</fnm>
               </au>
               <au>
                  <snm>Tsirigos</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rigooutsos</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Nat Methods</source>
            <pubdate>2007</pubdate>
            <volume>4</volume>
            <fpage>63</fpage>
            <lpage>72</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth976</pubid>
                  <pubid idtype="pmpid" link="fulltext">17179938</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>An environmental signature for 323 microbial genomes based on codon adaptation indices.</p>
            </title>
            <aug>
               <au>
                  <snm>Willenbrock</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Friis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Juncker</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Ussery</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>R114</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2006-7-12-r114</pubid>
                  <pubid idtype="pmcid">1794427</pubid>
                  <pubid idtype="pmpid" link="fulltext">17156429</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Get the most out of your metagenome: computational analysis of environmental sequence data.</p>
            </title>
            <aug>
               <au>
                  <snm>Raes</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Foerstner</snm>
                  <fnm>KU</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Curr Opin Microbiol</source>
            <pubdate>2007</pubdate>
            <volume>10</volume>
            <fpage>490</fpage>
            <lpage>498</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.mib.2007.09.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">17936679</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Paul</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bag</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Das</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Harvill</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Dutta</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>R70</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2008-9-4-r70</pubid>
                  <pubid idtype="pmcid">2643941</pubid>
                  <pubid idtype="pmpid" link="fulltext">18397532</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Community proteogenomics highlights microbial strain-variant protein expression within activated sludge performing enhanced biological phosphorus removal.</p>
            </title>
            <aug>
               <au>
                  <snm>Wilmes</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Andersson</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Lefsrud</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Wexler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hettich</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Bond</snm>
                  <fnm>PL</fnm>
               </au>
               <au>
                  <snm>VerBerkmoes</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>ISME J</source>
            <pubdate>2008</pubdate>
            <volume>2</volume>
            <fpage>853</fpage>
            <lpage>864</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ismej.2008.38</pubid>
                  <pubid idtype="pmpid" link="fulltext">18449217</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Acid mine drainage biogeochemistry at Iron Mountain, California.</p>
            </title>
            <aug>
               <au>
                  <snm>Druschel</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Gihring</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Geochem Trans</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>13</fpage>
            <lpage>32</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1467-4866-5-13</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Microbial communities in acid mine drainage.</p>
            </title>
            <aug>
               <au>
                  <snm>Baker</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Ecol</source>
            <pubdate>2003</pubdate>
            <volume>44</volume>
            <fpage>139</fpage>
            <lpage>152</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-6496(03)00028-X</pubid>
                  <pubid idtype="pmpid">19719632</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Lo</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Denef</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Verberkmoes</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Goltsman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>DiBartolo</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Tyson</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Ram</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Detter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Thelen</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Hettich</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>446</volume>
            <fpage>537</fpage>
            <lpage>541</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05624</pubid>
                  <pubid idtype="pmpid" link="fulltext">17344860</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Community genomic and proteomic analyses of chemoautotrophic iron-oxidizing "Leptospirillum rubarum" (Group II) and "Leptospirillum ferrodiazotrophum" (Group III) bacteria in acid mine drainage biofilms.</p>
            </title>
            <aug>
               <au>
                  <snm>Goltsman</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Denef</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Singer</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>VerBerkmoes</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Lefsrud</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mueller</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Dick</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Zemla</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Hauser</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Land</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Thelen</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Hettich</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2009</pubdate>
            <volume>75</volume>
            <fpage>4599</fpage>
            <lpage>4615</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/AEM.02943-08</pubid>
                  <pubid idtype="pmpid" link="fulltext">19429552</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>An archaeal iron-oxidizing extreme acidophile important in acid mine drainage.</p>
            </title>
            <aug>
               <au>
                  <snm>Edwards</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Bond</snm>
                  <fnm>PL</fnm>
               </au>
               <au>
                  <snm>Gihring</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>287</volume>
            <fpage>1796</fpage>
            <lpage>1799</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.287.5459.1796</pubid>
                  <pubid idtype="pmpid" link="fulltext">10710303</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Self-organizing maps</p>
            </title>
            <aug>
               <au>
                  <snm>Kohonen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <publisher>New York: Springer-Verlag</publisher>
            <pubdate>1997</pubdate>
            <volume>0</volume>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Binning sequences using very sparse labels within a metagenome.</p>
            </title>
            <aug>
               <au>
                  <snm>Chan</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Hsu</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Halgamuge</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>215</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-9-215</pubid>
                  <pubid idtype="pmcid">2383919</pubid>
                  <pubid idtype="pmpid" link="fulltext">18442374</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM.</p>
            </title>
            <aug>
               <au>
                  <snm>Ultsch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Moerchen</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Technical Report Department of Mathematics and Computer Science, University of Marburg, Germany</source>
            <pubdate>2005</pubdate>
            <volume>46</volume>
         </bibl>
         <bibl id="B61">
            <title>
               <p>A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action.</p>
            </title>
            <aug>
               <au>
                  <snm>Makarova</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Grishin</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Shabalina</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Biol Direct</source>
            <pubdate>2006</pubdate>
            <volume>1</volume>
            <fpage>7</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1745-6150-1-7</pubid>
                  <pubid idtype="pmcid">1462988</pubid>
                  <pubid idtype="pmpid" link="fulltext">16545108</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Differentiation of regions with atypical oligonucleotide composition in bacterial genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Reva</snm>
                  <fnm>ON</fnm>
               </au>
               <au>
                  <snm>Tummler</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>251</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-6-251</pubid>
                  <pubid idtype="pmcid">1274298</pubid>
                  <pubid idtype="pmpid" link="fulltext">16225667</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses.</p>
            </title>
            <aug>
               <au>
                  <snm>Pride</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Wassenaar</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Ghose</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Blaser</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>BMC genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>8</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2164-7-8</pubid>
                  <pubid idtype="pmcid">1360066</pubid>
                  <pubid idtype="pmpid" link="fulltext">16417644</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>Virus attenuation by genome-scale changes in codon pair bias.</p>
            </title>
            <aug>
               <au>
                  <snm>Coleman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Papamichail</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Skiena</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Futcher</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wimmer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mueller</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2008</pubdate>
            <volume>320</volume>
            <fpage>1784</fpage>
            <lpage>1787</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1155761</pubid>
                  <pubid idtype="pmpid" link="fulltext">18583614</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Tetraether-linked membrane monolayers in <it>Ferroplasma </it>spp.: a key to survival in acid.</p>
            </title>
            <aug>
               <au>
                  <snm>Macalady</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Vestling</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Baumler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Boekelheide</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kaspar</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Extremophiles</source>
            <pubdate>2004</pubdate>
            <volume>8</volume>
            <fpage>411</fpage>
            <lpage>419</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00792-004-0404-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">15258835</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Community Proteomics of a Natural Microbial Biofilm.</p>
            </title>
            <aug>
               <au>
                  <snm>Ram</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>VerBerkmoes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Thelen</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Tyson</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>RC</fnm>
                  <suf>II</suf>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hettich</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Banfield</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <fpage>1915</fpage>
            <lpage>1920</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science. 1109070</pubid>
                  <pubid idtype="pmpid" link="fulltext">15879173</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Codon usage between genomes is constrained by genome-wide mutational processes.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hottes</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Shapiro</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>McAdams</snm>
                  <fnm>HH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>3480</fpage>
            <lpage>3485</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0307827100</pubid>
                  <pubid idtype="pmcid">373487</pubid>
                  <pubid idtype="pmpid" link="fulltext">14990797</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Knight</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Freeland</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Landweber</snm>
                  <fnm>LF</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pubmed">11305938</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization.</p>
            </title>
            <aug>
               <au>
                  <snm>Rocha</snm>
                  <fnm>EP</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>2279</fpage>
            <lpage>2286</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.2896904</pubid>
                  <pubid idtype="pmcid">525687</pubid>
                  <pubid idtype="pmpid" link="fulltext">15479947</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Ikemura</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1981</pubdate>
            <volume>146</volume>
            <fpage>1</fpage>
            <lpage>21</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(81)90363-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">6167728</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Causes for the intriguing presence of tRNAs in phages.</p>
            </title>
            <aug>
               <au>
                  <snm>Bailly-Bechet</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vergassola</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rocha</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>1486</fpage>
            <lpage>1495</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.6649807</pubid>
                  <pubid idtype="pmcid">1987346</pubid>
                  <pubid idtype="pmpid" link="fulltext">17785533</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Co-variation of tRNA abundance and codon usage in <it>Escherichia coli </it>at different growth rates.</p>
            </title>
            <aug>
               <au>
                  <snm>Dong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nilsson</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kurland</snm>
                  <fnm>CG</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1996</pubdate>
            <volume>260</volume>
            <fpage>649</fpage>
            <lpage>663</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1996.0428</pubid>
                  <pubid idtype="pmpid" link="fulltext">8709146</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>The selection-mutation-drift theory of synonymous codon usage.</p>
            </title>
            <aug>
               <au>
                  <snm>Bulmer</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1991</pubdate>
            <volume>129</volume>
            <fpage>897</fpage>
            <lpage>907</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1204756</pubid>
                  <pubid idtype="pmpid" link="fulltext">1752426</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Variation in strength of selected codon usage bias among bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Sharp</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Bailes</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Grocock</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Peden</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Sockett</snm>
                  <fnm>RE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>1141</fpage>
            <lpage>1153</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gki242</pubid>
                  <pubid idtype="pmcid">549432</pubid>
                  <pubid idtype="pmpid" link="fulltext">15728743</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>Performance of the translational apparatus varies with the ecological strategies of bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Dethlefsen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>TM</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2007</pubdate>
            <volume>189</volume>
            <fpage>3237</fpage>
            <lpage>3245</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JB.01686-06</pubid>
                  <pubid idtype="pmcid">1855866</pubid>
                  <pubid idtype="pmpid" link="fulltext">17277058</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B76">
            <title>
               <p>rRNA operon copy number reflects ecological strategies of bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Klappenbach</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Dunbar</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>TM</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2000</pubdate>
            <volume>66</volume>
            <fpage>1328</fpage>
            <lpage>1333</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/AEM.66.4.1328-1333.2000</pubid>
                  <pubid idtype="pmcid">91988</pubid>
                  <pubid idtype="pmpid" link="fulltext">10742207</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>Behavior of restriction-modification systems as selfish mobile elements and their impact on genome evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Kobayashi</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>3742</fpage>
            <lpage>3756</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/29.18.3742</pubid>
                  <pubid idtype="pmcid">55917</pubid>
                  <pubid idtype="pmpid" link="fulltext">11557807</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B78">
            <title>
               <p>Diverse syntrophic partnerships from deep-sea methane vents revealed by direct cell capture and metagenomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Pernthanler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dekas</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Goffredi</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Embaye</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Orphan</snm>
                  <fnm>VJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2008</pubdate>
            <volume>105</volume>
            <fpage>7052</fpage>
            <lpage>7057</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0711303105</pubid>
                  <pubid idtype="pmcid">2383945</pubid>
                  <pubid idtype="pmpid" link="fulltext">18467493</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B79">
            <title>
               <p>Targeted access to the genomes of low-abundance organisms in complex microbial communities.</p>
            </title>
            <aug>
               <au>
                  <snm>Podar</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Abulencia</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Walcher</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hutchison</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zengler</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Garcia</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Holland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Cotton</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hauser</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Keller</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>2007</pubdate>
            <volume>73</volume>
            <fpage>3205</fpage>
            <lpage>3214</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/AEM.02985-06</pubid>
                  <pubid idtype="pmcid">1907129</pubid>
                  <pubid idtype="pmpid" link="fulltext">17369337</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B80">
            <title>
               <p>Matching phylogeny and metabolism in the uncultured marine bacteria, one cell at a time.</p>
            </title>
            <aug>
               <au>
                  <snm>Stepanauskas</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sieracki</snm>
                  <fnm>ME</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <fpage>9052</fpage>
            <lpage>9057</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0700496104</pubid>
                  <pubid idtype="pmcid">1885626</pubid>
                  <pubid idtype="pmpid" link="fulltext">17502618</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B81">
            <title>
               <p>Genomic analysis of the uncultivated marine crenarchaeote <it>Cenarchaem symbiosum</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Hallam</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Konstantinidis</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Putnam</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Schleper</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sugahara</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Preston</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Torre</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>DeLong</snm>
                  <fnm>EF</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <fpage>18296</fpage>
            <lpage>18301</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0608549103</pubid>
                  <pubid idtype="pmcid">1643844,1643844</pubid>
                  <pubid idtype="pmpid" link="fulltext">17114289</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B82">
            <title>
               <p>"<it>Candidatus </it>Cloacamonas Acidaminovorans": Genome sequence reconstruction provides a first glimpse of a new bacterial division.</p>
            </title>
            <aug>
               <au>
                  <snm>Pelletier</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kreimeyer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bocs</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rouy</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gyapay</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chouari</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rivi&#232;re</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ganesan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Daegelen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sghir</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>GN</fnm>
               </au>
               <au>
                  <snm>M&#233;digue</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wessenbach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Paslier</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2008</pubdate>
            <volume>190</volume>
            <fpage>2572</fpage>
            <lpage>2579</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JB.01248-07</pubid>
                  <pubid idtype="pmcid">2293186</pubid>
                  <pubid idtype="pmpid" link="fulltext">18245282</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B83">
            <title>
               <p>Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities.</p>
            </title>
            <aug>
               <au>
                  <snm>Garcia Martin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ivanova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kunin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Warnecke</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Barry</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>McHardy</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Salamov</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Szeto</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dalin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Putnam</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Shapiro</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Pangilinan</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Rigoutsos</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Blackall</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>McMahon</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2006</pubdate>
            <volume>24</volume>
            <fpage>1263</fpage>
            <lpage>1269</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1247</pubid>
                  <pubid idtype="pmpid" link="fulltext">16998472</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B84">
            <title>
               <p>Deciphering the evolution and metabolism of an annamox bacterium from a community genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Strous</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pelletier</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mangenot</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rattei</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lehner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Horn</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Daims</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bartol-Mavel</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Wincker</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Barbe</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Fonknechten</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Vallenet</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Segurens</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Schenowitz-Truong</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Medigue</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Collingro</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Dutilh</snm>
                  <fnm>BE</fnm>
               </au>
               <au>
                  <snm>Op den Camp</snm>
                  <fnm>HJM</fnm>
               </au>
               <au>
                  <snm>Drift</snm>
                  <mnm>van der</mnm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cirpus</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pas-Schoonen</snm>
                  <mnm>van de</mnm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Harhangi</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>van Niftrik</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Schmid</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Keltjens</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vossenberg</snm>
                  <mnm>van de</mnm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kartal</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Meier</snm>
                  <fnm>H</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>440</volume>
            <fpage>790</fpage>
            <lpage>794</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04647</pubid>
                  <pubid idtype="pmpid" link="fulltext">16598256</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B85">
            <title>
               <p>Phylogenetic analyses of ribosomal DNA-containing bacterioplankton genome fragments from a 4000 m vertical profile in the North Pacific Subtropical Gyre.</p>
            </title>
            <aug>
               <au>
                  <snm>Pham</snm>
                  <fnm>VD</fnm>
               </au>
               <au>
                  <snm>Konstantinidis</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Palden</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>DeLong</snm>
                  <fnm>EF</fnm>
               </au>
            </aug>
            <source>Environ Microbiol</source>
            <pubdate>2008</pubdate>
            <volume>10</volume>
            <fpage>2313</fpage>
            <lpage>2330</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1462-2920.2008.01657.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">18494796</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B86">
            <title>
               <p>Genomic islands and the ecology and evolution of Prochlorococcus.</p>
            </title>
            <aug>
               <au>
                  <snm>Coleman</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Martiny</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Steglich</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Barry</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Delong</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Chisholm</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>311</volume>
            <fpage>1768</fpage>
            <lpage>1770</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1122050</pubid>
                  <pubid idtype="pmpid" link="fulltext">16556843</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B87">
            <title>
               <p>Genomic plasticity in prokaryotes: the case of the square haloarchaeon.</p>
            </title>
            <aug>
               <au>
                  <snm>Cuadros-Orellana</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Martin-Cuadrado</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Legault</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>D'Auria</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Zhaxybayeva</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Papke</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Rodriguez-Valera</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>ISME J</source>
            <pubdate>2007</pubdate>
            <volume>1</volume>
            <fpage>235</fpage>
            <lpage>245</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ismej.2007.35</pubid>
                  <pubid idtype="pmpid" link="fulltext">18043634</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B88">
            <title>
               <p>The microbial pan-genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Medini</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Donati</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tettelin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Masignani</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Rappuoli</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>589</fpage>
            <lpage>594</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gde.2005.09.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">16185861</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B89">
            <title>
               <p>Amelioration of bacterial genomes: rates of change and exchange.</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1997</pubdate>
            <volume>44</volume>
            <fpage>383</fpage>
            <lpage>397</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/PL00006158</pubid>
                  <pubid idtype="pmpid" link="fulltext">9089078</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B90">
            <title>
               <p>Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>9</volume>
            <fpage>335</fpage>
            <lpage>343</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0966-842X(01)02079-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">11435108</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B91">
            <title>
               <p>Hyperbolic SOM-based clustering of DNA fragment features for taxonomic visualization and classification.</p>
            </title>
            <aug>
               <au>
                  <snm>Martin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Diaz</snm>
                  <fnm>NN</fnm>
               </au>
               <au>
                  <snm>Ontrup</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nattkemper</snm>
                  <fnm>TW</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>1568</fpage>
            <lpage>1574</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btn257</pubid>
                  <pubid idtype="pmpid" link="fulltext">18535082</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B92">
            <title>
               <p>ARB: a software environment for sequence data.</p>
            </title>
            <aug>
               <au>
                  <snm>Ludwig</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Strunk</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Westram</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Richter</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Meier</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yadhukumar</snm>
                  <fnm/>
               </au>
               <au>
                  <snm>Buchner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lai</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Steppi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jobb</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Forster</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Brettske</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Gerber</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ginhart</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Gross</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Grumann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hermann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jost</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Konig</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Liss</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lussmann</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>May</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nonhoff</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Reichel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Strehlow</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Stamatakis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stuckmann</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Vilbig</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lenke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ludwig</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>1363</fpage>
            <lpage>1371</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkh293</pubid>
                  <pubid idtype="pmcid">390282</pubid>
                  <pubid idtype="pmpid" link="fulltext">14985472</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B93">
            <title>
               <p>SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB.</p>
            </title>
            <aug>
               <au>
                  <snm>Pruesse</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Quast</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Knittel</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Fuchs</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Ludwig</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Peplies</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Glockner</snm>
                  <fnm>FO</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>7188</fpage>
            <lpage>7196</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkm864</pubid>
                  <pubid idtype="pmcid">2175337</pubid>
                  <pubid idtype="pmpid" link="fulltext">17947321</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B94">
            <title>
               <p>Databionic ESOM Tools</p>
            </title>
            <url>http://databionic-esom.sourceforge.net</url>
         </bibl>
         <bibl id="B95">
            <title>
               <p>Density estimation for statistics and data analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Silverman</snm>
                  <fnm>BW</fnm>
               </au>
            </aug>
            <publisher>London: CRC Press</publisher>
            <pubdate>1986</pubdate>
         </bibl>
         <bibl id="B96">
            <title>
               <p>Hawth's Analysis Tools</p>
            </title>
            <url>http://www.spatialecology.com/htools/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
