<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-5-r71</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Evolution of the core and pan-genome of <it>Streptococcus</it>: positive selection, recombination, and genome composition</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Lef&#233;bure</snm>
               <fnm>Tristan</fnm>
               <insr iid="I1"/>
               <email>tnl7@cornell.edu</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Stanhope</snm>
               <mi>J</mi>
               <fnm>Michael</fnm>
               <insr iid="I1"/>
               <email>mjs297@cornell.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Population Medicine and Diagnostic Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY 14853, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>5</issue>
         <fpage>R71</fpage>
         <url>http://genomebiology.com/2007/8/5/R71</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17475002</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-5-r71</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>28</day>
               <month>11</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>24</day>
               <month>4</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>2</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>02</day>
               <month>05</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Lef&#233;bure and Stanhope; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p><it>Streptococcus </it>genome evolution</p>
      </shorttitle>
      <shortabs>
         <p>Comparative evolutionary analyses of 26 <it>Streptococcus </it>genomes show that recombination and positive selection have both had important roles in the adaptation of different species to different hosts.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The genus <it>Streptococcus </it>is one of the most diverse and important human and agricultural pathogens. This study employs comparative evolutionary analyses of 26 <it>Streptococcus </it>genomes to yield an improved understanding of the relative roles of recombination and positive selection in pathogen adaptation to their hosts.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p><it>Streptococcus </it>genomes exhibit extreme levels of evolutionary plasticity, with high levels of gene gain and loss during species and strain evolution. <it>S. agalactiae </it>has a large pan-genome, with little recombination in its core-genome, while <it>S. pyogenes </it>has a smaller pan-genome and much more recombination of its core-genome, perhaps reflecting the greater habitat, and gene pool, diversity for <it>S. agalactiae </it>compared to <it>S. pyogenes</it>. Core-genome recombination was evident in all lineages (18% to 37% of the core-genome judged to be recombinant), while positive selection was mainly observed during species differentiation (from 11% to 34% of the core-genome). Positive selection pressure was unevenly distributed across lineages and biochemical main role categories. <it>S. suis </it>was the lineage with the greatest level of positive selection pressure, the largest number of unique loci selected, and the largest amount of gene gain and loss.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Recombination is an important evolutionary force in shaping <it>Streptococcus </it>genomes, not only in the acquisition of significant portions of the genome as lineage specific loci, but also in facilitating rapid evolution of the core-genome. Positive selection, although undoubtedly a slower process, has nonetheless played an important role in adaptation of the core-genome of different <it>Streptococcus </it>species to different hosts.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010014">Microbiology and parasitology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Microbial pathogens show surprising capacity for adaptation to new hosts, antibiotics, or immune systems. Three principal mechanisms are regarded as important in this adaptive potential: Darwinian, or positive selection, favoring the fixation of advantageous mutations; acquisition of new genetic material by lateral DNA exchange (that is, recombination); and gene regulation. Several studies have suggested that recombination might be the key factor in adaptation of pathogens and that the recombination rates of bacteria might be higher than their mutation rates <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. At the same time, there is a portion of the genome - the core-genome - that is thought to be representative of bacterial taxa, at various taxonomic levels <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Recent molecular evolution analyses of <it>Escherichia coli </it>and <it>Salmonella enterica </it><abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp> have identified genes under positive selection pressure in the core-genome of these enteric bacteria. Genome sequence data are now available for numerous species of several genera of bacteria, providing the possibility of using comparative evolutionary genomic approaches to assess positive selection pressure and the role of horizontal gene transfer in the evolution of the core-genome of a bacterial genus.</p>
         <p>One such important bacterial genus is <it>Streptococcus</it>, which includes some of the most important human and agricultural pathogens, causing a wide range of different diseases, and inflicting significant morbidity and mortality throughout the world, as well as resulting in significant economic burden. Twenty six genomes of <it>Streptococcus </it>are available on public databases belonging to six different species, including <it>S. pneumoniae</it>, <it>S. agalactiae</it>, <it>S. pyogenes</it>, <it>S. thermophilus</it>, <it>S. mutans </it>and <it>S. suis</it>. <it>S. pyogenes </it>(Group A <it>Streptococcus</it>; GAS), is responsible for a wide range of human diseases, including pharyngitis, impetigo, puerperal sepsis, necrotizing fasciitis ('flesh-eating disease'), scarlet fever, the postinfection sequelae glomerulonephritis and rheumatic fever. In addition, <it>S. pyogenes </it>has recently been associated with Tourette's syndrome and movement and attention deficit disorders <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. A resurgence of <it>S. pyogenes </it>infections has been observed since the mid-1980s. <it>S. agalactiae </it>is another important human pathogen and is the leading cause of bacterial sepsis, pneumonia, and meningitis in US and European neonates <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Although <it>S. agalactiae </it>normally behaves as a commensal organism that colonizes the genital or gastrointestinal tract of healthy adults, it can cause life threatening invasive infection in susceptible hosts, such as newborns, pregnant women, and nonpregnant adults with chronic illnesses <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. <it>S. agalactiae </it>was first recognized as a pathogen in bovine mastitis <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. <it>S. pneumoniae </it>is the leading cause of human bacterial infection worldwide <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, although paradoxically, is primarily carried asymptomatically. It has been an object of medical study and scrutiny for over a century. <it>S. mutans </it>is implicated as the principal causative agent of human dental caries (tooth decay) <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. <it>S. thermophilus </it>is a non-pathogenic, food microorganism, widely used in the dairy product industry. <it>S. suis </it>is responsible for a variety of diseases in pigs, including meningitis, septicemia, arthritis, and pneumonia <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. It is also a zoonotic pathogen that causes occasional cases of meningitis and sepsis in humans, but has recently also been implicated in outbreaks of streptococcal toxic shock syndrome <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>.</p>
         <p>A recent comparative genomic analysis of five of these above mentioned streptococcal species (<it>S. suis </it>not included), focused on understanding the role of lateral gene transfer in shaping the genomes of each of these lineages, and analyzed some of the species specific genes for potential adaptive evolution <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Species or strain specific loci are often the focus of attempts to understand adaptive differences in bacteria. However, with the exception of the Chen <it>et al</it>. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> study on <it>E. coli</it>, assessments of adaptive evolution in the core-genome components of other bacterial species have not been thoroughly explored. In addition to individual genome sequences for several species of <it>Streptococcus</it>, there are also complete genome sequences available for multiple strains of <it>S. agalactiae</it>, <it>S. pyogenes</it>, and <it>S. thermophilus</it>. Genome wide molecular selection analyses, designed to assess selection pressure across the entire core-genome of different species and strains of <it>Streptococcus </it>have not been reported, and also no published reports have attempted to address the relative role of selection versus recombination in the diversification of the core-genome of <it>Streptococcus</it>.</p>
         <p>Along with the burgeoning increase in microbial genome sequence data there has been a concomitant development of sophisticated methods for detecting positive selection in protein coding genes. These methods can be used to compare orthologous DNA sequence data across the entire genomes of the available species within the genus <it>Streptococcus</it>. Ziheng Yang, Rasmus Nielsen and colleagues <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> have developed powerful statistical methods for detecting adaptive molecular evolution. Their methods compare synonymous and nonsynonymous substitution rates in protein coding genes and regard a nonsynonymous rate elevated above the synonymous rate as evidence for positive or Darwinian selection. Positive natural selection leads to the fixation of advantageous mutations driven by natural selection, and is the fundamental process behind adaptive changes in genes and genomes, leading to evolutionary innovations and species differences. A significant advancement on many earlier methods, which averaged over sites and time, their methods are designed to detect positive selection at individual sites and lineages <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Our study employs these powerful selection methods to assess positive selection pressure across the core-genome components of the genus <it>Streptococcus</it>, as well as several species of <it>Streptococcus</it>, while concomitantly assessing levels of recombination within the core-genome.</p>
         <p>Concomitant with the identification of bacterial core-genomes, it has become evident that there is an apparently dispensable portion of bacterial genomes, consisting of partially shared and strain-specific genes that can, even within a particular species, represent a surprisingly large proportion (for example, <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>). The concept of dispensable portions of genomes implies that genes have been lost and gained since separation from common ancestors, which in turn implies that this loss and gain can be estimated from reconstructed genome composition. This sort of approach has been undertaken previously, including for a few species of <it>Streptococcus </it><abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, with one of the resulting conclusions being that gene gain tends to be much greater than gene loss. An additional purpose of this paper is to compare gene gain and loss within and between <it>Streptococcus </it>species, making use of the larger comparative data set of species and strains now available, and to compare that history with histories of positive selection and recombination in the core-genome.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Pan-genome, core-genome, and evolution of genome composition</p>
            </st>
            <p>The number of protein coding genes per genome within the various strains and species of <it>Streptococcus </it>is relatively similar (ranging from 1,697 to 2,376; Table <tblr tid="T1">1</tblr>), but the gene composition of these genomes is much more variable. Based on the gene content table obtained by OrthoMCL (Additional data file 1), three strains of <it>S. agalactiae</it>, <it>S. pyogenes </it>or <it>S. thermophilus </it>share about 75% of their genes, and three different species of <it>Streptococcus </it>share only around half of their genes (Figure <figr fid="F1">1</figr>). This latter result appears to be independent of the particular strains or species involved in the comparison and of their phylogenetic affinities. Even with the inclusion of 26 genomes, the total number of possible genes - the pan-genome - of <it>Streptococcus </it>appears not to have been reached, as depicted in the gene accumulation curve (Figure <figr fid="F2">2</figr>), and we estimate the <it>Streptococcus </it>pan-genome probably surpasses 6,000 genes. A surprising 21% of the genes in the pan-genome of the genus <it>Streptococcus </it>(based on these 26 genome sequences), were represented in only one lineage, suggesting a remarkable degree of lateral gene transfer in shaping the genomes of each of these taxa (Figure <figr fid="F3">3</figr>). Within species, the pan-genome size also remains uncertain, although our estimates suggest that the pan-genome size of <it>S. pyogenes </it>is smaller, and better estimated with the currently available data, than that of <it>S. agalactiae </it>(Figure <figr fid="F2">2</figr>).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Genomes analyzed</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Species</p>
                     </c>
                     <c ca="left">
                        <p>Strain</p>
                     </c>
                     <c ca="left">
                        <p>Refseq accession number</p>
                     </c>
                     <c ca="left">
                        <p>Status</p>
                     </c>
                     <c ca="center">
                        <p>CDS</p>
                     </c>
                     <c ca="center">
                        <p>Serotype</p>
                     </c>
                     <c ca="center">
                        <p>References</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS10270</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_008022">NC_008022</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,987</p>
                     </c>
                     <c ca="center">
                        <p>M2</p>
                     </c>
                     <c ca="center">
                        <p>[46]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS10750</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_008024">NC_008024</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,979</p>
                     </c>
                     <c ca="center">
                        <p>M4</p>
                     </c>
                     <c ca="center">
                        <p>[46]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS2096</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_008023">NC_008023</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,898</p>
                     </c>
                     <c ca="center">
                        <p>M12</p>
                     </c>
                     <c ca="center">
                        <p>[46]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS9429</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_008021">NC_008021</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,877</p>
                     </c>
                     <c ca="center">
                        <p>M12</p>
                     </c>
                     <c ca="center">
                        <p>[46]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>M1 GAS</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_002737">NC_002737</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,697</p>
                     </c>
                     <c ca="center">
                        <p>M1</p>
                     </c>
                     <c ca="center">
                        <p>[76]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS5005</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_007297">NC_007297</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,865</p>
                     </c>
                     <c ca="center">
                        <p>M1</p>
                     </c>
                     <c ca="center">
                        <p>[77]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS8232</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_003485">NC_003485</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,845</p>
                     </c>
                     <c ca="center">
                        <p>M18</p>
                     </c>
                     <c ca="center">
                        <p>[78,79]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS6180</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_007296">NC_007296</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,894</p>
                     </c>
                     <c ca="center">
                        <p>M28</p>
                     </c>
                     <c ca="center">
                        <p>[80]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS315</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_004070">NC_004070</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,865</p>
                     </c>
                     <c ca="center">
                        <p>M3</p>
                     </c>
                     <c ca="center">
                        <p>[79]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>SSI-1</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_004606">NC_004606</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,861</p>
                     </c>
                     <c ca="center">
                        <p>M3</p>
                     </c>
                     <c ca="center">
                        <p>[81]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS10394</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_006086">NC_006086</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,886</p>
                     </c>
                     <c ca="center">
                        <p>M6</p>
                     </c>
                     <c ca="center">
                        <p>[82]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pneumoniae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>R6</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_003098">NC_003098</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>2,043</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[83]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pneumoniae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>TIGR4</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_003028">NC_003028</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>2,094</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[84]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. mutans</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>UA159</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_004350">NC_004350</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,960</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[85]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>2603V/R</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_004116">NC_004116</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>2,124</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[86]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>A909</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_007432">NC_007432</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,996</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[22]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>NEM316</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_004368">NC_004368</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>2,094</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[9]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>515</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NZ_AAJP00000000">NZ_AAJP00000000</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>WGS</p>
                     </c>
                     <c ca="center">
                        <p>2,275</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[22]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>CJB111</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NZ_AAJQ00000000">NZ_AAJQ00000000</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>WGS</p>
                     </c>
                     <c ca="center">
                        <p>2,197</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[22]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>COH1</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NZ_AAJR00000000">NZ_AAJR00000000</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>WGS</p>
                     </c>
                     <c ca="center">
                        <p>2,376</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[22]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>H36B</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NZ_AAJS00000000">NZ_AAJS00000000</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>WGS</p>
                     </c>
                     <c ca="center">
                        <p>2,376</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[22]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>18RS21</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NZ_AAJO00000000">NZ_AAJO00000000</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>WGS</p>
                     </c>
                     <c ca="center">
                        <p>2,146</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[22]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. suis</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>89/1591</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NZ_AAFA00000000">NZ_AAFA00000000</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>WGS</p>
                     </c>
                     <c ca="center">
                        <p>1,896</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. thermophilus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>CNRZ1066</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_006449">NC_006449</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,915</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[87]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. thermophilus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>LMG 18311</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NC_006448">NC_006448</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>Complete</p>
                     </c>
                     <c ca="center">
                        <p>1,889</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>[87]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. thermophilus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>LMD-9</p>
                     </c>
                     <c ca="left">
                        <p>GenBank:<ext-link ext-link-type="gen" ext-link-id="NZ_AAGS00000000">NZ_AAGS00000000</ext-link></p>
                     </c>
                     <c ca="left">
                        <p>WGS</p>
                     </c>
                     <c ca="center">
                        <p>1,835</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>CDS, number of protein coding sequences; WGS, whole genome shotgun.</p>
               </tblfn>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Venn diagram for six sets of three taxa</p>
               </caption>
               <text>
                  <p>Venn diagram for six sets of three taxa. Above are taxa of the same species and below are taxa of different species. The surfaces are approximately proportional to the number of genes.</p>
               </text>
               <graphic file="gb-2007-8-5-r71-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Accumulation curves for the total number of genes (left) or the number of genes in common (right) given a number of genomes analyzed for the different species of <it>Streptococcus </it>(in blue), the different strains of <it>S. agalactiae </it>(in red) and <it>S. pyogenes </it>(in green)</p>
               </caption>
               <text>
                  <p>Accumulation curves for the total number of genes (left) or the number of genes in common (right) given a number of genomes analyzed for the different species of <it>Streptococcus </it>(in blue), the different strains of <it>S. agalactiae </it>(in red) and <it>S. pyogenes </it>(in green). The vertical bars correspond to standard deviations after repeating one hundred random input orders of the genomes.</p>
               </text>
               <graphic file="gb-2007-8-5-r71-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Frequency of genes within the 26 genomes included in this analysis</p>
               </caption>
               <text>
                  <p>Frequency of genes within the 26 genomes included in this analysis. Genes present in a single genome represent lineage specific genes, while at the opposite end of the scale, genes found in all 26 genomes represent the <it>Streptoccocus </it>core-genome.</p>
               </text>
               <graphic file="gb-2007-8-5-r71-3"/>
            </fig>
            <p>In contrast to the pan-genome estimates, the number of genes in common between the different species within the genus <it>Streptococcus </it>- the core-genome - appears to reach a plateau around 600 genes (Figures <figr fid="F2">2</figr> and <figr fid="F3">3</figr>). Next to the genome specific genes and the genes shared by only two genomes, the genes of the core-genome were the third most common genes (11%; Figure <figr fid="F3">3</figr>), suggesting they form a coherent group. Similarly, the estimated core-genome for <it>S. pyogenes</it>, based on the 11 available strains, plateaus around 1,400 genes. The pattern was less clear for <it>S. agalactiae</it>, where the estimate of core-genome size does not level out, and appears as though it might still be influenced by the inclusion of new genome sequences. On the whole, these analyses suggest that it is possible to delineate a core-genome at both genus and species level. We analyzed four such core-genome data sets: the <it>Streptococcus </it>core-genome (611 genes), and the core-genomes of <it>S. agalactiae </it>(1,472 genes), <it>S. pyogenes </it>(1,376 genes) and <it>S. thermophilus </it>(1,487 genes). To save computation time, the <it>Streptococcus </it>core-genome data set was reduced to ten taxa by keeping only two strains per species for <it>S. agalactiae</it>, <it>S. pyogenes</it>, <it>S. thermophilus </it>(strains A909 and NEM316, MGAS9429 and M1 GAS, and CNRZ1066 and LMG 18311, respectively). After discarding clusters of genes containing paralogs (that is, clusters containing more than one gene per taxon), and alignments with uncertain site homologies, we obtained four data sets containing 260, 1,297, 1,212 and 1,365 genes representing the alignable core-genomes of <it>Streptococcus</it>, <it>S. pyogenes</it>, <it>S. agalactiae</it>, and <it>S. thermophilus</it>, respectively.</p>
            <p>Determinations of the number of genes gained and lost on each of the lineages shows considerable variation (Figure <figr fid="F4">4</figr>) and, in agreement with earlier studies, gene gain was generally considerably greater than gene loss, as well as being particularly evident on external branches <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The lineage in the interspecific analysis showing the greatest gene gain was <it>S. suis</it>, followed closely by <it>S. pneumoniae </it>and <it>S. mutans</it>. Even within a species, between strains, the numbers of genes gained and lost were very high, reaching, for example, values in excess of 150 for gene gain in <it>S. agalactiae </it>strain H36B. High levels of gene gain and loss were evident, even for closely related isolates of the same serotype in <it>S. pyogenes </it>(for example, M1 GAS/MGAS5005; SSI-1/MGAS315; MGAS9429/MGAS2096). Branch lengths of the <it>S. pyogenes </it>concatenated tree were much longer than those for <it>S. agalactiae</it>, suggesting the lineages might be much older; however, despite this there was generally more gene gain on the <it>S. agalactiae </it>branches than on <it>S. pyogenes </it>branches. Large values for duplications were also a feature of the lineage specific evolution (Figure <figr fid="F4">4</figr>). Phylogenetic analysis of several of these cases suggests this is a combination of lineage specific duplications as well as LGT events involving homologous sequences from other species of <it>Streptococcus</it>. When gene gain was penalized with respect to gene loss (for example, <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>), not surprisingly, it globally decreased the number of gene gains and increased the number of gene losses (Additional data file 3) and, as a consequence, increased the number of genes in the pan-genomes of ancestral nodes (data not shown). Nevertheless, even with a penalty, gene gain remained in excess of gene loss on some lineages (Additional data file 3).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Gene gain, loss and duplication, and positive selection</p>
               </caption>
               <text>
                  <p>Gene gain, loss and duplication, and positive selection. Core-genome phylogenies of <it>Streptococcus </it>(left), <it>S. agalactiae </it>(middle), and <it>S. pyogenes </it>(right) based on concatenated genes. Dashed lines correspond to unresolved branches. Numbers adjacent to angle brackets facing the branch refer to genes gained, opposite direction - genes lost, and '&#215;' refers to duplicated loci. Values correspond to the most parsimonious unambiguous changes, following an equally penalized model (that is, gain, loss and duplication events cost the same numbers of changes). Numbers adjacent to the red dot correspond to the number of genes under positive selection within the core-genome, on a particular lineage.</p>
               </text>
               <graphic file="gb-2007-8-5-r71-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Recombination</p>
            </st>
            <sec>
               <st>
                  <p>Between species of <it>Streptococcus</it></p>
               </st>
               <p>The results of the approximately unbiased (AU) test indicated that 39 out of 260 genes rejected the concatenated tree. The <it>p </it>value heatmap (Figure <figr fid="F5">5a</figr>) indicates that some gene trees showed the same or very similar histories, depicted by groups of topologies with a similar <it>p </it>value pattern (for example, topologies 1 to 47, and 48 to 65). On the other hand, a small group of genes rejected most topologies (that is, genes 230 to 260, read horizontally in Figure <figr fid="F5">5a</figr>), and at the same time, their trees were rejected by most of the genes (that is, topologies 230 to 260, read vertically in Figure <figr fid="F5">5a</figr>). Although different topologies were supported by various groups of genes, the majority of genes did not reject the concatenated tree and only a small subset of genes proposed significantly different trees. The analysis of bipartitions (Figure <figr fid="F5">5b</figr>) demonstrated that the vast majority of genes supported three distinct bipartitions, corresponding to the monophyly of <it>S. pyogenes</it>, <it>S. pneumoniae </it>and <it>S. thermophilus </it>(bipartitions 28, 29, and 30, respectively). Also generally supported were the monophyly of <it>S. agalactiae</it>, the monophyly of the group <it>S. pneumoniae </it>+ <it>S. suis</it>, and the monophyly of the group <it>S. agalactiae </it>+ <it>S. pyogenes </it>(bipartitions 27, 26 and 25, respectively). Several other bipartitions were only supported by some genes (for example, bipartition 19, corresponding to the grouping of <it>S. pneumoniae </it>with <it>S. thermophilus</it>), while others were only supported by one or a few genes (for example, bipartition 10 and 11). The well supported conflicting bipartitions figure (Figure <figr fid="F5">5c</figr>) is a summary of the <it>p </it>value heatmap (Figure <figr fid="F5">5a</figr>) and bipartition analyses (Figure <figr fid="F5">5b</figr>). A majority of the genes (around 150 out of 260) show no conflict with each other. Most of them support the monophyly of the different species and the lineage <it>S. pneumoniae </it>+ <it>S. suis</it>, and most of them do not reject the concatenated gene tree. Another set of genes showed some instances of conflict with the aforementioned set of 150, but most of them were in conflict with each other. They tend to support the same principal groups as the set of 150, with a few additional bipartitions that are conflicting. A final group of genes conflict with the first and the second group, as well as with each other, corresponding to genes that rejected most of the other gene trees in the AU test (Figure <figr fid="F5">5a</figr>) and that provide support for rare bipartitions; genes of this set have strongly incongruent histories with the other genes (for a detailed list, see Additional data file 4). The topologies used to test for positive selection were the concatenated gene tree for the genes that don't reject it, and individual gene trees for those loci that do reject the concatenated tree.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p><it>Streptococcus </it>recombination heatmaps</p>
                  </caption>
                  <text>
                     <p><it>Streptococcus </it>recombination heatmaps. Heatmaps of the (a) AU test, (b) bipartitions bootstrap scores and (c) well supported conflicting bipartitions on the core-genome of <it>Streptococcus</it>. Topologies are ordered from the less rejected (on the left) to the most rejected (on the right). Bipartitions are ordered from the less supported (on the left) to the most supported (on the right), and only bipartitions supported by at least a 70% bootstrap score are represented. Genes are ordered from the less conflicting (left and top) to the most conflicting (right and bottom). The well supported conflicting bipartitions heatmap represents a symmetrical distances matrix, where each cell corresponds to the number of well supported (that is, bootstrap &#8805;90) conflicting bipartitions between two genes. A color key is given on the right side, and gradations correspond to <it>p </it>values, bootstrap percentages, and number of conflicting bipartitions, left to the right respectively. The arrow locates the concatenated tree.</p>
                  </text>
                  <graphic file="gb-2007-8-5-r71-5"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Within <it>S. agalactiae</it></p>
               </st>
               <p>The concatenated gene tree was rejected by 750 genes of the core-genome of <it>S. agalactiae</it>. On the whole, most genes rejected most of the other gene trees (Figure <figr fid="F6">6a</figr>), although there were also some genes that did not reject the majority of gene trees. There were no commonly well supported bipartitions across the genes (Figure <figr fid="F6">6b</figr>). Around half of the genes provided either no, or only weak, bootstrap support for any bipartition (genes 1 to 560; Figure <figr fid="F6">6b</figr>), while the rest of the genes supported different sets of bipartitions. The most commonly supported groups of strains were 515+NEM316, A909+H36B, 515+NEM316+COH1, A909+CJB111+H36B, A909+CJB111+H36B, and 515+COH1 (bipartitions 75 to 70, respectively; Figure <figr fid="F6">6b</figr>). Additional, numerous bipartitions were supported by only one or a few genes. Because they possessed a too limited phylogenetic signal, around half of the genes (genes 1 to 560) showed no conflict with any of the other genes (Figure <figr fid="F6">6c</figr>). Although the AU test suggested that some of these genes have different histories, it is difficult to reach any definitive conclusions about the congruence of these gene histories since phylogenetic signal was so limited or absent (genes with no sequence divergence between strains).</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p><it>S. agalactiae </it>recombination heatmaps</p>
                  </caption>
                  <text>
                     <p><it>S. agalactiae </it>recombination heatmaps. The layout is the same as Figure 5 but for the core-genome of <it>S. agalactiae</it>.</p>
                  </text>
                  <graphic file="gb-2007-8-5-r71-6"/>
               </fig>
               <p>The second half of the core-genome can be split into two groups. The first group contains genes that have some conflict with each other, and that tend to support the six bipartitions described earlier, plus three additional ones. The second group contained genes that were largely in conflict with each other, and with the preceding group. This latter group provided support for a number of rarely supported bipartitions. While the first group contained genes that had only partly incongruent histories (only a few bipartitions in conflict), genes of the last group had more incongruent gene histories (greater number of bipartitions in conflict). Given these results, and the ambiguity of defining which genes had the same history, we analyzed each gene with its own gene tree in the subsequent positive selection analyses.</p>
            </sec>
            <sec>
               <st>
                  <p>Within <it>S. pyogenes</it></p>
               </st>
               <p>As for <it>S. agalactiae</it>, while a few genes rejected nothing, the majority of genes rejected the other gene trees (Figure <figr fid="F7">7a</figr>). Three bipartitions were generally supported, although not always, and with various bootstrap scores, corresponding with serotype groupings: MGAS5005+M1 GAS, MGAS315+SSI-1, and MGAS2096+MGAS9429 (bipartitions 131 to 129, respectively; Figure <figr fid="F7">7b</figr>). A total of 434 genes tended to also provide support for various unique bipartitions. Around half of the genes had weak or no phylogenetic signal, and, as a consequence, had no conflict with any other trees (Figure <figr fid="F7">7b</figr>). A set of around 200 genes, most of which supported the three bipartitions detailed above, tended not to conflict with each other, but occasionally with the final grouping of genes. This latter group was composed of the 434 genes mentioned above, which supported variously different bipartitions, and thus tended to be in conflict with each other. Overall, the <it>S. pyogenes </it>core-genome is composed of genes that are largely congruent for a portion of relatively recent history (that is, the serotype monophyly), while one-third of the core-genome appears to have strongly incongruent histories for older events. Because it appeared difficult to define which genes were likely to have the same history, we analyzed each gene with its own gene tree in the subsequent positive selection analyses.</p>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p><it>S. pyogenes </it>recombination heatmaps</p>
                  </caption>
                  <text>
                     <p><it>S. pyogenes </it>recombination heatmaps. The layout is the same as Figure 5 but for the core-genome of <it>S. pyogenes</it>.</p>
                  </text>
                  <graphic file="gb-2007-8-5-r71-7"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Substitution analysis of recombination</p>
               </st>
               <p>The pairwise homoplasy index (PHI) approach suggested that around 20% of the genes were recombinant within the core-genome of <it>Streptococcus </it>and <it>S. pyogenes</it>, while within <it>S. agalactiae </it>only about 3% of the genes were recombinant (Table <tblr tid="T2">2</tblr>). Employing a more conservative approach that considers as recombinant only those genes found by three different substitution approaches (PHI, MaxChi and neighbor similarity score (NSS)), these proportions were reduced, but the relative differences between the data sets remained (Table <tblr tid="T2">2</tblr>). With the phylogenetic approach detailed above, numerous genes had weak phylogenetic signal, and several groups of genes were only partially incongruent; therefore, it can be difficult to define clearly which genes have different histories. It is, however, possible to adopt a conservative approach that considers as putative recombinants only those genes with strong phylogenetic incongruence (SPI), with most of the other genes. Nevertheless, only a small proportion of genes was identified by both PHI and SPI approaches as putative recombinants (Table <tblr tid="T2">2</tblr>), suggesting that each approach tends to identify different types of recombination event. We therefore propose that an estimate of the complete set of putative recombinants can best be considered as the set of genes identified by SPI plus the genes identified by all three substitution recombination methods (Table <tblr tid="T2">2</tblr>). This yields an estimate of 18% of the core-genome for <it>S. agalactiae </it>as putative recombinants, 19% for the genus <it>Streptococcus</it>, and 37% for <it>S. pyogenes</it>.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>Number of genes showing evidence of recombination</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>1. SPI</p>
                        </c>
                        <c ca="center">
                           <p>2. PHI</p>
                        </c>
                        <c ca="center">
                           <p>3. PHI &#8745; MaxChi &#8745; NSS</p>
                        </c>
                        <c ca="center">
                           <p>1 &#8745; 2</p>
                        </c>
                        <c ca="center">
                           <p>1 &#8745; 3</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Between species</p>
                        </c>
                        <c ca="center">
                           <p>26 (10.0%)</p>
                        </c>
                        <c ca="center">
                           <p>54 (20.8%)</p>
                        </c>
                        <c ca="center">
                           <p>35 (13.5%)</p>
                        </c>
                        <c ca="center">
                           <p>11 (4.2%)</p>
                        </c>
                        <c ca="center">
                           <p>53 (19.2%)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>S. pyogenes</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>434 (33.5%)</p>
                        </c>
                        <c ca="center">
                           <p>284 (21.9%)</p>
                        </c>
                        <c ca="center">
                           <p>168 (12.9%)</p>
                        </c>
                        <c ca="center">
                           <p>186 (14.3%)</p>
                        </c>
                        <c ca="center">
                           <p>477 (36.8%)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>S. agalactiae</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>222 (18.3%)</p>
                        </c>
                        <c ca="center">
                           <p>34 (2.8%)</p>
                        </c>
                        <c ca="center">
                           <p>7 (0.6%)</p>
                        </c>
                        <c ca="center">
                           <p>18 (1.5%)</p>
                        </c>
                        <c ca="center">
                           <p>223 (18.4%)</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Positive selection analysis</p>
            </st>
            <p>The number of genes that showed evidence for positive selection was particularly high within the <it>Streptococcus </it>core-genome (between 10% and 40%; Table <tblr tid="T3">3</tblr>). The <it>S. pneumoniae </it>and <it>S. suis </it>lineages, and the ancestral lineage leading to these two species, exhibited the greatest proportion of the core-genome evolving under positive selection (28%, 34% and 32%, respectively; Table <tblr tid="T3">3</tblr>). Approximately one-third of the genes showed positive selection on only one lineage, and no gene was selected in all possible lineages (Figure <figr fid="F8">8</figr>). There were, however, many examples of genes selected on multiple lineages, including several genes selected on as many as 5 (12 genes) or 6 (4 genes) different lineages (Figures <figr fid="F8">8</figr> and <figr fid="F9">9</figr>; see Additional data file 5 for a complete list of all genes and lineages under positive selection). A significant proportion of positively selected genes for <it>S. suis</it>, <it>S. pneumoniae</it>, and <it>S. thermophilus </it>was uniquely selected on each of these lineages (21%, 19%, and 24%, respectively), in contrast to that for <it>S. agalactiae</it>, <it>S. pyogenes</it>, and <it>S. mutans</it>, which had either no uniquely selected loci (<it>S. agalactiae</it>), or a very small proportion (Figure <figr fid="F9">9</figr>). Analysis of variance of genes under positive selection pressure supported a significant effect of both lineage and biochemical main role category (Table <tblr tid="T4">4</tblr>). <it>Post hoc </it>multiple comparisons showed that the main effect was due to two categories, 'DNA metabolism' and 'Transcription'. Less strongly supported, but still significant, was the interaction between lineages and main role categories (Table <tblr tid="T4">4</tblr>). This interaction appeared mainly due to an increase of genes under positive selection for loci involved in transcription, protein fate, protein synthesis and DNA metabolism for the <it>S. pneumoniae-S. suis </it>ancestral lineage and the <it>S. suis </it>lineage.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Genes under positive selection</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Data set</p>
                     </c>
                     <c ca="left">
                        <p>Lineage</p>
                     </c>
                     <c ca="center">
                        <p>n</p>
                     </c>
                     <c ca="center">
                        <p>PS</p>
                     </c>
                     <c ca="center">
                        <p>%</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Streptococcus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>S. mutans</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>260</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>12.69</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>S. pneumoniae</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>260</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>28.08</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>S. suis</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>260</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>34.23</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>S. thermophilus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>260</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                     <c ca="center">
                        <p>23.46</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>260</p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>10.77</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>260</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>16.92</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>(<it>S. pneumoniae</it>, <it>S. suis</it>)</p>
                     </c>
                     <c ca="center">
                        <p>221</p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                     <c ca="center">
                        <p>32.13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. agalactiae</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>COH1</p>
                     </c>
                     <c ca="center">
                        <p>1,212</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>0.58</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>18RS21</p>
                     </c>
                     <c ca="center">
                        <p>1,212</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0.00</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>NEM316</p>
                     </c>
                     <c ca="center">
                        <p>1,212</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.08</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>H36B</p>
                     </c>
                     <c ca="center">
                        <p>1,212</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.08</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>A909</p>
                     </c>
                     <c ca="center">
                        <p>1,212</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0.00</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>2603V/R</p>
                     </c>
                     <c ca="center">
                        <p>1,212</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.08</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>CJB111</p>
                     </c>
                     <c ca="center">
                        <p>1,212</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.08</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>515</p>
                     </c>
                     <c ca="center">
                        <p>1,212</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. pyogenes</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>MGAS10270</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MGAS10394</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MGAS10750</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.08</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MGAS2096</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.08</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MGAS315</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0.00</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MGAS5005</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.08</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MGAS6180</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>0.15</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MGAS8232</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>0.31</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MGAS9429</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>0.15</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>M1 GAS</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0.00</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SSI-1</p>
                     </c>
                     <c ca="center">
                        <p>1,297</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0.00</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>(MGAS9429, MGAS2096)</p>
                     </c>
                     <c ca="center">
                        <p>925</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>(MGAS5005, M1 GAS)</p>
                     </c>
                     <c ca="center">
                        <p>978</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>(SSI-1, MGAS315)</p>
                     </c>
                     <c ca="center">
                        <p>983</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. thermophilus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>CNRZ1066</p>
                     </c>
                     <c ca="center">
                        <p>1,365</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>LGM 18311</p>
                     </c>
                     <c ca="center">
                        <p>1,365</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>LMD-9</p>
                     </c>
                     <c ca="center">
                        <p>1,365</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>1.03</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>PS, positive selection.</p>
               </tblfn>
            </tbl>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Analysis of variance for the effect of the lineages and role categories</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Df</p>
                     </c>
                     <c ca="center">
                        <p>Sum Sq</p>
                     </c>
                     <c ca="center">
                        <p>Mean Sq</p>
                     </c>
                     <c ca="center">
                        <p>F value</p>
                     </c>
                     <c ca="center">
                        <p><it>p </it>value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Lineage</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>2,954</p>
                     </c>
                     <c ca="center">
                        <p>492</p>
                     </c>
                     <c ca="center">
                        <p>23.9</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Main role</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>1,086</p>
                     </c>
                     <c ca="center">
                        <p>109</p>
                     </c>
                     <c ca="center">
                        <p>5.27</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Interaction</p>
                     </c>
                     <c ca="center">
                        <p>60</p>
                     </c>
                     <c ca="center">
                        <p>1,974</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>1.6</p>
                     </c>
                     <c ca="center">
                        <p>0.003</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Residuals</p>
                     </c>
                     <c ca="center">
                        <p>1,699</p>
                     </c>
                     <c ca="center">
                        <p>35,005</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Df, degree of freedom.</p>
               </tblfn>
            </tbl>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Frequency of positive selection</p>
               </caption>
               <text>
                  <p>Frequency of positive selection. Numbers of genes showing evidence of positive selection in 1-7 lineages.</p>
               </text>
               <graphic file="gb-2007-8-5-r71-8"/>
            </fig>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Positive selection occurrence per genes and lineages</p>
               </caption>
               <text>
                  <p>Positive selection occurrence per genes and lineages. A black dot indicates positive selection. The genes and lineages were ordered following a correspondence analysis.</p>
               </text>
               <graphic file="gb-2007-8-5-r71-9"/>
            </fig>
            <p>In addition to identifying genes and lineages under positive selection, the branch-site test also identifies sites using a Bayes empirical Bayes approach <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. For 91% of the genes under positive selection, specific sites were proposed (posterior probability >0.95). Interestingly, when a gene was independently selected on different lineages, the sites under positive selection were generally not the same across lineages, arguing for different selection pressure located at different sites. In contrast to the interspecific comparisons, positive selection was evident for only a few genes within the core-genome, across strains of the different <it>Streptococcus </it>species (Table <tblr tid="T3">3</tblr>, Additional data file 5), including a few lineages that showed slightly increased levels of positive selection relative to the rest. For <it>S. agalactiae </it>the exceptional lineage was COH1, for <it>S. pyogenes </it>the exceptional lineages were MGAS10270 and that leading to SSI-1/MGAS315, and for <it>S. thermophilus </it>it was LMD-9. A significant number of genes evolving under positive selection were also judged as putative recombinants (Table <tblr tid="T5">5</tblr>). This was particularly true for the <it>S. pyogenes </it>genome, where 78% of the genes under positive selection were putative recombinants. Approximately half of these genes were identified as recombinants by the substitution based recombination methods, and the other half by the phylogenetic approach.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Total number of genes showing evidence of positive selection and recombination</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Data set</p>
                     </c>
                     <c ca="center">
                        <p>PS</p>
                     </c>
                     <c ca="center">
                        <p>PS and R</p>
                     </c>
                     <c ca="center">
                        <p>PS and SPI</p>
                     </c>
                     <c ca="center">
                        <p>PS and IR</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Between species</p>
                     </c>
                     <c ca="center">
                        <p>175</p>
                     </c>
                     <c ca="center">
                        <p>43 (25%)</p>
                     </c>
                     <c ca="center">
                        <p>20 (8%)</p>
                     </c>
                     <c ca="center">
                        <p>29 (11%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>S</it>.<it>agalactiae</it></p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>4 (40%)</p>
                     </c>
                     <c ca="center">
                        <p>4 (40%)</p>
                     </c>
                     <c ca="center">
                        <p>0 (0%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>S</it>.<it>pyogenes</it></p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>25 (78%)</p>
                     </c>
                     <c ca="center">
                        <p>21 (65%)</p>
                     </c>
                     <c ca="center">
                        <p>17 (53%)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>PS, positive selection; R, recombination.</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Core-genome, pan-genome, and recombination</p>
            </st>
            <p>We estimate that the pan-genome of the genus <it>Streptococcus </it>probably exceeds at least three times the average genome size of a typical <it>Streptococcus </it>species. This huge variability in gene content between species is also evident in comparisons across strains of the same species. Our prediction for the <it>S. agalactiae </it>pan-genome is in general agreement with that of Tettelin <it>et al</it>. <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. The marked difference in estimated pan-genome size for these two species may be a reflection of their habitat differences. The human oral-nasal mucosa is the primary habitat for <it>S. pyogenes</it>, whereas <it>S. agalactiae </it>was first identified as a bacteria linked to bovine mastitis, and later in humans, where it colonizes the lower gastrointestinal tract and vaginal epithelium of healthy adults. This apparent broader habitat range for <it>S. agalactiae</it>, and presumably, therefore, a greater available gene pool for lateral gene transfer, could explain the difference in pan-genome size of these two species.</p>
            <p>The pronounced evolutionary flexibility of these bacterial genomes is further evident in the determinations of gene gain, loss and duplication on each of the respective lineages. Gene gain figures were generally higher for <it>S. agalactiae </it>than for <it>S. pyogenes</it>, despite the fact that branch lengths suggest the <it>S. pyogenes </it>lineages may be older, and is likely a consequence of the overall smaller pan-genome size for <it>S. pyogenes</it>. For some species, gene gain figures exceeded 20% of the total gene content for the organism. Our results in this regard are in general agreement with those of Hao and Golding <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, while also extending the estimates to additional taxa of <it>Streptococcus</it>, and lineages of <it>S. agalactiae </it>and <it>S. pyogenes</it>, and we would certainly concur with these authors that much of this gene gain likely reflects species specific adaptation. In our opinion, a plausible explanation of the discrepancy between gene gain and loss is that much of the pan-genome remains unsampled, and, therefore, we simply cannot detect many gene loss events, resulting in an underestimate of that category. In several genome reconstruction analyses, gene gain is penalized with respect to gene loss (for example, <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>). This procedure logically results in an increase of gene loss relative to gene gain. It has the direct consequence of increasing ancestral genome sizes, and reducing the importance of LGT events to the benefit of genome reduction processes. Nevertheless, this approach is questionable when one attempts to reconstruct genome composition, and in particular, for the case of <it>Streptococcus</it>. First, because the genome size is relatively stable in <it>Streptococcus</it>, the ancestral genome sizes were arguably of the same order. Second, the number of genome specific genes is so common that there is little reason to postulate that gene gain is less probable than gene loss. Therefore, at least in the case of <it>Streptococcus</it>, we question the validity of an approach that favors genome reduction processes over LGT. In that regard, the development of probabilistic models will be highly valuable to estimate rates of gene gain and loss (for example, <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B29">29</abbr></abbrgrp>).</p>
            <p>In contrast to the pan-genome, the core-genome size of the taxonomic groups included in our analysis are much better estimated. Within <it>Streptococcus</it>, we characterized several core-genomes, the composition of which depended on the taxonomic level considered. Our prediction of core-genome composition for <it>S. agalactiae </it>was in general agreement with Tettelin <it>et al</it>. <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> and Brochet <it>et al</it>. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. The slight differences in the absolute numbers between the three studies are due to differences in methodology used to define orthology <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, or the use of DNA microarray hybridization data <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. At the genus level, the core-genome corresponded to 25% of a typical <it>Streptococcus </it>genome, while at the species level it represented around 60% of the genome. Earlier studies involving other groups of bacteria have suggested that such core-genomes may be relatively free of recombination <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. If you consider the union of both substitution based methods and phylogenetic based methods we estimate that around 18% of the core genome of the genus <it>Streptococcus </it>is recombinant and as much as 35% of the genome of <it>S. pyogenes</it>. In addition to the fact that we are analyzing a different group of taxa, and thus levels of recombination might well be expected to be different, our results differ from these earlier estimates, also because of approach. We concur with the comments of Susko <it>et al</it>. <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> that attempts to evaluate phylogenetic congruence of core-genes need to involve comparisons with as many relevant topologies as possible, and not just that favored by the concatenated topology, some variants of it, or particular canonical markers such as the small subunit rRNA. In doing this, however, we think it is important to keep in mind that topologies may be rejected as being congruent, even though the genes may provide little phylogenetic signal, and, thus, only that proportion of the gene trees that is rejected based on strongly supported conflicting nodes should be regarded as incongruent. This was particularly evident in our analysis of <it>S. pyogenes</it>, where the vast majority of topologies reject one another despite the fact that at least half of these genes have little or no phylogenetic signal. Furthermore, our assessment of core genome recombination also differs from some of these earlier studies by the inclusion of substitution based approaches to recombination detection. There was little overlap in the loci identified as putative recombinants using both the phylogenetic and the substitution based approaches, suggesting they were identifying very different types of recombination. The substitution based approaches are likely to detect homologous recombination of smaller pieces of DNA that could be missed by a phylogenetic approach. The more restricted habitat distribution for <it>S. pyogenes </it>may also be an explanation for the elevated amounts of recombination in the core-genome of that species. A more reduced gene pool for possible recombination would not only result in a smaller pan-genome size (as suggested above for <it>S. pyogenes</it>), but it would also result in the propensity for more homologous recombination of core-genome components, at least partly because the relative proportion of conspecific donor pieces of DNA are likely to be greater.</p>
         </sec>
         <sec>
            <st>
               <p>Positive selection in the core-genome</p>
            </st>
            <p>A logical surmise often made in studying pathogen evolution is that much of the host specific adaptation that a bacterial species exhibits will be associated with its species specific genes. Perhaps as a consequence of this common sense viewpoint, adaptation within the core genome has received much less attention. Our analysis reveals that during the diversification of the genus <it>Streptococcus </it>there has been significant amounts of positive selection pressure on core genome components and that this selection pressure has occurred disproportionately in certain lineages, and biochemical categories. Such an important positive selection signature within the core-genome of <it>Streptococcus </it>is perhaps somewhat surprising, as these predominately housekeeping genes might be expected to evolve under strong purifying selection. At the same time, we would argue that many of these genes are undoubtedly related to the colonization, persistence, survival, and propensity to cause disease in these organisms and thus are associated with the adaptive specifics of the bacteria.</p>
            <p>Based on currently available phylogenies for the genus <it>Streptococcus </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, there appears to be a rough correlation with the relative divergence of taxa and the level of positive selection detected in different lineages. For example, <it>S. pyogenes </it>and <it>S. agalactiae </it>are both members of the Pyogenes taxonomic group and there are fewer numbers of genes selected on these lineages than for taxa from different taxonomic groups. Several genomes also tend to resemble one another in relation to the genes that were positively selected, while others, such as <it>S. suis </it>and <it>S. thermophilus</it>, exhibit higher levels of specific adaptation. <it>S. suis </it>was also the species with the largest number of positively selected genes in its core-genome, relative to the other lineages, and the genome that had the greatest amount of gene gain and loss incurred since the separation of the <it>S. pneumoniae-S. suis </it>common ancestor. This suggests a lineage that has been under strong selection pressure, both with regard to acquiring new genes and with regard to the sequence characteristics of the core genome components. This selection pressure is undoubtedly correlated with particular characteristics of the host species, swine. Current phylogenies support <it>S. acidominimus </it>as the sister group to <it>S. suis </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, a species associated with bovine vagina, the skin of calves, and raw milk, suggesting the possibility of host divergence in one or another (or possibly both) lineage, subsequent to their split from a common ancestor. <it>S. suis </it>is also known as an occasional zoonotic pathogen of humans, causing septicemia, meningitis, endocarditis <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, and, most recently, streptococcal toxic shock syndrome <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Our analysis suggests an apparent evolutionary flexibility of the <it>S. suis </it>genome that could perhaps be related to this propensity for host jumping.</p>
            <p>In addition to the lineage effect in the interspecific selection analysis, there was also an effect of biochemical roles, with DNA metabolism and transcription significantly different from the majority of the other categories and correlated with higher incidence of positive selection. An excess of positive selection in the genes related to transcription is perhaps surprising, as these genes are generally well conserved and are known to be particularly recalcitrant to recombination. Furthermore, the <it>S. suis </it>and <it>S. pneumoniae-S. suis </it>ancestral lineages showed a disproportionate amount of positively selected genes for several biochemical categories. Thus, the extent of adaptive molecular evolution varies across lineages and gene roles, undoubtedly reflecting the habitat heterogeneity of the genus <it>Streptococcus</it>. Our results also reveal that positive selection was partly linked to recombination. Recombination can create artifactual positive selection results <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, particularly for genes showing evidence of intragenic recombination. This is less probable for genes identified as recombinant using the phylogenetic recombination detection procedure, because these loci were analyzed for positive selection with their own phylogenies, thereby taking into account possible LGT events. It is possible that many LGT loci could also be positively selected because of strong selective forces that might be expected to act on newly acquired genes. Marri <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> found that many species specific genes within the <it>Streptococcus </it>genus showed evidence of adaptive evolution, and concluded that LGT played an important adaptive role.</p>
            <p>Although a good deal of positive selection pressure was evident in the analysis of different lineages within the genus <it>Streptococcus</it>, there was much less evidence for positive selection between the different lineages of <it>S. agalactiae</it>, <it>S. pyogenes</it>, and <it>S. thermophilus</it>. Nevertheless, these absolute numbers of genes under positive selection are not corrected for depth of ancestry. In other words, they do not reflect rates of adaptative mutations for specific lineages; instead, they represent the core-genome fraction that has participated in the adaptation of a specific taxon. There were, however, several lineages in each of these species that had slightly elevated levels of selection relative to the rest. In <it>S. pyogenes</it>, for example, the lineage leading to the M3 serotype had nine genes under positive selection, while the majority of other lineages had two or less (see Additional data file 5 for a complete description of these genes). Compared to other M types, serotype M3 strains cause more cases of invasive disease, such as necrotizing fasciitis, bacteremia, and streptococcal toxic shock syndrome <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>, a higher rate of lethal infections <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>, and exhibit occasional epidemic tendency <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. The nine genes we identified as positively selected included loci implicated as virulence determinants in other species of <it>Streptococcus</it>, such as <it>adenylosuccinate lyase</it>, a homotetramer that catalyzes two discrete reactions in the <it>de novo </it>synthesis of purines, and has recently been implicated as a virulence factor for infective endocarditis, a serious endovascular infection caused by <it>Streptococcus sanguinis </it><abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, as well as genes involved in cell envelope, such as <it>UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate-D-alanyl-D-alanyl ligase</it>, demonstrated to be integral to peptidoglycan biosynthesis and cell growth <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. In addition to the evaluation of genes unique to M3 strains (for example, <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>), we suggest that these nine core-genome loci uniquely under positive selection pressure in this M3 lineage should be considered as putatively important in the unique pathogenic features of M3 strains.</p>
            <p>In the case of <it>S. agalactiae</it>, the lineage that stood out from the rest with regard to levels of positive selection pressure was COH1, which is serotype III, ST17, significantly associated with neonatal invasive disease <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, and is hypothesized to have recently arisen from a bovine ancestor <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. The seven genes selected on this lineage include a number of loci either already implicated in virulence in other bacteria, or for which there is some reason to suspect them as candidate virulence loci (see Additional data file 5 for a full description of these genes). For example, <it>adenylosuccinate lyase</it>, discussed above with regard to <it>S. pyogenes</it>, was also positively selected in this lineage. <it>Phosphate acetyltransferase </it>was uniquely selected along this COH1 lineage, and has recently been implicated in virulence in <it>Salmonella enterica </it><abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. Also selected were a protein involved in the cell envelope, as well as two different ABC transporters. The cell envelope is a key overall component of virulence in <it>S. agalactiae </it><abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. ABC transporters have been known for some time to be efflux mechanisms of drug resistance, although such efflux pumps are now also known to have physiological roles, conferring resistance to natural substances produced by the host, as well as possible roles in pathogenicity <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Perhaps the proposed recent shift in host preference from bovine to human for this lineage <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> is facilitated by molecular adaptation of ABC transporters that confer resistance to natural substances of the new host.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The research presented here employs a comparative genomics approach to define the core-genome component of the genus <it>Streptococcus</it>, as well as that of <it>S. agalactiae</it>, <it>S. pyogenes</it>, and <it>S. thermophilus</it>. We then assess levels of recombination and positive selection pressure in this core-genome for each of these taxonomic groups. Concomitant with these assessments of core-genome were estimates of the pan-genome size of each of these groups, and levels of gene gain, loss and duplication on each of the lineages.</p>
         <p>The pan-genome size of <it>S. pyogenes </it>appears to be quite well estimated with the 11 sequences currently available, and is approximately 2,500 genes. The pan-genome size of <it>S. agalactiae </it>is less well estimated with available sequence data and is in excess of 2,800 genes. Similarly, the pan-genome size of the genus <it>Streptococcus </it>is not accurately estimated with the 26 genomes analyzed here, and is in excess of 5,300 genes. We suggest that the broader habitat range for <it>S. agalactiae </it>may provide a greater available gene pool for lateral gene transfer, and could explain the difference in pan-genome size of <it>S. agalactiae </it>and <it>S. pyogenes</it>.</p>
         <p>The core-genome components of each of these taxonomic groups is much better represented, and contrary to some earlier studies involving other groups of bacteria, which have suggested that such core-genomes may be relatively free of recombination, we estimate that around 18% of the core-genome of the genus <it>Streptococcus </it>is recombinant and as much as 35% of the genome of <it>S. pyogenes</it>. An explanation for the greater amount of recombination in <it>S. pyogenes </it>may be related to the more restricted habitat distribution for <it>S. pyogenes</it>, which would result in the propensity for more homologous recombination of core-genome components because the relative proportion of conspecific donor pieces of DNA is likely to be greater. Positive selection across the core-genome was particularly evident in the analysis of the different species within the genus <it>Streptococcus</it>, and it occurred disproportionately in certain lineages, as well as biochemical categories. <it>S. suis </it>was the lineage that showed the greatest positive selection pressure, the largest number of loci uniquely selected, and the lineage that had the greatest amount of gene gain and loss. In addition to the lineage effect in the interspecific selection analysis, there was also an effect of biochemical role, with genes related to DNA metabolism and transcription showing a significantly higher number of genes under positive selection. Contrary to the interspecific analysis, the selection analysis on individual species supported much less evidence for positive selection, but suggested there were particular lineages in each species that had experienced more core-genome selection pressure than the others. In the case of <it>S. pyogenes </it>this was the lineage leading to the M3 serotype, and we suggest that the nine genes identified as positively selected should be considered as putatively important in the unique pathogenic features of M3 strains. In the case of <it>S. agalactiae </it>the lineage with the disproportionate selection pressure was COH1, which is known to be significantly associated with neonatal invasive disease, and is hypothesized to have recently arisen from a bovine ancestor. We suggest that this proposed recent host jump from bovine to human for this lineage could be the explanation for the greater amount of selection pressure observed in this genome.</p>
         <p>Overall, this study indicates that there has been considerable recombination and positive selection pressure in the diversification of the <it>Streptococcus </it>core-genome, particularly at the interspecific level. Positive selection seems to be of principal importance in species differentiation and adaptation to new hosts, while it plays a less important role during strain evolution, where the process may be too slow to facilitate rapid strain adaptation. On the other hand, the process of recombination, through either LGT or homologous intragenic recombination, involving both the core-genome and the pan-genome, appeared to be of main importance at a variety of evolutionary time scales. It seems likely that recombination is a more efficient means of change, ultimately making it the more universal process of <it>Streptococcus </it>adaptation. Although the cause-effect explanations are not necessarily clear for many of the genes that we identify as positively selected, it is nonetheless important to realize that positive selection of these genes indicates such loci have important functions, which in many instances may be integral to the unique adaptive features of each lineage. Several recent studies have harnessed the power of modern molecular selection analyses to direct functional experimentation based on the resulting molecular evolutionary hypotheses (see for example, <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr></abbrgrp>). It is our hope and intention that the identification and cataloging of these loci (Additional data file 5) for this and other groups of bacteria will serve as an evolutionary shortcut for others to design laboratory mutation experiments to assess the specific functional significance of these genes.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Ortholog retrieval</p>
            </st>
            <p>Twenty six genomes of <it>Streptococcus </it>were downloaded from GenBank (Table <tblr tid="T1">1</tblr>), representing six different species. Coding sequences were extracted from GenBank files, and orthologs were determined using OrthoMCL <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. This program first makes an all-against-all BLASTp, and then defines putative pairs of orthologs or recent paralogs based on reciprocal BLAST. Recent paralogs are identified as genes within the same genome that are reciprocally more similar to each other than any sequence from another genome. OrthoMCL then converts the reciprocal BLAST <it>p </it>values to a normalized similarity matrix that is analyzed by a Markov Cluster algorithm (MCL) <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. In return, the MCL yields a set of clusters, with each cluster containing a set of orthologs and/or recent paralogs. OrthoMCL was run with a BLAST E-value cut-off of 1e-5, and an inflation parameter of 1.5. We used the OrthoMCL output to construct a table describing genome gene content (Additional data file 1). Genes that were not included in a cluster were considered taxon specific genes only if they were at least 50 amino acids long and had no BLAST hit with any other protein (E-value &#8804; 1e-10). Preliminary analysis indicated that many truncated proteins found at the ends of contigs of the incomplete genomes, although exhibiting clear evidence of homology, were not included in any cluster because they had weak or no reciprocal BLAST hit. This table was used to plot venn diagrams with R 2.2.1 <abbrgrp><abbr bid="B56">56</abbr></abbrgrp> and to construct four core-genome data-sets corresponding to the following taxa: genus <it>Streptococcus</it>, <it>S. agalactiae</it>, <it>S. pyogenes</it>, and <it>S. thermophilus</it>.</p>
            <p>Gene loss, acquisition, and duplication were determined on all branches of trees involving these taxa using the parsimony criterion with the DelTran option, implemented in Paup 4.0b10 <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. Ancestral gene state reconstruction was run with two different step-matrices: the first one assesses gene gain, loss and duplication as being equally likely (for example, <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>), while the second penalizes gene gain by assuming a double cost compared to loss and duplication (for example, <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>). The gene content table was also used to perform gene accumulation curves using R, which describe the number of new genes and genes in common, with the addition of new comparative genomes. The procedure was repeated 100 times by randomly modifying genome insertion order to obtain means and standard errors.</p>
         </sec>
         <sec>
            <st>
               <p>Alignments</p>
            </st>
            <p>Orthologs were first aligned at the DNA level with ClustalW 1.82 <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. To ensure homology, alignments that contained less than 35% conserved sites for the <it>Streptococcus </it>genus data set, and 50% for the <it>Streptococcus </it>species data sets, were discarded. A preliminary analysis of the data revealed that many sequences identified as under positive selection contained frameshifts that disrupted the reading frame (a single insertion or deletion that modified the reading frame), resulting in high non-synonymous substitution rates. Unfortunately, it is not possible to accurately discriminate sequencing errors from actual insertions or deletions; however, most of these frameshifts were found within the unclosed genomes and appeared at the beginning or end of the contigs, where sequencing errors are more probable. As described by Perrodou <it>et al</it>. <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>, most of these frameshifts are probably sequencing errors, although it is possible that some are not <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. We chose the conservative approach of removing all codons appearing before or after the frameshift when located at the beginning or end of the coding sequence, respectively. For that purpose, Perl scripts were developed to find frameshifts on the DNA alignments, and the sequences were edited manually. A second alignment step was then used to refine all alignments by translating sequences to amino acids, aligning them with ClustalW, and then back-translating to DNA, using the script transAlign <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. Finally, amino acid alignments showing a low percentage of conserved sites were manually inspected, and removed if the alignment was ambiguous.</p>
         </sec>
         <sec>
            <st>
               <p>Recombination detection</p>
            </st>
            <p>Both phylogenetic and substitution pattern methods were use to detect recombination events. The phylogenetic methods rely on the examination of phylogenetic congruence among genes (for example, <abbrgrp><abbr bid="B62">62</abbr><abbr bid="B63">63</abbr></abbrgrp>). If a gene has been laterally transferred, the phylogenetic relationships depicted by this gene will be different from the species tree. We first applied the method suggested by Susko <it>et al</it>. <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, which tests for the rejection of a set of topologies by a set of orthologous genes using the AU test <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>. When possible (that is, the number of taxa &#8804; 7), the trees tested are all the possible unrooted trees (for example, <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>). When a gene rejects a tree that is supported by the majority of the other genes, this gene is considered to have been laterally transferred. We applied this approach to our data sets (except for <it>S. thermophilus</it>, which contains only three taxa), by using as tested topologies the individual gene trees obtained by phyML (general time reversible (GTR) +&#915;4+I model of evolution with a BIONJ starting tree) <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>, with the addition of the tree obtained with Paup (GTR+&#915;4+I model of evolution, neighbor-joining (NJ) starting tree, and a tree-bisection-reconnection (TBR) branch-swapping algorithm) reconstructed from the concatenation of all genes. The site likelihood of each tree was than computed by the program baseml (PAML package) <abbrgrp><abbr bid="B67">67</abbr></abbrgrp> using a GTR+&#915;4 model of evolution. The AU test was then applied using Consel <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>.</p>
            <p>As suggested by Susko <it>et al</it>. <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, results (<it>p </it>values for the rejection of each tree) were plotted using heatmaps obtained with R. This approach has the disadvantage of been developed with a test (the AU test) that assesses the rejection of a tree by a gene and not for its acceptance <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. As a result it is not possible to say if a tree is not rejected because it is not significantly different or because it is simply unresolved. This sort of situation is particularly expected with weakly divergent alignments, and at the opposite spectrum, with saturated alignments. We thus developed and performed a second set of analyses to complement the Susko approach and intended to quantify the amount of supported and incongruent phylogenetic signal between two gene trees. This approach relies on the discovery of well supported conflicting bipartitions (that is, branches that can not be observed in the same tree), as measured by non-parametric bootstrap analysis <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>, thus revealing incongruence between gene histories. Support for each bipartition was obtained by bootstrapping a maximum likelihood (ML) tree search using Paup (GTR+&#915;4+I model of evolution). Custom-made scripts were then used to find and count well supported (&#8805;90% bootstrap support) conflicting bipartitions between gene trees. Additional to phylogenetic recombination detection, we employed methods specifically developed to detect homologous intragenic recombination. We used the compatibility approach between site histories, based on the pairwise homoplasy index (PHI), developed by Bruen <it>et al</it>. <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> and implemented within the program PhiPack <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>. Bruen <it>et al</it>. have suggested PHI is a more robust and sensitive method than many of the earlier approaches. Additional to the PHI statistic, for comparative purposes, we computed MaxChi <abbrgrp><abbr bid="B72">72</abbr></abbrgrp> and NSS <abbrgrp><abbr bid="B73">73</abbr></abbrgrp><it>p </it>values for recombination using PhiPack (employing 1,000 permutations).</p>
         </sec>
         <sec>
            <st>
               <p>Positive selection analysis</p>
            </st>
            <p>We employed the branch-site test of Yang and Nielsen <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B74">74</abbr></abbrgrp> implemented in the program Codeml of the package PAML <abbrgrp><abbr bid="B67">67</abbr></abbrgrp> to assess positive selection at particular sites and lineages. Briefly, the likelihood of a model that does not allow positive selection is compared to one allowing positive selection on some specified lineages. The model allowing positive selection is tested using a likelihood ratio test (LRT) that is compared to a &#967;<sup>2 </sup>statistic with two degrees of freedom. Likelihoods were estimated on the genes or species trees. For the <it>Streptococcus </it>data set each lineage leading to the six different species was tested. For the species analyses each of the lineages corresponding with the different strains was tested, as well as several internal branches supported by the majority of genes. Finally, <it>p </it>values were corrected for multiple hypothesis testing using the Benjamini and Yekutieli method <abbrgrp><abbr bid="B75">75</abbr></abbrgrp>. The effect of lineages, and genes' TIGR main role categories, and their interaction were tested using an analysis of variance on the LRT, using R. Role categories containing less than ten genes were merged. If F-statistic was significant, Tukey's 'honest significant difference' multi-comparison method was used to discriminate lineages and role categories associated with different LRTs.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a gene content table that describes the presence and absence of gene clusters per genome. Additional data file <supplr sid="S2">2</supplr> is a CSV text file listing the composition of the clusters, with links to GenBank protein accession numbers. Additional data file <supplr sid="S3">3</supplr> contains several tables detailing the gene gain, loss and duplication parsimony reconstruction for the <it>Streptococcus</it>, <it>S. pyogenes </it>and <it>S. agalactiae </it>pan-genomes. Results are presented for equal weights as well as following a model that penalized gene gain with respect to gene loss. Additional data file <supplr sid="S4">4</supplr> is a table listing the genes showing evidence of recombination and positive selection. Additional data file <supplr sid="S5">5</supplr> is a table listing the genes and lineages under positive selection for the four analyzed core-genomes, with gene annotation from NCBI and TIGR.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Gene content table describing the presence and absence of gene clusters per genome</p>
            </caption>
            <text>
               <p>Gene content table describing the presence and absence of gene clusters per genome</p>
            </text>
            <file name="gb-2007-8-5-r71-S1.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Composition of the clusters, with links to GenBank protein accession numbers</p>
            </caption>
            <text>
               <p>Composition of the clusters, with links to GenBank protein accession numbers</p>
            </text>
            <file name="gb-2007-8-5-r71-S2.csv">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Gene gain, loss and duplication parsimony reconstruction for the <it>Streptococcus</it>, <it>S. pyogenes </it>and <it>S. agalactiae </it>pan-genomes</p>
            </caption>
            <text>
               <p>Results are presented for equal weights as well as following a model that penalized gene gain with respect to gene loss</p>
            </text>
            <file name="gb-2007-8-5-r71-S3.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>Genes showing evidence of recombination and positive selection</p>
            </caption>
            <text>
               <p>Genes showing evidence of recombination and positive selection</p>
            </text>
            <file name="gb-2007-8-5-r71-S4.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>Genes and lineages under positive selection for the four analyzed core-genomes, with gene annotation from NCBI and TIGR</p>
            </caption>
            <text>
               <p>Genes and lineages under positive selection for the four analyzed core-genomes, with gene annotation from NCBI and TIGR</p>
            </text>
            <file name="gb-2007-8-5-r71-S5.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Qi Sun and Robert Bukowsky for their help with the parallelization of the analyses on a Windows cluster at the Computational Biology Service Unit of Cornell University, Adam Siepel for helpful comments regarding the frameshift problem, and Paulina Pavinski Bitar for help regarding the manual frameshift check. This work was supported by Cornell University start-up funds and USDA grant 2006-35204-17422, awarded to MJS.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The evolutionary genomics of pathogen recombination.</p>
            </title>
            <aug>
               <au>
                  <snm>Awadalla</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>50</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12509753</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Feil</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Holmes</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Bessen</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>NP</fnm>
               </au>
               <au>
                  <snm>Enright</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Goldstein</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hood</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Kalia</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>CE</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>182</fpage>
            <lpage>187</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">14565</pubid>
                  <pubid idtype="pmpid" link="fulltext">11136255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The relative contributions of recombination and point mutation to the diversification of bacterial clones.</p>
            </title>
            <aug>
               <au>
                  <snm>Spratt</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Hanage</snm>
                  <fnm>WP</fnm>
               </au>
               <au>
                  <snm>Feil</snm>
                  <fnm>EJ</fnm>
               </au>
            </aug>
            <source>Curr Opin Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>4</volume>
            <fpage>602</fpage>
            <lpage>606</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11587939</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The fate of laterally transferred genes: life in the fast lane to adaptation or death.</p>
            </title>
            <aug>
               <au>
                  <snm>Hao</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Golding</snm>
                  <fnm>GB</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>636</fpage>
            <lpage>643</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1457040</pubid>
                  <pubid idtype="pmpid" link="fulltext">16651664</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Examining bacterial species under the specter of gene transfer and exchange.</p>
            </title>
            <aug>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lerat</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <issue>Suppl 1</issue>
            <fpage>6595</fpage>
            <lpage>6599</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1131874</pubid>
                  <pubid idtype="pmpid" link="fulltext">15851673</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The rate of adaptive evolution in enteric bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Charlesworth</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Eyre-Walker</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>1348</fpage>
            <lpage>1356</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16621913</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Identification of genes subject to positive selection in uropathogenic strains of <it>Escherichia coli</it>: a comparative genomics approach.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Hung</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Reigstad</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Magrini</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Sabo</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Blasiar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bieri</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Ozersky</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <fpage>5977</fpage>
            <lpage>5982</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1424661</pubid>
                  <pubid idtype="pmpid" link="fulltext">16585510</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>PANDAS: current status and directions for research.</p>
            </title>
            <aug>
               <au>
                  <snm>Snider</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Swedo</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Mol Psychiatry</source>
            <pubdate>2004</pubdate>
            <volume>9</volume>
            <fpage>900</fpage>
            <lpage>907</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15241433</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Genome sequence of Streptococcus <it>agalactiae</it>, a pathogen causing invasive neonatal disease.</p>
            </title>
            <aug>
               <au>
                  <snm>Glaser</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rusniok</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Buchrieser</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chevalier</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Frangeul</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Msadek</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zouine</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Couve</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lalioui</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Poyart</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2002</pubdate>
            <volume>45</volume>
            <fpage>1499</fpage>
            <lpage>1513</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12354221</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Group B streptococcal disease in nonpregnant adults.</p>
            </title>
            <aug>
               <au>
                  <snm>Farley</snm>
                  <fnm>MM</fnm>
               </au>
            </aug>
            <source>Clin Infect Dis</source>
            <pubdate>2001</pubdate>
            <volume>33</volume>
            <fpage>556</fpage>
            <lpage>561</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11462195</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>A serological differenciation of specific types of bovine hemolytic streptococci (group B).</p>
            </title>
            <aug>
               <au>
                  <snm>Lancefield</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>J Exp Med</source>
            <pubdate>1934</pubdate>
            <volume>59</volume>
            <fpage>441</fpage>
            <lpage>458</lpage>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Report from a WHO Working Group: standard method for detecting upper respiratory carriage of <it>Streptococcus pneumoniae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>O'Brien</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Nohynek</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Pediatr Infect Dis J</source>
            <pubdate>2003</pubdate>
            <volume>22</volume>
            <fpage>1</fpage>
            <lpage>11</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12544401</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Role of <it>Streptococcus mutans </it>in human dental decay.</p>
            </title>
            <aug>
               <au>
                  <snm>Loesche</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Microbiol Rev</source>
            <pubdate>1986</pubdate>
            <volume>50</volume>
            <fpage>353</fpage>
            <lpage>380</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">373078</pubid>
                  <pubid idtype="pmpid" link="fulltext">3540569</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Selection of virulence-associated determinants of <it>Streptococcus suis </it>serotype 2 by in vivo</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>HE</fnm>
               </au>
               <au>
                  <snm>Buijs</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wisselink</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Stockhofe-Zurwieden</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Smits</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2001</pubdate>
            <volume>69</volume>
            <fpage>1961</fpage>
            <lpage>1966</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">98113</pubid>
                  <pubid idtype="pmpid" link="fulltext">11179384</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Streptococcal toxic shock syndrome caused by <it>Streptococcus suis </it>serotype 2.</p>
            </title>
            <aug>
               <au>
                  <snm>Tang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>H</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Med</source>
            <pubdate>2006</pubdate>
            <volume>3</volume>
            <fpage>e151</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1434494</pubid>
                  <pubid idtype="pmpid" link="fulltext">16584289</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Gene gain and gene loss in <it>Streptococcus</it>: is it driven by habitat?</p>
            </title>
            <aug>
               <au>
                  <snm>Marri</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hao</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Golding</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>2379</fpage>
            <lpage>2391</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16966682</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene.</p>
            </title>
            <aug>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1998</pubdate>
            <volume>148</volume>
            <fpage>929</fpage>
            <lpage>936</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1460041</pubid>
                  <pubid idtype="pmpid" link="fulltext">9539414</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>32</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10666704</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Codon-substitution models for heterogeneous selection pressure at amino acid sites.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Goldman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2000</pubdate>
            <volume>155</volume>
            <fpage>431</fpage>
            <lpage>449</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1461088</pubid>
                  <pubid idtype="pmpid" link="fulltext">10790415</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>908</fpage>
            <lpage>917</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12032247</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Bayes empirical bayes inference of amino acid sites under positive selection.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>WSW</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>1107</fpage>
            <lpage>1118</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15689528</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Genome analysis of multiple pathogenic isolates of <it>Streptococcus agalactiae</it>: implications for the microbial "pan-genome".</p>
            </title>
            <aug>
               <au>
                  <snm>Tettelin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Masignani</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cieslewicz</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Donati</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Medini</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Angiuoli</snm>
                  <fnm>SV</fnm>
               </au>
               <au>
                  <snm>Crabtree</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Durkin</snm>
                  <fnm>AS</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>13950</fpage>
            <lpage>13955</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1216834</pubid>
                  <pubid idtype="pmpid" link="fulltext">16172379</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Patterns of bacterial gene movement.</p>
            </title>
            <aug>
               <au>
                  <snm>Hao</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Golding</snm>
                  <fnm>GB</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>1294</fpage>
            <lpage>1307</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15115802</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Comparative genomics of the lactic acid bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Makarova</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Slesarev</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sorokin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mirkin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pavlov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pavlova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Karamychev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Polouchine</snm>
                  <fnm>N</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <fpage>15611</fpage>
            <lpage>15616</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1622870</pubid>
                  <pubid idtype="pmpid" link="fulltext">17030793</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Accuracy and power of Bayes prediction of amino acid sites under positive selection.</p>
            </title>
            <aug>
               <au>
                  <snm>Anisimova</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bielawski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>950</fpage>
            <lpage>958</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12032251</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Computational inference of scenarios for alpha-proteobacterial genome evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Boussau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Karlberg</snm>
                  <fnm>EO</fnm>
               </au>
               <au>
                  <snm>Frank</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Legault</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Andersson</snm>
                  <fnm>SGE</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>9722</fpage>
            <lpage>9727</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">470742</pubid>
                  <pubid idtype="pmpid" link="fulltext">15210995</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Genomes in flux: the evolution of archaeal and proteobacterial gene content.</p>
            </title>
            <aug>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huynen</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>17</fpage>
            <lpage>25</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11779827</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The balance of driving forces during genome evolution in prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Kunin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1589</fpage>
            <lpage>1594</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403731</pubid>
                  <pubid idtype="pmpid" link="fulltext">12840037</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>A probabilistic model for gene content evolution with duplication, loss and horizontal transfer.</p>
            </title>
            <aug>
               <au>
                  <snm>Csur&#246;s</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mikl&#243;s</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Research in Computational Molecular Biology</source>
            <publisher>Berlin: Springer</publisher>
            <editor>Apostolico A, Guerra C, Istrail S, Pevzner P, Waterman M</editor>
            <pubdate>2006</pubdate>
            <fpage>206</fpage>
            <lpage>220</lpage>
            <note>[Kanade T, Kittler J, Kleinberg JM, Mattern F, Mitchell JC, Nierstrasz O, Noar M, Rangan CP, Steffen B, Sudan M, <it>et al</it>. (Series Editors): <it>Lecture Notes in Computer Science</it>, vol. 3909]</note>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Genomic diversity and evolution within the species <it>Streptococcus agalactiae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Brochet</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Couv&#233;</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Zouine</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vallaeys</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rusniok</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lamy</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Buchrieser</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Trieu-Cuot</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kunst</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Poyart</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Microbes Infect</source>
            <pubdate>2006</pubdate>
            <volume>8</volume>
            <fpage>1227</fpage>
            <lpage>1243</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16529966</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Phylogenetics and the cohesion of bacterial genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Moran</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>301</volume>
            <fpage>829</fpage>
            <lpage>832</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12907801</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Lerat</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Moran</snm>
                  <fnm>NA</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2003</pubdate>
            <volume>1</volume>
            <fpage>E19</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193605</pubid>
                  <pubid idtype="pmpid" link="fulltext">12975657</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Evolutionary origins of genomic repertoires in bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Lerat</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Moran</snm>
                  <fnm>NA</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>e130</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1073693</pubid>
                  <pubid idtype="pmpid" link="fulltext">15799709</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Visualizing and assessing phylogenetic congruence of core gene sets: a case study of the gamma-proteobacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Susko</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Leigh</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
               <au>
                  <snm>Bapteste</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>1019</fpage>
            <lpage>1030</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16495350</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Phylogenetic relationships and genotyping of the genus <it>Streptococcus </it>by sequence determination of the RNase P RNA gene, rnpB.</p>
            </title>
            <aug>
               <au>
                  <snm>T&#228;pp</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Thollesson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Herrmann</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Int J Syst Evol Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>53</volume>
            <fpage>1861</fpage>
            <lpage>1871</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14657115</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Meningitis caused by <it>Streptococcus suis </it>in humans.</p>
            </title>
            <aug>
               <au>
                  <snm>Arends</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Zanen</snm>
                  <fnm>HC</fnm>
               </au>
            </aug>
            <source>Rev Infect Dis</source>
            <pubdate>1988</pubdate>
            <volume>10</volume>
            <fpage>131</fpage>
            <lpage>137</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3353625</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites.</p>
            </title>
            <aug>
               <au>
                  <snm>Anisimova</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2003</pubdate>
            <volume>164</volume>
            <fpage>1229</fpage>
            <lpage>1236</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1462615</pubid>
                  <pubid idtype="pmpid" link="fulltext">12871927</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Invasive group A streptococcal infections in Ontario, Canada. Ontario Group A Streptococcal Study Group.</p>
            </title>
            <aug>
               <au>
                  <snm>Davies</snm>
                  <fnm>HD</fnm>
               </au>
               <au>
                  <snm>McGeer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Cann</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Simor</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Low</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>N Engl J Med</source>
            <pubdate>1996</pubdate>
            <volume>335</volume>
            <fpage>547</fpage>
            <lpage>554</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8684408</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Population-based surveillance for group A streptococcal necrotizing fasciitis: Clinical features, prognostic indicators, and microbiologic analysis of seventy-seven cases. Ontario Group A Streptococcal Study.</p>
            </title>
            <aug>
               <au>
                  <snm>Kaul</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>McGeer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Low</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Am J Med</source>
            <pubdate>1997</pubdate>
            <volume>103</volume>
            <fpage>18</fpage>
            <lpage>24</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9236481</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Array of M protein gene subtypes in 1064 recent invasive group A <it>streptococcus </it>isolates recovered from the active bacterial core surveillance.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Sakota</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Jackson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Franklin</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Beall</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Infect Dis</source>
            <pubdate>2003</pubdate>
            <volume>188</volume>
            <fpage>1587</fpage>
            <lpage>1592</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14624386</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Epidemiology of invasive group A <it>Streptococcus </it>disease in the United States, 1995-1999.</p>
            </title>
            <aug>
               <au>
                  <snm>O'Brien</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Beall</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Cieslak</snm>
                  <fnm>PR</fnm>
               </au>
               <au>
                  <snm>Reingold</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Farley</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Danila</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Zell</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Facklam</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>B</fnm>
               </au>
               <etal/>
            </aug>
            <source>Clin Infect Dis</source>
            <pubdate>2002</pubdate>
            <volume>35</volume>
            <fpage>268</fpage>
            <lpage>276</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12115092</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Severe group a streptococcal soft-tissue infections in Ontario: 1992-1996.</p>
            </title>
            <aug>
               <au>
                  <snm>Sharkawy</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Low</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Saginur</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gregson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Jessamine</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>McGeer</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Clin Infect Dis</source>
            <pubdate>2002</pubdate>
            <volume>34</volume>
            <fpage>454</fpage>
            <lpage>460</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11797171</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>OrthoMCL: identification of ortholog groups for eukaryotic genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Stoeckert</snm>
                  <fnm>CJJ</fnm>
               </au>
               <au>
                  <snm>Roos</snm>
                  <fnm>DS</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2178</fpage>
            <lpage>2189</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403725</pubid>
                  <pubid idtype="pmpid" link="fulltext">12952885</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Identification of virulence determinants for endocarditis in <it>Streptococcus sanguinis </it>by signature-tagged mutagenesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Paik</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Senty</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Das</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Noe</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Munro</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Kitten</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2005</pubdate>
            <volume>73</volume>
            <fpage>6064</fpage>
            <lpage>6074</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1231064</pubid>
                  <pubid idtype="pmpid" link="fulltext">16113327</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Expression of the <it>Staphylococcus aureus </it>UDP-N-acetylmuramoyl-L-alanyl-D-glutamate:L-lysine ligase in <it>Escherichia coli </it>and effects on peptidoglycan biosynthesis and cell growth.</p>
            </title>
            <aug>
               <au>
                  <snm>Mengin-Lecreulx</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Falla</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Blanot</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>van Heijenoort</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Chopra</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1999</pubdate>
            <volume>181</volume>
            <fpage>5909</fpage>
            <lpage>5914</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">103616</pubid>
                  <pubid idtype="pmpid" link="fulltext">10498701</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Molecular genetic anatomy of inter- and intraserotype variation in the human bacterial pathogen group A <it>Streptococcus</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Beres</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Richter</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Nagiec</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Sumby</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Porcella</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>DeLeo</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Musser</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <fpage>7059</fpage>
            <lpage>7064</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1459018</pubid>
                  <pubid idtype="pmpid" link="fulltext">16636287</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Multilocus sequence typing of Swedish invasive group B <it>streptococcus </it>isolates indicates a neonatally associated genetic lineage and capsule switching.</p>
            </title>
            <aug>
               <au>
                  <snm>Luan</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Granlund</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sellin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lagergard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Spratt</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Norgren</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Clin Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>43</volume>
            <fpage>3727</fpage>
            <lpage>3733</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1233917</pubid>
                  <pubid idtype="pmpid" link="fulltext">16081902</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Hyperinvasive neonatal group B <it>streptococcus </it>has arisen from a bovine ancestor.</p>
            </title>
            <aug>
               <au>
                  <snm>Bisharat</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Crook</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Leigh</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Harding</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>Coffey</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Maiden</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Peto</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>J Clin Microbiol</source>
            <pubdate>2004</pubdate>
            <volume>42</volume>
            <fpage>2161</fpage>
            <lpage>2167</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">404684</pubid>
                  <pubid idtype="pmpid" link="fulltext">15131184</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Mutation of phosphotransacetylase but not isocitrate lyase reduces the virulence of <it>Salmonella enterica </it>serovar Typhimurium in mice.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>YR</fnm>
               </au>
               <au>
                  <snm>Brinsmade</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Escalante-Semerena</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fierer</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2006</pubdate>
            <volume>74</volume>
            <fpage>2498</fpage>
            <lpage>2502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1418904</pubid>
                  <pubid idtype="pmpid" link="fulltext">16552088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Surface proteins of <it>Streptococcus agalactiae </it>and related proteins in other bacterial pathogens.</p>
            </title>
            <aug>
               <au>
                  <snm>Lindahl</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Stalhammar-Carlemalm</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Areschoug</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Clin Microbiol Rev</source>
            <pubdate>2005</pubdate>
            <volume>18</volume>
            <fpage>102</fpage>
            <lpage>127</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">544178</pubid>
                  <pubid idtype="pmpid" link="fulltext">15653821</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Multidrug-resistance efflux pumps - not just for resistance.</p>
            </title>
            <aug>
               <au>
                  <snm>Piddock</snm>
                  <fnm>LJV</fnm>
               </au>
            </aug>
            <source>Nat Rev Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <fpage>629</fpage>
            <lpage>636</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16845433</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Darwinian adaptation of proteorhodopsin to different light intensities in the marine environment.</p>
            </title>
            <aug>
               <au>
                  <snm>Bielawski</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Dunn</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Sabehi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Beja</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>14824</fpage>
            <lpage>14829</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">522022</pubid>
                  <pubid idtype="pmpid" link="fulltext">15466697</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Identification of residues in glutathione transferase capable of driving functional diversification in evolution. A novel approach to protein redesign.</p>
            </title>
            <aug>
               <au>
                  <snm>Ivarsson</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Mackey</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Edalat</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Mannervik</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2003</pubdate>
            <volume>278</volume>
            <fpage>8733</fpage>
            <lpage>8738</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12486119</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Positive selection of primate TRIM5alpha identifies a critical species-specific retroviral restriction domain.</p>
            </title>
            <aug>
               <au>
                  <snm>Sawyer</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Emerman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Malik</snm>
                  <fnm>HS</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>2832</fpage>
            <lpage>2837</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">549489</pubid>
                  <pubid idtype="pmpid" link="fulltext">15689398</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Graph clustering by flow simulation.</p>
            </title>
            <aug>
               <au>
                  <snm>VanDongen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>PhD thesis</source>
            <publisher>University of Utrecht, The Netherlands</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B56">
            <title>
               <p>The R Project for Statistical Computing</p>
            </title>
            <url>http://www.r-project.org/</url>
         </bibl>
         <bibl id="B57">
            <aug>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4</source>
            <publisher>Sunderland, Massachusetts: Sinauer Associates</publisher>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B58">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice.</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res Suppl</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
         </bibl>
         <bibl id="B59">
            <title>
               <p>ICDS database: interrupted CoDing sequences in prokaryotic genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Perrodou</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Deshayes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schaeffer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>VanDorsselaer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ripp</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Reyrat</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Lecompte</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res Suppl</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>Database issue</issue>
            <fpage>D338</fpage>
            <lpage>D343</lpage>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Detecting and analyzing DNA sequencing errors: toward a higher quality of the <it>Bacillus subtilis </it>genome sequence.</p>
            </title>
            <aug>
               <au>
                  <snm>Medigue</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Viari</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Danchin</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>1116</fpage>
            <lpage>1127</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310837</pubid>
                  <pubid idtype="pmpid" link="fulltext">10568751</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Bininda-Emonds</snm>
                  <fnm>ORP</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>156</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1175081</pubid>
                  <pubid idtype="pmpid" link="fulltext">15969769</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>The influence of recombination on the population structure and evolution of the human pathogen <it>Neisseria meningitidis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Holmes</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Urwin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Maiden</snm>
                  <fnm>MC</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1999</pubdate>
            <volume>16</volume>
            <fpage>741</fpage>
            <lpage>749</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10368953</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Feil</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Holmes</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Bessen</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>NP</fnm>
               </au>
               <au>
                  <snm>Enright</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Goldstein</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hood</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Kalia</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>CE</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>182</fpage>
            <lpage>187</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">14565</pubid>
                  <pubid idtype="pmpid" link="fulltext">11136255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>An approximately unbiased test of phylogenetic tree selection.</p>
            </title>
            <aug>
               <au>
                  <snm>Shimodaira</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2002</pubdate>
            <volume>51</volume>
            <fpage>492</fpage>
            <lpage>508</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12079646</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>A selective barrier to horizontal gene transfer in the T4-type bacteriophages that has preserved a core genome with the viral replication and structural genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Fil&#233;e</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bapteste</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Susko</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Krisch</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>1688</fpage>
            <lpage>1696</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16782763</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.</p>
            </title>
            <aug>
               <au>
                  <snm>Guindon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gascuel</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2003</pubdate>
            <volume>52</volume>
            <fpage>696</fpage>
            <lpage>704</lpage>
            <xrefbib>
               <pubid idtype="pmpid">14530136</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>PAML: a program package for phylogenetic analysis by maximum likelihood.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <fpage>555</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9367129</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>CONSEL: for assessing the confidence of phylogenetic tree selection.</p>
            </title>
            <aug>
               <au>
                  <snm>Shimodaira</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>1246</fpage>
            <lpage>1247</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11751242</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Confidence limits on phylogenies: An approach using the bootstrap.</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Evolution Int J Org Evolution</source>
            <pubdate>1985</pubdate>
            <volume>39</volume>
            <fpage>783</fpage>
            <lpage>791</lpage>
         </bibl>
         <bibl id="B70">
            <title>
               <p>A simple and robust statistical test for detecting the presence of recombination.</p>
            </title>
            <aug>
               <au>
                  <snm>Bruen</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Philippe</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2006</pubdate>
            <volume>172</volume>
            <fpage>2665</fpage>
            <lpage>2681</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1456386</pubid>
                  <pubid idtype="pmpid" link="fulltext">16489234</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Trevor's Bruen homepage</p>
            </title>
            <url>http://www.mcb.mcgill.ca/~trevor/</url>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Analyzing the mosaic structure of genes.</p>
            </title>
            <aug>
               <au>
                  <snm>MaynardSmith</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1992</pubdate>
            <volume>34</volume>
            <fpage>126</fpage>
            <lpage>129</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1556748</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Jakobsen</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Easteal</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1996</pubdate>
            <volume>12</volume>
            <fpage>291</fpage>
            <lpage>295</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8902355</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>2472</fpage>
            <lpage>2479</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16107592</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>The control of the false discovery rate in multiple testing under dependency.</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yekutieli</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Ann Stat</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>1165</fpage>
            <lpage>1188</lpage>
         </bibl>
         <bibl id="B76">
            <title>
               <p>Complete genome sequence of an M1 strain of <it>Streptococcus pyogenes</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Ferretti</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>McShan</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Ajdic</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Savic</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Savic</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lyon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Primeaux</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sezate</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Suvorov</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Kenton</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>4658</fpage>
            <lpage>4663</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">31890</pubid>
                  <pubid idtype="pmpid" link="fulltext">11296296</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>Evolutionary origin and emergence of a highly successful clone of serotype M1 group a <it>Streptococcus </it>involved multiple horizontal gene transfer events.</p>
            </title>
            <aug>
               <au>
                  <snm>Sumby</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Porcella</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madrigal</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Barbian</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Virtaneva</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ricklefs</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Sturdevant</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Graham</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Vuopio-Varkila</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hoe</snm>
                  <fnm>NP</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Infect Dis</source>
            <pubdate>2005</pubdate>
            <volume>192</volume>
            <fpage>771</fpage>
            <lpage>782</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16088826</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B78">
            <title>
               <p>Genome sequence and comparative microarray analysis of serotype M18 group A <it>Streptococcus </it>strains associated with acute rheumatic fever outbreaks.</p>
            </title>
            <aug>
               <au>
                  <snm>Smoot</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Barbian</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>VanGompel</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Smoot</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Chaussee</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Sylva</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Sturdevant</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Ricklefs</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Porcella</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Parkins</snm>
                  <fnm>LD</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>4668</fpage>
            <lpage>4673</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">123705</pubid>
                  <pubid idtype="pmpid" link="fulltext">11917108</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B79">
            <title>
               <p>Genome sequence of a serotype M3 strain of group A <it>Streptococcus</it>: phage-encoded toxins, the high-virulence phenotype, and clone emergence.</p>
            </title>
            <aug>
               <au>
                  <snm>Beres</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Sylva</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Barbian</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Lei</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hoff</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Mammarella</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Smoot</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Porcella</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Parkins</snm>
                  <fnm>LD</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>10078</fpage>
            <lpage>10083</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">126627</pubid>
                  <pubid idtype="pmpid" link="fulltext">12122206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B80">
            <title>
               <p>Genome sequence of a serotype M28 strain of group a <it>Streptococcus</it>: potential new insights into puerperal sepsis and bacterial disease specificity.</p>
            </title>
            <aug>
               <au>
                  <snm>Green</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Porcella</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Nagiec</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Barbian</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Beres</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>LeFebvre</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Musser</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Infect Dis</source>
            <pubdate>2005</pubdate>
            <volume>192</volume>
            <fpage>760</fpage>
            <lpage>770</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16088825</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B81">
            <title>
               <p>Genome sequence of an M3 strain of <it>Streptococcus pyogenes </it>reveals a large-scale genomic rearrangement in invasive strains and new insights into phage evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Nakagawa</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kurokawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nakata</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tomiyasu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Okahashi</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kawabata</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yamazaki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shiba</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yasunaga</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1042</fpage>
            <lpage>1055</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403657</pubid>
                  <pubid idtype="pmpid" link="fulltext">12799345</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B82">
            <title>
               <p>Progress toward characterization of the group A <it>Streptococcus </it>metagenome: complete genome sequence of a macrolide-resistant serotype M6 strain.</p>
            </title>
            <aug>
               <au>
                  <snm>Banks</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Porcella</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Barbian</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Beres</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Philips</snm>
                  <fnm>LE</fnm>
               </au>
               <au>
                  <snm>Voyich</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>DeLeo</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Somerville</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Musser</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Infect Dis</source>
            <pubdate>2004</pubdate>
            <volume>190</volume>
            <fpage>727</fpage>
            <lpage>738</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15272401</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B83">
            <title>
               <p>Genome of the bacterium <it>Streptococcus pneumoniae </it>strain R6.</p>
            </title>
            <aug>
               <au>
                  <snm>Hoskins</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Alborn</snm>
                  <fnm>WEJ</fnm>
               </au>
               <au>
                  <snm>Arnold</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Blaszczak</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Burgett</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>DeHoff</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Estrem</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Fritz</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Fuller</snm>
                  <fnm>W</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2001</pubdate>
            <volume>183</volume>
            <fpage>5709</fpage>
            <lpage>5717</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">95463</pubid>
                  <pubid idtype="pmpid" link="fulltext">11544234</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B84">
            <title>
               <p>Complete genome sequence of a virulent isolate of <it>Streptococcus pneumoniae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Tettelin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Read</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>DeBoy</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Haft</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>RJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>293</volume>
            <fpage>498</fpage>
            <lpage>506</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11463916</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B85">
            <title>
               <p>Genome sequence of <it>Streptococcus mutans </it>UA159, a cariogenic dental pathogen.</p>
            </title>
            <aug>
               <au>
                  <snm>Ajdic</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>McShan</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>McLaughlin</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Savic</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Carson</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Primeaux</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kenton</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jia</snm>
                  <fnm>H</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>14434</fpage>
            <lpage>14439</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137901</pubid>
                  <pubid idtype="pmpid" link="fulltext">12397186</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B86">
            <title>
               <p>Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V <it>Streptococcus agalactiae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Tettelin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Masignani</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cieslewicz</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wessels</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Margarit</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Read</snm>
                  <fnm>TD</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>12391</fpage>
            <lpage>12396</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">129455</pubid>
                  <pubid idtype="pmpid" link="fulltext">12200547</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B87">
            <title>
               <p>Complete sequence and comparative genome analysis of the dairy bacterium <it>Streptococcus thermophilus</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Bolotin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Quinquis</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Renault</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sorokin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ehrlich</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Kulakauskas</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lapidus</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Goltsman</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mazur</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pusch</snm>
                  <fnm>GD</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2004</pubdate>
            <volume>22</volume>
            <fpage>1554</fpage>
            <lpage>1558</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15543133</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
