<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2010-11-4-r45</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Small variable segments constitute a major type of diversity of bacterial genomes at the species level</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Touzain</snm>
               <fnm>Fabrice</fnm>
               <insr iid="I1"/>
               <email>fabrice.touzain@jouy.inra.fr</email>
            </au>
            <au id="A2">
               <snm>Denamur</snm>
               <fnm>Erick</fnm>
               <insr iid="I2"/>
               <email>erick.denamur@inserm.fr</email>
            </au>
            <au id="A3">
               <snm>M&#233;digue</snm>
               <fnm>Claudine</fnm>
               <insr iid="I3"/>
               <email>cmedigue@genoscope.cns.fr</email>
            </au>
            <au id="A4">
               <snm>Barbe</snm>
               <fnm>Val&#233;rie</fnm>
               <insr iid="I4"/>
               <email>valerie.barbe@genoscope.cns.fr</email>
            </au>
            <au id="A5">
               <snm>El Karoui</snm>
               <fnm>Meriem</fnm>
               <insr iid="I1"/>
               <email>Meriem_ElKaroui@hms.harvard.edu</email>
            </au>
            <au ca="yes" id="A6">
               <snm>Petit</snm>
               <fnm>Marie-Agn&#232;s</fnm>
               <insr iid="I1"/>
               <email>marie-agnes.petit@jouy.inra.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>INRA, UMR1319, Micalis, Bat 222, Jouy en Josas, 78350, France</p>
            </ins>
            <ins id="I2">
               <p>INSERM U722 and Universit&#233; Paris 7, Facult&#233; de M&#233;decine, Site Xavier Bichat, Paris, 75018, France</p>
            </ins>
            <ins id="I3">
               <p>CNRS-UMR 8030 &amp; CEA/IG/Genoscope, Laboratoire d'Analyses Bioinformatiques en G&#233;nomique et M&#233;tabolisme (LABGeM), rue Gaston Cr&#233;mieux, Evry, 91057, France</p>
            </ins>
            <ins id="I4">
               <p>CEA, Institut de G&#233;nomique, Genoscope, rue Gaston Cr&#233;mieux, Evry, 91057, France</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2010</pubdate>
         <volume>11</volume>
         <issue>4</issue>
         <fpage>R45</fpage>
         <url>http://genomebiology.com/2010/11/4/R45</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2010-11-4-r45</pubid>
               <pubid idtype="pmpid">20433696</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>29</day>
               <month>10</month>
               <year>2009</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>15</day>
               <month>3</month>
               <year>2010</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>30</day>
               <month>4</month>
               <year>2010</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>30</day>
               <month>4</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>Touzain et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Bacterial genome diversity</p>
      </shorttitle>
      <shortabs>
         <p>Comparison of all available genomes of three bacterial species suggests that a large part of genome diversity is contributed by short regions.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Analysis of large scale diversity in bacterial genomes has mainly focused on elements such as pathogenicity islands, or more generally, genomic islands. These comprise numerous genes and confer important phenotypes, which are present or absent depending on strains. We report that despite this widely accepted notion, most diversity at the species level is composed of much smaller DNA segments, 20 to 500 bp in size, which we call microdiversity.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We performed a systematic analysis of the variable segments detected by multiple whole genome alignments at the DNA level on three species for which the greatest number of genomes have been sequenced: <it>Escherichia coli</it>, <it>Staphylococcus aureus</it>, and <it>Streptococcus pyogenes</it>. Among the numerous sites of variability, 62 to 73% were loci of microdiversity, many of which were located within genes. They contribute to phenotypic variations, as 3 to 6% of all genes harbor microdiversity, and 1 to 9% of total genes are located downstream from a microdiversity locus. Microdiversity loci are particularly abundant in genes encoding membrane proteins. In-depth analysis of the <it>E. coli </it>alignments shows that most of the diversity does not correspond to known mobile or repeated elements, and it is likely that they were generated by illegitimate recombination. An intriguing class of microdiversity includes small blocks of highly diverged sequences, whose origin is discussed.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>This analysis uncovers the importance of this small-sized genome diversity, which we expect to be present in a wide range of bacteria, and possibly also in many eukaryotic genomes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification id="300100010" subtype="man_spc_id" type="BMC">Evolution</classification>
         <classification id="300100014" subtype="man_spc_id" type="BMC">Genome studies</classification>
         <classification id="300100013" subtype="man_spc_id" type="BMC">Microbiology and parasitology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The availability of bacterial genome sequences for closely related strains within a species and software dedicated to multiple genome alignments allow for a novel perspective of bacterial genetic diversity <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. Use of these aligners has led to the notion that bacterial species share a DNA backbone common to all strains interrupted by variable segments (VSs) that are specific to a subset of the aligned strains <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. The most studied category of VSs are genomic islands, which are defined by Vernikos and Parkhill as horizontally acquired mobile elements of limited phylogenetic distribution <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. These islands are of a large size (30 to 100 kb), and often encode genes critical for pathogenesis <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Their integration into genomes presumably occurs by site-specific recombination. Genomic islands may then diffuse from strain to strain by homologous recombination <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Where known, horizontal transfer of islands occurs either by mobilization through bacteriophages, such as in <it>Staphylococcus aureus </it><abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp> or by conjugation, using transfer origins located either outside or inside the island <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Informatic tools have been developed to detect such islands in genomes <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. A second category of VSs of large size involves temperate bacteriophages, or phage remnants. Like genomic islands, they enter the bacterial chromosome by site-specific recombination. Informatic tools to predict these elements have flourished in the past few years <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Recently, a new class of large variable elements has been characterized with the clustered, regularly interspaced short palindromic repeats (CRISPR), in which repeats alternate with short DNA segments of plasmid or bacteriophage origin. These regions confer phage or plasmid immunity <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> by mechanisms that remain to be understood. Databases for these elements are available <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Transposons and insertion sequences (ISs) also contribute to VSs when closely related genomes are compared, and their size is small compared to the first two types of elements (a few hundred base pairs to a few kilobases). These elements move within a given genome by transposition. A reference website allowing their classification exists <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, and two strategies for automated IS detection have been described <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. Finally, the smallest kind of VS (with a = 20 bp threshold) expected to be present when genomes are aligned are the minisatellites, composed of small tandem repeats that are commonly used for strain typing. Websites allowing their recognition are available <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. A special category of such repeats are the 'small dispersed repeats', some 20 bp long and tandemly repeated in various copy numbers in genomes, which might be mobile <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The <it>Escherichia coli </it>genomes contain a family of such elements, called palindromic units (PUs; 30 to 37 bp), which are palindromic and intergenic, and often combined in clusters <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
         <p>DNA recombination and mutagenesis are the sources of respectively large and small scale genetic diversity in genomes. In a broad sense, recombination designates all events that reshuffle DNA sequences. This reshuffling can have two opposite effects: either it homogenizes DNA sequences (a process called DNA conversion), or it provokes the abrupt loss, acquisition or translocation of genetic information, and therefore brings in diversity. A wide range of artificial genetic systems have been set up in the past decades to study recombination at the molecular level in bacteria and to determine the frequencies of its occurrence. Among the three main categories of recombination events, site-specific recombination is highly efficient; for example, recombination can occur in 100% of cells in an engineered site-specific recombination assay <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. However, this class of events is limited by its specialization, as it requires a dedicated enzyme (whose expression is usually regulated) and its cognate site. The next most efficient bacterial system is homologous recombination; for example, an estimated 10<sup>-4 </sup>of a non-stressed cell population recombined 1-kb-long tandem repeats present in the chromosomes of <it>Salmonella typhimurium </it><abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, <it>E. coli </it><abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, <it>Bacillus subtilis </it><abbrgrp><abbr bid="B34">34</abbr></abbrgrp> and <it>Helicobacter pylori </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. These events usually rely on RecA, an ubiquitous enzyme that catalyzes homologous DNA pairing. Homologous recombination is not sequence-specific, and its efficiency is proportional to the length of homology shared by the recombining molecules. High proportions of recombinants are scored during DNA conjugation (up to 10%), where several hundred-kilobase-long DNA segments enter the cell <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, and during natural DNA transformation <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Finally, illegitimate recombination is the least efficient mode of recombination, with events occurring in approximately 10<sup>-8 </sup>of a given cell population <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. It includes events that join DNA segments not sufficiently homologous for RecA pairing, nor involved in site-specific recombination. Illegitimate recombination events are attributed to errors of enzymes that deal with DNA, such as DNA polymerases <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>, RNA polymerases <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, repair enzymes, or topological enzymes (for reviews, see <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>). Interestingly, the non-homologous end joining type of illegitimate recombination, which involves dedicated enzymes and has a pre-eminent role in eukaryotes, is almost absent in prokaryotes, except in a few species such as <it>Mycobacterium tuberculosis </it><abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp> and <it>B. subtilis</it>, where it contributes to spore germination and resistance to desiccation <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp>.</p>
         <p>To date, no correlation exists between experimental DNA recombination studies and comparative genomic analyses. Indeed, molecular analyses usually focus on a single type of event (for examples, see <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B38">38</abbr><abbr bid="B42">42</abbr></abbrgrp>) without considering its frequency compared to those of other events that occur in the natural history of bacterial genomes. It is conceivable that the least efficient - that is, illegitimate recombination - is the major contributor in shaping bacterial genomes. Comparative genomic analyses offer the possibility to examine genome diversity globally, but most studies usually concentrate on just a single class of VSs. One exception involves a systematic analysis of all VSs of more than 10 bp present on two very closely related <it>S. aureus </it>genomes <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. Among 27 VS sites, this study revealed a pre-eminence of illegitimate events over other classes of recombination, and raises questions of whether this observation can be generalized to more diverse genomes, and to other species.</p>
         <p>In this report, we performed multi-strain alignments in three very different species to make a global assessment of bacterial diversity. Our aim was to understand the kind of molecular events that shaped present day genomes, and to determine the features of recombination. Our main finding is that short VSs (20 to 500 bp long) are highly frequent in genomes and reside often within genes. Such VSs are sometimes referred to as indels, but our multigenome analysis shows that only a minority of them originates effectively from an insertion or a deletion; we therefore designated them collectively by the broader term of 'microdiversity'. This study uncovers the numerical importance of microdiversity, predicts the pre-eminence of illegitimate recombination as the mechanism generating it, and highlights the existence, among microdiversity, of highly diverged blocks.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Strain choice</p>
            </st>
            <p><it>E. coli</it>, <it>S. aureus </it>and <it>Streptococcus pyogenes </it>were selected to examine intra-species diversity at the genome level, as they are the three species with the greatest number of available genome sequences. Members of each species are known pathogens, but otherwise they have very diverse characteristics: <it>E. coli </it>is a Gram-negative bacterium that lives both in the digestive tract of warm blooded animals and in water, while <it>S. aureus </it>and <it>S. pyogenes </it>are Gram-positive species that respectively colonize the nose, and skin and throat of mammals. Unlike the two other species, <it>S. pyogenes </it>is an obligate fermenting bacterium. Five genomes representative of each of these species were selected such that each member of the set was as distant as possible from all others (see Materials and methods). The <it>E. coli </it>species is particularly diverse, and phylogenetic studies led to the conclusion that a branch of this species, the B2 phylogenetic group, behaves as a subspecies <abbrgrp><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp>. Moreover, the comparative study of 20 <it>E. coli </it>genomes identified a substantial set of genes that are unique to the B2 group <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. We therefore analyzed a set of five <it>E. coli </it>B2 genomes as a group, in addition to the genome set representative of the <it>E. coli </it>species. Neighbor joining trees derived from a new genomic distance called MUMi (see Materials and methods) <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> were calculated for the four strain sets (Figure <figr fid="F1">1</figr>). The <it>E. coli </it>MUMi tree was congruent with the phylogenetic tree reconstructed from the <it>Escherichia </it>core genome genes <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. As for the <it>S. aureus </it>and <it>S. pyogenes </it>sets, reliable phylogenetic trees derived from the concatenated core genome of the species are not yet available to our knowledge, but our previous results suggest that the MUMi trees should be good approximations of phylogenetic trees <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Neighbor joining trees based on genomic MUMi distances of the strains selected for the five-genome alignments</p>
               </caption>
               <text>
                  <p><b>Neighbor joining trees based on genomic MUMi distances of the strains selected for the five-genome alignments</b>.</p>
               </text>
               <graphic file="gb-2010-11-4-r45-1" hint_layout="double"/>
            </fig>
            <p>To complete the five genomes analyses, alignments involving a maximum number of genomes were also analyzed using 25, 11 and 12 genomes for <it>E. coli</it>, <it>S. aureus </it>and <it>S. pyogenes</it>, respectively. Trees of the strains used are shown in Additional file <supplr sid="S1">1</supplr>.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Neighbor joining trees based on genomic MUMi distances of the strains selected for the maximal genomes alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S1.PDF">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Alignments and definition of the variable segments</p>
            </st>
            <p>Complete multiple genome aligners provide general outlines of colinear regions among the genomes, as well as the set of identical anchors (short DNA fragments) shared by all genomes. Out of these data, complete alignments can be defined precisely using a post-treatment step, so as to attribute which parts of the genomes belong to the common backbone DNA, and which parts are VSs (see Materials and methods). MOSAIC <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> is a database offering such completely refined alignments for bacterial genomes at the intra-species level, using either MGA or MAUVE as entry points for the post-treatment step. We have shown previously <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp> that it is possible to use robust criteria to delineate VSs: if in a part of the alignment at least two DNA segments differ by more than 24% at the nucleotide level, or if the alignment includes a gap of at least 20 nucleotides, all segments of this part of the alignment are labeled as VSs. Further details on these parameter choices are given in the Materials and methods and in Additional file <supplr sid="S2">2</supplr>.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>Choice of the maximum divergence level for inclusion of ClustalW aligned sequences into the backbone.</p>
               </text>
               <file name="gb-2010-11-4-r45-S2.DOC">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>VSs are defined here as DNA segments with a minimum length of 20 bp, and that differ from one another at a given position of the alignment. The cutoff chosen to decide that two VSs differ from one another is largely above the average pairwise nucleotide diversity between orthologous genes, which usually does not exceed 5% at the intra-species level in bacteria. As a consequence, in this analysis, all sequences having point mutations corresponding to the intra-species vertical divergence, as well as small indels, are classified as the backbone and are not considered.</p>
            <p>The main characteristics of the alignments are presented in Table <tblr tid="T1">1</tblr>. While the <it>E. coli </it>strains were, as expected, more distantly related to one another than strains of the other sets <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> (see the longer branches in Figure <figr fid="F1">1</figr>, and maximal MUMi values in Table <tblr tid="T1">1</tblr>), the B2<it>E. coli</it>, <it>S. pyogenes </it>and <it>S. aureus </it>sets had similar 'tree depth', suggesting that these three sets diverged during similar evolutionary time scales.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Characteristics of the four whole-genome alignments, involving five strains each</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>E. coli</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b><it>E. coli </it>B2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>S. aureus</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>S. pyogenes</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Median genome size (Mb)</p>
                     </c>
                     <c ca="center">
                        <p>5.2</p>
                     </c>
                     <c ca="center">
                        <p>5.2</p>
                     </c>
                     <c ca="center">
                        <p>2.8</p>
                     </c>
                     <c ca="center">
                        <p>1.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Maximal MUMi distance</p>
                     </c>
                     <c ca="center">
                        <p>0.3</p>
                     </c>
                     <c ca="center">
                        <p>0.156</p>
                     </c>
                     <c ca="center">
                        <p>0.197</p>
                     </c>
                     <c ca="center">
                        <p>0.175</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Coverage<sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>72.7%</p>
                     </c>
                     <c ca="center">
                        <p>83.5%</p>
                     </c>
                     <c ca="center">
                        <p>84.5%</p>
                     </c>
                     <c ca="center">
                        <p>83.5%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Percent identity of backbone</p>
                     </c>
                     <c ca="center">
                        <p>98.05%</p>
                     </c>
                     <c ca="center">
                        <p>99.43%</p>
                     </c>
                     <c ca="center">
                        <p>98.73%</p>
                     </c>
                     <c ca="center">
                        <p>99.18%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total number of loci<sup>b</sup></p>
                     </c>
                     <c ca="center">
                        <p>1,037</p>
                     </c>
                     <c ca="center">
                        <p>539</p>
                     </c>
                     <c ca="center">
                        <p>768</p>
                     </c>
                     <c ca="center">
                        <p>344</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of microdiversity loci</p>
                     </c>
                     <c ca="center">
                        <p>640</p>
                     </c>
                     <c ca="center">
                        <p>370</p>
                     </c>
                     <c ca="center">
                        <p>556</p>
                     </c>
                     <c ca="center">
                        <p>250</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Median size of VS (bp)<sup>c</sup></p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>78</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>a</sup>Proportion of the genome included in the backbone (average). <sup>b</sup>Positions in the alignment where the backbone is interrupted by at least one variable segment (VS).</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>VSs are abundant, short in size, and, for the most part, different from previously reported variable elements</p>
            </st>
            <p>We will hereafter refer to 'locus' as the position of an alignment where the backbone is interrupted by a VS in at least one strain (Figure <figr fid="F2">2</figr>). The number of loci in a given alignment varied from 344 to 1,037 depending on the species studied (Table <tblr tid="T1">1</tblr>). The VS size distribution in all four alignments is represented as a box-plot in Figure <figr fid="F3">3</figr>, and whole distributions are shown in Additional file <supplr sid="S3">3</supplr>. A remarkable feature of all the alignments was that most of the segments were small: the VSs had a median size of 60 to 90 bp (Table <tblr tid="T1">1</tblr>), and at least 75% of all VSs were smaller than 500 bp (Figure <figr fid="F3">3</figr>). Loci where all VSs were less than 500 bp long were also abundant (62 to 73% of all loci; Table <tblr tid="T1">1</tblr>), and will be designated hereafter as microdiversity loci. To test whether microdiversity was still present when more genomes are aligned, alignments of <it>E. coli</it>, <it>S. aureus </it>and <it>S. pyogenes </it>using 25, 11 and 12 genomes, respectively, were realized (Table <tblr tid="T2">2</tblr>). Overall, the number of loci increased by 50% for <it>E. coli</it>, 26% for <it>S. aureus</it>, and 65% for <it>S. pyogenes</it>. Again, microdiversity loci represented 55 to 78% of all loci. We conclude that the most abundant type of genomic diversity is microdiversity, irrespective of the number of genomes included in the alignment.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>Distribution of the VS sizes in the five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S3.PPT">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Rationale for the alignment analyses</p>
               </caption>
               <text>
                  <p><b>Rationale for the alignment analyses</b>. The five horizontal blue lines represent the backbone DNA, and the triangles represent the VSs interrupting the backbone. All the VSs present at a given position of the alignment constitute a locus. <b>(a)</b> The five categories of VS positions relative to genes. Red arrows below the backbone blue lines represent genes. IntraG, intragenic; interG, intergenic; G, gene; L, length. <b>(b)</b> Loci history. VSs are colored according to DNA content. Identical color indicates identical content. Detection of insertions, deletions, ancient insertion or deletion event (ins or del), dimorph, homeologous and polymorph loci are as detailed in the text.</p>
               </text>
               <graphic file="gb-2010-11-4-r45-2" hint_layout="double"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Size distribution of the variable segments produced in the four alignments (box plots)</p>
               </caption>
               <text>
                  <p><b>Size distribution of the variable segments produced in the four alignments (box plots)</b>. Each box shows the median value (middle lane), first and third quartiles (lower and upper lanes) of the size distribution. Values laying more than 1.5 times the inter-quartile value away from the bulk of all values are shown individually as dots. The width of each box is proportional to the number of VSs analyzed per alignment. On the right side, VSs shorter than 500 bp are designated by microdiversity. Abcissa: E_co, <it>E. coli</it>; E_B2, <it>E. coli </it>B2 phylogenetic group; S_au, <it>S. aureus</it>; S_pyo, <it>S. pyogenes</it>.</p>
               </text>
               <graphic file="gb-2010-11-4-r45-3" hint_layout="double"/>
            </fig>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Microdiversity loci, including homeologous and dimorphic loci, are dominant categories irrespective of the number of genomes aligned</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>
                              <it>E. coli</it>
                           </b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>
                              <it>S. aureus</it>
                           </b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>
                              <it>S. pyogenes</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of genomes aligned</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total number of loci</p>
                     </c>
                     <c ca="center">
                        <p>1,037</p>
                     </c>
                     <c ca="center">
                        <p>1,553</p>
                     </c>
                     <c ca="center">
                        <p>768</p>
                     </c>
                     <c ca="center">
                        <p>970</p>
                     </c>
                     <c ca="center">
                        <p>344</p>
                     </c>
                     <c ca="center">
                        <p>570</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of microdiversity loci (M)</p>
                     </c>
                     <c ca="center">
                        <p>640 (62%)<sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>852 (55%)</p>
                     </c>
                     <c ca="center">
                        <p>556 (72%)</p>
                     </c>
                     <c ca="center">
                        <p>715 (74%)</p>
                     </c>
                     <c ca="center">
                        <p>250 (73%)</p>
                     </c>
                     <c ca="center">
                        <p>385 (67%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Insertions/M</p>
                     </c>
                     <c ca="center">
                        <p>7.03%</p>
                     </c>
                     <c ca="center">
                        <p>3.99%</p>
                     </c>
                     <c ca="center">
                        <p>3.6%</p>
                     </c>
                     <c ca="center">
                        <p>1.12%</p>
                     </c>
                     <c ca="center">
                        <p>4.8%</p>
                     </c>
                     <c ca="center">
                        <p>5.71%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Deletions/M</p>
                     </c>
                     <c ca="center">
                        <p>4.22%</p>
                     </c>
                     <c ca="center">
                        <p>4.69%</p>
                     </c>
                     <c ca="center">
                        <p>4.68%</p>
                     </c>
                     <c ca="center">
                        <p>4.48%</p>
                     </c>
                     <c ca="center">
                        <p>12.4%</p>
                     </c>
                     <c ca="center">
                        <p>10.91%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Insertions or deletions/M</p>
                     </c>
                     <c ca="center">
                        <p>3.59%</p>
                     </c>
                     <c ca="center">
                        <p>0.47%</p>
                     </c>
                     <c ca="center">
                        <p>3.24%</p>
                     </c>
                     <c ca="center">
                        <p>2.66%</p>
                     </c>
                     <c ca="center">
                        <p>0.8%</p>
                     </c>
                     <c ca="center">
                        <p>0%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dimorphs/M</p>
                     </c>
                     <c ca="center">
                        <p>37.97%</p>
                     </c>
                     <c ca="center">
                        <p>23.71%</p>
                     </c>
                     <c ca="center">
                        <p>42.63%</p>
                     </c>
                     <c ca="center">
                        <p>52.03%</p>
                     </c>
                     <c ca="center">
                        <p>31.6%</p>
                     </c>
                     <c ca="center">
                        <p>22.34%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Homeologous/M</p>
                     </c>
                     <c ca="center">
                        <p>30.31%</p>
                     </c>
                     <c ca="center">
                        <p>45.89%</p>
                     </c>
                     <c ca="center">
                        <p>22.84%</p>
                     </c>
                     <c ca="center">
                        <p>23.5%</p>
                     </c>
                     <c ca="center">
                        <p>19.6%</p>
                     </c>
                     <c ca="center">
                        <p>27.53%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Polymorphs/M</p>
                     </c>
                     <c ca="center">
                        <p>16.88%</p>
                     </c>
                     <c ca="center">
                        <p>21.24%</p>
                     </c>
                     <c ca="center">
                        <p>23.02%</p>
                     </c>
                     <c ca="center">
                        <p>16.22%</p>
                     </c>
                     <c ca="center">
                        <p>30.8%</p>
                     </c>
                     <c ca="center">
                        <p>33.51%</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>a</sup>Percentage of total loci.</p>
               </tblfn>
            </tbl>
            <p>Given the abundance of annotated data available for <it>E. coli </it>in databases, we selected this species to perform a mapping of the VSs to available annotations such as bacteriophages, genomic islands, clustered, regularly interspaced short palindromic repeats (CRISPRs), ISs, and repeated elements such as minisatellites and PUs (see Materials and methods for data collection). If more than 50% of the length of a VS corresponded to an annotated region, the VS was labeled as such. All VS labels were then stored collectively at the locus level. The number of loci containing each type of annotation is reported (Table <tblr tid="T3">3</tblr>). Only 35% of the 1,037 loci of the <it>E. coli </it>alignment, and 47% of the B2 subgroup loci, corresponded to one of the elements described above. Therefore, the major proportion of the loci does not originate from readily identifiable events. In particular, the microdiversity loci accounted for 63 to 72% of the category 'Other'. The DNA content of the <it>E. coli </it>loci not belonging to known categories was compared by Blast to the Non-Redundant database (see Materials and methods). The largest category comprised segments that matched with other <it>E. coli </it>strains (65 to 86% of the cumulated DNA length of all VSs tested in a given genome). This suggests that most of the VSs belong to a shared pool of <it>E. coli </it>sequences, the so-called <it>E. coli </it>pan-genome. The next largest category included segments that did not have any match in the database (13 to 34%). DNA segments matching to other species or environmental samples were essentially absent. In conclusion, most of the variable loci are microdiversity loci, and to the best of our knowledge for <it>E. coli</it>, they do not correspond to known elements, although most contain pan-genomic DNA.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Number of loci in <it>E. coli </it>alignments corresponding to known elements</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4" ca="center">
                        <p>
                           <b>
                              <it>E. coli</it>
                           </b>
                        </p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>
                           <b><it>E. coli </it>B2</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>All loci</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>Microdiversity loci</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>All loci</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b>Microdiversity loci</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>n</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Percent</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>n</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Percent</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>n</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Percent</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>n</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Percent</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>1,037</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>640</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>539</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>370</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Bacteriophages</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CRISPR</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>0.3</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.1</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Genomic islands</p>
                     </c>
                     <c ca="center">
                        <p>127</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>103</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>64</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Insertion sequences</p>
                     </c>
                     <c ca="center">
                        <p>55</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>0.3</p>
                     </c>
                     <c ca="center">
                        <p>48</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Palindromic units</p>
                     </c>
                     <c ca="center">
                        <p>129</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>105</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Minisatellites</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Other</p>
                     </c>
                     <c ca="center">
                        <p>678</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>459</p>
                     </c>
                     <c ca="center">
                        <p>72</p>
                     </c>
                     <c ca="center">
                        <p>289</p>
                     </c>
                     <c ca="center">
                        <p>53</p>
                     </c>
                     <c ca="center">
                        <p>233</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>CRISPR, clustered, regularly interspaced short palindromic repeat.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Identification of the microdiversity regions possibly affecting genes</p>
            </st>
            <p>The remaining part of this analysis focuses on the microdiversity loci that correspond to largely unknown aspects of genome diversity. We chose to focus on the five-genome alignments because more information was available for these. We asked how microdiversity regions were located respective to genes. A microdiversity locus was designated as an 'intragenic locus' if all VSs of the locus were located inside a gene, without perturbing its reading frame, and as an 'intergenic locus' if all VS boundaries were located outside genes (Figure <figr fid="F2">2a</figr>, first two examples). We also considered the cases where insertion of a VS interrupts a gene in at least one strain of the alignment (such as with IS insertions), and called this category 'flanking gene missing' (Figure <figr fid="F2">2a</figr>, third case). Addition of DNA can also sometimes provoke an in-frame fusion, resulting in a locus where VSs have 'flanking genes of variable length'. Finally, we placed the remaining loci in the 'mixed locus' category (it can correspond, for instance, to loci where some VSs of a given locus are intragenic and others intergenic).</p>
            <p>Thirty-five to 55% of the microdiversity loci were intragenic (Figure <figr fid="F4">4</figr>), and did not perturb the reading frame of the gene (for example, see the nucleotide sequence of a 61-bp microdiversity locus present in the <it>manZ </it>gene; Figure <figr fid="F5">5</figr>). The number of genes affected by microdiversity, that is, harboring a VS in at least one genome, was then calculated. Depending on the genome and the alignment, their proportion ranged from 3 to 6% of all genes. Some genes contained more than one VS. Remarkably, some <it>S. aureus </it>genes harbor up to seven in-frame VSs. These <it>S. aureus </it>VS-rich genes encode surface proteins such as the fibrinogen binding protein SdrE, or clumping factor ClfB. The most VS-rich gene of <it>E. coli </it>and B2 subgroup alignments is <it>ftsK </it>(four and three VSs, respectively), encoding a membrane protein important for chromosome segregation. In most cases (75 to 92% of intragenic loci), the amino acid sequence of the protein was modified by the presence of the VS. Complete lists of these genes are given in Additional files <supplr sid="S4">4</supplr>, <supplr sid="S5">5</supplr>, <supplr sid="S6">6</supplr> and <supplr sid="S7">7</supplr>, with a break-down according to functional categories for <it>E. coli </it>genes in Additional file <supplr sid="S8">8</supplr>. Genes encoding membrane proteins were significantly enriched among the population of genes with microdiversity loci in the <it>E. coli </it>and B2 lists (Additional file <supplr sid="S8">8</supplr>). These results suggest that besides point mutations, genes also evolve by more abrupt, 'block modifications' of gene fragments (see Discussion).</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>List of genes containing microdiversity loci in the <it>E. coli </it>five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S4.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S5">
               <title>
                  <p>Additional file 5</p>
               </title>
               <text>
                  <p>List of genes containing microdiversity loci in the <it>E. coli </it>B2 five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S5.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S6">
               <title>
                  <p>Additional file 6</p>
               </title>
               <text>
                  <p>List of genes containing microdiversity loci in the <it>S. aureus </it>five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S6.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S7">
               <title>
                  <p>Additional file 7</p>
               </title>
               <text>
                  <p>List of genes containing microdiversity loci in the <it>S. pyogenes </it>five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S7.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S8">
               <title>
                  <p>Additional file 8</p>
               </title>
               <text>
                  <p>Distribution of the <it>E. coli </it>genes containing microdiversity loci in functional categories.</p>
               </text>
               <file name="gb-2010-11-4-r45-S8.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Location of the variable segments relative to genes in the four alignments</p>
               </caption>
               <text>
                  <p><b>Location of the variable segments relative to genes in the four alignments</b>. The proportion of each category is given as percentages of total loci present in each alignment.</p>
               </text>
               <graphic file="gb-2010-11-4-r45-4" hint_layout="double"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>The 61 bp-long variable segment of the <it>manZ </it>gene</p>
               </caption>
               <text>
                  <p><b>The 61 bp-long variable segment of the <it>manZ </it>gene</b>. <b>(a) </b>DNA sequence. Bold capitals delineate the VS. Non-synonymous mutations are shown in red, synonymous in green. <b>(b) </b>Protein sequence. Amino acid changes are shown in red. This locus is intragenic and dimorphic.</p>
               </text>
               <graphic file="gb-2010-11-4-r45-5" hint_layout="double"/>
            </fig>
            <p>Intergenic loci represented 23 to 48% of all loci (Figure <figr fid="F4">4</figr>). In <it>E. coli</it>, some of them corresponded to PU/repetitive elements (93 of 276 for the global <it>E. coli </it>alignment, and 32 of 127 for the <it>E. coli </it>B2 subgroup alignment). In the <it>S. aureus </it>alignment, the intergenic loci were the most abundant, representing 48% of all variable loci. Some of them likely correspond to <it>Staphylococcus </it>repetitive elements <abbrgrp><abbr bid="B56">56</abbr></abbrgrp> that are intergenic, or to staphylococcal interspersed repeats units <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. An analysis was performed on loci where VSs were located less than 500 bp upstream of an ORF (Additional files <supplr sid="S9">9</supplr>, <supplr sid="S10">10</supplr>, <supplr sid="S11">11</supplr>, and <supplr sid="S12">12</supplr>), and a break-down in functional categories was effected for the <it>E. coli </it>genes (Additional file <supplr sid="S13">13</supplr>). The proportion of genes preceded by a VS ranged from 1 to 9% of all genes. Non-coding RNA (corresponding to tRNA, rRNA and small non-coding RNA) were significantly enriched among the genes preceded by a VS (Additional file <supplr sid="S13">13</supplr>). Note that these RNA were not target sites for genomic island integration, which preferentially integrate downstream from tRNAs. They often corresponded to variations in runs of tRNA genes, or in tRNA interspersed between rRNA genes. Apart from this special category, we suspect that the presence of VSs upstream of genes may affect regulation, and hence contribute to strain diversity.</p>
            <suppl id="S9">
               <title>
                  <p>Additional file 9</p>
               </title>
               <text>
                  <p>List of genes placed downstream of microdiversity loci in the <it>E. coli </it>five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S9.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S10">
               <title>
                  <p>Additional file 10</p>
               </title>
               <text>
                  <p>List of genes placed downstream of microdiversity loci in the <it>E. coli </it>B2 five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S10.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S11">
               <title>
                  <p>Additional file 11</p>
               </title>
               <text>
                  <p>List of genes placed downstream of microdiversity loci in the <it>S. aureus </it>five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S11.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S12">
               <title>
                  <p>Additional file 12</p>
               </title>
               <text>
                  <p>List of genes placed downstream of microdiversity loci in the <it>S. pyogenes </it>five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S12.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S13">
               <title>
                  <p>Additional file 13</p>
               </title>
               <text>
                  <p>Distribution of the <it>E. coli </it>genes placed downstream of microdiversity loci in functional categories.</p>
               </text>
               <file name="gb-2010-11-4-r45-S13.XLS">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The mixed loci (5 to 10% of all loci) correspond generally to cases where the VSs are either intragenic or intergenic. This suggests mutagenic insertion of a DNA sequence inside a gene, leading to its pseudogenization in the strains where the locus is intergenic. Some additional cases of pseudogenization may be detected in loci with a flanking gene missing (5 to 7% of all loci; Figure <figr fid="F4">4</figr>), if the gene loss is due to the introduction of the VS.</p>
         </sec>
         <sec>
            <st>
               <p>Some 10% of the VSs are flanked by direct repeats in the microdiversity loci</p>
            </st>
            <p>Recombination between directly oriented repeats placed at the base of the VS may explain one mechanism of variability: in some strains, a deletion may have occurred between repeats, thereby generating a new locus in the alignment. The percentage of VSs flanked by repeats varied between 10 and 18%, with the highest frequency occurrence in <it>S. aureus </it>(Table <tblr tid="T4">4</tblr>, first part). The vast majority (66 to 94%) of repeat sequences were less than 30 bp in size.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Characteristics of microdiversity loci flanked by repeats</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>E. coli</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b><it>E. coli </it>B2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>S. aureus</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>S. pyogenes</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VS analysis</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>VS flanked by repeats/all VS</p>
                     </c>
                     <c ca="center">
                        <p>10%</p>
                     </c>
                     <c ca="center">
                        <p>14%</p>
                     </c>
                     <c ca="center">
                        <p>18%</p>
                     </c>
                     <c ca="center">
                        <p>12%</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Repeats less than 30 bp/all VS with repeats</p>
                     </c>
                     <c ca="center">
                        <p>74%</p>
                     </c>
                     <c ca="center">
                        <p>66%</p>
                     </c>
                     <c ca="center">
                        <p>82%</p>
                     </c>
                     <c ca="center">
                        <p>94%</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Loci analysis</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Total number of loci</p>
                     </c>
                     <c ca="center">
                        <p>640</p>
                     </c>
                     <c ca="center">
                        <p>370</p>
                     </c>
                     <c ca="center">
                        <p>556</p>
                     </c>
                     <c ca="center">
                        <p>250</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>% of loci with VSs flanked by repeats (r-loci)/all loci</p>
                     </c>
                     <c ca="center">
                        <p>21%</p>
                     </c>
                     <c ca="center">
                        <p>22%</p>
                     </c>
                     <c ca="center">
                        <p>32%</p>
                     </c>
                     <c ca="center">
                        <p>23%</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>% loci with possible deletion/r-loci</p>
                     </c>
                     <c ca="center">
                        <p>51%</p>
                     </c>
                     <c ca="center">
                        <p>66%</p>
                     </c>
                     <c ca="center">
                        <p>16%</p>
                     </c>
                     <c ca="center">
                        <p>42%</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>% loci with possible deletion/all loci</p>
                     </c>
                     <c ca="center">
                        <p>21%</p>
                     </c>
                     <c ca="center">
                        <p>25%</p>
                     </c>
                     <c ca="center">
                        <p>22%</p>
                     </c>
                     <c ca="center">
                        <p>20%</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>If repeats are responsible for instability, one would expect to find genomes in which the VS is deleted. Loci at which at least one of the VSs was flanked by repeats were designated 'r-loci' (Table <tblr tid="T4">4</tblr>, second part). Among these r-loci, the proportion of those where at least one genome had an empty VS at the locus (empty VS means the VS is absent or less than 20 bp long) could be calculated (Table <tblr tid="T4">4</tblr>, last lines). For the <it>E. coli </it>and <it>S. pyogenes </it>alignments, this proportion was 42 to 66%, which is significantly higher than expected (<it>P </it>&lt;&lt; 0.01). For <it>S. aureus</it>, the proportion of r-loci with apparent deletions was only 16%, which is even less than the overall proportion of loci with apparent deletions (22%). We conclude that for the r-loci, variability may be explained in part by recombination between these repeats; these events appear to be more frequent in <it>E. coli </it>and <it>S. pyogenes </it>than in <it>S. aureus</it>. Overall, up to one-fifth of the microdiversity between genomes may be due to recombination between short repeats flanking some of the VSs.</p>
         </sec>
         <sec>
            <st>
               <p>Global prediction of loci history reveals two important categories of events: dimorphic loci, and highly divergent loci</p>
            </st>
            <p>A global analysis was carried out to investigate the possible history of loci and assess the contribution of deletions, insertions, and more complex situations. This implied the analysis of VS content, placed within a phylogenetic context. Our approach consisted first in assigning an 'occupancy' value to all loci. It corresponds, for a given locus, to the number of genomes that 'occupy' the locus, that is, where the VS is not empty. We observed that 75 to 80% of loci had maximal occupancy, that is, occupancy 5 (Additional file <supplr sid="S14">14</supplr>).</p>
            <suppl id="S14">
               <title>
                  <p>Additional file 14</p>
               </title>
               <text>
                  <p>Loci occupancy in the five-genome alignments.</p>
               </text>
               <file name="gb-2010-11-4-r45-S14.DOC">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>We then made use of locus occupancy, strain phylogeny and VS content to predict some simple situations, using the parsimony principle (Figure <figr fid="F2">2b</figr>): loci of occupancy 1 with VSs on a short branch were predicted to be 'recent insertions', while loci of occupancy 4 with identical VS content and the longer branch occupied were predicted as 'recent deletions'. Using a similar method, loci of occupancy 2 or 3 with VSs of identical content present on the same sub-tree, were predicted as 'ancestral insertions or ancestral deletions'. Among the loci of maximal occupancy, two situations were singled out: loci with only two kinds of VS segregating on subtrees, which were named 'dimorphs'; and loci where all VSs turned out to be of nearly identical content, which were named 'homeologs'. These loci may indicate places where DNA diverges more rapidly than elsewhere on the genome, and they were therefore kept in the 'VS pool'. The last category of 'polymorphs' included all other loci.</p>
            <p>Results showing the proportions of loci encountered in each category are reported in Figure <figr fid="F6">6</figr>. Surprisingly, the 'dimorphs', in which a given locus contains exactly two different kinds of segment, was the most abundant category. Dimorphic loci can be explained by the presence of a DNA insertion hot spot or by the replacement of an 'ancestral' sequence by a new segment. If such is the case, it should be possible to match one of the two VSs of the locus with a genome segment of a closely related species. A Blast analysis was conducted for the <it>E. coli </it>and B2 phylogenetic group alignments on all dimorphic loci, using <it>Escherichia fergusonii </it>as an out-group <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. In 55% of <it>E. coli </it>loci, and 36% of the B2 group loci, a matching segment with <it>E. fergusonii </it>was found (76% identity on 90% of its length). This argues for the existence of a segment replacement in a fraction of the dimorphs. A comparable matching could not be performed for the two other species due to the absence of a sufficiently proximal genome out-group.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Prediction of locus histories in the four alignments</p>
               </caption>
               <text>
                  <p><b>Prediction of locus histories in the four alignments</b>. The proportion of each category is given as percentages of total loci present in each alignment.</p>
               </text>
               <graphic file="gb-2010-11-4-r45-6" hint_layout="double"/>
            </fig>
            <p>Homeologous loci represented 9 to 30% of the total loci (see Figure <figr fid="F5">5</figr> for an example of such an homeologous locus). Interestingly, the longer the maximal MUMi genomic distance among the strains being compared, the higher the proportion of divergent loci among the total VSs. This may suggest that the yield of divergent loci reflects the evolutionary time elapsed from the time that the species diverged. The homeologous loci were significantly enriched among the intragenic loci for two alignments: <it>E. coli </it>(53% of intragenic loci are homeologous, compared to 30% homeologous loci overall, <it>P </it>&lt;&lt; 0.01), and <it>S. aureus </it>(33% compared to 23%, <it>P </it>= 0.017). This was not the case, however, for the B2 <it>E. coli </it>alignment (14% compared to 9%, <it>P </it>= 0.08), or the <it>S. pyogenes </it>alignment, where 23% of intragenic loci are homeologous, compared to 20% overall.</p>
            <p>The polymorphic loci included 4 to 31% of all microdiversity loci, and may correspond to recombination hotspots, which remain to be studied in detail.</p>
            <p>We then proceeded to test whether the two most important categories identified with the five-genome alignments, namely dimorphic and homeologous loci, were conserved when more genomes were included in the alignment. This proved to be the case (Table <tblr tid="T2">2</tblr>). For the <it>E. coli </it>and the <it>S. pyogenes </it>alignments, the homeologous loci even became preponderant relative to the dimorphic loci.</p>
            <p>In conclusion, microdiversity loci correspond mostly to cases of segment replacement, recombination hot spots, or to homeologous DNA that diverged faster relative to the backbone DNA. Cases of simple deletion or insertions were scarce, proportionally.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Microdiversity constitutes a major type of variability between bacterial genomes within a species</p>
            </st>
            <p>The main outcome of this study is the discovery of a major type of bacterial genome diversity at the species level, made of variable short segments between 20 and 500 bp long. In the five-genome alignments, these VSs represent some 63 to 72% of all possible variable regions detected by whole genome alignments. They remain very abundant (50 to 72% of all loci) when a maximal number of genomes are included in the alignments (Table <tblr tid="T2">2</tblr>). The presence of such small diversity had been reported earlier for <it>E. coli </it><abbrgrp><abbr bid="B4">4</abbr><abbr bid="B58">58</abbr></abbrgrp>, and its general importance is presently emerging in various comparative genomic studies, both in eukaryotes <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> and prokaryotes <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>, where it is often reported as indels. However, the term indel is imprecise with respect to the size of segments involved (it can be used for 1- to 10-bp insertions or deletions up to the insertion or deletion of genomic islands). It is also misleading in terms of the underlying mechanism because it suggests that an insertion or a deletion occurred. Our work shows that more than 80% of the microdiversity loci are due to neither insertion nor deletion. The term indel was therefore replaced in this study by the more neutral term of microdiversity. If such microdiversity were found essentially outside genes, it might be considered as recombination scars, with little evolutionary importance. However, among the five-genome alignments, 35 to 55% of microdiversity regions lie within ORFs and 16 to 33% of VSs are immediately upstream of ORFs. They should therefore contribute greatly to strain diversity within a species, either by affecting protein domains or by changing gene expression.</p>
            <p>Among the <it>E. coli </it>genes harboring microdiversity, those encoding membrane and surface proteins are significantly enriched in VSs. This is in keeping with the notion that bacteria adapt to their varying and challenging environments by modifying their surface proteins, as already documented <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. A comparative genome analysis detected 23 genes that are under positive selection in <it>E. coli </it><abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. The present study identifies six of them (<it>fhuA</it>, <it>ompA</it>, <it>ompC</it>, <it>ompF</it>, <it>lamB </it>and <it>ubiF</it>) as harboring microdiversity. Moreover, for five of the six proteins where the structure is known, the Peterson analysis revealed that all mutations were concentrated on one or a few loops of the protein <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>; this feature allowed us to detect them in our screen, as scattered mutations would have gone undetected. Recently, using a more sensitive approach, 290 core genes of <it>E. coli </it>were detected as under short-term positive selection <abbrgrp><abbr bid="B63">63</abbr></abbrgrp>. However, only four of them (<it>narH, fes</it>, <it>cstA </it>and <it>yphH</it>) corresponded to the 192 genes we report here as harboring microdiversity. Therefore, at least 10 of the 192 genes harboring microdiversity may be under positive selection. Interestingly, microdiversity regions have been found in orthologous proteins compared broadly across bacterial and yeast species and found to be more numerous in essential proteins, which suggests a functional role for these flexible regions <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Illegitimate recombination may explain a large fraction of the VSs</p>
            </st>
            <p>One aim of this study was to elucidate the mechanisms underlying DNA recombination in microbial genomes. To this end, we focused on <it>E. coli</it>, the best studied bacterial species at the molecular level for recombination. More than half of the VS loci could not be explained by site-specific recombination, nor by transposition, nor by the hypothetical mechanism invoked for very short dispersed elements similar to PUs <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> (Table <tblr tid="T2">2</tblr>). We speculate that homologous or illegitimate recombination may explain these loci: in the three species, analysis of the five-genome alignments have shown that 10 to 18% of the VSs are flanked by repeats at least 5 bp long, which might account for part of the variability, especially as a deletion was often found associated with such loci (Table <tblr tid="T4">4</tblr>). However, as most repeats were of a size below 30 bp, the reported threshold for RecA-dependent homologous recombination in <it>E. coli </it><abbrgrp><abbr bid="B64">64</abbr></abbrgrp>, it is likely that VSs are generated by replication slippage between the repeats, a mechanism also called short-homology-dependent illegitimate recombination <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>. Although not as proportionally abundant as events detected in a previous, more limited study <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>, the present analysis implicates short-homology-mediated deletion events as one significant cause of genome variability.</p>
            <p>This conclusion on the importance of illegitimate recombination with regards to the VSs should not yield to the notion that homologous recombination is unimportant in bacterial genomes. Rather, homologous recombination relies on the detection of subtle tracts of 3 to 4% diverged sequences, which are not taken into account in our VS analysis. These sequences are part of the backbone, and studies on backbone DNA detecting blocks of mutations moving together across strains have shown, to the contrary, that homologous recombination plays a great role in bacteria. In <it>E. coli</it>, the average size of these blocks was estimated to be 500 bp in a first study on four genomes <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>, and more recently re-estimated to to 50 bp based on a 20-genome comparison <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. It has also been demonstrated that genomic islands, once integrated into a genome (by site-specific recombination most likely), diffuse in a population by homologous recombination between the sequences flanking the island <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
            <p>Dimorphic loci, which contain exactly two different segments at a given site, represent 38 to 68% of all loci in the five-genome alignments (Figure <figr fid="F6">6</figr>), and 22 to 52% of all microdiversity loci in the maximal alignments (Table <tblr tid="T2">2</tblr>). In the case of the <it>E. coli </it>five-genome alignment, we found that in about half the cases, one of the two segments was present in <it>E. fergusonii</it>. This suggests that the ancestral segment was replaced at some point by another segment. A process called 'illegitimate recombination assisted by homology' can produce such a situation <abbrgrp><abbr bid="B67">67</abbr><abbr bid="B68">68</abbr><abbr bid="B69">69</abbr></abbrgrp>. If the new incoming DNA segment is flanked by a segment homologous to the recipient chromosome, RecA may initiate homologous recombination on part of the molecule, followed by 'illegitimate' actors that complete the DNA integration at the other extremity (Figure <figr fid="F7">7a</figr>). Such a process is described in <it>Streptococcus pneumoniae</it>, <it>Acinetobacter baylii </it>and <it>Pseudomonas stutzeri</it>, three naturally competent species, and was found to be 10<sup>2</sup>- to 10<sup>5</sup>-fold more efficient than strict illegitimate recombination <abbrgrp><abbr bid="B67">67</abbr><abbr bid="B68">68</abbr><abbr bid="B69">69</abbr></abbrgrp>. Whether such a process could occur in <it>E. coli</it>, for instance during DNA conjugation, is presently under study. Alternatively, dimorphic (as well as polymorphic) loci may also correspond to fragile sites of the chromosome, which are hot spots of illegitimate recombination.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Possible mechanisms explaining dimorphic and homeologous loci</p>
               </caption>
               <text>
                  <p><b>Possible mechanisms explaining dimorphic and homeologous loci</b>. <b>(a)</b> Dimorphic loci. Incoming DNA (the shorter, black and grey molecule above) may recombine by illegitimate recombination assisted by homology with the resident bacterial chromosome G1. HR, homologous recombination; IR, illegitimate recombination; G1 and G2, genomes 1 and 2; VS, variable segment. <b>(b)</b> Three possible scenarios to explain the origin of microdiversity at homeologous loci in bacterial genomes (see text for details).</p>
               </text>
               <graphic file="gb-2010-11-4-r45-7" hint_layout="double"/>
            </fig>
            <p>Although illegitimate recombination occurs at low frequency, our analysis of VSs suggests that it nevertheless is responsible for a large proportion of the genomic diversity: taking all loci differing from known events for <it>E. coli</it>, and labeled "Other" in Table <tblr tid="T3">3</tblr>, and removing the category of homeologous loci (Figure <figr fid="F6">6</figr>) we estimate that it is responsible for 41% (<it>E. coli </it>five-genome alignment) to 56% (<it>E. coli </it>B2 alignment) of microdiversity loci.</p>
         </sec>
         <sec>
            <st>
               <p>What mechanism generates homeologous DNA microdiversity?</p>
            </st>
            <p>A particular class of loci comprises those containing homeologous sequences. For <it>E. coli</it>, <it>S. aureus </it>and <it>S. pyogenes</it>, they represent 20 to 30% of loci in the five-genome alignments, and even more (20 to 46%) in the maximal genome alignments (Table <tblr tid="T2">2</tblr>). They are less abundant, however, in the alignment of B2 genomes (9%). Interestingly, we found that among the five-genome alignments, homeologous loci were significantly enriched among intragenic loci (50 to 78% of the divergent loci are intragenic). The question arises as to how such blocks of microdiversity could be generated. Three scenarios are considered: positive selection, homeologous recombination and mutation showers (Figure <figr fid="F7">7b</figr>).</p>
            <sec>
               <st>
                  <p>Positive selection</p>
               </st>
               <p>A given protein domain may be under positive selection, so that non-synonymous mutations accumulate in a limited region of the corresponding gene, while conservation of the rest of the protein is selected by physical constraints (for example, membrane-spanning domains), such that non-synonymous mutations are counter-selected. In contrast, synonymous mutations are expected in equal density inside and outside the microdiversity block. However, we did not observe this pattern (synonymous mutations were also enriched in the homeologous loci), and therefore tend to exclude this hypothesis.</p>
            </sec>
            <sec>
               <st>
                  <p>Homeologous recombination between diverged DNA segments</p>
               </st>
               <p>Given our similarity threshold, recombination should have taken place between at least 24% diverged sequences. In <it>E. coli</it>, RecA seems inefficient on 22% diverged sequences <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>, and <it>B. subtilis </it>RecA is apparently inhibited by 7% divergence <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>. However, phage recombinases may be more efficient on highly diverged DNA <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>. Moreover, it is suspected that, in nature, bacteria alternate between a mutator and non-mutator state, via the inactivation/activation of the <it>mutS </it>or <it>mutL </it>genes, and during the mutator period, homeologous recombination should increase <abbrgrp><abbr bid="B72">72</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Mutation showers</p>
               </st>
               <p>High mutation densities are sometimes observed both in eukaryotes <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>and prokaryotes <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>, and it is suggested that local exposure to a mutagenic agent, or a long state as single strand DNA may result in such mutation showers <abbrgrp><abbr bid="B75">75</abbr></abbrgrp>.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We report here an attempt to examine systematically genome variability at the DNA level in several bacterial species. We have shown that at the species level, the main kind of genomic variability is 'microdiversity'. It consists of small blocks (20 to 500 bp in length) of DNA, often present within or upstream of genes and contributing to the genome diversity. This notion raises the question of the mechanisms that may generate such diversity, and opens challenging new questions at both the molecular and bacterial evolution level.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Genomes</p>
            </st>
            <p>All publicly available complete sequences and annotations were downloaded from the Genome Reviews database <abbrgrp><abbr bid="B76">76</abbr></abbrgrp>. <it>S. aureus </it>genomes: Mu50 [GenBank: <ext-link ext-link-id="BA000017" ext-link-type="gen">BA000017</ext-link>], MW2 [GenBank:<ext-link ext-link-id="BA000033" ext-link-type="gen">BA000033</ext-link>], COL [GenBank:<ext-link ext-link-id="CP000046" ext-link-type="gen">CP000046</ext-link>], RF122 [GenBank:<ext-link ext-link-id="AJ938182" ext-link-type="gen">AJ938182</ext-link>], MRSA252 [GenBank:<ext-link ext-link-id="BX571856" ext-link-type="gen">BX571856</ext-link>], N315 [GenBank:<ext-link ext-link-id="BA000018" ext-link-type="gen">BA000018</ext-link>], JH1 [GenBank:<ext-link ext-link-id="CP000736" ext-link-type="gen">CP000736</ext-link>], MSSA476 [GenBank:<ext-link ext-link-id="BX571857" ext-link-type="gen">BX571857</ext-link>], NCTC8325 [GenBank:<ext-link ext-link-id="CP000253" ext-link-type="gen">CP000253</ext-link>], Newman [GenBank:<ext-link ext-link-id="AP009351" ext-link-type="gen">AP009351</ext-link>], USA300 [GenBank:<ext-link ext-link-id="CP000255" ext-link-type="gen">CP000255</ext-link>]. <it>S. pyogenes </it>genomes: M1 GAS, also known as SF370 [GenBank:<ext-link ext-link-id="AE004092" ext-link-type="gen">AE004092</ext-link>], GAS315 [GenBank:<ext-link ext-link-id="NC004070" ext-link-type="gen">NC004070</ext-link>], GAS8232 [GenBank:<ext-link ext-link-id="NC003485" ext-link-type="gen">NC003485</ext-link>], GAS2096 [GenBank:<ext-link ext-link-id="NC008023" ext-link-type="gen">NC008023</ext-link>], GAS10270 [GenBank:<ext-link ext-link-id="NC008022" ext-link-type="gen">NC008022</ext-link>], GAS9429 [GenBank:<ext-link ext-link-id="CP000259" ext-link-type="gen">CP000259</ext-link>], GAS10750 [GenBank:<ext-link ext-link-id="CP000262" ext-link-type="gen">CP000262</ext-link>], NZ131 [GenBank:<ext-link ext-link-id="CP000829" ext-link-type="gen">CP000829</ext-link>], GAS5005 [GenBank:<ext-link ext-link-id="CP000017" ext-link-type="gen">CP000017</ext-link>], GAS6180 [GenBank:<ext-link ext-link-id="CP000056" ext-link-type="gen">CP000056</ext-link>], GAS10394 [GenBank:<ext-link ext-link-id="CP000003" ext-link-type="gen">CP000003</ext-link>], Manfredo [GenBank:<ext-link ext-link-id="AM295007" ext-link-type="gen">AM295007</ext-link>]. <it>E. coli </it>genomes: K-12 MG1655 [GenBank:<ext-link ext-link-id="U00096" ext-link-type="gen">U00096</ext-link>], O157:H7 Sakai [GenBank:<ext-link ext-link-id="BA000007" ext-link-type="gen">BA000007</ext-link>], B2 phylogenetic group, strain CFT073 [GenBank:<ext-link ext-link-id="AE014075" ext-link-type="gen">AE014075</ext-link>], B2 group, strain UTI89 [GenBank:<ext-link ext-link-id="CP000243" ext-link-type="gen">CP000243</ext-link>], B2 group, strain APECO1 [GenBank:<ext-link ext-link-id="CP000468" ext-link-type="gen">CP000468</ext-link>], B2 phylogenetic group, strain 536 [GenBank:<ext-link ext-link-id="CP000247" ext-link-type="gen">CP000247</ext-link>], B2 phylogenetic group, strain S88 [GenBank:<ext-link ext-link-id="CU928161" ext-link-type="gen">CU928161</ext-link>], W3110 [GenBank:<ext-link ext-link-id="AP009048" ext-link-type="gen">AP009048</ext-link>], DH10B [GenBank:<ext-link ext-link-id="CP000948" ext-link-type="gen">CP000948</ext-link>], BW2952 [GenBank:<ext-link ext-link-id="CP001396" ext-link-type="gen">CP001396</ext-link>], REL606 [GenBank:<ext-link ext-link-id="CP000819" ext-link-type="gen">CP000819</ext-link>], BL21 [GenBank:<ext-link ext-link-id="AM946981" ext-link-type="gen">AM946981</ext-link>], HS [GenBank:<ext-link ext-link-id="CP000802" ext-link-type="gen">CP000802</ext-link>], Crooks [GenBank:<ext-link ext-link-id="CP000946" ext-link-type="gen">CP000946</ext-link>], 55989 [GenBank:<ext-link ext-link-id="CU928145" ext-link-type="gen">CU928145</ext-link>], E24377A [GenBank:<ext-link ext-link-id="CP000800" ext-link-type="gen">CP000800</ext-link>], SE11 [GenBank:<ext-link ext-link-id="AP009240" ext-link-type="gen">AP009240</ext-link>], EDL933 [GenBank:<ext-link ext-link-id="AE005174" ext-link-type="gen">AE005174</ext-link>], TW14359 [GenBank:<ext-link ext-link-id="CP001368" ext-link-type="gen">CP001368</ext-link>], 4115 [GenBank:<ext-link ext-link-id="CP001164" ext-link-type="gen">CP001164</ext-link>], SMS3-5, named SECEC here [GenBank:<ext-link ext-link-id="CP000970" ext-link-type="gen">CP000970</ext-link>], IAI39 [GenBank:<ext-link ext-link-id="CU928164" ext-link-type="gen">CU928164</ext-link>], B2 phylogenetic group, E2348-69 [GenBank:<ext-link ext-link-id="FM180568" ext-link-type="gen">FM180568</ext-link>]. All <it>E. coli </it>genome annotations were downloaded from the Genoscope ColiScope project <abbrgrp><abbr bid="B77">77</abbr></abbrgrp>, and their annotations were homogenized using the MaGe annotation platform <abbrgrp><abbr bid="B78">78</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Alignment strategies</p>
            </st>
            <p>A first set of alignments involving few and collinear genomes were computed using the MGA software <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Genomes were selected so as to be representative of the species under study. For this, a genomic distance based on maximal unique matches (MUM) was calculated for all possible genome pairs <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, and neighbor-joining trees were built so as to choose the appropriate genomes. When several closely related genomes were available, the second criterion used was genome collinearity, as determined by Mummer plots <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>. MGA alignment parameters were fine-tuned as described <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Briefly, in a first step, detection of anchors composed of maximal exact matches of minimal length 50 bp common to all genomes was carried out. A subset of collinear anchors was then selected by a chaining algorithm. Next, these two steps were repeated in each interval framed by the chosen anchors, using a lower minimal length value of 20 bp for the maximal exact matches. The remaining gaps of the alignment, if shorter in length than 3,000 bp, were treated with ClustalW.</p>
            <p>MGA alignment outputs are stored in the MOSAIC database after a post-treatment step on the raw ClustalW results. This step is needed to define, among the ClustalW output files, those in which the alignment reflects common ancestry from those where different pieces of DNA are forced into an alignment. As described earlier <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, post-treatment parameters were chosen so as to classify as VSs all segments of a given locus, if at least two of them share less than 76% identity on 100% of the aligned length, or if a gap larger than 20 bp is found in the alignment. This allowed a high sensitivity with respect to VS size, but also some flexibility with respect to overall DNA divergence. The choice of the 76% threshold for DNA identity is described in Additional file <supplr sid="S2">2</supplr>. The 20-bp gap size was chosen as corresponding, at the protein level, to a small secondary structure of at least six amino acids. The minimal VS size was set to 20 bp. We compared the results obtained when the minimal VS size was increased from 20 to 42 bp for a three-strain <it>E. coli </it>and a six-strain <it>S. aureus </it>alignment (alignments computed in the preparatory phase of this analysis). This resulted in a 26% decrease in the global number of loci. This indicated that an important proportion of VSs belongs to microdiversity loci, and justified our choice to maintain the minimal VS size as 20 bp, so as to be more sensitive to the microdiversity loci that may contribute to strain diversity.</p>
            <p>A second set of alignments were computed so as to include a maximal number of genomes for the <it>E. coli</it>, <it>S. aureus</it>, and <it>S. pyogenes </it>species, using MAUVE version 1.2.3 for <it>S. aureus </it>and <it>S. pyogenes </it><abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, and progressive MAUVE version 2.1.3 for <it>E. coli</it>, instead of MGA for the first step. The same MOSAIC post-treatment step as described above was then applied <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Compared to MGA, the MAUVE software offers the advantages of dealing with large rearrangements, and the possibility to treat high numbers of genomes. This comes, however, at the price of slightly less precise backbone/VS boundaries, as we observed when comparing output from MGA versus MAUVE version 1.2.3 for an <it>E. coli </it>MG1655-Sakai alignment. Analyses requiring precise VS boundaries, such as repeat detection and positions of VSs relative to genes, were thus restricted to the MGA alignments. The phylogenetic trees corresponding to the strains used for the alignments are shown in Additional file <supplr sid="S1">1</supplr>.</p>
         </sec>
         <sec>
            <st>
               <p>Collection of additional annotations for the E. coli genomes</p>
            </st>
            <sec>
               <st>
                  <p>Bacteriophages</p>
               </st>
               <p>Phage coordinates of strains MG1655 and Sakai were downloaded from the Sakai genome project web page <abbrgrp><abbr bid="B80">80</abbr></abbrgrp>. For the CFT073, UTI89 and 536 genomes, the Prophinder tool <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and web access were used <abbrgrp><abbr bid="B81">81</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>CRISPR sequences</p>
               </st>
               <p>Positions of the CRISPR sequences were retrieved from the CRISPR database of G Vergnaud's laboratory <abbrgrp><abbr bid="B82">82</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Genomic islands</p>
               </st>
               <p>Ou <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> described a systematic means to detect genomic islands. Coordinates were downloaded from the supplementary data provided by them for MG1655, Sakai and CFT073 genomes. For the other genomes, an approach similar to that of Ou <it>et al</it>. based on synteny break points was used. Briefly, blocks of genes at least 5 kb long and not following the local synteny are analyzed for exceptional GC content or interpolated variable order motif (IVOM) value <abbrgrp><abbr bid="B83">83</abbr></abbrgrp>, presence of flanking tRNA genes, and presence of integrase-like genes. All blocks meeting at least one of the criteria were considered as regions of genomic plasticity, a denomination that does not make any assumption about the evolutionary origin or genetic basis of these variable chromosomal segments. The regions corresponding to bacteriophages and CRISPRs as defined above were then removed, and counted separately.</p>
            </sec>
            <sec>
               <st>
                  <p>Insertion sequences</p>
               </st>
               <p>For all genomes but S88, UMN026 and IAI1, IS coordinates were taken from the ASAP site <abbrgrp><abbr bid="B84">84</abbr></abbrgrp>. For the three remaining genomes, ISs were detected by the presence of transposase genes.</p>
            </sec>
            <sec>
               <st>
                  <p>Palindromic units/repetitive sequence elements</p>
               </st>
               <p>PUs, also called repetitive sequences, have been described for <it>E. coli </it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Their coordinates on MG1655 were calculated starting from the Bachelier web page <abbrgrp><abbr bid="B85">85</abbr></abbrgrp>, and converting the coordinates so that they match with the current version of the MG1655 genome. Detection of putative PUs on the other <it>E. coli </it>genomes was performed as follows. PUs being palindromic, the presence of half a PU was searched using fuzznuc (EMBOSS package), with the following pattern ' [ACG] [AT] [TC]GCC [GT]GATGCGN(3,9)CG [CT](0,1)CTTATC [CA] GGCCTAC [AG]' allowing for a maximum of four mismatches. PUs are often associated in pairs, which form bacterial interspersed mosaic elements. PUs separated by less than 100 bp were therefore associated in a unique mosaic element. Application of this pattern to the MG1655 complete genome allows detection of 80% of the 266 PUs or mosaics described in <abbrgrp><abbr bid="B85">85</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Minisatellites</p>
               </st>
               <p>Genomes were searched for tandemly repeated sequences on the minisatellite database of G Vergnaud's laboratory <abbrgrp><abbr bid="B86">86</abbr></abbrgrp>. Parameters used were repeat motifs at least 20 bp long, repeated at least twice, such that identity between repeats is at least 90%. Among the minisatellites, a majority corresponded to PU elements that were scored separately (see above), so that only the remaining, non-PU minisatellites were reported in this category.</p>
            </sec>
            <sec>
               <st>
                  <p>Source of other <it>E. coli </it>variable segment</p>
               </st>
               <p>For all <it>E. coli </it>VSs that did not correspond to the above mentioned annotations, an estimation of their content was carried out using Blast against the EMBL Non-Redundant database, and the result was considered positive if at least 90% identity over at least 90% of the length was obtained. Results were parsed using the following categories: DNA segment present in at least one other <it>E. coli </it>strain (except very close kin such as EDL933, which is clonally related to the Sakai strain, or W3110, related to MG1655); DNA segment present in another bacterial species or a non-cultivable sample; no match in the Non-Redundant database.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Variable segment analysis</p>
            </st>
            <sec>
               <st>
                  <p>Data preparation</p>
               </st>
               <p>Coordinates of the VSs for all four alignments were downloaded from the MOSAIC web site <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. A script written in Python allowed us to analyze the VSs, in which the central object was the 'locus' class, composed of all VSs belonging to the same locus. Boundaries of some of the VSs as generated by the aligner were sometimes inexact, in the sense that the DNA content of the boundary (usually not more than 20 to 100 bp) was more than 90% identical in all VSs. A pre-treatment of the VS arrays was therefore performed to trim such boundaries (and sometimes remove a VS when its size shrank below 20 bp). As a result, some of the VSs described in the MOSAIC interface are slightly larger than those considered in this study.</p>
            </sec>
            <sec>
               <st>
                  <p>Inspection of variable segment boundaries relative to backbone genes</p>
               </st>
               <p>For all VSs, a right and left neighboring gene on the backbone was assigned (the neighbor gene either overlapped the VS or was the first gene next to it). The position of all VSs of a given locus, relative to these genes, was then analyzed. If all VSs were inside genes, meaning that the ORF of the genes in all genomes was not interrupted by any of the VSs of the locus, the locus was labeled intragenic. If all VSs were between two genes that did not overlap with the VS boundaries, the locus was labeled intergenic. A flanking gene on the backbone was considered as missing if, among all VSs of the locus, the distance between one VS boundary and its neighboring gene distal extremity varied by more than 500 bp (that is, the approximate size of a small gene). When the flanking gene overlapped with a VS boundary, the gene portion lying inside the VS was compared with all VSs: if this portion varied by more than 50 bp (approximately 16 amino acids), it was considered that the locus modified the length of the flanking gene. If the neighbor genes overlapped the VS by less than these 50 bp, the overlapping was considered negligible and the locus was considered as intergenic.</p>
            </sec>
            <sec>
               <st>
                  <p>Detection of repeats flanking variable segments</p>
               </st>
               <p>For all VSs, a DNA fragment encompassing the VS and 500 bp flanking each side was extracted. Repeat detection was done with the Vmatch software <abbrgrp><abbr bid="B87">87</abbr></abbrgrp>, using a three step procedure. First, VS boundaries were scanned for the presence of repeats of length = 11 bp, allowing 10% divergence between the repeats, and a misplacement of the repeat of 10 bp around the position of the VS boundary. If no repeat was found, a second search of repeats of length >10 bp with a Hamming distance of 1 was carried out. A final scan was done in case of repeat detection failure, for exact repeats &#8805; 5 bp (this value was chosen based on an example of a known, accurate deletion of genes <it>yafN </it>and <it>yafO </it>that occurred between a 5-bp repeat in the CFT073 strain of <it>E. coli</it>), allowing no misplacement relative to the VS boundary (otherwise, the probability to find such repeats at random is too high). This last step was found to double the number of VSs flanked by repeats.</p>
            </sec>
            <sec>
               <st>
                  <p>Detection of variable segments with similar DNA content</p>
               </st>
               <p>To determine which VSs of a locus had similar content, pairwise alignments on VSs having similar lengths (&#177; 10%) were performed using 'stretcher' (EMBOSS suite). A similar content was attributed if more than 76% identity was found over at least 90% of the smaller VS length. A final step controlled that all relationships within the locus were transitive.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>bp: base pair; CRISPR: clustered: regularly interspaced short palindromic repeat; IS: insertion sequence; ORF: open reading frame; PU: palindromic unit; VS: variable segment.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>FT performed the analysis of maximal genome alignments, MAP conceived the work, performed it, and wrote the manuscript. ED, CM and VB were responsible for the complete sequencing and annotation of seven strains of <it>Escherichia </it>(ColiScope project), and made their data available prior to publication. MEK contributed to the work and ED and MEK contributed to the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We are thankful to Alexandra Gruss and Ivan Matic for helpful comments on the manuscript. We thank H&#233;l&#232;ne Chiapello, Annie Gendrault and Philippe Palcy for running MGA and MAUVE alignments and integrating them into the MOSAIC database <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>, as well as the Migale bioinformatics platform for providing computational resources and technical assistance. The research was funded by the French 'Agence Nationale de la Recherche' project CoCoGen BLAN07-1_185484.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Mauve: multiple alignment of conserved genomic sequence with rearrangements.</p>
            </title>
            <aug>
               <au>
                  <snm>Darling</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Mau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Blattner</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1394</fpage>
            <lpage>1403</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.2289704</pubid>
                  <pubid idtype="pmcid">442156</pubid>
                  <pubid idtype="pmpid" link="fulltext">15231754</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Efficient multiple genome alignment.</p>
            </title>
            <aug>
               <au>
                  <snm>Hohl</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ohlebusch</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>Suppl 1</issue>
            <fpage>S312</fpage>
            <lpage>320</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12169561</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species.</p>
            </title>
            <aug>
               <au>
                  <snm>Treangen</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Messeguer</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>433</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-7-433</pubid>
                  <pubid idtype="pmcid">1629028</pubid>
                  <pubid idtype="pmpid" link="fulltext">17022809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops.</p>
            </title>
            <aug>
               <au>
                  <snm>Chiapello</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bourgait</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Sourivong</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Heuclin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gendrault-Jacquemard</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Petit</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>El Karoui</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>171</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-6-171</pubid>
                  <pubid idtype="pmcid">1187871</pubid>
                  <pubid idtype="pmpid" link="fulltext">16011797</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.</p>
            </title>
            <aug>
               <au>
                  <snm>Chiapello</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gendrault</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Caron</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Blum</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Petit</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>El Karoui</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>498</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-9-498</pubid>
                  <pubid idtype="pmcid">2607288</pubid>
                  <pubid idtype="pmpid" link="fulltext">19038022</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Complete genome sequence of enterohemorrhagic <it>Escherichia coli </it>O157:H7 and genomic comparison with a laboratory strain K-12.</p>
            </title>
            <aug>
               <au>
                  <snm>Hayashi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Makino</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ohnishi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kurokawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ishii</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yokoyama</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Ohtsubo</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Nakayama</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Murata</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tanaka</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tobe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Iida</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Takami</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Honda</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sasakawa</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ogasawara</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Yasunaga</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kuhara</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shiba</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shinagawa</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>DNA Res</source>
            <pubdate>2001</pubdate>
            <volume>8</volume>
            <fpage>11</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/dnares/8.1.11</pubid>
                  <pubid idtype="pmpid" link="fulltext">11258796</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Resolving the structural features of genomic islands: a machine learning approach.</p>
            </title>
            <aug>
               <au>
                  <snm>Vernikos</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2008</pubdate>
            <volume>18</volume>
            <fpage>331</fpage>
            <lpage>342</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.7004508</pubid>
                  <pubid idtype="pmcid">2203631</pubid>
                  <pubid idtype="pmpid" link="fulltext">18071028</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Insights on evolution of virulence and resistance from the complete genome analysis of an early methicillin-resistant <it>Staphylococcus aureus </it>strain and a biofilm-producing methicillin-resistant <it>Staphylococcus epidermidis </it>strain.</p>
            </title>
            <aug>
               <au>
                  <snm>Gill</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Fouts</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Archer</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Mongodin</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Deboy</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Ravel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Kolonay</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Brinkac</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Beanan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Daugherty</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Madupu</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Angiuoli</snm>
                  <fnm>SV</fnm>
               </au>
               <au>
                  <snm>Durkin</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Haft</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Vamathevan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Khouri</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dimitrov</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Qin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Weidman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kang</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hance</snm>
                  <fnm>IR</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2005</pubdate>
            <volume>187</volume>
            <fpage>2426</fpage>
            <lpage>2438</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JB.187.7.2426-2438.2005</pubid>
                  <pubid idtype="pmcid">1065214</pubid>
                  <pubid idtype="pmpid" link="fulltext">15774886</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Role of intraspecies recombination in the spread of pathogenicity islands within the <it>Escherichia coli </it>species.</p>
            </title>
            <aug>
               <au>
                  <snm>Schubert</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Darlu</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Clermont</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Wieser</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Magistro</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hoffmann</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Weinert</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tenaillon</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Matic</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Denamur</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>PLoS Pathog</source>
            <pubdate>2009</pubdate>
            <volume>5</volume>
            <fpage>e1000257</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.ppat.1000257</pubid>
                  <pubid idtype="pmcid">2606025</pubid>
                  <pubid idtype="pmpid" link="fulltext">19132082</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Phage-mediated intergeneric transfer of toxin genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Novick</snm>
                  <fnm>RP</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2009</pubdate>
            <volume>323</volume>
            <fpage>139</fpage>
            <lpage>141</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1164783</pubid>
                  <pubid idtype="pmpid" link="fulltext">19119236</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p><it>Staphylococcus aureus </it>pathogenicity island DNA is packaged in particles composed of phage proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Tormo</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Ferrer</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Maiques</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ubeda</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Selva</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lasa</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Calvete</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Novick</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Penades</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2008</pubdate>
            <volume>190</volume>
            <fpage>2434</fpage>
            <lpage>2440</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JB.01349-07</pubid>
                  <pubid idtype="pmcid">2293202</pubid>
                  <pubid idtype="pmpid" link="fulltext">18223072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Shaping a bacterial genome by large chromosomal replacements, the evolutionary history of <it>Streptococcus agalactiae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Brochet</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rusniok</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Couve</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dramsi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Poyart</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Trieu-Cuot</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kunst</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Glaser</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2008</pubdate>
            <volume>105</volume>
            <fpage>15961</fpage>
            <lpage>15966</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0803654105</pubid>
                  <pubid idtype="pmcid">2572952</pubid>
                  <pubid idtype="pmpid" link="fulltext">18832470</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Conjugative transposons: the tip of the iceberg.</p>
            </title>
            <aug>
               <au>
                  <snm>Burrus</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Pavlovic</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Decaris</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Guedon</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2002</pubdate>
            <volume>46</volume>
            <fpage>601</fpage>
            <lpage>610</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2002.03191.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12410819</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>IslandPath: aiding detection of genomic islands in prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Hsiao</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wan</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Brinkman</snm>
                  <fnm>FS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>418</fpage>
            <lpage>420</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg004</pubid>
                  <pubid idtype="pmpid" link="fulltext">12584130</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities.</p>
            </title>
            <aug>
               <au>
                  <snm>Mantri</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>KP</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D55</fpage>
            <lpage>58</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkh059</pubid>
                  <pubid idtype="pmcid">308793</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681358</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Ou</snm>
                  <fnm>HY</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Lonnen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chaudhuri</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Thani</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Garton</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Hinton</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pallen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barer</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Rajakumar</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>e3</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gnj005</pubid>
                  <pubid idtype="pmcid">1326021</pubid>
                  <pubid idtype="pmpid" link="fulltext">16414954</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Prophage Finder: a prophage loci prediction tool for prokaryotic genome sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Bose</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barber</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <source>In Silico Biol</source>
            <pubdate>2006</pubdate>
            <volume>6</volume>
            <fpage>223</fpage>
            <lpage>227</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16922685</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Fouts</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>5839</fpage>
            <lpage>5851</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkl732</pubid>
                  <pubid idtype="pmcid">1635311</pubid>
                  <pubid idtype="pmpid" link="fulltext">17062630</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Prophinder: a computational tool for prophage prediction in prokaryotic genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lima-Mendez</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Van Helden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Toussaint</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Leplae</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>863</fpage>
            <lpage>865</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btn043</pubid>
                  <pubid idtype="pmpid" link="fulltext">18238785</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>CRISPR provides acquired resistance against viruses in prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Barrangou</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fremaux</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Deveau</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Boyaval</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Moineau</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Romero</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Horvath</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2007</pubdate>
            <volume>315</volume>
            <fpage>1709</fpage>
            <lpage>1712</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1138140</pubid>
                  <pubid idtype="pmpid" link="fulltext">17379808</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>CRISPR interference limits horizontal gene transfer in staphylococci by targeting DNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Marraffini</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Sontheimer</snm>
                  <fnm>EJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2008</pubdate>
            <volume>322</volume>
            <fpage>1843</fpage>
            <lpage>1845</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1165771</pubid>
                  <pubid idtype="pmcid">2695655</pubid>
                  <pubid idtype="pmpid" link="fulltext">19095942</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats.</p>
            </title>
            <aug>
               <au>
                  <snm>Bland</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ramsey</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Sabree</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Lowe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Hugenholtz</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>209</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-209</pubid>
                  <pubid idtype="pmcid">1924867</pubid>
                  <pubid idtype="pmpid" link="fulltext">17577412</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats.</p>
            </title>
            <aug>
               <au>
                  <snm>Grissa</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Vergnaud</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pourcel</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>172</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-172</pubid>
                  <pubid idtype="pmcid">1892036</pubid>
                  <pubid idtype="pmpid" link="fulltext">17521438</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>ISfinder: the reference centre for bacterial insertion sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Siguier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Perochon</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lestrade</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mahillon</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chandler</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D32</fpage>
            <lpage>36</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkj014</pubid>
                  <pubid idtype="pmcid">1347377</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381877</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>A survey of bacterial insertion sequences using IScan.</p>
            </title>
            <aug>
               <au>
                  <snm>Wagner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bichsel</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>5284</fpage>
            <lpage>5293</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkm597</pubid>
                  <pubid idtype="pmcid">2018620</pubid>
                  <pubid idtype="pmpid" link="fulltext">17686783</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Insertion Sequences show diverse recent activities in Cyanobacteria and Archaea.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhou</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Olman</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>36</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2164-9-36</pubid>
                  <pubid idtype="pmcid">2246112</pubid>
                  <pubid idtype="pmpid" link="fulltext">18218090</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>VNTRDB: a bacterial variable number tandem repeat locus database.</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>YC</fnm>
               </au>
               <au>
                  <snm>Underwood</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chiou</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Kao</snm>
                  <fnm>CY</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>D416</fpage>
            <lpage>421</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkl872</pubid>
                  <pubid idtype="pmcid">1781188</pubid>
                  <pubid idtype="pmpid" link="fulltext">17175529</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains: a web-based resource.</p>
            </title>
            <aug>
               <au>
                  <snm>Denoeud</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Vergnaud</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>4</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-5-4</pubid>
                  <pubid idtype="pmcid">331396</pubid>
                  <pubid idtype="pmpid" link="fulltext">14715089</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Very small mobile repeated elements in cyanobacterial genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Elhai</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kato</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cousins</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lindblad</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Costa</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2008</pubdate>
            <volume>18</volume>
            <fpage>1484</fpage>
            <lpage>1499</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.074336.107</pubid>
                  <pubid idtype="pmcid">2527708</pubid>
                  <pubid idtype="pmpid" link="fulltext">18599681</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>The BIME family of bacterial highly repetitive sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Gilson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Saurin</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Perrin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bachellier</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofnung</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Res Microbiol</source>
            <pubdate>1991</pubdate>
            <volume>142</volume>
            <fpage>217</fpage>
            <lpage>222</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0923-2508(91)90033-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">1656494</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Macrodomain organization of the <it>Escherichia coli </it>chromosome.</p>
            </title>
            <aug>
               <au>
                  <snm>Valens</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Penaud</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rossignol</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cornet</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Boccard</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>2004</pubdate>
            <volume>23</volume>
            <fpage>4330</fpage>
            <lpage>4341</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.emboj.7600434</pubid>
                  <pubid idtype="pmcid">524398</pubid>
                  <pubid idtype="pmpid" link="fulltext">15470498</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Tandem genetic duplications in phage and bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Anderson</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Annu Rev Microbiol</source>
            <pubdate>1977</pubdate>
            <volume>31</volume>
            <fpage>473</fpage>
            <lpage>505</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.mi.31.100177.002353</pubid>
                  <pubid idtype="pmpid" link="fulltext">334045</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>A sister-strand exchange mechanism for recA-independent deletion of repeated DNA sequences in <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Lovett</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Drapkin</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Sutera</snm>
                  <fnm>VA</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Gluckman-Peskind</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1993</pubdate>
            <volume>135</volume>
            <fpage>631</fpage>
            <lpage>642</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1205708</pubid>
                  <pubid idtype="pmpid" link="fulltext">8293969</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Replication mutations differentially enhance RecA-dependent and RecA-independent recombination between tandem repeats in <it>Bacillus subtilis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Bruand</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bidnenko</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Ehrlich</snm>
                  <fnm>SD</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>39</volume>
            <fpage>1248</fpage>
            <lpage>1258</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.2001.02312.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">11251841</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Unveiling novel RecO distant orthologues involved in homologous recombination.</p>
            </title>
            <aug>
               <au>
                  <snm>Marsin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mathieu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kortulewski</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Guerois</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Radicella</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2008</pubdate>
            <volume>4</volume>
            <fpage>e1000146</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pgen.1000146</pubid>
                  <pubid idtype="pmcid">2475510</pubid>
                  <pubid idtype="pmpid" link="fulltext">18670631</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Conjugational recombination in <it>E. coli </it>: myths and mechanisms.</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>GR</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1991</pubdate>
            <volume>64</volume>
            <fpage>19</fpage>
            <lpage>27</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(91)90205-D</pubid>
                  <pubid idtype="pmpid" link="fulltext">1986865</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Adaptation to the environment: <it>Streptococcus pneumoniae</it>, a paradigm for recombination-mediated genetic plasticity?.</p>
            </title>
            <aug>
               <au>
                  <snm>Claverys</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Prudhomme</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mortier-Barriere</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2000</pubdate>
            <volume>35</volume>
            <fpage>251</fpage>
            <lpage>259</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2000.01718.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">10652087</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Frequency of deletion formation decreases exponentially with distance between short direct repeats.</p>
            </title>
            <aug>
               <au>
                  <snm>Chedin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Dervyn</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dervyn</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ehrlich</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Noirot</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>1994</pubdate>
            <volume>12</volume>
            <fpage>561</fpage>
            <lpage>569</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.1994.tb01042.x</pubid>
                  <pubid idtype="pmpid">7934879</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>A novel assay for illegitimate recombination in <it>Escherichia coli </it>: stimulation of lambda bio transducing phage formation by ultra-violet light and its independence from RecA function.</p>
            </title>
            <aug>
               <au>
                  <snm>Ikeda</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shimizu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ukita</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kumagai</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Adv Biophys</source>
            <pubdate>1995</pubdate>
            <volume>31</volume>
            <fpage>197</fpage>
            <lpage>208</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0065-227X(95)99392-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">7625274</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Copy-choice recombination mediated by DNA polymerase III holoenzyme from <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Canceill</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ehrlich</snm>
                  <fnm>SD</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1996</pubdate>
            <volume>93</volume>
            <fpage>6647</fpage>
            <lpage>6652</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.93.13.6647</pubid>
                  <pubid idtype="pmcid">39080</pubid>
                  <pubid idtype="pmpid">8692872</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Replication slippage of different DNA polymerases is inversely related to their strand displacement efficiency.</p>
            </title>
            <aug>
               <au>
                  <snm>Canceill</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Viguera</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ehrlich</snm>
                  <fnm>SD</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1999</pubdate>
            <volume>274</volume>
            <fpage>27481</fpage>
            <lpage>27490</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.274.39.27481</pubid>
                  <pubid idtype="pmpid" link="fulltext">10488082</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Replication slippage involves DNA polymerase pausing and dissociation.</p>
            </title>
            <aug>
               <au>
                  <snm>Viguera</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Canceill</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ehrlich</snm>
                  <fnm>SD</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>2001</pubdate>
            <volume>20</volume>
            <fpage>2587</fpage>
            <lpage>2595</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/emboj/20.10.2587</pubid>
                  <pubid idtype="pmcid">125466</pubid>
                  <pubid idtype="pmpid" link="fulltext">11350948</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>DNA transcription and repressor binding affect deletion formation in <it>Escherichia coli </it>plasmids.</p>
            </title>
            <aug>
               <au>
                  <snm>Vilette</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Uzest</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ehrlich</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Michel</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1992</pubdate>
            <volume>11</volume>
            <fpage>3629</fpage>
            <lpage>3634</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">556822</pubid>
                  <pubid idtype="pmpid">1396563</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Illegitimate recombination.</p>
            </title>
            <aug>
               <au>
                  <snm>Ehrlich</snm>
                  <fnm>SD</fnm>
               </au>
            </aug>
            <source>Mobile DNA</source>
            <publisher>Washington, DC: American Society for Microbiology</publisher>
            <editor>Berg D, Howe M</editor>
            <pubdate>1989</pubdate>
            <fpage>799</fpage>
            <lpage>832</lpage>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Illegitimate recombination in bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Michel</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Organization of the Prokaryotic Genome</source>
            <publisher>Washington DC: ASM Press</publisher>
            <editor>Charlebois RL</editor>
            <pubdate>1999</pubdate>
            <fpage>129</fpage>
            <lpage>150</lpage>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Mycobacterial Ku and ligase proteins constitute a two-component NHEJ repair machine.</p>
            </title>
            <aug>
               <au>
                  <snm>Della</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Palmbos</snm>
                  <fnm>PL</fnm>
               </au>
               <au>
                  <snm>Tseng</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Tonkin</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Daley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Topper</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Pitcher</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Tomkinson</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>TE</fnm>
               </au>
               <au>
                  <snm>Doherty</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>306</volume>
            <fpage>683</fpage>
            <lpage>685</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1099824</pubid>
                  <pubid idtype="pmpid" link="fulltext">15499016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Identification of a DNA nonhomologous end-joining complex in bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Weller</snm>
                  <fnm>GR</fnm>
               </au>
               <au>
                  <snm>Kysela</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Roy</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tonkin</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Scanlan</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Della</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Devine</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Wilkinson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>d'Adda di Fagagna</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Devine</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Bowater</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Jeggo</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Jackson</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Doherty</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>297</volume>
            <fpage>1686</fpage>
            <lpage>1689</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1074584</pubid>
                  <pubid idtype="pmpid" link="fulltext">12215643</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Role of DNA repair by nonhomologous-end joining in <it>Bacillus subtilis </it>spore resistance to extreme dryness, mono- and polychromatic UV, and ionizing radiation.</p>
            </title>
            <aug>
               <au>
                  <snm>Moeller</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Stackebrandt</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Reitz</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Berger</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rettberg</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Doherty</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Horneck</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Nicholson</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2007</pubdate>
            <volume>189</volume>
            <fpage>3306</fpage>
            <lpage>3311</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JB.00018-07</pubid>
                  <pubid idtype="pmcid">1855867</pubid>
                  <pubid idtype="pmpid" link="fulltext">17293412</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Nonhomologous end-joining in bacteria: a microbial perspective.</p>
            </title>
            <aug>
               <au>
                  <snm>Pitcher</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Brissett</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Doherty</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Annu Rev Microbiol</source>
            <pubdate>2007</pubdate>
            <volume>61</volume>
            <fpage>259</fpage>
            <lpage>282</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.micro.61.080706.093354</pubid>
                  <pubid idtype="pmpid" link="fulltext">17506672</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Evolution of paralogous genes: Reconstruction of genome rearrangements through comparison of multiple genomes within <it>Staphylococcus aureus</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Tsuru</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kawai</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mizutani-Ui</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Uchiyama</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kobayashi</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2006</pubdate>
            <volume>23</volume>
            <fpage>1269</fpage>
            <lpage>1285</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msk013</pubid>
                  <pubid idtype="pmpid" link="fulltext">16601000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Extraintestinal virulence is a coincidental by-product of commensalism in B2 phylogenetic group <it>Escherichia coli </it>strains.</p>
            </title>
            <aug>
               <au>
                  <snm>Le Gall</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Clermont</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Gouriou</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Picard</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Nassif</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Denamur</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Tenaillon</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <fpage>2373</fpage>
            <lpage>2384</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msm172</pubid>
                  <pubid idtype="pmpid" link="fulltext">17709333</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>aes, the gene encoding the esterase B in <it>Escherichia coli</it>, is a powerful phylogenetic marker of the species.</p>
            </title>
            <aug>
               <au>
                  <snm>Lescat</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hoede</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Clermont</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Garry</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Darlu</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tuffery</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Denamur</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Picard</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>BMC Microbiol</source>
            <pubdate>2009</pubdate>
            <volume>9</volume>
            <fpage>273</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2180-9-273</pubid>
                  <pubid idtype="pmcid">2805673</pubid>
                  <pubid idtype="pmpid" link="fulltext">20040078</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Organised genome dynamics in the <it>Escherichia coli </it>species results in highly diverse adaptive paths.</p>
            </title>
            <aug>
               <au>
                  <snm>Touchon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hoede</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tenaillon</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Barbe</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Baeriswyl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bidet</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bingen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bonacorsi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bouchier</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bouvet</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Calteau</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chiapello</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Clermont</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Cruveiller</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Danchin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Diard</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dossat</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Karoui</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Frapy</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Garry</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ghigo</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Gilles</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Le Bougu&#233;nec</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lescat</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mangenot</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Martinez-J&#233;hanne</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Matic</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Nassif</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Oztas</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2009</pubdate>
            <volume>5</volume>
            <fpage>e1000344</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pgen.1000344</pubid>
                  <pubid idtype="pmcid">2617782</pubid>
                  <pubid idtype="pmpid" link="fulltext">19165319</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>A genomic distance based on MUM indicates discontinuity between most bacterial species and genera.</p>
            </title>
            <aug>
               <au>
                  <snm>Deloger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>El Karoui</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Petit</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2009</pubdate>
            <volume>191</volume>
            <fpage>91</fpage>
            <lpage>99</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JB.01202-08</pubid>
                  <pubid idtype="pmcid">2612450</pubid>
                  <pubid idtype="pmpid" link="fulltext">18978054</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>MOSAIC</p>
            </title>
            <url>http://genome.jouy.inra.fr/mosaic</url>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Identification of a new repetitive element in <it>Staphylococcus aureus</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Cramton</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Schnell</snm>
                  <fnm>NF</fnm>
               </au>
               <au>
                  <snm>Gotz</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bruckner</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2000</pubdate>
            <volume>68</volume>
            <fpage>2344</fpage>
            <lpage>2348</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/IAI.68.4.2344-2348.2000</pubid>
                  <pubid idtype="pmcid">97424</pubid>
                  <pubid idtype="pmpid" link="fulltext">10722640</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Distribution and characterization of staphylococcal interspersed repeat units (SIRUs) and potential use for strain differentiation.</p>
            </title>
            <aug>
               <au>
                  <snm>Hardy</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Ussery</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Oppenheim</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Hawkey</snm>
                  <fnm>PM</fnm>
               </au>
            </aug>
            <source>Microbiology</source>
            <pubdate>2004</pubdate>
            <volume>150</volume>
            <fpage>4045</fpage>
            <lpage>4052</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1099/mic.0.27413-0</pubid>
                  <pubid idtype="pmpid">15583157</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Genome sequence of enterohaemorrhagic <it>Escherichia coli </it>O157:H7.</p>
            </title>
            <aug>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Plunkett</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Burland</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Mau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Glasner</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Mayhew</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Gregor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kirkpatrick</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>P&#243;sfai</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hackett</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Klink</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Boutin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Shao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Grotbeck</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>NW</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dimalanta</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Potamousis</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Apodaca</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Anantharaman</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>DC</fnm>
               </au>
               <au>
                  <snm>Welch</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Blattner</snm>
                  <fnm>FR</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>529</fpage>
            <lpage>533</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35054089</pubid>
                  <pubid idtype="pmpid" link="fulltext">11206551</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Genome and gene alterations by insertions and deletions in the evolution of human and chimpanzee chromosome 22.</p>
            </title>
            <aug>
               <au>
                  <snm>Volfovsky</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Oleksyk</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Cruz</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Truelove</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Stephens</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>MW</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <fpage>51</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2164-10-51</pubid>
                  <pubid idtype="pmcid">2654908</pubid>
                  <pubid idtype="pmpid" link="fulltext">19171065</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Relationship between insertion/deletion (indel) frequency of proteins and essentiality.</p>
            </title>
            <aug>
               <au>
                  <snm>Chan</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Hsing</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hormozdiari</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cherkasov</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>227</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-227</pubid>
                  <pubid idtype="pmcid">1925122</pubid>
                  <pubid idtype="pmpid" link="fulltext">17598914</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Evolutionary dynamics of ompA, the gene encoding the <it>Chlamydia trachomatis </it>key antigen.</p>
            </title>
            <aug>
               <au>
                  <snm>Nunes</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Borrego</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Nunes</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Florindo</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gomes</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2009</pubdate>
            <volume>191</volume>
            <fpage>7182</fpage>
            <lpage>7192</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JB.00895-09</pubid>
                  <pubid idtype="pmpid" link="fulltext">19783629</pubid>
                  <pubid idtype="pmcid">2786549</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Genes under positive selection in <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Petersen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bollback</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Dimmic</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hubisz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>1336</fpage>
            <lpage>1343</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.6254707</pubid>
                  <pubid idtype="pmcid">1950902</pubid>
                  <pubid idtype="pmpid" link="fulltext">17675366</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>High frequency of hotspot mutations in core genes of <it>Escherichia coli </it>due to short-term positive selection.</p>
            </title>
            <aug>
               <au>
                  <snm>Chattopadhyay</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Weissman</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Minin</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Russo</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Dykhuizen</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Sokurenko</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2009</pubdate>
            <volume>106</volume>
            <fpage>12412</fpage>
            <lpage>12417</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0906217106</pubid>
                  <pubid idtype="pmcid">2718352</pubid>
                  <pubid idtype="pmpid" link="fulltext">19617543</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>Homologous recombination in <it>Escherichia coli </it>: dependence on substrate length and homology.</p>
            </title>
            <aug>
               <au>
                  <snm>Shen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>HV</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1986</pubdate>
            <volume>112</volume>
            <fpage>441</fpage>
            <lpage>457</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1202756</pubid>
                  <pubid idtype="pmpid" link="fulltext">3007275</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Short-homology-independent illegitimate recombination in <it>Escherichia coli </it>: distinct mechanism from short-homology-dependent illegitimate recombination.</p>
            </title>
            <aug>
               <au>
                  <snm>Shimizu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yamaguchi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ashizawa</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kohno</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Asami</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kato</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ikeda</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>266</volume>
            <fpage>297</fpage>
            <lpage>305</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1996.0794</pubid>
                  <pubid idtype="pmpid" link="fulltext">9047364</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Genome-wide detection and analysis of homologous recombination among sequenced strains of <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Mau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Glasner</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Darling</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>R44</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2006-7-5-r44</pubid>
                  <pubid idtype="pmcid">1779527</pubid>
                  <pubid idtype="pmpid" link="fulltext">16737554</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Integration of foreign DNA during natural transformation of <it>Acinetobacter </it>sp. by homology-facilitated illegitimate recombination.</p>
            </title>
            <aug>
               <au>
                  <snm>de Vries</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wackernagel</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>2094</fpage>
            <lpage>2099</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.042263399</pubid>
                  <pubid idtype="pmcid">122324</pubid>
                  <pubid idtype="pmpid" link="fulltext">11854504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Mechanisms of homology-facilitated illegitimate recombination for foreign DNA acquisition in transformable <it>Pseudomonas stutzeri</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Meier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Wackernagel</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>48</volume>
            <fpage>1107</fpage>
            <lpage>1118</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2003.03498.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12753199</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Homologous recombination at the border: insertion-deletions and the trapping of foreign DNA in <it>Streptococcus pneumoniae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Prudhomme</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Libante</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Claverys</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>2100</fpage>
            <lpage>2105</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.032262999</pubid>
                  <pubid idtype="pmcid">122325</pubid>
                  <pubid idtype="pmpid" link="fulltext">11854505</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>The lambda red proteins promote efficient recombination between diverged sequences: implications for bacteriophage genome mosaicism.</p>
            </title>
            <aug>
               <au>
                  <snm>Martinsohn</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Radman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Petit</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2008</pubdate>
            <volume>4</volume>
            <fpage>e1000065</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pgen.1000065</pubid>
                  <pubid idtype="pmcid">2327257</pubid>
                  <pubid idtype="pmpid" link="fulltext">18451987</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>DNA sequence similarity requirements for interspecific recombination in <it>Bacillus</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Majewski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cohan</snm>
                  <fnm>FM</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1999</pubdate>
            <volume>153</volume>
            <fpage>1525</fpage>
            <lpage>1533</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1460850</pubid>
                  <pubid idtype="pmpid">10581263</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Evolutionary implications of the frequent horizontal transfer of mismatch repair genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Denamur</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lecointre</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Darlu</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tenaillon</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Acquaviva</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sayada</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sunjevaric</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Rothstein</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Elion</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Taddei</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Radman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Matic</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2000</pubdate>
            <volume>103</volume>
            <fpage>711</fpage>
            <lpage>721</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)00175-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">11114328</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>Evidence for mutation showers.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gonzalez</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Scaringe</snm>
                  <fnm>WA</fnm>
               </au>
               <au>
                  <snm>Tsai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Sommer</snm>
                  <fnm>SS</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <fpage>8403</fpage>
            <lpage>8408</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0610902104</pubid>
                  <pubid idtype="pmcid">1895962</pubid>
                  <pubid idtype="pmpid" link="fulltext">17485671</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Mutations in clusters and showers.</p>
            </title>
            <aug>
               <au>
                  <snm>Drake</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2007</pubdate>
            <volume>104</volume>
            <fpage>8203</fpage>
            <lpage>8204</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0703089104</pubid>
                  <pubid idtype="pmcid">1895928</pubid>
                  <pubid idtype="pmpid" link="fulltext">17495029</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>Hypermutability of damaged single-strand DNA formed at double-strand breaks and uncapped telomeres in yeast <it>Saccharomyces cerevisiae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sterling</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Storici</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Resnick</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Gordenin</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2008</pubdate>
            <volume>4</volume>
            <fpage>e1000264</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pgen.1000264</pubid>
                  <pubid idtype="pmcid">2577886</pubid>
                  <pubid idtype="pmpid" link="fulltext">19023402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B76">
            <title>
               <p>Integr8 and Genome Reviews: integrated views of complete genomes and proteomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Kersey</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bower</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Horne</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Petryszak</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kanz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kanapin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Das</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Michoud</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Phan</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Gattiker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kulikova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Faruque</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Duggan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mclaren</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Reimholz</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Penel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Reuter</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D297</fpage>
            <lpage>302</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gki039</pubid>
                  <pubid idtype="pmcid">539993</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608201</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>MaGe (Magnifying genomes) Microbial Genome Annotation System</p>
            </title>
            <url>https://www.genoscope.cns.fr/agc/mage/wwwpkgdb/MageHome</url>
         </bibl>
         <bibl id="B78">
            <title>
               <p>MaGe: a microbial genome annotation system supported by synteny results.</p>
            </title>
            <aug>
               <au>
                  <snm>Vallenet</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Labarre</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rouy</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Barbe</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bocs</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cruveiller</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lajus</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pascal</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Scarpelli</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Medigue</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>53</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkj406</pubid>
                  <pubid idtype="pmcid">1326237</pubid>
                  <pubid idtype="pmpid" link="fulltext">16407324</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B79">
            <title>
               <p>Versatile and open software for comparing large genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Phillippy</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Delcher</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Smoot</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shumway</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Antonescu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R12</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2004-5-2-r12</pubid>
                  <pubid idtype="pmcid">395750</pubid>
                  <pubid idtype="pmpid" link="fulltext">14759262</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B80">
            <title>
               <p><it>E. coli </it>O157:H7 Sakai Genome Project</p>
            </title>
            <url>http://genome.naist.jp/bacteria/o157/overview.html</url>
         </bibl>
         <bibl id="B81">
            <title>
               <p>ACLAME: Prophinder</p>
            </title>
            <url>http://aclame.ulb.ac.be/Tools/Prophinder/</url>
         </bibl>
         <bibl id="B82">
            <title>
               <p>CRISPRdb</p>
            </title>
            <url>http://crispr.u-psud.fr/crispr/</url>
         </bibl>
         <bibl id="B83">
            <title>
               <p>Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the <it>Salmonella </it>pathogenicity islands.</p>
            </title>
            <aug>
               <au>
                  <snm>Vernikos</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>2196</fpage>
            <lpage>2203</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl369</pubid>
                  <pubid idtype="pmpid" link="fulltext">16837528</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B84">
            <title>
               <p>ASAP</p>
            </title>
            <url>http://asap.ahabs.wisc.edu/asap/home.php</url>
         </bibl>
         <bibl id="B85">
            <title>
               <p>BIMEs Table</p>
            </title>
            <url>http://www.pasteur.fr/recherche/unites/pmtg/repet/tableauBIMEcoli.html</url>
         </bibl>
         <bibl id="B86">
            <title>
               <p>The Microorganisms Tandem Repeat Database</p>
            </title>
            <url>http://minisatellites.u-psud.fr/GPMS/</url>
         </bibl>
         <bibl id="B87">
            <title>
               <p>The <it>Vmatch </it>large scale sequence analysis software</p>
            </title>
            <url>http://www.vmatch.de/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
