<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2006-7-5-r44</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Genome-wide detection and analysis of homologous recombination among sequenced strains of <it>Escherichia coli</it></p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Mau</snm>
               <fnm>Bob</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>bobmau@biochem.wisc.edu</email>
            </au>
            <au id="A2">
               <snm>Glasner</snm>
               <mi>D</mi>
               <fnm>Jeremy</fnm>
               <insr iid="I3"/>
               <email>jeremy@genome.wisc.edu</email>
            </au>
            <au id="A3">
               <snm>Darling</snm>
               <mi>E</mi>
               <fnm>Aaron</fnm>
               <insr iid="I4"/>
               <email>darling@cs.wisc.edu</email>
            </au>
            <au id="A4">
               <snm>Perna</snm>
               <mi>T</mi>
               <fnm>Nicole</fnm>
               <insr iid="I3"/>
               <insr iid="I5"/>
               <email>nicole@genome.wisc.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Mathematics, Lincoln Drive, University of Wisconsin, Madison WI 53706, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Oncology, University Ave, University of Wisconsin, Madison WI 53706, USA</p>
            </ins>
            <ins id="I3">
               <p>Genome Center of Wisconsin, Henry Mall, University of Wisconsin, Madison WI 53706, USA</p>
            </ins>
            <ins id="I4">
               <p>Department of Computer Science, W. Dayton St, University of Wisconsin, Madison WI 53706, USA</p>
            </ins>
            <ins id="I5">
               <p>Department of Animal Health and Biomedical Sciences, Linden Drive, University of Wisconsin, Madison WI 53706, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>5</issue>
         <fpage>R44</fpage>
         <url>http://genomebiology.com/2006/7/5/R44</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16737554</pubid>
               <pubid idtype="doi">10.1186/gb-2006-7-5-r44</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>1</day>
               <month>11</month>
               <year>2005</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>8</day>
               <month>2</month>
               <year>2006</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>8</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>31</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Mau et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Recombination among bacterial strains</p>
      </shorttitle>
      <shortabs>
         <p>Multiple alignment of <it>E. coli </it>and <it>Shigella </it>genomes reveals that intraspecific recombination is more common than previously thought.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Comparisons of complete bacterial genomes reveal evidence of lateral transfer of DNA across otherwise clonally diverging lineages. Some lateral transfer events result in acquisition of novel genomic segments and are easily detected through genome comparison. Other more subtle lateral transfers involve homologous recombination events that result in substitution of alleles within conserved genomic regions. This type of event is observed infrequently among distantly related organisms. It is reported to be more common within species, but the frequency has been difficult to quantify since the sequences under comparison tend to have relatively few polymorphic sites.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here we report a genome-wide assessment of homologous recombination among a collection of six complete <it>Escherichia coli </it>and <it>Shigella flexneri </it>genome sequences. We construct a whole-genome multiple alignment and identify clusters of polymorphic sites that exhibit atypical patterns of nucleotide substitution using a random walk-based method. The analysis reveals one large segment (approximately 100 kb) and 186 smaller clusters of single base pair differences that suggest lateral exchange between lineages. These clusters include portions of 10% of the 3,100 genes conserved in six genomes. Statistical analysis of the functional roles of these genes reveals that several classes of genes are over-represented, including those involved in recombination, transport and motility.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We demonstrate that intraspecific recombination in <it>E. coli </it>is much more common than previously appreciated and may show a bias for certain types of genes. The described method provides high-specificity, conservative inference of past recombination events.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010015">Model organisms</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The role of lateral gene transfer (LGT) in shaping prokaryotic genomes has been the subject of intense investigation and debate in recent years <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. In the pre-genomic era, the handful of examples of LGT were detected primarily as discordance between phylogenetic reconstructions with different housekeeping genes <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. The explosion of publicly available bacterial genome sequences, coupled with the development of whole-genome comparison tools <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, initially focused LGT discovery on genome-wide scans for islands of sequences specific to particular lineages of bacteria (for example, <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>). Most recently, phylogenetic approaches are applied to detect LGT among genome-wide sets of putative orthologs <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Together, these studies point to low, but detectable, levels of LGT among distantly related species with occasionally higher rates found among organisms that occupy similar environments. Closely related organisms show higher levels of LGT, with intraspecific comparisons showing the highest levels. Two limitations of these analyses are the lack of phylogenetic resolution, particularly among intraspecific comparisons, and the reliance on annotated boundaries of genes in delineating candidate regions.</p>
         <p>Statistical and phylogenetic methods have been developed for detecting recombination in aligned sequences of single genes or relatively short genomic segments. One general approach, referred to as nucleotide substitution distribution methods in <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, assesses atypical clusters of nucleotide differences. Clusters come in two flavors: groups of polymorphisms exhibiting the same topologically discordant pattern <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, or an elevated rate of mutation in a single lineage across a segment of the alignment <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. The former indicates recombination between compared strains, while the latter implies a recombination with some unknown, more divergent, strain. Phylogenetic methods are most often applied in the context of detecting recombination break points in sequence alignments <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. These methods require longer alignments, are computationally intensive, and have reportedly been outperformed by substitution distribution methods on simulated test data <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
         <p>Genome-scale analyses of lateral transfer events have typically relied on identification of incongruent tree topologies from phylogenetic analyses of sets of putative orthologous genes identified by reciprocal BLAST analyses <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B34">34</abbr></abbrgrp>. This approach can be confounded by errors associated with BLAST, such as false-positive orthologs, is limited to identifying recombination events that occur within gene boundaries, and is unlikely to identify short recombined regions within genes.</p>
         <p>Recently, a Markov clustering algorithm was used to partition orthologous pairs of genes, determined by an all versus all BLAST comparison of 144 fully sequenced prokaryotic genomes, into maximally representative clusters <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B35">35</abbr></abbrgrp>. Bayesian phylogenetic analysis (for example, <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>) was applied to each cluster of four or more taxa to infer lateral gene transfer against the background of a consensus 'supertree' of sequenced bacteria. This approach is most successful in determining global pathways of gene transfer between phyla and divisions of prokaryotes, where homologous recombination is unlikely to have played a significant role. Rather, these likely arise as illegitimate recombination events.</p>
         <p>Here, we develop a method to detect segments of closely related genomes that have been replaced with a homologous copy from another conspecific lineage, that is, an allelic substitution. The method is not designed to detect non-homologous sequences that may have accompanied a homologous recombination event or homologous recombination events involving identical alleles.</p>
         <p>The method compiles a list of polymorphism sites from a whole-genome multiple alignment, then applies score functions to locate clusters discordant with the predominant phylogenetic signal. Identified clusters can cross gene boundaries and non-coding sequence. Our use of extreme value theory furnishes us with a statistically defensible criterion to assess significance of these clusters in much the same manner as the Karlin-Altschul statistics help interpret BLAST results <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>.</p>
         <p>We apply the recombination detection method to the published genome sequences of several <it>E. coli </it><abbrgrp><abbr bid="B18">18</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>. Construction of a multiple whole genome alignment facilitates a global survey of recombination among these <it>E. coli </it>isolates. Genome sequences must first be partitioned into locally collinear blocks (LCBs) - regions without rearrangement. Most LCBs contain lineage-specific sequence acquired through lateral gene transfer or differential gene loss. To further complicate matters, non-homologous sequences from different organisms can integrate into different lineages at a common locus <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In a previous work, we developed a software package called Mauve <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> that can construct global multiple genome alignments in the presence of rearrangement and lineage-specific content. The Mauve alignments provide a convenient starting point for locating polymorphic patterns indicative of intraspecific recombination, which we call allelic substitution.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>As seen in Figure <figr fid="F1">1</figr>, the Mauve genome aligner takes the four <it>E. coli </it>and two <it>Shigella flexneri </it>genome sequences and returns 34 local alignments spanning 3.4 Mb of homologous sequence common to all strains. The majority of rearrangements occur in <it>Shigella </it>genomes where inversions between copies of repetitive elements are relatively frequent <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>A multiple whole-genome alignment of six strains consists of 34 rearranged pieces larger than 1 kb</p>
            </caption>
            <text>
               <p>A multiple whole-genome alignment of six strains consists of 34 rearranged pieces larger than 1 kb. Each genome is laid out horizontally with homologous segments (LCBs) outlined as colored rectangles. Regions inverted relative to <it>E. coli </it>K-12 are set below those that match in the forward orientation. Lines collate aligned segments between genomes. Average sequence similarities within an LCB, measured in sliding windows, are proportional to the heights of interior colored bars. Large sections of white within blocks and gaps between blocks indicate lineage specific sequence.</p>
            </text>
            <graphic file="gb-2006-7-5-r44-1"/>
         </fig>
         <p>Computer assisted screening of the Mauve output finds 733 problematic intervals inside LCBs in which base pairs do not properly align because of gaps created by lineage specific sequence and/or attempts to align non-homologous sequence. Deleting these intervals from the alignment yields 130,008 high quality base pair differences. Common bipartitions, constituting 96.4% of all such differences, are listed in Table <tblr tid="T1">1</tblr>.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Frequency of common patterns of single nucleotide differences</p>
            </caption>
            <tblbdy cols="4">
               <r>
                  <c ca="left">
                     <p>Bipartition (Split)</p>
                  </c>
                  <c ca="center">
                     <p>Pattern KOOCSS</p>
                  </c>
                  <c ca="center">
                     <p>Number of SNDs</p>
                  </c>
                  <c ca="center">
                     <p>Relative frequency</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>((KSSOO) C)</p>
                  </c>
                  <c ca="center">
                     <p>111211</p>
                  </c>
                  <c ca="center">
                     <p>50,354</p>
                  </c>
                  <c ca="center">
                     <p>38.73</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>((KSSC)(OO))</p>
                  </c>
                  <c ca="center">
                     <p>122111</p>
                  </c>
                  <c ca="center">
                     <p>19,678</p>
                  </c>
                  <c ca="center">
                     <p>15.14</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>((KOOC)(SS))</p>
                  </c>
                  <c ca="center">
                     <p>111122</p>
                  </c>
                  <c ca="center">
                     <p>18,490</p>
                  </c>
                  <c ca="center">
                     <p>14.22</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>(K(OOSSC))</p>
                  </c>
                  <c ca="center">
                     <p>122222</p>
                  </c>
                  <c ca="center">
                     <p>14,115</p>
                  </c>
                  <c ca="center">
                     <p>10.86</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>((KSS)(OOC)) = KS</p>
                  </c>
                  <c ca="center">
                     <p>122211</p>
                  </c>
                  <c ca="center">
                     <p>9,882</p>
                  </c>
                  <c ca="center">
                     <p>7.60</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>((KOO)(SSC)) = KO</p>
                  </c>
                  <c ca="center">
                     <p>111222</p>
                  </c>
                  <c ca="center">
                     <p>6,890</p>
                  </c>
                  <c ca="center">
                     <p>5.30</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>((KC)(OOSS)) = KC</p>
                  </c>
                  <c ca="center">
                     <p>122122</p>
                  </c>
                  <c ca="center">
                     <p>5,874</p>
                  </c>
                  <c ca="center">
                     <p>4.52</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Common single nucleotide differences have two alleles. Each such nucleotide difference separates the six genomes into two classes. Pattern codes are represented as 6-tuples of ones and twos (for allele 1 and allele 2) in the following order: (K) <it>E. coli </it>K-12 MG1655, (O) <it>E. coli </it>O157:H7 EDL933, (O) <it>E. coli </it>O157:H7 Sakai strain RIMD0509952, (C) <it>E. coli </it>CFT073, (S) <it>Shigella flexneri </it>2A 301, and (S) <it>Shigella flexneri </it>2A 2457T. By convention, K-12 is always allele one. For brevity, key groupings are denoted as KS, KO, or KC. The remaining 3.6% SNDs come in over 50 different patterns, including one quadripartition. See appendix 1 in Additional data file 1 for additional frequencies.</p>
            </tblfn>
         </tbl>
         <p>We use the term 'single nucleotide difference' (SND) to describe the partition structure at a variable site in the alignment. A representative 100 base-pair (bp) segment of the 3.4 Mb alignment is presented in Figure <figr fid="F2">2</figr> for illustrative purposes.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Small sample segment of the alignment spanning the start of the <it>mutS </it>gene (denoted in blue)</p>
            </caption>
            <text>
               <p>Small sample segment of the alignment spanning the start of the <it>mutS </it>gene (denoted in blue). Location of a mismatch is indicated by the integer '1' along the bottom row. Five columns contain SNDs: TTTCTT, AAAGAA, AAATAA, GGGAGG, and GAAAAA. The first four share the same bipartition pattern (111211) and are deemed equivalent, even though one of them results from a transversion. The other SND is considered distinct despite having the same mutation (A to G) found in the second SND.</p>
            </text>
            <graphic file="gb-2006-7-5-r44-2"/>
         </fig>
         <p>All but 2% of variable sites are bi-allelic, meaning each site splits six strains into two groups, called a bipartition. Nearly 80% of the bi-allelic SNDs have a minor allele unique to the CFT, K-12, O157:H7, or <it>S. flexneri </it>lineage. The remaining bi-allelic SNDs divide the lineages into three alternative pairings of sister taxa, giving rise to three alternative unrooted tree topologies denoted as: &#968;<sub><it>KS </it></sub>(K-12 with <it>S. flexneri</it>, CFT with O157:H7); &#968;<sub><it>KO </it></sub>(K-12 with O157:H7, CFT with <it>S. flexneri</it>); and &#968;<sub><it>KC </it></sub>(K-12 with CFT, O157:H7 with <it>S. flexneri</it>).</p>
         <p>The four lineages serve as operational taxonomic units (OTUs) in our study of allelic substitution in <it>E. coli</it>. When nucleotides at a polymorphic site exhibit a partition structure explainable by a single point mutation, the induced bipartition is said to be compatible with the enabling topology. Bipartitions labeled KS, KO, and KC in Table <tblr tid="T1">1</tblr> are compatible with the topologies &#968;<sub><it>KS</it></sub>, &#968;<sub><it>KO</it></sub>, and &#968;<sub><it>KC</it></sub>, respectively. Note that frequency of the KS pattern exceeds that of each of its competitors by 3,000 SNDs, thus certifying &#968;<sub><it>KS </it></sub>as the 'species' topology. The elevated frequency of SNDs unique to CFT roots topology &#968;<sub><it>KS </it></sub>as (((KS)O)C). The 102,000 topologically uninformative lineage-specific SNDs nevertheless provide information that our method uses to assess recombination.</p>
         <p>We define three complementary score functions that discriminate between KS, KO, and KC patterns. Each of these score functions assigns an integer value to each SND pattern. Moving across the chromosome of reference strain MG1655, we keep a cumulative sum of the scores assigned by each function to consecutive SNDs in the alignment. Graphical representations of cumulative scores, called random walk plots or excursions, can reveal large-scale variations in feature composition. Excursions for each of the three topologies are plotted concurrently in Figure <figr fid="F3">3</figr>.</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Three excursions (KS, KO, and KC) spanning the alignment with K-12 MG1655 as reference genome</p>
            </caption>
            <text>
               <p>Three excursions (KS, KO, and KC) spanning the alignment with K-12 MG1655 as reference genome. The KS random walk plot, representing the dominant clonal topology, decreases more gradually than do the two other plots. Excursions for the discordant topologies (patterns KO and KC) run parallel to one another, except in a 100 kb region at 2 Mb where KO abruptly increases. Parallel flat gaps common to all three plots reflect K-12 lineage specific sequence.</p>
            </text>
            <graphic file="gb-2006-7-5-r44-3"/>
         </fig>
         <p>A large phylogenetic anomaly appears midway through the alignment. Magnification of a 100 kb segment between 1.95 and 2.1 Mb reveals a core 40 kb region in which KO SNDs are the dominant pattern of substitution, flanked by transitional regions for which &#968;<sub><it>KO </it></sub>serves as the 'gene tree' as well.</p>
         <p>Global random walk plots highlight grossly deviant regions. In this alignment, a solitary segment stands out. All other regions appear indistinguishable from one another in Figure <figr fid="F3">3</figr>. Unless stated to the contrary, DNA sequence and genes from the large atypical region (from <it>sdiA </it>to <it>gnd</it>) are excluded from further computations (a separate analysis of this region is included in Appendix 2 of Additional data file 1).</p>
         <sec>
            <st>
               <p>Local variation in phylogenetic signal</p>
            </st>
            <p>In Figure <figr fid="F3">3</figr>, clusters of like patterns labeled KS, KC, or KO generate tiny, imperceptible bumps in the corresponding random walk plots. Examined at higher resolution (data not shown), they can be seen to punctuate each excursion. However, manual scanning of high-resolution random walk plots is tedious, time consuming, and error-prone. In Materials and methods, we describe an alternative strategy that automatically scans for clusters at the local level.</p>
            <p>The score functions generating Figure <figr fid="F3">3</figr> are designed to elicit large positive local scores (differences in cumulative scores evaluated at nearby positions) whenever clusters of like, topologically informative, patterns are encountered. When that local score exceeds a predetermined threshold, the interval between the delimiting SNDs is declared a high scoring segment (HSS). The strategy behind this scheme is exactly analogous to BLAST <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, in which high scoring segments denote probable homology between the query and one or more reference sequences.</p>
            <p>When two lineages share a nucleotide that is not the result of a single mutation in a common ancestor, a homoplasy is said to have occurred. Homoplasies arise either through multiple mutations at a common site (convergent evolution) or recombination. The former tend to be distributed randomly about an alignment, whereas a recombination event typically produces a cluster of nucleotide differences at nearby sites exhibiting the same SND pattern. Our approach identifies such clusters of nucleotide differences with a common phylogenetic partitioning pattern. Variability in mutation rates and patterns in different chromosomal regions and bacterial lineages might also lead to physical clustering of similar substitutions. Although the clustering of sites with similar patterns strongly suggests homologous recombination between lineages, we cannot rule out the possibility that some clusters arise by independent mutation-driven processes. Simple score functions alone cannot distinguish between these two possibilities, though the latter is believed to be relatively rare.</p>
            <p>Our method relies on the relative intensity of particular SND patterns (the one of interest versus all others) to measure cluster formation, rather than the absolute number of SNDs in any given fixed length segment of the alignment. As a result, local mutational intensity is factored out of the analysis. We assert this is legitimate provided the overall rate of mutation is not too great, and local deviations from that average are not severe. We demonstrate in appendix 5 of Additional data file 1 that this is indeed the case for these six genomes. Random SNDs can and do form clusters of identical patterns simply by chance. Given the number of SNDs and their relative frequencies within the alignment, we wish to distinguish 'bumps' that are too large to have occurred by chance.</p>
            <p>Here again, BLAST statistics <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> serve as the model for assessing significance. Random walk theory provides the tools for assessing high scoring segments, and the corresponding extreme value distributions (EVDs) guide selection of appropriate thresholds. Random walks (as opposed to random walk plots) are stochastic processes operating under a fixed set of probabilities at each stage.</p>
            <p>In the Materials and methods section, we apply the relevant theory to derive thresholds. Using the appropriate extreme value distribution as an arbiter, we chose a significance threshold of 170 for clusters of KS SNDs and the same value of 100 for both KO and KC, as their frequencies are nearly identical outside the large atypical region (4.85% versus 4.57%). These thresholds define 186 high scoring segments that span 7.5% of the sequence alignment. A breakdown by pattern and range of scores is arrayed in Tables <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Distribution of scores of significant segments for discordant bipartitions</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Bipartition pattern</p>
                     </c>
                     <c cspan="5" ca="center">
                        <p>Number of segments exceeding a given HSS threshold of 100</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>101-110</p>
                     </c>
                     <c ca="center">
                        <p>111-125</p>
                     </c>
                     <c ca="center">
                        <p>126-200</p>
                     </c>
                     <c ca="center">
                        <p>>200</p>
                     </c>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>KO (CS)</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>64</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>KC (OS)</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>53</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Distribution of scores for KS (OC) high scoring segments</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Pattern</p>
                     </c>
                     <c cspan="6" ca="center">
                        <p>Number of segments exceeding threshold of 170</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>170-200</p>
                     </c>
                     <c ca="center">
                        <p>201-220</p>
                     </c>
                     <c ca="center">
                        <p>221-250</p>
                     </c>
                     <c ca="center">
                        <p>251-400</p>
                     </c>
                     <c ca="center">
                        <p>>400</p>
                     </c>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>KS (OC)</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>We deviate from BLAST protocols in one important respect: a high scoring segment maximizes the local score, which is the primary goal of sequence alignment. Here, we want to isolate sub-regions within an HSS that individually exceed the significance threshold. Our rationale is that sequence between sub-regions may not have participated in the recombination, and we want to identify only those genomic intervals that possess <it>prima facie </it>evidence of recombination.</p>
            <p>A minimal significant cluster (MSC) is a smallest subset of contiguous SNDs generating a local score above the threshold. To avoid ambiguity, overlapping MSCs supporting the same topology are merged into a single representative MSC. Most high scoring segments consist of a single such cluster, but HSSs with more than 150 SNDs often contain two or more disjoint MSCs.</p>
            <p>HSSs and MSCs are represented graphically by modifying global random walk plots. By subtracting off the underlying negative trend, only positive local scores are displayed. Figure <figr fid="F4">4</figr> shows a local random walk plot for the HSS covering the seven genes of the tryptophan operon. The <it>trp </it>operon was the first reported example of homologous recombination in <it>E. coli </it><abbrgrp><abbr bid="B45">45</abbr></abbrgrp>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>The KS local random walk plot showing homologous recombination in the tryptophan (<it>trp</it>) operon</p>
               </caption>
               <text>
                  <p>The KS local random walk plot showing homologous recombination in the tryptophan (<it>trp</it>) operon. Genes are rectangular boxes positioned above or below the axis based on transcribed strand. KS SNDs form two non-overlapping MSCs with significant local scores exceeding 170. Both MSCs, with a combined length under 2 kb, are contained in a single 6.5 kb HSS covering most the <it>trp </it>operon. The positions of each KO, KC, and KS SND in <it>E. coli </it>K-12 are shown above the KS excursion. Random walk values below 50 are not plotted, resulting in the absence of visible KC or KO excursions.</p>
               </text>
               <graphic file="gb-2006-7-5-r44-4"/>
            </fig>
            <p>Although the entire <it>trp </it>operon may have been exchanged in a single event, only <it>trpA </it>and <it>trpE </it>contain clusters of KS SNDs that individually give rise to statistically significant local scores. Moreover, the first MSC clearly includes in excess of 200 bp downstream of the <it>trp </it>operon - evidence that downstream transcription termination signals have also been subject to homologous recombination. In this manner, MSCs facilitate more precise targeting of chromosomal regions implicated in recombination. This criterion modestly increases the number of recombined segments to 216 (75, 62, 79 for KO, KC, KS, respectively) while reducing the amount of participating sequence from 251 kb to 129 kb. We outline a procedure for finding non-overlapping minimal significant clusters inside high scoring segments in Materials and methods.</p>
         </sec>
         <sec>
            <st>
               <p>Gene content of regions that underwent recent allelic substitution</p>
            </st>
            <p>Although our method identifies recombination events independently of gene boundaries, it is interesting to look at the types of genes and gene products involved in these events. To this end, we extracted a list of genes encoded in regions deemed atypical by our random walks. Among the 4,353 genes in K-12, 3,107 align across all six genomes. Of these, 271 genes intersect a minimal cluster segment. When augmented with 40 genes from the atypical region, 10% of shared genes exhibit evidence of recombination. A table of the 186 high scoring segments, subdivided into MSCs and identifying affected genes, is provided as Additional data file 2.</p>
            <p>We examined this list of 311 genes in light of gene function assignments made using a controlled vocabulary called MultiFun <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> that supports multiple functional classifications for a given gene. The 3,107 genes aligned by Mauve in all six genomes have been classified with 5,550 gene functions. Nearly 2,000 genes have a single classification (many are 'Unknown function'). By contrast, six genes have seven 'Level 2' functions. This analysis revealed an over-representation of four categories and under-representation in seven others (Table <tblr tid="T4">4</tblr>).</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>MulitFun categories exhibiting unusual levels of allelic substitution among the four major lineages</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>HR detected</p>
                     </c>
                     <c ca="center">
                        <p>Genes</p>
                     </c>
                     <c ca="center">
                        <p>Percent recombined</p>
                     </c>
                     <c ca="center">
                        <p>&#967;<sup>2 </sup>score</p>
                     </c>
                     <c ca="left">
                        <p>Multi-Fun Level 2 categories</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>144</p>
                     </c>
                     <c ca="center">
                        <p>3.5</p>
                     </c>
                     <c ca="center">
                        <p>4.52</p>
                     </c>
                     <c ca="left">
                        <p>Ribosome and peptidoglycan structure</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>237</p>
                     </c>
                     <c ca="center">
                        <p>4.2</p>
                     </c>
                     <c ca="center">
                        <p>5.47</p>
                     </c>
                     <c ca="left">
                        <p>Cell division, cell protection, and adaptation to stress</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>279</p>
                     </c>
                     <c ca="center">
                        <p>5.0</p>
                     </c>
                     <c ca="center">
                        <p>4.35</p>
                     </c>
                     <c ca="left">
                        <p>Protein-related information</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>329</p>
                     </c>
                     <c ca="center">
                        <p>6.1</p>
                     </c>
                     <c ca="center">
                        <p>2.94</p>
                     </c>
                     <c ca="left">
                        <p>RNA-related information</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>386</p>
                     </c>
                     <c ca="center">
                        <p>4,035</p>
                     </c>
                     <c ca="center">
                        <p>9.6</p>
                     </c>
                     <c ca="center">
                        <p>Not reported</p>
                     </c>
                     <c ca="left">
                        <p>All other functions (including unknown)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>48</p>
                     </c>
                     <c ca="center">
                        <p>357</p>
                     </c>
                     <c ca="center">
                        <p>13.5</p>
                     </c>
                     <c ca="center">
                        <p>9.24</p>
                     </c>
                     <c ca="left">
                        <p>Building block biosynthesis</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>109</p>
                     </c>
                     <c ca="center">
                        <p>13.8</p>
                     </c>
                     <c ca="center">
                        <p>3.21</p>
                     </c>
                     <c ca="left">
                        <p>DNA-related information</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>17.5</p>
                     </c>
                     <c ca="center">
                        <p>3.56</p>
                     </c>
                     <c ca="left">
                        <p>Group translocators (PTS)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>46</p>
                     </c>
                     <c ca="center">
                        <p>19.6</p>
                     </c>
                     <c ca="center">
                        <p>6.24</p>
                     </c>
                     <c ca="left">
                        <p>Motility</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Categories with few members such as ribosome and peptidoglycan structure are combined together, as are three types of cell processes. We computed a &#967;<sup>2 </sup>goodness-of-fit statistic for each category, but do not report <it>p </it>values because dependencies exist between categories.</p>
               </tblfn>
            </tbl>
            <p>Highly conserved genes that encode components of the ribosome and genes involved in peptidoglycan biosynthesis show little evidence of detectable recombination. Conversely, many genes involved in motility and chemotaxis undergo allelic substitution. Chemotaxis may also be related to elevated recombination detected among genes encoding components of phosphotransferase transport systems (PTSs) since these genes can double as sensors for substrates such as glucose and mannose <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
            <p>Genes involved in basic processing of cellular information, such as replication, transcription and translation, reveal an unexpected dichotomy: genes dedicated to RNA and protein metabolism are refractory to recombination, but genes involved with DNA replication, repair and recombination appear prone to allelic substitution. Equally surprising is a bias favoring evident recombination among genes involved in small molecule biosynthesis. Examples of biosynthetic genes that support the pairings in topology &#968;<sub><it>KC </it></sub>include members of the aromatic amino acid pathway (<it>aroP</it>, <it>aroD</it>, and <it>aroG</it>) as well as the pyrimidine producing <it>carB </it>(also known as <it>pyrA</it>). SND clusters supporting topology &#968;<sub><it>KO </it></sub>are present in <it>pyrI</it>, <it>pyrB</it>, and several genes in the histidine operon. Finally, <it>purD</it>, <it>purF</it>, <it>leuDC</it>, <it>modABC</it>, and two genes in the <it>trp </it>operon (Figure <figr fid="F4">4</figr>) contain clusters compatible with the clonal topology, but at much higher intensity than elsewhere in the genome.</p>
         </sec>
         <sec>
            <st>
               <p>Mosaic operons and genes</p>
            </st>
            <p>With over 216 recombined segments intersecting 271 genes, this group of <it>E. coli </it>genomes is truly a patchwork of its constituent members. Although genes within the <it>trp </it>and <it>his </it>operons contain multiple clusters of the same pattern (KS for <it>trp</it>, KO for <it>his</it>), such uniformity across operons is atypical <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. Figure <figr fid="F5">5</figr> shows a short stretch of aligned sequence containing two mosaic operons.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Mosaic operons and genes</p>
               </caption>
               <text>
                  <p>Mosaic operons and genes. Three of six <it>rha </it>genes (<it>rhaB</it>, <it>rhaA</it>, and <it>rhaD</it>) belong to an operon on the reverse strand. This operon is unusual because well-defined recombination events clearly fall within gene boundaries; <it>rhaD </it>contains two dense KC clusters, whereas <it>rhaA </it>and <it>rhaB </it>contain predominantly KS and KO SNDs, respectively. In a nearby operon consisting of <it>fdoG</it>, <it>fdoH</it>, <it>fdoI</it>, and <it>fdhE</it>, there has been a KC intragenic recombination event with <it>fdoG </it>a mosaic, resulting from two recombination events, one of which is shared with <it>fdoH</it>.</p>
               </text>
               <graphic file="gb-2006-7-5-r44-5"/>
            </fig>
            <p>Besides <it>fdoG </it>(shown in Figure <figr fid="F5">5</figr>), six other genes - <it>polB</it>, <it>mutS</it>, <it>speF</it>, <it>recG</it>, <it>actP</it>, and <it>yfaL </it>- show evidence of mosaicism. Three of these genes - <it>polB</it>, <it>mutS</it>, and <it>recG </it>- are informational genes involved in DNA replication and repair. Each mosaic gene contains two minimum significant clusters generated by different partition patterns. A closer inspection of one of these genes, <it>speF</it>, suggests that all three phylogenetic signals may be present, as shown in Figure <figr fid="F6">6</figr>.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Random walk plots for positive local scores in the vicinity of the <it>speF </it>gene</p>
               </caption>
               <text>
                  <p>Random walk plots for positive local scores in the vicinity of the <it>speF </it>gene. <it>SpeF </it>is a mosaic gene by virtue of its KS and KO clusters. Note the small cluster of KC SNDs appears to divide a large KS segment near coordinate 718,600. This short KC spike, though not statistically significant on a whole genome scale, would undoubtedly pass a single gene substitution distribution type test.</p>
               </text>
               <graphic file="gb-2006-7-5-r44-6"/>
            </fig>
            <p>Other mosaic genes undoubtedly exist within these strains, but their phylogenetic signal is too short or too weak to register in a genome-wide scan. Full genome scans come at a cost; one must sacrifice sensitivity to maintain specificity. At present, we are content to underestimate the true amount of recombination in order to eliminate false positives.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Natural transformation, transduction, and conjugation are three mechanisms for transporting foreign DNA into the cell. The relative contribution of each mechanism varies from species to species. For example, transformation is the dominant mode of transfer in bacteria such as <it>Neisseria meningitidis </it>and <it>Helicobacter pylori </it>that are naturally competent, that is, able to absorb small pieces of naked DNA. As <it>E. coli </it>is competent only under extreme conditions, typically in the laboratory, it is expected that this form of transformation may play a minor role in nature. Exogenous DNA can also enter via phage transduction or conjugation, which are expected to be the primary source of exogenous DNA for <it>E. coli</it>. Transducing phages can deliver large fragments of genomic DNA from their previous bacterial host into a recipient strain. DNA transferred via conjugative mechanisms can be even larger.</p>
         <p>The lengths of recombined segments reported in the previous section are typically short. Half the intervals are shorter than 1 kb, and 80% are less than 2 kb. DNA fragments delivered by transducing phages might be expected to be considerably larger (30 to 60 kb). The size differential between entrance and incorporation molecules has been partially reconciled by experiments in which site-specific DNA was packaged into phages and transduced into K-12 cells <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. Screening for recombinants in the proximity of the <it>trp </it>operon, the authors found average replacement sizes to be in the 8 to 14 kb range. Moreover, multiple replacements were detected in some instances. In a follow-up paper <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, the level of sequence dissimilarity (from 1% to 3%) between recipient and donor strains was shown to correlate with the degree of abridgement by restriction endonucleases. The length of a typical recombinant in our study is still an order of magnitude less than that reported by McKane and Milkman <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, but they based their conclusions on restriction site analysis, which has a limited ability to detect short fragments. Actual incorporations in their experiments could conceivably have been more frequent and shorter. Overlapping recombination events at particular sites are also likely to contribute to the net reductions in observed incorporation sizes.</p>
         <p>Our approach detects significant clusters of phylogenetically informative SNDs, but does not tell us which lineages participated in the recombination. When presented with four OTUs, recombination is possible between six undirected donor-recipient pairs: KO, CS, KS, OC, KC, and OS. These alternative histories can be jointly represented as a phylogenetic network (Figure <figr fid="F7">7</figr>).</p>
         <fig id="F7">
            <title>
               <p>Figure 7</p>
            </title>
            <caption>
               <p>Percentage of SNDs supporting each of three topologies in a phylogenetic network for six <it>E. coli </it>genomes (four OTUs)</p>
            </caption>
            <text>
               <p>Percentage of SNDs supporting each of three topologies in a phylogenetic network for six <it>E. coli </it>genomes (four OTUs). Black lines describe the 'species' topology. Green, blue, and orange lines indicate the alternative pairings of sister taxa that result from KS, KO, and KC recombinations, respectively. Also shown is the percentage of SNDs supporting each bipartition in Table 1.</p>
            </text>
            <graphic file="gb-2006-7-5-r44-7"/>
         </fig>
         <fig id="F8">
            <title>
               <p>Figure 8</p>
            </title>
            <caption>
               <p>The location of all SNDs in a 5 kb region</p>
            </caption>
            <text>
               <p>The location of all SNDs in a 5 kb region. In clusters demarcated by colored lines, note the corresponding absence of two more common types of SNDs. Three diamonds in lighter shades of blue, green, and red are compatible tri-partitions (see Additional data file 1). Colored lines demarcate regions where the absence of lineage-specific SNDs is offset by an increase in the corresponding recombinant pattern (for example, in <it>yiaA</it>, no K-12 or <it>S. flexneri </it>only SNDs).</p>
            </text>
            <graphic file="gb-2006-7-5-r44-8"/>
         </fig>
         <p>For example, a high scoring KC segment indicates that the donor and recipient lineages are either K-12 and CFT, or O157:H7 and <it>S. flexneri</it>. Exactly which pair of lineages is involved in the transfer can sometimes be determined by examining the joint distribution of all seven SND patterns. Recombinant activity in <it>glyS </it>and the four genes to its right is illustrated in Figure <figr fid="F8">8</figr>.</p>
         <p>The colored intervals in Figure <figr fid="F7">7</figr> share a common feature: the presence of topologically informative SNDs is accompanied by the absence of SNDs from two paired sister taxa. For example, no 'O157 only' or '<it>Shigella </it>only' SNDs are present in the KC/OS interval inside <it>glyS</it>, strongly suggesting that the O157:H7 and <it>S. flexneri </it>lineages were involved in the transfer. The other two intervals coincide with gene boundaries. When viewed in isolation, the genes <it>yiaA </it>and <it>yiaH </it>appear to be reasonable candidates for recombination. Yet only the KC recombinant inside the <it>glyS </it>gene is detectable by our whole genome significance thresholds.</p>
         <p>Sequence divergence can reduce the likelihood that homologous recombination occurs between orthologous genes, but does not address the underlying mechanisms that lead to divergence in the presence of rampant recombination. The restriction of different lineages of bacteria to distinct niches could act to prevent gene flow, but in the case of <it>E. coli </it>and <it>Salmonella</it>, the niches overlap. The barriers to exchange might also reflect more active exclusion of foreign DNA by mechanisms such as restriction enzyme expression. Perhaps the most appealing explanation for the phenomenon would invoke the activity of bacteriophages, transposons and conjugation-promoting elements as the key determinants of recombinational potential between taxa. Given the propensity of these mobile elements to participate in genetic exchange within species and their often narrow host ranges, we might expect that they promote recombination within a species but cannot transfer to more diverse organisms. The lack of extensive recombination of orthologous sequences between species may result from a competition between bacteria and phage that can activate rapid evolution of barriers to phage infection. Our estimate for a higher rate of homologous recombination among <it>E. coli </it>underscores the discrepancy between rates of intraspecies recombination, which appear to be quite common, and rates of recombination of orthologous genes between species such as <it>E. coli </it>and <it>Salmonella</it>, which appear to be much less frequent <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>Earlier comparisons of different <it>E. coli </it>strains <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B11">11</abbr><abbr bid="B14">14</abbr><abbr bid="B50">50</abbr></abbrgrp> found recombination among several distinct sets of genes. The affected genes in these studies were not randomly selected and may not have been representative of the shared gene complement. Although our method surveys all genes, the genomes we compared are heavily skewed towards human pathogens. As additional <it>E. coli </it>strains are sequenced, the role of homologous recombination in bacterial genome evolution will become clearer, and may force reassessment of traditional methods for describing relationships among bacterial taxa <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B51">51</abbr></abbrgrp>.</p>
         <p>Our analytical methods are straightforward here because the number of unrooted topologies is the same as the number of topologically informative bipartitions. This correspondence decays exponentially as more operational taxonomic units are added. Sometimes going from four OTUs to five requires a new analytic procedure (for example, see <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>). We leave the challenging problem of extension to more taxa for future work.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We demonstrate that the rate of intraspecies recombination in <it>E. coli </it>is much higher than previously appreciated and may show a bias for certain types of genes. The described method provides high-specificity, conservative inference of past recombination events.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <p>The Mauve alignment tool produces an output file containing separate alignments for each locally collinear block. Concatenation of LCBs results in a G &#215; M matrix of nucleotides and gap symbols, where G is the number of genomes and M is the length of gapped alignments across all blocks. Each matrix column represents one site in the consolidated alignment. Restricting attention to columns containing at least one nucleotide difference but no gaps results in a G &#215; M' sub-matrix &#916; composed solely of single nucleotide differences. Automated screening of the Mauve alignment (Figure <figr fid="F1">1</figr>) filtered out SNDs in regions of poor alignment quality, resulting in a &#916; with dimension 6 by 130,008 (see Appendix 4 in Additional data file 1 for protocol employed).</p>
         <p>Numerous scoring schemes have been devised to identify and assess the statistical significance of molecular sequence features on a genomic scale <abbrgrp><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr></abbrgrp>. One general approach calculates average scores within a sliding window (for example, <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>). We use an equally versatile method that computes cumulative scores based on a score function, evaluated at each column of &#916; (see <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> for other applications).</p>
         <p>Let &#926; = {<it>KS</it>, <it>KC</it>, <it>KO</it>} represent the three discordant SND patterns in Table <tblr tid="T1">1</tblr>, and let &#968;<sub>&#958; </sub>be the unrooted topology compatible with pattern &#958; &#8712; &#926;. We define three complementary score functions on SNDs to filter conflicting phylogenetic signals:</p>
         <p>
            <graphic file="gb-2006-7-5-r44-i1.gif"/>
         </p>
         <p>where <it>s </it>is a SND and &#966;(<it>s</it>) is the corresponding partition pattern in Table <tblr tid="T1">1</tblr>, and <it>D </it>= 13. For a given &#958; &#8712; &#926;, the cumulative score at the <it>nth </it>column in &#916; is the partial sum:</p>
         <p>
            <graphic file="gb-2006-7-5-r44-i2.gif"/>
         </p>
         <p>These score functions share a key characteristic of alignment scoring schemes; both generate high scoring segments that identify regions of interest. In the case of alignments, a high score segment represents a likely sequence homology. A significant difference between our analysis and sequence alignment is that substitution matrices are empirically derived from a test set (for example, PAM or BLOSUM). Here, <it>D </it>is not a parameter in an underlying stochastic model of evolution, but rather a tuning parameter in a diagnostic specifically designed to detect recombination. The value <it>D </it>= 13 was inspired by the observation that the most frequent topologically informative pattern, KS, has an observed frequency of 7.6%, approximately the reciprocal of 13. Alternative integer values were tried and rejected.</p>
         <p>Score functions generate high scoring segments whenever they encounter a cluster of SND patterns supporting one topology but are discordant with other choices. For a given topology &#968;<sub>&#958;</sub>, we define <it>Score</it><sub>&#958;</sub>(&#951;) to take on positive values when pattern &#951; is &#958; and negative values otherwise (&#951; &#8800; &#958;,). As discordant patterns are antithetical to one another, their weights should be equal to but opposite from the one being scanned. Neutral SND patterns are not individually disruptive to the underlying signal, but in aggregate they degrade the signal. These non-informative patterns are down-weighted and made integer-valued as in substitution matrices.</p>
         <p>Hence, a large local score - the equivalent of a high scoring segment - is evidence for recombination between two of the lineages paired by &#958; (for example, &#958; = <it>KS </it>associates K-12 with <it>S. flexneri </it>and O157:H7 with CFT).</p>
         <p>Random walk plots connect the dots' between partial sums that are computed from SNDs as they occur in &#916;. By contrast, random walks are translation invariant stochastic processes governed by the relative frequencies in &#916;, irrespective of order. We augment the random walk transition probabilities with an additional 'terminator' state. Terminators break a global alignment into several smaller sub-alignments, and are used to represent alignment fragmentation caused by 'large' gaps (>15 bp in one lineage), spurious alignments, or LCB boundaries (Figure <figr fid="F1">1</figr>). Accordingly, for each &#958; &#8712; &#926;, random walk increments are distributed according to the following probabilities:</p>
         <p>
            <graphic file="gb-2006-7-5-r44-i3.gif"/>
         </p>
         <p>where <it>D </it>= 13, &#960;<sub><it>KO </it></sub>= 0.048, &#960;<sub><it>KS </it></sub>= 0.076, &#960;<sub><it>OS </it></sub>= 0.045, &#960;<sub><it>other </it></sub>= 0.826, &#960;<sub><it>break </it></sub>= 0.005 and <graphic file="gb-2006-7-5-r44-i4.gif"/></p>
         <p>Since the expected value <it>E</it>(<it>X</it><sup>&#958;</sup>) &lt; 0,&#8704;&#958;, sums of these identically distributed variables generate transient random walks. Random stopping times, defined recursively by:</p>
         <p>
            <graphic file="gb-2006-7-5-r44-i5.gif"/>
         </p>
         <p>form a strictly decreasing set of ladder points. Though <it>S</it><sub><it>k </it></sub>depends on &#958;, we suppress it for ease of exposition. The horizontal distances between consecutive ladder points: &#964;<sub><it>k</it>+1 </sub>- &#964;<sub><it>k</it></sub>, are called ladder epochs. The local record height (LRH) of the <it>kth </it>epoch is defined by:</p>
         <p>
            <graphic file="gb-2006-7-5-r44-i6.gif"/>
         </p>
         <p>Ladder epochs measure the size of a high scoring segment in SND units rather than base pairs (chain length M' versus M). The number of ladder epochs in a random walk of size <it>N </it>is denoted by &#923;(<it>N</it>). The distribution of the maximum value in a sequence of local record heights is an extreme value distribution (EVD) with parameterization:</p>
         <p>
            <graphic file="gb-2006-7-5-r44-i7.gif"/>
         </p>
         <p>Here &#956; is the positive solution of an equation involving the moment generating function:</p>
         <p>
            <graphic file="gb-2006-7-5-r44-i8.gif"/>
         </p>
         <p>The value of &#956; is solved for numerically. For &#968;<sub><it>KC</it></sub>, the equation:</p>
         <p><it>mgf</it><sub><it>KC</it></sub>(&#956;) = 0.045<it>e</it><sup>13&#956; </sup>+ .124<it>e</it><sup>-13&#956; </sup>+ .826<it>e</it><sup>-&#956; </sup>+ .005<it>e</it><sup>-100,000&#956; </sup>= 1</p>
         <p>has a positive solution at &#956; = 0.1354 (&#956; = 0 is a trivial solution). The value of <it>K </it>can be computed as a rapidly converging infinite sum (see appendix of <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>). We chose instead to simulate 2,000 random walks of size <it>N </it>= 10,000 using the statistical package R <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. The largest local record height attained over the course of each simulation is saved. The functional form of the EVD (equation 1) is then fit to a probability histogram of 2,000 stored maxima. The estimated values of K and &#923; are combined with an <it>N </it>= M' to adjust for the actual alignment size (M' = 129,000 after excluding the atypical region) in each EVD. The densities of the three EVDs are plotted in Figure <figr fid="F9">9</figr>.</p>
         <fig id="F9">
            <title>
               <p>Figure 9</p>
            </title>
            <caption>
               <p>Statistical justification of threshold values - 100, 100, and 170 for topologies KO, KC, and KS, respectively - used to identify recombination events</p>
            </caption>
            <text>
               <p>Statistical justification of threshold values - 100, 100, and 170 for topologies KO, KC, and KS, respectively - used to identify recombination events. Values on the x-axis are maximal local scores. EVD probability densities for the maximum maximal local score attained by random walks of length M' appear as bell-shaped curves with a pronounced skew to the right. Threshold values, demarcated by vertical lines, correspond to conservative significance levels (&#945; = 0.05) for these distributions.</p>
            </text>
            <graphic file="gb-2006-7-5-r44-9"/>
         </fig>
         <p>Ladder points, ladder epochs, and local record heights are easily computed with a few simple R commands. Finding minimal significant clusters - a smallest possible cluster of SNDs with a significant score - is more challenging. A na&#239;ve approach takes each SND within a high scoring segment as the start of some local score, then iteratively adds successive terms to local scores in parallel until one of the sums exceeds the threshold. The SNDs producing that sum constitute the first MSC. The process continues on the remaining sums to seek out additional, non-overlapping MSCs. The algorithm is <it>O</it>(<it>n</it><sup>2</sup>) in the number of SNDs. Such a brute force approach works here because alignment gaps split the problem into 186 small pieces, the largest of which contains fewer than 700 SNDs.</p>
         <sec>
            <st>
               <p>Accession numbers</p>
            </st>
            <p>Deposited accession numbers are: <it>Escherichia coli </it>CFT073 [GenBank:AE014075]; <it>Escherichia coli </it>K-12 MG1655 [GenBank:U00096]; <it>Escherichia coli </it>O157:H7: RIMD0509952 (Sakai) [GenBank:BA000007]; <it>Escherichia coli </it>O157:H7: EDL933: [GenBank:AE005174]; <it>Shigella flexneri </it>2a str.2457T: [GenBank:AE014073]; <it>Shigella flexneri </it>2a str.301: [GenBank:AE005674].</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a PDF document containing five appendices. Appendix 1 shows the distribution of rare SNDS supplementing Table <tblr tid="T1">1</tblr>. Appendix 2 shows the comparative analysis of the large atypical region. Appendix 3 shows genes uniquely present in 13 &#947;-proteobacteria that have undergone homologous recombination between the four lineages of <it>E. coli</it>. Appendix 4 contains the screening protocols used to delete erroneous alignment of non-homologous sequence. Appendix 5 shows the local deviation in the rate of mutation among the six genomes. Additional data file <supplr sid="S2">2</supplr> is a spreadsheet enumerating all HSS, MSC, and affected genes in this analysis. Additional data file <supplr sid="S3">3</supplr> is a text file of all 130,008 SNDs by pattern and location in K-12 MG1655 coordinates.</p>
         <suppl id="S1">
            <title>
               <p>Additional File 1</p>
            </title>
            <caption>
               <p>Five appendices</p>
            </caption>
            <text>
               <p>Five appendices.</p>
            </text>
            <file name="gb-2006-7-5-r44-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional File 2</p>
            </title>
            <caption>
               <p>Enumeration of all HSS, MSC, and affected genes in this analysis</p>
            </caption>
            <text>
               <p>Enumeration of all HSS, MSC, and affected genes in this analysis.</p>
            </text>
            <file name="gb-2006-7-5-r44-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional File 3</p>
            </title>
            <caption>
               <p>All 130,008 SNDs by pattern and location in K-12 MG1655 coordinates</p>
            </caption>
            <text>
               <p>All 130,008 SNDs by pattern and location in K-12 MG1655 coordinates.</p>
            </text>
            <file name="gb-2006-7-5-r44-S3.txt">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors wish to thank Professor Frederick R Blattner for his advice, and two anonymous referees for keeping us honest. Funding for this research was provided by NIH Grant GM62994-02. AED was supported in part by NLM Training Grant 5T15M007359-04.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Recombination and population structure in <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Milkman</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1997</pubdate>
            <volume>146</volume>
            <fpage>745</fpage>
            <lpage>750</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9215884</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Phylogenetics and the cohesion of bacterial genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Moran</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>301</volume>
            <fpage>829</fpage>
            <lpage>832</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1086568</pubid>
                  <pubid idtype="pmpid" link="fulltext">12907801</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The relative contributions of recombination and mutation to the divergence of clones of <it>Neisseria meningitidis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Feil</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Maiden</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Achtman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Spratt</snm>
                  <fnm>BG</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1999</pubdate>
            <volume>16</volume>
            <fpage>1496</fpage>
            <lpage>1502</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10555280</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The relative contributions of recombination and point mutation to the diversification of bacterial clones.</p>
            </title>
            <aug>
               <au>
                  <snm>Spratt</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Hanage</snm>
                  <fnm>WP</fnm>
               </au>
               <au>
                  <snm>Feil</snm>
                  <fnm>EJ</fnm>
               </au>
            </aug>
            <source>Curr Opin Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>4</volume>
            <fpage>602</fpage>
            <lpage>606</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1369-5274(00)00257-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">11587939</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Prokaryotic evolution in light of gene transfer.</p>
            </title>
            <aug>
               <au>
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>2226</fpage>
            <lpage>2238</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12446813</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Lateral gene transfer: when will adolescence end?</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Hendrickson</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>50</volume>
            <fpage>739</fpage>
            <lpage>749</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2003.03778.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">14617137</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-proteobacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Lerat</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Moran</snm>
                  <fnm>NA</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2003</pubdate>
            <volume>1</volume>
            <fpage>E19</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193605</pubid>
                  <pubid idtype="pmpid" link="fulltext">12975657</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0000019</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Examining bacterial species under the specter of gene transfer and exchange.</p>
            </title>
            <aug>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lerat</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <issue>Suppl 1</issue>
            <fpage>6595</fpage>
            <lpage>6599</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1131874</pubid>
                  <pubid idtype="pmpid" link="fulltext">15851673</pubid>
                  <pubid idtype="doi">10.1073/pnas.0502035102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The cobweb of life revealed by genome-scale estimates of horizontal gene transfer.</p>
            </title>
            <aug>
               <au>
                  <snm>Ge</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>L-S</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>e316</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16122348</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0030316</pubid>
                  <pubid idtype="pmcid">1233574</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Highways of gene sharing in prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Beiko</snm>
                  <fnm>RG</fnm>
               </au>
               <au>
                  <snm>Harlow</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Ragan</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>14332</fpage>
            <lpage>14337</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1242295</pubid>
                  <pubid idtype="pmpid" link="fulltext">16176988</pubid>
                  <pubid idtype="doi">10.1073/pnas.0504068102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Recombination in <it>Escherichia coli</it> and the definition of biological species.</p>
            </title>
            <aug>
               <au>
                  <snm>Dykhuizen</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1991</pubdate>
            <volume>173</volume>
            <fpage>7257</fpage>
            <lpage>7268</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">209233</pubid>
                  <pubid idtype="pmpid">1938920</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Interspecies recombination between the penA genes of <it>Neisseria meningitidis</it> and commensal <it>Neisseria</it> species during the emergence of penicillin resistance in <it>N. meningitidis</it>: natural events and laboratory simulation.</p>
            </title>
            <aug>
               <au>
                  <snm>Bowler</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>QY</fnm>
               </au>
               <au>
                  <snm>Riou</snm>
                  <fnm>JY</fnm>
               </au>
               <au>
                  <snm>Spratt</snm>
                  <fnm>BG</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1994</pubdate>
            <volume>176</volume>
            <fpage>333</fpage>
            <lpage>337</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">205054</pubid>
                  <pubid idtype="pmpid">8288526</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Free recombination within <it>Helicobacter pylori</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Suerbaum</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Bapumia</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Morelli</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Kunstmann</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dyrek</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Achtman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>12619</fpage>
            <lpage>12624</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">22880</pubid>
                  <pubid idtype="pmpid" link="fulltext">9770535</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.21.12619</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Parallel evolution of virulence in pathogenic <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Reid</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Herbelin</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Bumbaugh</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Selander</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Whittam</snm>
                  <fnm>TS</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <fpage>64</fpage>
            <lpage>67</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35017546</pubid>
                  <pubid idtype="pmpid" link="fulltext">10894541</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>ACT: the Artemis Comparison Tool.</p>
            </title>
            <aug>
               <au>
                  <snm>Carver</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Berriman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3422</fpage>
            <lpage>3423</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti553</pubid>
                  <pubid idtype="pmpid" link="fulltext">15976072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Versatile and open software for comparing large genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Phillippy</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Delcher</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Smoot</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shumway</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Antonescu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R12</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395750</pubid>
                  <pubid idtype="pmpid" link="fulltext">14759262</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-2-r12</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Mauve: multiple alignment of conserved genomic sequence with rearrangements.</p>
            </title>
            <aug>
               <au>
                  <snm>Darling</snm>
                  <fnm>ACE</fnm>
               </au>
               <au>
                  <snm>Mau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Blattner</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1394</fpage>
            <lpage>1403</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">442156</pubid>
                  <pubid idtype="pmpid" link="fulltext">15231754</pubid>
                  <pubid idtype="doi">10.1101/gr.2289704</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Genome sequence of enterohaemorrhagic <it>Escherichia coli</it> O157:H7.</p>
            </title>
            <aug>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Plunkett</snm>
                  <fnm>G</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Burland</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Mau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Glasner</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Mayhew</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Gregor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kirkpatrick</snm>
                  <fnm>HA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>529</fpage>
            <lpage>533</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35054089</pubid>
                  <pubid idtype="pmpid" link="fulltext">11206551</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Complete genome sequence of a multiple drug resistant <it>Salmonella enterica</it> serovar Typhi CT18.</p>
            </title>
            <aug>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dougan</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>James</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Thomson</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>Pickard</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Wain</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Churcher</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Bentley</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Holden</snm>
                  <fnm>MT</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>413</volume>
            <fpage>848</fpage>
            <lpage>852</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35101607</pubid>
                  <pubid idtype="pmpid" link="fulltext">11677608</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Genome analysis of multiple pathogenic isolates of <it>Streptococcus agalactiae</it>: Implications for the microbial "pan-genome".</p>
            </title>
            <aug>
               <au>
                  <snm>Tettelin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Masignani</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cieslewicz</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Donati</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Medini</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Angiuoli</snm>
                  <fnm>SV</fnm>
               </au>
               <au>
                  <snm>Crabtree</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Durkin</snm>
                  <fnm>AS</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>13950</fpage>
            <lpage>13955</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1216834</pubid>
                  <pubid idtype="pmpid" link="fulltext">16172379</pubid>
                  <pubid idtype="doi">10.1073/pnas.0506758102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Evidence of a Large Novel Gene Pool Associated with Prokaryotic Genomic Islands.</p>
            </title>
            <aug>
               <au>
                  <snm>Hsiao</snm>
                  <fnm>WW</fnm>
               </au>
               <au>
                  <snm>Ung</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Aeschliman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bryan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Finlay</snm>
                  <fnm>BB</fnm>
               </au>
               <au>
                  <snm>Brinkman</snm>
                  <fnm>FS</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <fpage>e62</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1285063</pubid>
                  <pubid idtype="pmpid" link="fulltext">16299586</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0010062</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Recombination in evolutionary genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Posada</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Crandall</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Holmes</snm>
                  <fnm>EC</fnm>
               </au>
            </aug>
            <source>Annu Rev Genet</source>
            <pubdate>2002</pubdate>
            <volume>36</volume>
            <fpage>75</fpage>
            <lpage>97</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.36.040202.111115</pubid>
                  <pubid idtype="pmpid" link="fulltext">12429687</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Stepwise detection of recombination breakpoints in sequence alignments.</p>
            </title>
            <aug>
               <au>
                  <snm>Graham</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>McNeney</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Seillier-Moiseiwitsch</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>589</fpage>
            <lpage>595</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti040</pubid>
                  <pubid idtype="pmpid" link="fulltext">15388518</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Statistical methods of DNA sequence analysis: detection of intragenic recombination or gene conversion.</p>
            </title>
            <aug>
               <au>
                  <snm>Stephens</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1985</pubdate>
            <volume>2</volume>
            <fpage>539</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3870876</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Detecting recombination from gene trees.</p>
            </title>
            <aug>
               <au>
                  <snm>Maynard Smith</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>NH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1998</pubdate>
            <volume>15</volume>
            <fpage>590</fpage>
            <lpage>599</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9580989</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Genetic exchange and plasmid transfers in <it>Borrelia burgdorferi sensu stricto</it> revealed by three-way genome comparisons and multilocus sequence typing.</p>
            </title>
            <aug>
               <au>
                  <snm>Qiu</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Schutzer</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Bruno</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Attie</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Dunn</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Casjens</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Luft</snm>
                  <fnm>BJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>14150</fpage>
            <lpage>14155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">521097</pubid>
                  <pubid idtype="pmpid" link="fulltext">15375210</pubid>
                  <pubid idtype="doi">10.1073/pnas.0402745101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Statistical tests for detecting gene conversion.</p>
            </title>
            <aug>
               <au>
                  <snm>Sawyer</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1989</pubdate>
            <volume>6</volume>
            <fpage>526</fpage>
            <lpage>538</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2677599</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacteria, and mitochondria.</p>
            </title>
            <aug>
               <au>
                  <snm>Worobey</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>1425</fpage>
            <lpage>1434</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11470833</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>A likelihood method for the detection of selection and recombination using nucleotide sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Grassly</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Holmes</snm>
                  <fnm>EC</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1997</pubdate>
            <volume>14</volume>
            <fpage>239</fpage>
            <lpage>247</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9066792</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Detecting recombination with MCMC.</p>
            </title>
            <aug>
               <au>
                  <snm>Husmeier</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>McGuire</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>Suppl 1</issue>
            <fpage>S345</fpage>
            <lpage>353</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12169565</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>TOPAL 2.0: improved detection of mosaic sequences within multiple alignments.</p>
            </title>
            <aug>
               <au>
                  <snm>McGuire</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wright</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>130</fpage>
            <lpage>134</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.2.130</pubid>
                  <pubid idtype="pmpid" link="fulltext">10842734</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Dual multiple change-point model leads to more accurate recombination detection.</p>
            </title>
            <aug>
               <au>
                  <snm>Minin</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Dorman</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Fang</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Suchard</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>3034</fpage>
            <lpage>3042</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti459</pubid>
                  <pubid idtype="pmpid" link="fulltext">15914546</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Evaluation of methods for detecting recombination from DNA sequences: computer simulations.</p>
            </title>
            <aug>
               <au>
                  <snm>Posada</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Crandall</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>13757</fpage>
            <lpage>13762</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">61114</pubid>
                  <pubid idtype="pmpid" link="fulltext">11717435</pubid>
                  <pubid idtype="doi">10.1073/pnas.241370698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Whole-genome analysis of photosynthetic prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Raymond</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhaxybayeva</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Gerdes</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Blankenship</snm>
                  <fnm>RE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>298</volume>
            <fpage>1616</fpage>
            <lpage>1620</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1075558</pubid>
                  <pubid idtype="pmpid" link="fulltext">12446909</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>A hybrid clustering approach to recognition of protein families in 114 microbial genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Harlow</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Ragan</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>45</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">420232</pubid>
                  <pubid idtype="pmpid" link="fulltext">15115543</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-45</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Bayesian phylogenetic inference via Markov chain Monte Carlo methods.</p>
            </title>
            <aug>
               <au>
                  <snm>Mau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Newton</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Larget</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Biometrics</source>
            <pubdate>1999</pubdate>
            <volume>55</volume>
            <fpage>1</fpage>
            <lpage>12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.0006-341X.1999.00001.x</pubid>
                  <pubid idtype="pmpid">11318142</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>MrBayes 3: Bayesian phylogenetic inference under mixed models.</p>
            </title>
            <aug>
               <au>
                  <snm>Ronquist</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>1572</fpage>
            <lpage>1574</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg180</pubid>
                  <pubid idtype="pmpid" link="fulltext">12912839</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Basic local alignment search tool.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1990</pubdate>
            <volume>87</volume>
            <fpage>2264</fpage>
            <lpage>2268</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">53667</pubid>
                  <pubid idtype="pmpid" link="fulltext">2315319</pubid>
                  <pubid idtype="doi">10.1073/pnas.87.6.2264</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The complete genome sequence of <it>Escherichia coli</it> K-12.</p>
            </title>
            <aug>
               <au>
                  <snm>Blattner</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Plunkett</snm>
                  <fnm>G</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Bloch</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Burland</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Collado-Vides</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Glasner</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Rode</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Mayhew</snm>
                  <fnm>GF</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>277</volume>
            <fpage>1453</fpage>
            <lpage>1474</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.277.5331.1453</pubid>
                  <pubid idtype="pmpid" link="fulltext">9278503</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Complete genome sequence and comparative genomics of <it>Shigella flexneri</it> serotype 2a strain 2457T.</p>
            </title>
            <aug>
               <au>
                  <snm>Wei</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Goldberg</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Burland</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Venkatesan</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Deng</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fournier</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mayhew</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Plunkett</snm>
                  <fnm>G</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Darling</snm>
                  <fnm>A</fnm>
               </au>
               <etal/>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2003</pubdate>
            <volume>71</volume>
            <fpage>2775</fpage>
            <lpage>2786</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">153260</pubid>
                  <pubid idtype="pmpid" link="fulltext">12704152</pubid>
                  <pubid idtype="doi">10.1128/IAI.71.5.2775-2786.2003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Genome sequence of <it>Shigella flexneri</it> 2a: insights into pathogenicity through comparison with genomes of <it>Escherichia coli</it> K12 and O157.</p>
            </title>
            <aug>
               <au>
                  <snm>Jin</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Yuan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>F</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>4432</fpage>
            <lpage>4441</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137130</pubid>
                  <pubid idtype="pmpid" link="fulltext">12384590</pubid>
                  <pubid idtype="doi">10.1093/nar/gkf566</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Complete genome sequence of enterohemorrhagic <it>Escherichia coli</it> O157:H7 and genomic comparison with a laboratory strain K-12.</p>
            </title>
            <aug>
               <au>
                  <snm>Hayashi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Makino</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ohnishi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kurokawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ishii</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yokoyama</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Ohtsubo</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Nakayama</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Murata</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>DNA Res</source>
            <pubdate>2001</pubdate>
            <volume>8</volume>
            <fpage>11</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/dnares/8.1.11</pubid>
                  <pubid idtype="pmpid" link="fulltext">11258796</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Extensive mosaic structure revealed by the complete genome sequence of uropathogenic <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Welch</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Burland</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Plunkett</snm>
                  <fnm>G</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Redford</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Roesch</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rasko</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Buckles</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Liou</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Boutin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hackett</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>17020</fpage>
            <lpage>17024</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">139262</pubid>
                  <pubid idtype="pmpid" link="fulltext">12471157</pubid>
                  <pubid idtype="doi">10.1073/pnas.252529799</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Molecular evolution of the <it>Escherichia coli</it> chromosome. I. Analysis of structure and natural variation in a previously uncharacterized region between trp and tonB.</p>
            </title>
            <aug>
               <au>
                  <snm>Stoltzfus</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Leslie</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Milkman</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1988</pubdate>
            <volume>120</volume>
            <fpage>345</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3058546</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>MultiFun, a multifunctional classification scheme for <it>Escherichia coli</it> K-12 gene products.</p>
            </title>
            <aug>
               <au>
                  <snm>Serres</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Microb Comp Genomics</source>
            <pubdate>2000</pubdate>
            <volume>5</volume>
            <fpage>205</fpage>
            <lpage>222</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11471834</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Glucose transporter mutants of <it>Escherichia coli</it> K-12 with changes in substrate recognition of IICB(Glc) and induction behavior of the ptsG gene.</p>
            </title>
            <aug>
               <au>
                  <snm>Zeppenfeld</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Larisch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lengeler</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Jahreis</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2000</pubdate>
            <volume>182</volume>
            <fpage>4443</fpage>
            <lpage>4452</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">94615</pubid>
                  <pubid idtype="pmpid" link="fulltext">10913077</pubid>
                  <pubid idtype="doi">10.1128/JB.182.16.4443-4452.2000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Evolution of mosaic operons by horizontal gene transfer and gene displacement <it>in situ</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Omelchenko</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Makarova</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Rogozin</snm>
                  <fnm>IB</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>R55</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193655</pubid>
                  <pubid idtype="pmpid" link="fulltext">12952534</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-9-r55</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Transduction, restriction and recombination patterns in <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>McKane</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Milkman</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1995</pubdate>
            <volume>139</volume>
            <fpage>35</fpage>
            <lpage>43</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7705636</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Clonal divergence in <it>Escherichia coli</it> as a result of recombination, not mutation.</p>
            </title>
            <aug>
               <au>
                  <snm>Guttman</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Dykhuizen</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1994</pubdate>
            <volume>266</volume>
            <fpage>1380</fpage>
            <lpage>1383</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7973728</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Recombination and the population structures of bacterial pathogens.</p>
            </title>
            <aug>
               <au>
                  <snm>Feil</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Spratt</snm>
                  <fnm>BG</fnm>
               </au>
            </aug>
            <source>Annu Rev Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>55</volume>
            <fpage>561</fpage>
            <lpage>590</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.micro.55.1.561</pubid>
                  <pubid idtype="pmpid" link="fulltext">11544367</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Visualization of the phylogenetic content of five genomes using dekapentagonal maps.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhaxybayeva</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Hamel</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Raymond</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R20</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395770</pubid>
                  <pubid idtype="pmpid" link="fulltext">15003123</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-3-r20</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Chance and statistical significance in protein and DNA sequence analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1992</pubdate>
            <volume>257</volume>
            <fpage>39</fpage>
            <lpage>49</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1621093</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Statistical methods and insights for protein and DNA sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brendel</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
            </aug>
            <source>Annu Rev Biophys Biophys Chem</source>
            <pubdate>1991</pubdate>
            <volume>20</volume>
            <fpage>175</fpage>
            <lpage>203</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bb.20.060191.001135</pubid>
                  <pubid idtype="pmpid">1867715</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Asymmetric substitution patterns in the two DNA strands of bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Lobry</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1996</pubdate>
            <volume>13</volume>
            <fpage>660</fpage>
            <lpage>665</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8676740</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Atypical regions in large genomic DNA sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Scherer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McPeek</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1994</pubdate>
            <volume>91</volume>
            <fpage>7134</fpage>
            <lpage>7138</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">44353</pubid>
                  <pubid idtype="pmpid" link="fulltext">8041759</pubid>
                  <pubid idtype="doi">10.1073/pnas.91.15.7134</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>The R Project for Statistical Computing</p>
            </title>
            <url>http://www.r-project.org/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
