<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2007-8-2-402</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Correspondence</dochead>
		<bibl>
			<title>
				<p>Systematic overestimation of gene gain through false diagnosis of gene absence</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Zhaxybayeva</snm>
					<fnm>Olga</fnm>
					<insr iid="I1"/>
					<email>olgazh@dal.ca</email>
				</au>
				<au id="A2">
					<snm>Nesb&#248;</snm>
					<mi>L</mi>
					<fnm>Camilla</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A3">
					<snm>Doolittle</snm>
					<fnm>W Ford</fnm>
					<insr iid="I1"/>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Biochemistry and Molecular Biology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 1X5 Canada</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2007</pubdate>
			<volume>8</volume>
			<issue>2</issue>
			<fpage>402</fpage>
			<url>http://genomebiology.com/2007/8/2/402</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">17328791</pubid><pubid idtype="doi">10.1186/gb-2007-8-2-402</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<pub>
				<date>
					<day>26</day>
					<month>02</month>
					<year>2007</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2007</year>
			<collab>BioMed Central Ltd</collab>
		</cpyrt>
		<shorttitle>
			<p>Systematic overestimation of gene gain</p>
		</shorttitle>
		<shortabs>
			<p>Usual BLAST-based methods for assessing gene presence and absence lead to systematic overestimation of within-species gene gain by lateral transfer.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<p>The usual BLAST-based methods for assessing gene presence and absence lead to systematic overestimation of within-species gene gain by lateral transfer.</p>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p/>
			</st>
			<p>Genomes from different strains of the same bacterial species often differ substantially (up to 30%) in gene content <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. There are two general ways to account for such gene content variability ('patchy distribution') among closely related genomes: strain-specific loss of genes after divergence from a common species ancestor that contained the genes, and strain-specific gain of genes after divergence from an ancestor that lacked them. Gain might be effected through lateral gene transfer (LGT), duplication (paralog creation) or, much less likely, <it>de novo </it>creation. Several recent publications have attempted to assess rates of within-species gain and loss using parsimony-based approaches applied to gene presence/absence data, in the context of a reference strain phylogeny <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Similar parsimony-based approaches have also been taken for inferences of gene gain/loss at larger phylogenetic distances <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>.</p>
			<p>In such analyses, a pattern like that shown in Figure <figr fid="F1">1a</figr> would be interpreted to indicate a single event of gain of a gene X not present in the species ancestor, after the separation of taxa 4 and 5. Explaining this distribution as the result of loss of a gene X initially present in the ancestor would, in contrast, require a minimum of four separate events, a seemingly less parsimonious scenario. However, reasoning by parsimony in such a situation requires difficult-to-test assumptions about the relative frequency of gain and loss events (that, for instance, losses are not four times more frequent than gains). Moreover, such reasoning is simply beside the point if we have some other sort of knowledge about the relevant processes that suggests we might be misled by appearances. Here we do know that gain (at least when it occurs by gene duplication or LGT) could be effectively instantaneous, but that loss will more commonly proceed gradually, through intermediates we might call pseudogenes and gene remnants (regions recognizable as gene-derived only by synteny and statistically significant sequence similarity to the parent gene). There is thus an inherent asymmetry between gain and loss both in terms of defining and of detecting them, and failure to recognize gene remnants will inevitably lead to mistaking a situation like that shown in Figure <figr fid="F1">1b</figr> (in which a gene present in the species' ancestor has deteriorated in all lineages but one) for the situation in Figure <figr fid="F1">1a</figr> (in which a gene absent from the ancestor has been gained in a single lineage). Our goal in the present analysis was to assess how often such mistakes might be made.</p>
			<fig id="F1">
				<title>
					<p>Figure 1</p>
				</title>
				<caption>
					<p>Illustration of parsimony inference from a gene/presence pattern and a reference tree topology</p>
				</caption>
				<text>
					<p>Illustration of parsimony inference from a gene/presence pattern and a reference tree topology. <b>(a,b) </b>Results of parsimonious inferences for the same gene family, with different criteria used to define presence/absence patterns. In (a) genes are divided into only two categories, present and absent, while in (b) the absent genes are further classified into gene remnants and genuinely absent.</p>
				</text>
				<graphic file="gb-2007-8-2-402-1"/>
			</fig>
			<p>Although prokaryotic genomes have traditionally been viewed as efficiently packed with functioning genes, and mutationally biased towards rapid deletion of dysfunctional regions <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, there are new indications that significant numbers of pseudogenes persist in some genomes <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. In addition, detailed analyses show that in reduced genomes such as those of <it>Rickettsia</it>, intergenic regions often represent decaying remnants of genes <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Some categorization more nuanced than 'presence' versus 'absence' might thus better capture genome history. But for gain-and-loss surveys of the sort cited there may seem to be no alternative to the binary approach. A gene is considered 'present' if represented by an open reading frame (ORF) showing significant similarity in sequence (with arbitrarily chosen significance cutoff) and having similar length to a query gene; otherwise it is scored as 'absent'. We systematically screened groups of closely related genomes (see Additional data files 1-3) for gene-family presence/absence patterns using several common criteria. When potential gene remnants detectable by less stringent methods are included, the number of gene families for which events of gain or loss within a species might be inferred (because they are scored as present only in some strains) can drop by as much as 90% (or as little as 7%) - on average about 60%. The extent to which recognition of such remnants will decrease estimates of the rates of gain of genes by LGT and increase estimates of the gene content of species' ancestors will depend on how recognition affects inferred patterns of presence and absence as displayed on a phylogeny of the species' strains. Each gene family must be individually examined, and where there is frequent between-strain recombination, not only is strain phylogeny a problematic concept <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, but it will sometimes be the case that gene remnants are themselves acquired by LGT.</p>
			<p>We have assessed the impact of more complete recognition of gene remnants in the simplest cases, those species for which only three genomes are available. We calculated the number of presence/absence patterns that change under different match-length requirements for the eight such groups in our dataset (Figure <figr fid="F2">2</figr>). Any change in any of the possible presence/absence patterns as a consequence of altered BLAST (Basic Local Alignment Search Tool) <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> criteria leads to a change in gain/loss inference on a three-taxon tree, and in all cases to a change in inferred ancestral state. Such numbers are not negligible in comparison with the total number of inferred presence/absence patterns (see the last row in Figure <figr fid="F2">2</figr>).</p>
			<fig id="F2">
				<title>
					<p>Figure 2</p>
				</title>
				<caption>
					<p>The analysis of patchily distributed gene families that change their state (present or absent) in different genomes under two different selection criteria for gene families</p>
				</caption>
				<text>
					<p>The analysis of patchily distributed gene families that change their state (present or absent) in different genomes under two different selection criteria for gene families. Eight groups of three genomes each were analyzed. In one selection scheme, a match-length requirement of 85% in BLASTN was imposed (stringent selection), while in the other there was no match-length requirement in BLASTN (relaxed selection). Corresponding gene families constructed under the two criteria were compared and classified into all possible types of gene families (total 3<sup>3 </sup>= 27). Of these, only those types of gene families (12) where at least one gene is present under both criteria, and where at least one gene changes its state under the two criteria, are shown. They are coded as filled circles (present under both criteria), empty circles (absent under both criteria) and half-filled circles (absent under the stringent criterion and present under the relaxed criterion). Numbers in the figure indicate the number of patchily distributed gene families that change their state when under two different selection criteria. The last row is the total number of gene families for which differences in history might be incorrectly inferred, expressed as a percentage of total gene families detected as present in one or two, but not three, genomes in a genome group. The total number of gene families used in the calculation is listed in the second table in Additional data file 2. Branches on the three-taxon tree are denoted as <it>a</it>, <it>b</it>, <it>c </it>and <it>d</it>. G, gain; L, loss; A, ambiguous (both gain and loss are equally parsimonious); C, core (that is, present in all three genomes). The subscript refers to the branch on which the event is inferred. For the list of genomes in each group see Additional data file 3.</p>
				</text>
				<graphic file="gb-2007-8-2-402-2"/>
			</fig>
			<p>Therefore, without agreed-upon definitions of presence/absence and reliable methods of detection, quantitation of rates of within-species gene gain have questionable meaning. It is both a practical concern and of theoretical interest that we really do not have a definition for gene loss. It is not clear where - along the line from the appearance of the first subtly deleterious regulatory or missense mutation to the deletion of the last nucleotide - we would agree to declare a gene to be lost. Parsimony-based inferences depend on how we make that declaration, but most quantitative treatments of gene loss in evolution avoid this question altogether. Moreover, in recombinogenic species, the possibility of exchange of remnants of inactivated genes between lineages means that there will be additional difficulties in reconstructing the decay process for individual genes. Indeed, in highly recombinogenic groups such as <it>Neisseria</it>, where homologous recombination, not mutation, is the principal source of between-strain sequence variation <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, it should seldom be possible to reconstruct the loss of an individual gene as a linear process of decay. These problems are of practical concern, as inferences about gain and loss dominate discussion of the evolution of pathogenicity and environmental adaptation within species. They are also of theoretical interest, bearing on the use of parsimony in evolutionary reconstruction.</p>
			<p>As a matter of good practice, no claim that strains of the same species differ in gene content should be based on BLAST results alone, as differences in annotation abound and even BLASTing a single genome against itself does not recover all its annotated ORFs. No BLASTP+BLASTN-based estimate of the number of genes that a genome must have received by LGT (because they are absent from sister lineages in the same species) should be accepted without recognition that it is probably too high, possibly by several-fold. Species seem to differ in the extent to which such estimates are sensitive to BLAST parameters, and it is unlikely that optimal parameters - could these somehow be established - would be the same for all species groups. Ideally, all gene families would be examined for even highly decayed remnants.</p>
		</sec>
		<sec>
			<st>
				<p>Additional data files</p>
			</st>
			<p>The following additional data are available online with this paper. Additional data file <supplr sid="S1">1</supplr> contains Materials and methods for the analyses performed. Additional data file <supplr sid="S2">2</supplr> describes in detail the comparison of different BLAST-based criteria for presence/absence detection. Additional data file <supplr sid="S3">3</supplr> is a table listing the composition of the analyzed genome groups.</p>
			<suppl id="S1">
				<title>
					<p>Additional data file 1</p>
				</title>
				<caption>
					<p>Materials and methods</p>
				</caption>
				<text>
					<p>Materials and methods.</p>
				</text>
				<file name="gb-2007-8-2-402-S1.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S2">
				<title>
					<p>Additional data file 2</p>
				</title>
				<caption>
					<p>Comparison of different BLAST criteria to detect gene presence/absence</p>
				</caption>
				<text>
					<p>Comparison of different BLAST criteria to detect gene presence/absence.</p>
				</text>
				<file name="gb-2007-8-2-402-S2.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S3">
				<title>
					<p>Additional data file 3</p>
				</title>
				<caption>
					<p>Thirty-two groups of genomes with ANI = 94% within each group</p>
				</caption>
				<text>
					<p>Thirty-two groups of genomes with ANI = 94% within each group.</p>
				</text>
				<file name="gb-2007-8-2-402-S3.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>This work was supported through CIHR (MOP-4467) and Genome Atlantic (Genome Canada) grants to W.F.D. O.Z. is supported through a CIHR Postdoctoral Fellowship and is an honorary Killam Postdoctoral Fellow at Dalhousie University. O.Z., C.L.N. and W.F.D. designed the study. O.Z. carried out all analyses. O.Z. and W.F.D. wrote the manuscript.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Extensive mosaic structure revealed by the complete genome sequence of uropathogenic <it>Escherichia coli</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Welch</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Burland</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Plunkett</snm>
						<fnm>G</fnm>
						<suf>3rd</suf>
					</au>
					<au>
						<snm>Redford</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Roesch</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Rasko</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Buckles</snm>
						<fnm>EL</fnm>
					</au>
					<au>
						<snm>Liou</snm>
						<fnm>SR</fnm>
					</au>
					<au>
						<snm>Boutin</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Hackett</snm>
						<fnm>J</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>17020</fpage>
				<lpage>17024</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">139262</pubid>
						<pubid idtype="pmpid" link="fulltext">12471157</pubid>
						<pubid idtype="doi">10.1073/pnas.252529799</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>The genome sequence of <it>Bacillus cereus </it>ATCC 10987 reveals metabolic adaptations and a large plasmid related to <it>Bacillus anthracis </it>pXO1.</p>
				</title>
				<aug>
					<au>
						<snm>Rasko</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Ravel</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Okstad</snm>
						<fnm>OA</fnm>
					</au>
					<au>
						<snm>Helgason</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Cer</snm>
						<fnm>RZ</fnm>
					</au>
					<au>
						<snm>Jiang</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Shores</snm>
						<fnm>KA</fnm>
					</au>
					<au>
						<snm>Fouts</snm>
						<fnm>DE</fnm>
					</au>
					<au>
						<snm>Tourasse</snm>
						<fnm>NJ</fnm>
					</au>
					<au>
						<snm>Angiuoli</snm>
						<fnm>SV</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<fpage>977</fpage>
				<lpage>988</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">373394</pubid>
						<pubid idtype="pmpid" link="fulltext">14960714</pubid>
						<pubid idtype="doi">10.1093/nar/gkh258</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Complete genome sequence of the plant commensal <it>Pseudomonas fluorescens </it>Pf-5.</p>
				</title>
				<aug>
					<au>
						<snm>Paulsen</snm>
						<fnm>IT</fnm>
					</au>
					<au>
						<snm>Press</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Ravel</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kobayashi</snm>
						<fnm>DY</fnm>
					</au>
					<au>
						<snm>Myers</snm>
						<fnm>GSA</fnm>
					</au>
					<au>
						<snm>Mavrodi</snm>
						<fnm>DV</fnm>
					</au>
					<au>
						<snm>DeBoy</snm>
						<fnm>RT</fnm>
					</au>
					<au>
						<snm>Seshadri</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Ren</snm>
						<fnm>Q</fnm>
					</au>
					<au>
						<snm>Madupu</snm>
						<fnm>R</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2005</pubdate>
				<volume>23</volume>
				<fpage>873</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nbt1110</pubid>
						<pubid idtype="pmpid" link="fulltext">15980861</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Gene transfer and genome plasticity in <it>Thermotoga maritima</it>, a model hyperthermophilic species.</p>
				</title>
				<aug>
					<au>
						<snm>Mongodin</snm>
						<fnm>EF</fnm>
					</au>
					<au>
						<snm>Hance</snm>
						<fnm>IR</fnm>
					</au>
					<au>
						<snm>Deboy</snm>
						<fnm>RT</fnm>
					</au>
					<au>
						<snm>Gill</snm>
						<fnm>SR</fnm>
					</au>
					<au>
						<snm>Daugherty</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Huber</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Fraser</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Stetter</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Nelson</snm>
						<fnm>KE</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>2005</pubdate>
				<volume>187</volume>
				<fpage>4935</fpage>
				<lpage>4944</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1169497</pubid>
						<pubid idtype="pmpid" link="fulltext">15995209</pubid>
						<pubid idtype="doi">10.1128/JB.187.14.4935-4944.2005</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Suppressive subtractive hybridization detects extensive genomic diversity in <it>Thermotoga maritima</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Nesb&#248;</snm>
						<fnm>CL</fnm>
					</au>
					<au>
						<snm>Nelson</snm>
						<fnm>KE</fnm>
					</au>
					<au>
						<snm>Doolittle</snm>
						<fnm>WF</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>2002</pubdate>
				<volume>184</volume>
				<fpage>4475</fpage>
				<lpage>4488</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">135253</pubid>
						<pubid idtype="pmpid" link="fulltext">12142418</pubid>
						<pubid idtype="doi">10.1128/JB.184.16.4475-4488.2002</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Genome analysis of multiple pathogenic isolates of <it>Streptococcus agalactiae</it>: implications for the microbial "pan-genome".</p>
				</title>
				<aug>
					<au>
						<snm>Tettelin</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Masignani</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Cieslewicz</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Donati</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Medini</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Ward</snm>
						<fnm>NL</fnm>
					</au>
					<au>
						<snm>Angiuoli</snm>
						<fnm>SV</fnm>
					</au>
					<au>
						<snm>Crabtree</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Durkin</snm>
						<fnm>AS</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2005</pubdate>
				<volume>102</volume>
				<fpage>13950</fpage>
				<lpage>13955</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1216834</pubid>
						<pubid idtype="pmpid" link="fulltext">16172379</pubid>
						<pubid idtype="doi">10.1073/pnas.0506758102</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Patterns of bacterial gene movement.</p>
				</title>
				<aug>
					<au>
						<snm>Hao</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Golding</snm>
						<fnm>GB</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2004</pubdate>
				<volume>21</volume>
				<fpage>1294</fpage>
				<lpage>1307</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msh129</pubid>
						<pubid idtype="pmpid" link="fulltext">15115802</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Speciation in <it>Chlamydia</it>: genomewide phylogenetic analyses identified a reliable set of acquired genes.</p>
				</title>
				<aug>
					<au>
						<snm>Ortutay</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Gaspari</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Toth</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Jager</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Vida</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Orosz</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Vellai</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>2003</pubdate>
				<volume>57</volume>
				<fpage>672</fpage>
				<lpage>680</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/s00239-003-2517-3</pubid>
						<pubid idtype="pmpid" link="fulltext">14745536</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>The source of laterally transferred genes in bacterial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Daubin</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Lerat</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Perriere</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>R57</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">193657</pubid>
						<pubid idtype="pmpid" link="fulltext">12952536</pubid>
						<pubid idtype="doi">10.1186/gb-2003-4-9-r57</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>The fate of laterally transferred genes: life in the fast lane to adaptation or death.</p>
				</title>
				<aug>
					<au>
						<snm>Hao</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Golding</snm>
						<fnm>GB</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2006</pubdate>
				<volume>16</volume>
				<fpage>636</fpage>
				<lpage>643</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1457040</pubid>
						<pubid idtype="pmpid" link="fulltext">16651664</pubid>
						<pubid idtype="doi">10.1101/gr.4746406</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Lateral gene transfer in <it>Mycobacterium avium </it>subspecies <it>paratuberculosis</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Marri</snm>
						<fnm>PR</fnm>
					</au>
					<au>
						<snm>Bannantine</snm>
						<fnm>JP</fnm>
					</au>
					<au>
						<snm>Paustian</snm>
						<fnm>ML</fnm>
					</au>
					<au>
						<snm>Golding</snm>
						<fnm>GB</fnm>
					</au>
				</aug>
				<source>Can J Microbiol</source>
				<pubdate>2006</pubdate>
				<volume>52</volume>
				<fpage>560</fpage>
				<lpage>569</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1139/W06-001</pubid>
						<pubid idtype="pmpid" link="fulltext">16788724</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>The balance of driving forces during genome evolution in prokaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Kunin</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>1589</fpage>
				<lpage>1594</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403731</pubid>
						<pubid idtype="pmpid" link="fulltext">12840037</pubid>
						<pubid idtype="doi">10.1101/gr.1092603</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Mirkin</snm>
						<fnm>BG</fnm>
					</au>
					<au>
						<snm>Fenner</snm>
						<fnm>TI</fnm>
					</au>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>BMC Evol Biol</source>
				<pubdate>2003</pubdate>
				<volume>3</volume>
				<fpage>2</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">149225</pubid>
						<pubid idtype="pmpid" link="fulltext">12515582</pubid>
						<pubid idtype="doi">10.1186/1471-2148-3-2</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Extensive gene gain associated with adaptive evolution of poxviruses.</p>
				</title>
				<aug>
					<au>
						<snm>McLysaght</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Baldi</snm>
						<fnm>PF</fnm>
					</au>
					<au>
						<snm>Gaut</snm>
						<fnm>BS</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2003</pubdate>
				<volume>100</volume>
				<fpage>15655</fpage>
				<lpage>15660</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">307623</pubid>
						<pubid idtype="pmpid" link="fulltext">14660798</pubid>
						<pubid idtype="doi">10.1073/pnas.2136653100</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Deletional bias and the evolution of bacterial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Mira</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Moran</snm>
						<fnm>NA</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<fpage>589</fpage>
				<lpage>596</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(01)02447-7</pubid>
						<pubid idtype="pmpid" link="fulltext">11585665</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>The nature and dynamics of bacterial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Davalos</snm>
						<fnm>LM</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2006</pubdate>
				<volume>311</volume>
				<fpage>1730</fpage>
				<lpage>1733</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1119966</pubid>
						<pubid idtype="pmpid" link="fulltext">16556833</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Recognizing the pseudogenes in bacterial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Lerat</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<fpage>3125</fpage>
				<lpage>3132</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1142405</pubid>
						<pubid idtype="pmpid" link="fulltext">15933207</pubid>
						<pubid idtype="doi">10.1093/nar/gki631</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes.</p>
				</title>
				<aug>
					<au>
						<snm>Liu</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Harrison</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Kunin</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>R64</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">522871</pubid>
						<pubid idtype="pmpid" link="fulltext">15345048</pubid>
						<pubid idtype="doi">10.1186/gb-2004-5-9-r64</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Pseudogenes, junk DNA, and the dynamics of <it>Rickettsia </it>genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Andersson</snm>
						<fnm>JO</fnm>
					</au>
					<au>
						<snm>Andersson</snm>
						<fnm>SG</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2001</pubdate>
				<volume>18</volume>
				<fpage>829</fpage>
				<lpage>839</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11319266</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Small change: keeping pace with microevolution.</p>
				</title>
				<aug>
					<au>
						<snm>Feil</snm>
						<fnm>EJ</fnm>
					</au>
				</aug>
				<source>Nat Rev Microbiol</source>
				<pubdate>2004</pubdate>
				<volume>2</volume>
				<fpage>483</fpage>
				<lpage>495</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nrmicro904</pubid>
						<pubid idtype="pmpid" link="fulltext">15152204</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Schaffer</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1997</pubdate>
				<volume>25</volume>
				<fpage>3389</fpage>
				<lpage>3402</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">146917</pubid>
						<pubid idtype="pmpid" link="fulltext">9254694</pubid>
						<pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Fuzzy species among recombinogenic bacteria.</p>
				</title>
				<aug>
					<au>
						<snm>Hanage</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Fraser</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Spratt</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>BMC Biol</source>
				<pubdate>2005</pubdate>
				<volume>3</volume>
				<fpage>6</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">554772</pubid>
						<pubid idtype="pmpid" link="fulltext">15752428</pubid>
						<pubid idtype="doi">10.1186/1741-7007-3-6</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.</p>
				</title>
				<aug>
					<au>
						<snm>Thompson</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Higgins</snm>
						<fnm>DG</fnm>
					</au>
					<au>
						<snm>Gibson</snm>
						<fnm>TJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1994</pubdate>
				<volume>22</volume>
				<fpage>4673</fpage>
				<lpage>4680</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">308517</pubid>
						<pubid idtype="pmpid" link="fulltext">7984417</pubid>
						<pubid idtype="doi">10.1093/nar/22.22.4673</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<aug>
					<au>
						<snm>Swofford</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>PAUP* 4.0 Beta Version, Phylogenetic Analysis Using Parsimony (and Other Methods)</source>
				<publisher>Sunderland, MA; Sinauer Associates</publisher>
				<pubdate>1998</pubdate>
			</bibl>
			<bibl id="B25">
				<title>
					<p>A cluster algorithm for graphs.</p>
				</title>
				<aug>
					<au>
						<snm>van Dongen</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Technical Report INS-R0010</source>
				<publisher>Amsterdam: National Research Institute for Mathematics and Computer Science in the Netherlands</publisher>
				<pubdate>2000</pubdate>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Genomic insights that advance the species definition for prokaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Konstantinidis</snm>
						<fnm>KT</fnm>
					</au>
					<au>
						<snm>Tiedje</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2005</pubdate>
				<volume>102</volume>
				<fpage>2567</fpage>
				<lpage>2572</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">549018</pubid>
						<pubid idtype="pmpid" link="fulltext">15701695</pubid>
						<pubid idtype="doi">10.1073/pnas.0409727102</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Effective protein sequence comparison.</p>
				</title>
				<aug>
					<au>
						<snm>Pearson</snm>
						<fnm>WR</fnm>
					</au>
				</aug>
				<source>Methods Enzymol</source>
				<pubdate>1996</pubdate>
				<volume>266</volume>
				<fpage>227</fpage>
				<lpage>258</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8743688</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Insights on biology and evolution from microbial genome sequencing.</p>
				</title>
				<aug>
					<au>
						<snm>Fraser-Liggett</snm>
						<fnm>CM</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2005</pubdate>
				<volume>15</volume>
				<fpage>1603</fpage>
				<lpage>1610</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.3724205</pubid>
						<pubid idtype="pmpid" link="fulltext">16339357</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Structural flexibility in the <it>Burkholderia mallei </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Nierman</snm>
						<fnm>WC</fnm>
					</au>
					<au>
						<snm>DeShazer</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Kim</snm>
						<fnm>HS</fnm>
					</au>
					<au>
						<snm>Tettelin</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Nelson</snm>
						<fnm>KE</fnm>
					</au>
					<au>
						<snm>Feldblyum</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ulrich</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Ronning</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Brinkac</snm>
						<fnm>LM</fnm>
					</au>
					<au>
						<snm>Daugherty</snm>
						<fnm>SC</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2004</pubdate>
				<volume>101</volume>
				<fpage>14246</fpage>
				<lpage>14251</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">521142</pubid>
						<pubid idtype="pmpid" link="fulltext">15377793</pubid>
						<pubid idtype="doi">10.1073/pnas.0403306101</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
