<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2004-5-9-r64</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Liu</snm>
					<fnm>Yang</fnm>
					<insr iid="I1"/>
					<insr iid="I3"/>
					<email>lyang@csb.yale.edu</email>
				</au>
				<au id="A2">
					<snm>Harrison</snm>
					<mi>M</mi>
					<fnm>Paul</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A3">
					<snm>Kunin</snm>
					<fnm>Victor</fnm>
					<insr iid="I2"/>
				</au>
				<au id="A4" ca="yes">
					<snm>Gerstein</snm>
					<fnm>Mark</fnm>
					<insr iid="I1"/>
					<email>Mark.Gerstein@yale.edu</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Molecular Biophysics and Biochemistry, Yale University, PO Box 208114, New Haven, CT 06520-8114, USA</p>
				</ins>
				<ins id="I2">
					<p>Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK</p>
				</ins>
				<ins id="I3">
					<p>Current address: Department of Biomedical Informatics, Columbia University, 622 W 168th street, New York, NY 10032, USA</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2004</pubdate>
			<volume>5</volume>
			<issue>9</issue>
			<fpage>R64</fpage>
			<url>http://genomebiology.com/2004/5/9/R64</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">15345048</pubid><pubid idtype="doi">10.1186/gb-2004-5-9-r64</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>1</day>
					<month>3</month>
					<year>2004</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>4</day>
					<month>6</month>
					<year>2004</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>2</day>
					<month>8</month>
					<year>2004</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>26</day>
					<month>8</month>
					<year>2004</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2004</year>
			<collab>Liu et al.; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. </note>
		</cpyrt>
		<shorttitle>
			<p>Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes</p>
		</shorttitle>
		<shortabs>
			<p>A comprehensive analysis of the occurrence of pseudogenes in a diverse selection of 64 prokaryote genomes identified around 7,000 candidate pseudogenes. A large fraction of prokaryote pseudogenes seems to have arisen from failed horizontal-transfer events.  </p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Pseudogenes often manifest themselves as disabled copies of known genes. In prokaryotes, it was generally believed (with a few well-known exceptions) that they were rare.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We have carried out a comprehensive analysis of the occurrence of pseudogenes in a diverse selection of 64 prokaryote genomes. Overall, we find a total of around 7,000 candidate pseudogenes. Moreover, in all the genomes surveyed, pseudogenes occur in at least 1 to 5% of all gene-like sequences, with some genomes having considerably higher occurrence. Although many large populations of pseudogenes arise from large, diverse protein families (for example, the ABC transporters), notable numbers of pseudogenes are associated with specific families that do not occur that widely. These include the cytochrome P450 and PPE families (PF00067 and PF00823) and others that have a direct role in DNA transposition.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st>
					<p>We find suggestive evidence that a large fraction of prokaryote pseudogenes arose from failed horizontal transfer events. In particular, we find that pseudogenes are more than twice as likely as genes to have anomalous codon usage associated with horizontal transfer. Moreover, we found a significant difference in the number of horizontally transferred pseudogenes in pathogenic and non-pathogenic strains of <it>Escherichia coli</it>.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010014">Microbiology and parasitology</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010009">Genetics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Genes that have recently fallen out of use for an organism are often detectable in the genome as pseudogenes - disabled copies of genes characterizable by disruptions of their reading frames due to frameshifts and premature stop codons <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. Surveys of the pseudogene populations of eukaryotes (budding yeast, nematode worm, fruit fly and human) have recently been completed <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. These pseudogene analyses have yielded insights into eukaryotic proteome evolution, showing that duplicated pseudogene formation tends to occur in younger, more lineage-specific, protein families, and is in many cases linked to the generation of functional diversity <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. However, pseudogene formation in most prokaryotes has not been analyzed as a matter of course, and has, historically, been assumed to be minimal <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Some recent substantial populations of pseudogenes have been discovered in pathogenic bacteria, most notably in the leprosy bacillus <it>Mycobacterium leprae</it>, where around 1,100 pseudogenes (compared to around 1,600 genes) were found, with pseudogene formation providing a 'fossil record' of recent wholesale loss of pathways involved in lipid metabolism and anaerobic respiration <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
			<p>Here we want to address the question of whether these large populations are exceptional, or whether there are substantial populations of pseudogenes in other prokaryotic genomes. If so, from a holistic 'polygenomic' perspective, what sorts of proteins tend to form prokaryotic pseudogenes? And are there any themes in common with the occurrence of pseudogenes in eukaryotes?</p>
			<p>To address these broad questions, we have adapted a pipeline developed for eukaryotic pseudogene identification to 64 prokaryotic genomes <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The species analyzed include archaea, pathogenic bacteria and non-pathogenic bacteria, and many of the pathogenic bacteria are also important organisms in current biodefense research. We have found nearly 7,000 pseudogenes, with notable numbers of pseudogenes for specific families linked to DNA transposition and also that have some role in environmental responses. Our results, which we have derived consistently across all the genomes, are available from our prokaryote pseudogene information website <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>Pseudogenes are pervasive in prokaryotes</p>
				</st>
				<p>To identify pseudogenes in prokaryotic genomes, we performed a conservative and comprehensive search, as outlined in Figure <figr fid="F1">1</figr> and Materials and methods. We used a proteome set consisting of sequences from the 64 genomes and Swiss-Prot <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> with relatively high confidence in annotation (that is, excluding those annotated as hypothetical proteins). Intergenic regions in prokaryotic genomes were searched against the proteome set using FastX <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> for homology matches with disablements as pseudogene candidates. We then applied several checks to reduce false positives (see Materials and methods). Overall, we found 6,895 candidate pseudogenes.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Pseudogenes in prokaryotes</p>
					</caption>
					<text>
						<p>Pseudogenes in prokaryotes. <b>(a) </b>Procedure for assigning pseudogenes. The flow chart shows the steps in identifying pseudogenes in 64 prokaryote genomes. The steps include: separate intergenic regions from coding sequence (hypothetical ORFs were excluded); six-frame FastX search on intergenic regions for pseudogene candidates; quality control to reduce false-positive results introduced by artificial disablement or by different codon usage. <b>(b) </b>The occurrence of relative disablement positions in pseudogenes, which were normalized on a 100-residue scale based on ratios of the distances from starting residues to disablements to the length of pseudogenes. The yellow bars indicate the distribution of disablement positions before the last quality-control step and the green bars show the distribution after minimizing false-positive pseudogenes.</p>
					</text>
					<graphic file="gb-2004-5-9-r64-1"/>
				</fig>
				<p>Previously, the pseudogene fraction was defined as the ratio of the number of pseudogenes to the number of all gene-like sequences (genes plus pseudogenes) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. By this measure, we find that pseudogenes are pervasive in prokaryotes (Figure <figr fid="F2">2</figr>). Pseudogenes are detectable at a low 'background' level in most prokaryotes, ranging from 1 to 5% of the genome (Figure <figr fid="F2">2</figr>). Application of a more restrictive cutoff (E-value less than 0.001, instead of E-value less than 0.01) in FastX alignment results in slightly smaller percentage of pseudogenes (0.1% less on average) in all the genomes, and generates essentially the same results (data not shown). Our census is in general agreement with previous assessments of pseudogene content in the genomes of <it>M. leprae</it>, <it>Escherichia coli </it>and <it>Rickettsia prowazekii </it><abbrgrp><abbr bid="B12">12</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. In these previous studies, however, different criteria were used for pseudogene identification in different genomes, leading to inconsistencies in comparing results. This is avoided in our study by using a method applied uniformly across all genomes. All these assessments suggest that most prokaryotes have similar net genomic DNA deletion rates, resulting in similar low-level 'background' pseudogene fractions in their genomes.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Fractions of pseudogenes in the 64 prokaryote genomes</p>
					</caption>
					<text>
						<p>Fractions of pseudogenes in the 64 prokaryote genomes. The genomes are divided into three categories: archaea (green), non-pathogenic bacteria (blue) and pathogenic bacteria (purple). The yellow bars represent the fractions of pseudogenes that overlap with hypothetical ORFs, and the green bars represent those that do not overlap. Genomes in each category are sorted by the green bars.</p>
					</text>
					<graphic file="gb-2004-5-9-r64-2"/>
				</fig>
				<p>To check for a correlation with microbial 'lifestyle', we classified the 64 species into three categories: archaea, pathogenic bacteria and non-pathogenic bacteria. The pseudogene fractions for these groupings were assessed. <it>M. leprae </it>has a very large pseudogene fraction (36.5%) and is clearly a unique outlier. When this genome is set aside, the three groups have similar pseudogene fractions (3.6%, 3.9% and 3.3%). Note that three other pathogenic species/strains have relatively large pseudogene fractions, including <it>Neisseria meningitidis </it>MC58 (12.4%), <it>N. meningitidis </it>Z2491 (11.6%) and <it>Rickettsia conorii </it>(9.7%). The higher pseudogene fractions of some pathogenic species have previously been suggested to be a result of a rapidly changing environmental niche, with loss of metabolic and respiratory pathways <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
				<p>We found that about 2,300 of our 6,895 candidate pseudogenes overlap with more than 2,600 annotated hypothetical open reading frames (ORFs), whose fractions were indicated in Figure <figr fid="F2">2</figr>. The overlap could arise from erroneous gene annotations or sequencing errors <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. In either case, the pseudogene annotation in prokaryotic genomes is evidently an important part of decontaminating gene annotation.</p>
			</sec>
			<sec>
				<st>
					<p>Pseudogene families</p>
				</st>
				<p>We used the Pfam classification <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> to analyze the families and functions of candidate pseudogenes. The 20 top-ranking domain families in terms of pseudogenes are shown in Figure <figr fid="F3">3a</figr>. Many large divergent gene families are among the top pseudogene families, including 9 of the top 10 gene families such as: the ABC transporter (PF00005), short-chain dehydrogenases/reductases (PF00106), sugar transporter (major facilitator superfamily) (PF00083), and histidine kinase-like ATPase (PF02518). As the largest family of proteins in prokaryotes, the ABC transporter functions to translocate a variety of compounds across biological membranes <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. It consists of two ATP-binding domains (PF00005) <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp> and two transmembrane domains (PF00664). These domains are present in large copy numbers across genomes (2,172 and 245 gene copies as well as 67 and 13 pseudogene copies respectively).</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Gene-to-pseudogene ratios</p>
					</caption>
					<text>
						<p>Gene-to-pseudogene ratios. <b>(a) </b>The top 20 pseudogene families and top 10 gene families based on Pfam classification. Ranking is based on the size of pseudogene families. The top 10 gene families are highlighted with the green background. <b>(b) </b>The number of genes plotted against the number of pseudogenes in a Pfam family. The line represents the overall ratio of the number of pseudogenes to the number of genes in the 64 genomes.</p>
					</text>
					<graphic file="gb-2004-5-9-r64-3"/>
				</fig>
				<p>There are notable protein families that rank high in pseudogene number, but low in terms of gene number. They include the PPE family (PF00823) which is thought to be linked to antigenic variation in mycobacteria and is highly polymorphic <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>; the cytochromes P450 (PF00067), which are involved in processing diverse substrates; the GGDEF domain (PF00990), which is of unknown function and is associated with a wide diversity of other protein domains <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>; alpha/beta-hydrolase enzymes (PF00561), which have diverse catalytic functions; and pseudo-U-synthase-2 enzymes (PF00849), which help synthesize pseudouridine from uracil. Note that the first two families in this list have sequence diversity that has some link to environmental response.</p>
				<p>Figure <figr fid="F3">3b</figr> shows the relationship between the number of pseudogenes and genes for Pfam families. One might expect this relationship to be linear, with bigger families having more pseudogenes, but Figure <figr fid="F3">3b</figr> shows this is not the case. Two large families that have a relatively high ratio of pseudogenes to genes are the transposase DDE domain (PF01609) and integrase core domain (PF00665). Transposase facilitates DNA transposition and horizontal gene transfer and its DDE domain may be responsible for DNA cleavage at a specific site followed by a strand-transfer reaction <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Many transposons contain transposases for their transposition <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. We found that two strains of <it>N. meningitidis </it>(MC58 and Z2491) carry 26 and 22 copies of transposase pseudogenes, respectively, and have only 11 and 5 copies of transposase genes. In the MC58 strain, transposase pseudogenes have been found in most of the 29 remnant insertion sequences <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. This suggests that <it>N. meningitidis </it>strains probably undergo high selection pressure for transposases. The integrase core domain family (PF00665) is the catalytic domain of integrase, which mediates integration of a DNA copy of a viral/bacteriophage genome into the host genome <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. It catalyzes the DNA strand-transfer reaction by ligating the 3' ends of the viral DNA to the 5' ends of the integration site <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The large number of transposase and integrase pseudogenes might result from harmful foreign genes being disabled in transposable elements. Several species contain many integrase pseudogenes, including <it>Streptococcus pneumoniae, M. leprae, M. tuberculosis</it>, and <it>E. coli </it>strain O157:H7. The large number of pseudogenes relative to genes for these two gene families may reflect an overall high selective pressure for them - that is, a gene family that is rapidly duplicating and evolving may generate many pseudogenes.</p>
			</sec>
			<sec>
				<st>
					<p>Origins of pseudogenes</p>
				</st>
				<p>Retrotransposition and genomic DNA duplication generate pseudogenes in mammals and other eukaryotes <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. In contrast, in prokaryotes, based on the experience annotating <it>E. coli </it>and <it>M. leprae </it><abbrgrp><abbr bid="B12">12</abbr><abbr bid="B16">16</abbr></abbrgrp>, pseudogenes are suggested to arise from three process: the disablement of detectable native duplications; the decay of native single-copy host genes; and failed horizontal transfers.</p>
				<p>However, the complete extent of the processes forming prokaryotic pseudogenes is not yet well understood. We realize that there are many methods of defining horizontal transfer <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp> and an active debate on the best way of doing this <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>, so we applied two independent methods to predict horizontal gene transfer events. The first method (GC-content) is based on the GC content bias at particular codon positions of recently acquired genes <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B39">39</abbr></abbrgrp>. The second method (GeneTrace) is based on the analysis of phylogenetic distribution of protein families on species tree <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. In the GC-content method, the number of pseudogenes resulting from horizontal transfer in each genome was estimated by applying the same criteria to them as had been previously used to identify horizontally transferred genes. Overall, we found that the ratio (19.9%) of pseudogenes from potential horizontal transfer to those derived from the host is significantly higher than the ratio of genes in the host (8.6%). We dubbed the ratio of these two quantities the 'failed horizontal transfer index', and observed that it implies that pseudogenes are 2.3 times more likely to arise from horizontal transfer than host genes are (Table <tblr tid="T1">1</tblr>).</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Putative horizontally transferred genes and pseudogenes</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c ca="left">
								<p>Species</p>
							</c>
							<c cspan="2" ca="center">
								<p>Gene</p>
							</c>
							<c cspan="2" ca="center">
								<p>Pseudogene</p>
							</c>
							<c ca="center">
								<p>Failed transfer index</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="2">
								<hr/>
							</c>
							<c cspan="2">
								<hr/>
							</c>
							<c cspan="1">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>All</p>
							</c>
							<c ca="center">
								<p>HT</p>
							</c>
							<c ca="center">
								<p>All</p>
							</c>
							<c ca="center">
								<p>HT</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>Archaea</b>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>A. pernix</it>
								</p>
							</c>
							<c ca="center">
								<p>615</p>
							</c>
							<c ca="center">
								<p>45</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>6.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. solfataricus</it>
								</p>
							</c>
							<c ca="center">
								<p>2,235</p>
							</c>
							<c ca="center">
								<p>231</p>
							</c>
							<c ca="center">
								<p>48</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>1.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. tokodaii</it>
								</p>
							</c>
							<c ca="center">
								<p>1,797</p>
							</c>
							<c ca="center">
								<p>185</p>
							</c>
							<c ca="center">
								<p>35</p>
							</c>
							<c ca="center">
								<p>19</p>
							</c>
							<c ca="center">
								<p>5.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>P. aerophilum</it>
								</p>
							</c>
							<c ca="center">
								<p>1,855</p>
							</c>
							<c ca="center">
								<p>171</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Halobacterium sp. NRC-1</it>
								</p>
							</c>
							<c ca="center">
								<p>1,383</p>
							</c>
							<c ca="center">
								<p>100</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>13.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>M. thermautotrophicus</it>
								</p>
							</c>
							<c ca="center">
								<p>1,350</p>
							</c>
							<c ca="center">
								<p>122</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>11.1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>M. jannaschii</it>
								</p>
							</c>
							<c ca="center">
								<p>1,280</p>
							</c>
							<c ca="center">
								<p>106</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>6.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>P. abyssi</it>
								</p>
							</c>
							<c ca="center">
								<p>891</p>
							</c>
							<c ca="center">
								<p>75</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>4.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>P. horikoshii</it>
								</p>
							</c>
							<c ca="center">
								<p>553</p>
							</c>
							<c ca="center">
								<p>50</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>T. acidophilum</it>
								</p>
							</c>
							<c ca="center">
								<p>1,169</p>
							</c>
							<c ca="center">
								<p>106</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>8.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>T. volcanium</it>
								</p>
							</c>
							<c ca="center">
								<p>1,061</p>
							</c>
							<c ca="center">
								<p>100</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>4.0</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>Non-pathogenic bacteria</b>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>A. aeolicus</it>
								</p>
							</c>
							<c ca="center">
								<p>1,244</p>
							</c>
							<c ca="center">
								<p>107</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Synechocystis sp. PCC 6803</it>
								</p>
							</c>
							<c ca="center">
								<p>2,696</p>
							</c>
							<c ca="center">
								<p>237</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>2.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Nostoc sp. PCC 7120</it>
								</p>
							</c>
							<c ca="center">
								<p>3,672</p>
							</c>
							<c ca="center">
								<p>332</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>2.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. coelicolor</it>
								</p>
							</c>
							<c ca="center">
								<p>6,012</p>
							</c>
							<c ca="center">
								<p>536</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>3.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>B. halodurans</it>
								</p>
							</c>
							<c ca="center">
								<p>3,279</p>
							</c>
							<c ca="center">
								<p>299</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>B. subtilis</it>
								</p>
							</c>
							<c ca="center">
								<p>1223</p>
							</c>
							<c ca="center">
								<p>102</p>
							</c>
							<c ca="center">
								<p>44</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>L. innocua</it>
								</p>
							</c>
							<c ca="center">
								<p>2,924</p>
							</c>
							<c ca="center">
								<p>263</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>11.1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>C. acetobutylicum</it>
								</p>
							</c>
							<c ca="center">
								<p>3,129</p>
							</c>
							<c ca="center">
								<p>295</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>2.1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>L. lactis subsp. lactis</it>
								</p>
							</c>
							<c ca="center">
								<p>1,870</p>
							</c>
							<c ca="center">
								<p>156</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>1.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>C. vibrioides</it>
								</p>
							</c>
							<c ca="center">
								<p>2,699</p>
							</c>
							<c ca="center">
								<p>231</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1.9</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>M. loti</it>
								</p>
							</c>
							<c ca="center">
								<p>5,235</p>
							</c>
							<c ca="center">
								<p>476</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>2.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. meliloti</it>
								</p>
							</c>
							<c ca="center">
								<p>2,985</p>
							</c>
							<c ca="center">
								<p>240</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>8.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>E. coli K12</it>
								</p>
							</c>
							<c ca="center">
								<p>2,897</p>
							</c>
							<c ca="center">
								<p>230</p>
							</c>
							<c ca="center">
								<p>63</p>
							</c>
							<c ca="center">
								<p>23</p>
							</c>
							<c ca="center">
								<p>4.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>T. maritima</it>
								</p>
							</c>
							<c ca="center">
								<p>1,445</p>
							</c>
							<c ca="center">
								<p>137</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>D. radiodurans</it>
								</p>
							</c>
							<c ca="center">
								<p>1,964</p>
							</c>
							<c ca="center">
								<p>134</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1.6</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<b>Pathogenic bacteria</b>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Buchnera sp. APS</it>
								</p>
							</c>
							<c ca="center">
								<p>477</p>
							</c>
							<c ca="center">
								<p>42</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>4.5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>U. urealyticum</it>
								</p>
							</c>
							<c ca="center">
								<p>467</p>
							</c>
							<c ca="center">
								<p>40</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>5.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>M. pneumoniae</it>
								</p>
							</c>
							<c ca="center">
								<p>610</p>
							</c>
							<c ca="center">
								<p>55</p>
							</c>
							<c ca="center">
								<p>30</p>
							</c>
							<c ca="center">
								<p>19</p>
							</c>
							<c ca="center">
								<p>7.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>B. burgdorferi</it>
								</p>
							</c>
							<c ca="center">
								<p>590</p>
							</c>
							<c ca="center">
								<p>63</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>M. pulmonis</it>
								</p>
							</c>
							<c ca="center">
								<p>595</p>
							</c>
							<c ca="center">
								<p>53</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>5.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>C. trachomatis</it>
								</p>
							</c>
							<c ca="center">
								<p>597</p>
							</c>
							<c ca="center">
								<p>67</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>3.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>C. muridarum</it>
								</p>
							</c>
							<c ca="center">
								<p>815</p>
							</c>
							<c ca="center">
								<p>81</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>R. prowazekii</it>
								</p>
							</c>
							<c ca="center">
								<p>504</p>
							</c>
							<c ca="center">
								<p>49</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1.5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>T. pallidum</it>
								</p>
							</c>
							<c ca="center">
								<p>727</p>
							</c>
							<c ca="center">
								<p>64</p>
							</c>
							<c ca="center">
								<p>12</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>4.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>C. pneumoniae J138</it>
								</p>
							</c>
							<c ca="center">
								<p>839</p>
							</c>
							<c ca="center">
								<p>74</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>C. pneumoniae AR39</it>
								</p>
							</c>
							<c ca="center">
								<p>831</p>
							</c>
							<c ca="center">
								<p>70</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>2.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>C. pneumoniae CWL029</it>
								</p>
							</c>
							<c ca="center">
								<p>845</p>
							</c>
							<c ca="center">
								<p>71</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>R. conorii</it>
								</p>
							</c>
							<c ca="center">
								<p>695</p>
							</c>
							<c ca="center">
								<p>67</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>M. leprae</it>
								</p>
							</c>
							<c ca="center">
								<p>1,440</p>
							</c>
							<c ca="center">
								<p>119</p>
							</c>
							<c ca="center">
								<p>271</p>
							</c>
							<c ca="center">
								<p>53</p>
							</c>
							<c ca="center">
								<p>2.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>C. jejuni</it>
								</p>
							</c>
							<c ca="center">
								<p>1,291</p>
							</c>
							<c ca="center">
								<p>108</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>H. pylori J99</it>
								</p>
							</c>
							<c ca="center">
								<p>856</p>
							</c>
							<c ca="center">
								<p>70</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>2.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>H. pylori 26695</it>
								</p>
							</c>
							<c ca="center">
								<p>1,055</p>
							</c>
							<c ca="center">
								<p>90</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>2.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. pyogenes M1 GAS</it>
								</p>
							</c>
							<c ca="center">
								<p>1,348</p>
							</c>
							<c ca="center">
								<p>108</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0.9</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. pneumoniae</it>
								</p>
							</c>
							<c ca="center">
								<p>1,632</p>
							</c>
							<c ca="center">
								<p>114</p>
							</c>
							<c ca="center">
								<p>54</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0.5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>N. meningitidis Z2491</it>
								</p>
							</c>
							<c ca="center">
								<p>1,432</p>
							</c>
							<c ca="center">
								<p>112</p>
							</c>
							<c ca="center">
								<p>26</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>2.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>P. multocida</it>
								</p>
							</c>
							<c ca="center">
								<p>1,035</p>
							</c>
							<c ca="center">
								<p>96</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>3.1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>N. meningitidis MC58</it>
								</p>
							</c>
							<c ca="center">
								<p>1,466</p>
							</c>
							<c ca="center">
								<p>121</p>
							</c>
							<c ca="center">
								<p>44</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
							<c ca="center">
								<p>3.9</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>X. fastidiosa</it>
								</p>
							</c>
							<c ca="center">
								<p>1,550</p>
							</c>
							<c ca="center">
								<p>152</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. aureus subsp. aureus N315</it>
								</p>
							</c>
							<c ca="center">
								<p>1,557</p>
							</c>
							<c ca="center">
								<p>140</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>5.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. aureus subsp. aureus Mu50</it>
								</p>
							</c>
							<c ca="center">
								<p>1,563</p>
							</c>
							<c ca="center">
								<p>138</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>5.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>L. monocytogenes</it>
								</p>
							</c>
							<c ca="center">
								<p>2,799</p>
							</c>
							<c ca="center">
								<p>231</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>C. perfringens</it>
								</p>
							</c>
							<c ca="center">
								<p>1,943</p>
							</c>
							<c ca="center">
								<p>165</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>B. melitensis</it>
								</p>
							</c>
							<c ca="center">
								<p>2,948</p>
							</c>
							<c ca="center">
								<p>216</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>R. solanacearum</it>
								</p>
							</c>
							<c ca="center">
								<p>3,032</p>
							</c>
							<c ca="center">
								<p>252</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>V. cholerae</it>
								</p>
							</c>
							<c ca="center">
								<p>2,846</p>
							</c>
							<c ca="center">
								<p>216</p>
							</c>
							<c ca="center">
								<p>24</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>2.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>M. tuberculosis CDC1551</it>
								</p>
							</c>
							<c ca="center">
								<p>2,837</p>
							</c>
							<c ca="center">
								<p>262</p>
							</c>
							<c ca="center">
								<p>49</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>1.5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>M. tuberculosis H37Rv</it>
								</p>
							</c>
							<c ca="center">
								<p>1,446</p>
							</c>
							<c ca="center">
								<p>130</p>
							</c>
							<c ca="center">
								<p>38</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>1.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Y. pestis</it>
								</p>
							</c>
							<c ca="center">
								<p>3,533</p>
							</c>
							<c ca="center">
								<p>282</p>
							</c>
							<c ca="center">
								<p>51</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>1.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. typhi CT18</it>
								</p>
							</c>
							<c ca="center">
								<p>3,986</p>
							</c>
							<c ca="center">
								<p>338</p>
							</c>
							<c ca="center">
								<p>147</p>
							</c>
							<c ca="center">
								<p>18</p>
							</c>
							<c ca="center">
								<p>1.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>S. typhimurium LT2</it>
								</p>
							</c>
							<c ca="center">
								<p>4,308</p>
							</c>
							<c ca="center">
								<p>349</p>
							</c>
							<c ca="center">
								<p>22</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>2.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>E. coli O157:H7</it>
								</p>
							</c>
							<c ca="center">
								<p>3,424</p>
							</c>
							<c ca="center">
								<p>266</p>
							</c>
							<c ca="center">
								<p>120</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>1.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>E. coli O157:H7 EDL933</it>
								</p>
							</c>
							<c ca="center">
								<p>4,322</p>
							</c>
							<c ca="center">
								<p>353</p>
							</c>
							<c ca="center">
								<p>73</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>P. aeruginosa</it>
								</p>
							</c>
							<c ca="center">
								<p>3,716</p>
							</c>
							<c ca="center">
								<p>281</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>5.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Total</p>
							</c>
							<c ca="center">
								<p>123,420</p>
							</c>
							<c ca="center">
								<p>10,571</p>
							</c>
							<c ca="center">
								<p>1,458</p>
							</c>
							<c ca="center">
								<p>290</p>
							</c>
							<c ca="center">
								<p>2.3</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>All genes and pseudogenes and the fraction having atypical codon-position-specific GC contents in the 64 genomes studied. The failed horizontal transfer index was computed as described in Materials and methods.</p>
					</tblfn>
				</tbl>
				<p>To confirm our findings based on a method relying on GC content bias we applied the GeneTrace method (see Materials and methods). We analyzed a subset of pseudogenes and found that 18% result from failed horizontal transfer events, consistent with the previous method. Note that GeneTrace and the GC-content method are very different in the criteria they use to assess horizontal transfer and thus make for good independent verification of each other.</p>
				<p>In summary, we report here for the first time an estimate of how often horizontal transfer in prokaryotes introduces genes that are redundant, useless or even detrimental. Firstly, ORFs from dangerous genetic elements are under strong selection pressure to be deleted from the host's genome <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Secondly, horizontally transferred genes have a higher chance than non-transferred genes of becoming pseudogenes in most prokaryotes, which may be a result of deactivation/disablement of non-beneficial transferred genes.</p>
				<p>By examining closely related strains of the same species, we found that most close strains have a similar value for the failed horizontal transfer index. In particular, <it>M. tuberculosis </it>(strains H37Rv and CDC1551), <it>N. meningitidis </it>(strains Z1491 and MC8), and <it>Helicobacter pylori </it>(strains 26695 and J99) share similar index values within species. However, <it>E. coli </it>has different index values in the three strains studied. The free-living <it>E. coli </it>K12 strain has an index value of 4.6, comparable to values calculated from previous results <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, while the two pathogenic <it>E. coli </it>strains O157:H7 and O157:H7 EDL933 have much lower values (1.8 and 0.8). This can be readily explained in two ways: the intracellular pathogenic <it>E. coli </it>strains could have moved into a different environment that results in lower exposure to incoming DNA and thus to a lower rate of horizontal gene transfer <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>; or these strains could have an increased rate of gene loss or pseudogene formation of their host genes.</p>
			</sec>
			<sec>
				<st>
					<p>A polygenomic power-law-like trend in pseudogene disablement</p>
				</st>
				<p>To characterize the overall rate of decay of pseudogene populations, we plotted the fraction of disablements versus the average number of matching residues (to their closest homologs) per pseudogene for each species. This measure shows how the overall level of decay of a pseudogene population relates to age (which corresponds to the degree of overall match to the closest homologs). There is a general power-law-like behavior governing this measure, with recent pseudogenes having few disablements and divergent pseudogenes having many (Figure <figr fid="F4">4</figr>). Archaea and most non-pathogenic bacteria cluster together at higher rates of disablement (between 10 and 28 per 1,000 residues) and less significant matches, indicating comparatively greater retention of ancient gene remnants in those species and fewer young pseudogenes. On the other hand, obligate pathogenic bacteria tend to have younger pools of pseudogenes, even though they exhibit high disablement rates. Interestingly, four species of obligate bacterial pathogens clearly stand out from the general tendency: these are <it>M. leprae </it>and three closely related mycoplasma species: <it>Mycoplasma pneumoniae</it>, <it>Mycoplasma pulmonis </it>and <it>Ureaplasma urealyticum</it>. Pseudogenes in these four pathogenic bacteria carry several times more disablements, suggesting that these bacteria have an accelerated disabling mutation rate. It is known that <it>M. leprae </it>has lost the <it>dnaQ</it>-mediated proofreading activities of DNA polymerase III <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B42">42</abbr></abbrgrp>, which could contribute to a higher mutation rate. The higher mutation rates in these species might suggest that these pathogens are under adaptation to their new environment, or have specific genome regions that are hypermutable.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>The fraction of disabled residues (per 1,000 residues) versus the number of average matching residues to the closest homologs per pseudogene in the 64 species categorized into four groups</p>
					</caption>
					<text>
						<p>The fraction of disabled residues (per 1,000 residues) versus the number of average matching residues to the closest homologs per pseudogene in the 64 species categorized into four groups: archaea (blue diamonds), non-pathogenic bacteria (green squares), obligate pathogenic bacteria (purple circles) and non-obligate pathogenic bacteria (red triangles).</p>
					</text>
					<graphic file="gb-2004-5-9-r64-4"/>
				</fig>
				<p>It is important to note here that the current sequence databases are derived from an uneven sampling of genomes. Therefore, genomes of organisms with more sequenced relatives may appear to have, on average, a seemingly younger population of pseudogenes, while others may appear to have older and fewer identifiable pseudogenes. Using data from 64 genomes, our results indicate an overall trend for pseudogenes observed in most of the genomes studied. However, these results have to be viewed as preliminary until more genome data is available.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st>
			<p>We have shown that pseudogenes in prokaryotes are not uncommon, occupying 1-5% of all gene-like sequences. We find that specific gene families with clear links to DNA transposition and environmental responses have higher pseudogene/gene ratios.</p>
			<p>The pseudogene data has many implications for the study of genome reduction and expansion <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>. A significant proportion of the pseudogenes arose from putative failed horizontal transfer - at more than two times the rate for genes. Obligate pathogenic bacteria have high rates of disablement in younger pseudogene populations, consistent with recent accelerated genome reduction <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, while, in contrast, archaea and non-pathogenic bacteria have relatively older pseudogene populations, but similar rates of disablement.</p>
			<p>In terms of methodological implications, it is evidently necessary to include prokaryote pseudogenes as part of systematic annotation pipelines in the future. In addition, it was also shown to be helpful to identify potential short ORFs <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. Furthermore, our survey shows that trends can be observed 'polygenomically' for prokaryotes, where they are not obvious or significant in individual genomes.</p>
		</sec>
		<sec>
			<st>
				<p>Materials and methods</p>
			</st>
			<sec>
				<st>
					<p>Database releases used</p>
				</st>
				<p>We used the following datasets in our prokaryotic pseudogene analysis: Swiss-Prot (release 40.19 and updated to 27 May, 2002) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> containing 43,094 prokaryotic protein sequences; nucleotide sequences from 64 prokaryotic genomes from EMBL database release 70 on March-2002 <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, including 11 genomes from archaea and 53 from bacteria as listed in Figure <figr fid="F1">1</figr>; Pfam release 7.3 of May 2002, containing 3,849 families and 498,152 protein domains in the alignments <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Pseudogene identification pipeline</p>
				</st>
				<p>Figure <figr fid="F1">1a</figr> shows the basic procedure for identifying prokaryotic pseudogenes. The general schema was adapted from pipelines for pseudogene analysis in eukaryotes <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. We generated a prokaryotic proteome set by collecting all the prokaryotic protein sequences in the Swiss-Prot database and those annotated in the 64 prokaryotic genomes. To be conservative, we did not include hypothetical or putative proteins, a large proportion of which might be overannotated <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. All the protein sequences were masked by SEG using the default low-complexity filter parameters (122.22.5) <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. To maximize the efficiency of the pseudogene search, we only considered the intergenic DNA regions in the 64 prokaryote genomes (including the regions encoding hypothetical proteins) as query sequences, and searched their forward and reverse complement sequences against the proteome set using FastX <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Significant homology matches (E-value less than 0.01) that contained more than one disablement (either a frameshift caused by insertion or deletion of nucleotides or a premature stop codon) were considered as potential pseudogenes. If an intergenic region had multiple matches, these matches were sorted by E-value (increasing) and then by the number of matching residues (decreasing), if they have the same E-value. The match with the most significant E-value and the maximum matching residues was selected and redundant matches were removed.</p>
				<p>To ensure that spurious disablements were not introduced at ends of sequences as an alignment artifact, we excluded homology matches whose disablements occurred only within a 'cutoff region' at either end. We used 16 residues for the cutoff region for short sequences (160 amino acids or fewer) - a parameter that has been applied previously <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. For longer sequences (more than 160 amino acids), 10% of the sequence length was applied as the cutoff region as FastX tends to include more residues at the ends of alignments.</p>
				<p>We also assessed the potential pseudogenes by examining the distribution of the disablements within pseudogene sequences. Given that mutations within pseudogenes are unconstrained, we would expect disablements on pseudogenes to be evenly distributed. Figure <figr fid="F1">1b</figr> shows the position of disablements within pseudogene fragments whose length is normalized to 100 residues. By removing those potential pseudogenes that only had disablements at their flanking regions at both ends, the distribution is almost evenly distributed. We used it as a 'control filter' to minimize false-positive pseudogenes. In the final pseudogene set, the length of pseudogenes ranges from 33 to 4,969 amino acids, with a median length of 130 amino acids, as compared with the proteome set, where the length ranges from 7 to 10,920 amino acids with a median length of 291 amino acids.</p>
				<p>We considered non-standard codon usage in some bacteria, such as when TGA encodes tryptophan rather than a stop codon in mycoplasma species, including <it>Mycoplasma pneumoniae</it>, <it>M. pulmonis </it>and <it>U. urealyticum</it>. By manual examination of <it>E. coli </it>genes with translational frameshifts in the RECODE database <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>, we found that those genes were included in coding sequences (CDS) and therefore were excluded from our pseudogene search.</p>
				<p>Sequencing errors could also be a potential problem in the detection of pseudogenes. However, this effect is expected to be small, as comparison of independently sequenced isolates of the same <it>E. coli </it>strains indicated that only about 7% of candidate pseudogenes could be due to sequencing error <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. To further consider the possibility of sequencing error, we examined the stop codons in the pseudogenes detected in the <it>S. pneumoniae </it>genome (frameshift positions are not considered as they are difficult to locate.). This genome and eight others found in the trace archive of the National Center for Biotechnology Information (NCBI) <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> and Ensembl <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> were all sequenced by TIGR. We selected <it>S. pneumoniae </it>as a case study as it is a relatively big genome available in the archive. By adapting a previous method <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>, we examined the overall quality values (Q) for each nucleic acid of stop codons in the pseudogenes. Pseudogene sequences were aligned to the archived sequences (&#8805; 95% identity), and the quality values for nucleotides in stop codons were summed up. We chose 10<sup>-2 </sup>as a cutoff of the error rate (err = 10<sup>SUM(-0.1Q)</sup>) for all nucleic acids. The stop codons with all three nucleic acids above the cutoff were validated. Out of 116 pseudogenes in this genome, 73 were found to contain 150 stop codons in total. Using the available data in the trace archive, we identified 54 pseudogenes with stop codons being aligned with the original sequences, and validated 47 of these (87%). In addition, a similar fraction of stop codons (101 out of 116) was confirmed.</p>
			</sec>
			<sec>
				<st>
					<p>Family classification of genes and pseudogenes</p>
				</st>
				<p>All genes in the 64 genomes were assigned to Pfam families by cross-referencing of their Swiss-Prot ID. Pseudogenes were assigned to Pfam families through ID of their closest homologs. Only the homologs that cover more than 70% of the Pfam domain were selected. A pseudogene could be assigned to multiple Pfam families if it contains multiple domains.</p>
			</sec>
			<sec>
				<st>
					<p>Estimation of horizontally transferred genes and pseudogenes</p>
				</st>
				<p>Here we used a method (GC-content) to estimate horizontal transferred genes on the basis of their base compositions <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B39">39</abbr></abbrgrp>. We analyzed each of the 64 genomes individually, and atypical genes and pseudogenes were identified if the GC content at first and third codon positions was two or more standard deviations higher or lower than the mean values at those positions in genes.</p>
				<p>To ensure that we had the codon positions accurately assigned for the GC-content method, we only analyzed codons for pseudogenes that aligned well with annotated protein sequences, specifically excluding the regions of the alignment around frameshifts. While it is true that the local alignment in some regions of a pseudogene may be ambiguous, causing some difference in the GC-content calculation in that region, the impact on the overall GC-content estimation is minimal, given how many positions we average over to calculate the failed transfer index score.</p>
				<p>The results for the 64 genomes are shown in Table <tblr tid="T1">1</tblr>. The failed transferred index in the last column represents the ratio of the fraction of putative horizontally transferred pseudogenes to the fraction of horizontally transferred genes</p>
				<p><graphic file="gb-2004-5-9-r64-i1.gif"/>,</p>
				<p>similar to the measure previously used in <it>E. coli </it><abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. This essentially gives a likelihood ratio for horizontal transfer for pseudogenes relative to that of genes.</p>
				<p>Note that to minimize the effect of more divergent sequence alignments, for the horizontal-transfer calculations we only analyzed 1,748 'recent' pseudogenes, which have more than 50% sequence identity to their closest matches over an aligned subsequence of more than 100 residues.</p>
				<p>We have investigated the statistical robustness of the failed transfer index using resampling approaches <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. For each of the 64 genomes, we randomly picked 90% of its genes and calculated their GC content. Using the new GC content, we then identified the putative horizontally transferred genes and pseudogenes and calculated the failed transfer index. We applied the process 1,000 times, generating a distribution of 1,000 indexes, which has a mean value of 2.32 with standard deviation of 0.01.</p>
				<p>We also applied an alternative method (GeneTrace) to estimate horizontally transferred pseudogenes <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. In this method, potential horizontal transfer events are inferred within a protein family when it is present only in distantly related species and is absent from members of the same phylogenetic clade. We analyzed a subset of pseudogenes - 225 pseudogenes across 62 genomes - whose closest Swiss-Prot homologs share more than 70% sequence identity across at least 100 amino acids, and identified 41 of them (18%) as from failed horizontal transfer events.</p>
			</sec>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>M.G. thanks NIH/NIAID grant for Northeast Biodefense Center (1U54AI057158-01) for financial support. He also acknowledges support from the Ruth B. Williams Fund. Y.L. was partially supported by an NLM postdoctoral fellowship (NIH Grant T15 LM07056). We thank Zhaolei Zhang and Nick Carriero for helpful discussions and Duncan Milburn for technical help.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Processed pseudogenes: characteristics and evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Vanin</snm>
						<fnm>EF</fnm>
					</au>
				</aug>
				<source>Annu Rev Genet</source>
				<pubdate>1985</pubdate>
				<volume>19</volume>
				<fpage>253</fpage>
				<lpage>272</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.ge.19.120185.001345</pubid>
						<pubid idtype="pmpid" link="fulltext">3909943</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Vertebrate pseudogenes.</p>
				</title>
				<aug>
					<au>
						<snm>Mighell</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>NR</fnm>
					</au>
					<au>
						<snm>Robinson</snm>
						<fnm>PA</fnm>
					</au>
					<au>
						<snm>Markham</snm>
						<fnm>AF</fnm>
					</au>
				</aug>
				<source>FEBS Lett</source>
				<pubdate>2000</pubdate>
				<volume>468</volume>
				<fpage>109</fpage>
				<lpage>114</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0014-5793(00)01199-6</pubid>
						<pubid idtype="pmpid" link="fulltext">10692568</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Studying genomes through the aeons: protein families, pseudogenes and proteome evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Harrison</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2002</pubdate>
				<volume>318</volume>
				<fpage>1155</fpage>
				<lpage>1174</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0022-2836(02)00109-2</pubid>
						<pubid idtype="pmpid" link="fulltext">12083509</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Digging for dead genes: an analysis of the characteristics of the pseudogene population in the <it>Caenorhabditis elegans </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Harrison</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Echols</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>MB</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>818</fpage>
				<lpage>830</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">30377</pubid>
						<pubid idtype="pmpid" link="fulltext">11160906</pubid>
						<pubid idtype="doi">10.1093/nar/29.3.818</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Harrison</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Kumar</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Lan</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Echols</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Snyder</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2002</pubdate>
				<volume>316</volume>
				<fpage>409</fpage>
				<lpage>419</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.2001.5343</pubid>
						<pubid idtype="pmpid" link="fulltext">11866506</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22.</p>
				</title>
				<aug>
					<au>
						<snm>Harrison</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Hegyi</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Balasubramanian</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Luscombe</snm>
						<fnm>NM</fnm>
					</au>
					<au>
						<snm>Bertone</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Echols</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Johnson</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>272</fpage>
				<lpage>280</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">155275</pubid>
						<pubid idtype="pmpid" link="fulltext">11827946</pubid>
						<pubid idtype="doi">10.1101/gr.207102</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome.</p>
				</title>
				<aug>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Harrison</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>1466</fpage>
				<lpage>1482</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">187539</pubid>
						<pubid idtype="pmpid" link="fulltext">12368239</pubid>
						<pubid idtype="doi">10.1101/gr.331902</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Identification of pseudogenes in the <it>Drosophila melanogaster </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Harrison</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Milburn</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Bertone</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>1033</fpage>
				<lpage>1037</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">149191</pubid>
						<pubid idtype="pmpid" link="fulltext">12560500</pubid>
						<pubid idtype="doi">10.1093/nar/gkg169</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Whole-genome screening indicates a possible burst of formation of processed pseudogenes and <it>Alu </it>repeats by particular L1 subfamilies in ancestral primates.</p>
				</title>
				<aug>
					<au>
						<snm>Ohshima</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Hattori</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Yada</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Gojobori</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Sakaki</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Okada</snm>
						<fnm>N</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>R74</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">329124</pubid>
						<pubid idtype="pmpid" link="fulltext">14611660</pubid>
						<pubid idtype="doi">10.1186/gb-2003-4-11-r74</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>A genome-wide survey of human pseudogenes.</p>
				</title>
				<aug>
					<au>
						<snm>Torrents</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Suyama</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Zdobnov</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>2559</fpage>
				<lpage>2567</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403797</pubid>
						<pubid idtype="pmpid" link="fulltext">14656963</pubid>
						<pubid idtype="doi">10.1101/gr.1455503</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Where are the pseudogenes in bacterial genomes?</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Hendrix</snm>
						<fnm>RW</fnm>
					</au>
					<au>
						<snm>Casjens</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Trends Microbiol</source>
				<pubdate>2001</pubdate>
				<volume>9</volume>
				<fpage>535</fpage>
				<lpage>540</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0966-842X(01)02198-9</pubid>
						<pubid idtype="pmpid" link="fulltext">11825713</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Massive gene decay in the leprosy bacillus.</p>
				</title>
				<aug>
					<au>
						<snm>Cole</snm>
						<fnm>ST</fnm>
					</au>
					<au>
						<snm>Eiglmeier</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Parkhill</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>James</snm>
						<fnm>KD</fnm>
					</au>
					<au>
						<snm>Thomson</snm>
						<fnm>NR</fnm>
					</au>
					<au>
						<snm>Wheeler</snm>
						<fnm>PR</fnm>
					</au>
					<au>
						<snm>Honore</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Garnier</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Churcher</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Harris</snm>
						<fnm>D</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>409</volume>
				<fpage>1007</fpage>
				<lpage>1011</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35059006</pubid>
						<pubid idtype="pmpid" link="fulltext">11234002</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Prokaryote Pseudogene Information Site</p>
				</title>
				<url>http://prokaryotes.pseudogene.org</url>
			</bibl>
			<bibl id="B14">
				<title>
					<p>The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.</p>
				</title>
				<aug>
					<au>
						<snm>Bairoch</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Apweiler</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<fpage>45</fpage>
				<lpage>48</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">102476</pubid>
						<pubid idtype="pmpid" link="fulltext">10592178</pubid>
						<pubid idtype="doi">10.1093/nar/28.1.45</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Improved tools for biological sequence comparison.</p>
				</title>
				<aug>
					<au>
						<snm>Pearson</snm>
						<fnm>WR</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1988</pubdate>
				<volume>85</volume>
				<fpage>2444</fpage>
				<lpage>2448</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">280013</pubid>
						<pubid idtype="pmpid">3162770</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>A systematic investigation identifies a significant number of probable pseudogenes in the <it>Escherichia coli </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Homma</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Fukuchi</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Kawabata</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ota</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Nishikawa</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>2002</pubdate>
				<volume>294</volume>
				<fpage>25</fpage>
				<lpage>33</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(02)00794-1</pubid>
						<pubid idtype="pmpid" link="fulltext">12234664</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>The genome sequence of <it>Rickettsia prowazekii </it>and the origin of mitochondria.</p>
				</title>
				<aug>
					<au>
						<snm>Andersson</snm>
						<fnm>SG</fnm>
					</au>
					<au>
						<snm>Zomorodipour</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Andersson</snm>
						<fnm>JO</fnm>
					</au>
					<au>
						<snm>Sicheritz-Ponten</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Alsmark</snm>
						<fnm>UC</fnm>
					</au>
					<au>
						<snm>Podowski</snm>
						<fnm>RM</fnm>
					</au>
					<au>
						<snm>Naslund</snm>
						<fnm>AK</fnm>
					</au>
					<au>
						<snm>Eriksson</snm>
						<fnm>AS</fnm>
					</au>
					<au>
						<snm>Winkler</snm>
						<fnm>HH</fnm>
					</au>
					<au>
						<snm>Kurland</snm>
						<fnm>CG</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1998</pubdate>
				<volume>396</volume>
				<fpage>133</fpage>
				<lpage>140</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/24094</pubid>
						<pubid idtype="pmpid" link="fulltext">9823893</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Pseudogenes, junk DNA, and the dynamics of <it>Rickettsia </it>genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Andersson</snm>
						<fnm>JO</fnm>
					</au>
					<au>
						<snm>Andersson</snm>
						<fnm>SG</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2001</pubdate>
				<volume>18</volume>
				<fpage>829</fpage>
				<lpage>839</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11319266</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>A bacterial genome in flux: the twelve linear and nine circular extrachromosomal DNAs in an infectious isolate of the Lyme disease spirochete <it>Borrelia burgdorferi</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Casjens</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Palmer</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>van Vugt</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>WM</fnm>
					</au>
					<au>
						<snm>Stevenson</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Rosa</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Lathigra</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Sutton</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Peterson</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Dodson</snm>
						<fnm>RJ</fnm>
					</au>
					<etal/>
				</aug>
				<source>Mol Microbiol</source>
				<pubdate>2000</pubdate>
				<volume>35</volume>
				<fpage>490</fpage>
				<lpage>516</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1365-2958.2000.01698.x</pubid>
						<pubid idtype="pmpid" link="fulltext">10672174</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>The Pfam protein families database.</p>
				</title>
				<aug>
					<au>
						<snm>Bateman</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Birney</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Durbin</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Eddy</snm>
						<fnm>SR</fnm>
					</au>
					<au>
						<snm>Howe</snm>
						<fnm>KL</fnm>
					</au>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>EL</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<fpage>263</fpage>
				<lpage>266</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">102420</pubid>
						<pubid idtype="pmpid" link="fulltext">10592242</pubid>
						<pubid idtype="doi">10.1093/nar/28.1.263</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>ATP transport and ABC proteins.</p>
				</title>
				<aug>
					<au>
						<snm>Guidotti</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Chem Biol</source>
				<pubdate>1996</pubdate>
				<volume>3</volume>
				<fpage>703</fpage>
				<lpage>706</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1074-5521(96)90244-6</pubid>
						<pubid idtype="pmpid">8939684</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Overview of bacterial ABC transporters.</p>
				</title>
				<aug>
					<au>
						<snm>Nikaido</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Hall</snm>
						<fnm>JA</fnm>
					</au>
				</aug>
				<source>Methods Enzymol</source>
				<pubdate>1998</pubdate>
				<volume>292</volume>
				<fpage>3</fpage>
				<lpage>20</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0076-6879(98)92003-1</pubid>
						<pubid idtype="pmpid">9711542</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Structure and association of ATP-binding cassette transporter nucleotide-binding domains.</p>
				</title>
				<aug>
					<au>
						<snm>Kerr</snm>
						<fnm>ID</fnm>
					</au>
				</aug>
				<source>Biochim Biophys Acta</source>
				<pubdate>2002</pubdate>
				<volume>1561</volume>
				<fpage>47</fpage>
				<lpage>64</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0304-4157(01)00008-9</pubid>
						<pubid idtype="pmpid" link="fulltext">11988180</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>A family of related ATP-binding subunits coupled to many distinct biological processes in bacteria.</p>
				</title>
				<aug>
					<au>
						<snm>Higgins</snm>
						<fnm>CF</fnm>
					</au>
					<au>
						<snm>Hiles</snm>
						<fnm>ID</fnm>
					</au>
					<au>
						<snm>Salmond</snm>
						<fnm>GP</fnm>
					</au>
					<au>
						<snm>Gill</snm>
						<fnm>DR</fnm>
					</au>
					<au>
						<snm>Downie</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Evans</snm>
						<fnm>IJ</fnm>
					</au>
					<au>
						<snm>Holland</snm>
						<fnm>IB</fnm>
					</au>
					<au>
						<snm>Gray</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Buckel</snm>
						<fnm>SD</fnm>
					</au>
					<au>
						<snm>Bell</snm>
						<fnm>AW</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>1986</pubdate>
				<volume>323</volume>
				<fpage>448</fpage>
				<lpage>450</lpage>
				<xrefbib>
					<pubid idtype="pmpid">3762694</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Binding protein-dependent transport systems.</p>
				</title>
				<aug>
					<au>
						<snm>Higgins</snm>
						<fnm>CF</fnm>
					</au>
					<au>
						<snm>Hyde</snm>
						<fnm>SC</fnm>
					</au>
					<au>
						<snm>Mimmack</snm>
						<fnm>MM</fnm>
					</au>
					<au>
						<snm>Gileadi</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Gill</snm>
						<fnm>DR</fnm>
					</au>
					<au>
						<snm>Gallagher</snm>
						<fnm>MP</fnm>
					</au>
				</aug>
				<source>J Bioenerg Biomembr</source>
				<pubdate>1990</pubdate>
				<volume>22</volume>
				<fpage>571</fpage>
				<lpage>592</lpage>
				<xrefbib>
					<pubid idtype="pmpid">2229036</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Whole-genome comparison of <it>Mycobacterium tuberculosis </it>clinical and laboratory strains.</p>
				</title>
				<aug>
					<au>
						<snm>Fleischmann</snm>
						<fnm>RD</fnm>
					</au>
					<au>
						<snm>Alland</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Eisen</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Carpenter</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Peterson</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>DeBoy</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Dodson</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Gwinn</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Haft</snm>
						<fnm>D</fnm>
					</au>
					<etal/>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>2002</pubdate>
				<volume>184</volume>
				<fpage>5479</fpage>
				<lpage>5490</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">135346</pubid>
						<pubid idtype="pmpid" link="fulltext">12218036</pubid>
						<pubid idtype="doi">10.1128/JB.184.19.5479-5490.2002</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>GGDEF domain is homologous to adenylyl cyclase.</p>
				</title>
				<aug>
					<au>
						<snm>Pei</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Grishin</snm>
						<fnm>NV</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>2001</pubdate>
				<volume>42</volume>
				<fpage>210</fpage>
				<lpage>216</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/1097-0134(20010201)42:2&lt;210::AID-PROT80&gt;3.0.CO;2-8</pubid>
						<pubid idtype="pmpid" link="fulltext">11119645</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Identification and analysis of the gas vesicle gene cluster on an unstable plasmid of <it>Halobacterium halobium</it>.</p>
				</title>
				<aug>
					<au>
						<snm>DasSarma</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Experientia</source>
				<pubdate>1993</pubdate>
				<volume>49</volume>
				<fpage>482</fpage>
				<lpage>486</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8335077</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Transposition in prokaryotes: transposon Tn501.</p>
				</title>
				<aug>
					<au>
						<snm>Brown</snm>
						<fnm>NL</fnm>
					</au>
					<au>
						<snm>Evans</snm>
						<fnm>LR</fnm>
					</au>
				</aug>
				<source>Res Microbiol</source>
				<pubdate>1991</pubdate>
				<volume>142</volume>
				<fpage>689</fpage>
				<lpage>700</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0923-2508(91)90082-L</pubid>
						<pubid idtype="pmpid">1660177</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>The Tn5 transposon.</p>
				</title>
				<aug>
					<au>
						<snm>Reznikoff</snm>
						<fnm>WS</fnm>
					</au>
				</aug>
				<source>Annu Rev Microbiol</source>
				<pubdate>1993</pubdate>
				<volume>47</volume>
				<fpage>945</fpage>
				<lpage>963</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">7504907</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Complete genome sequence of <it>Neisseria meningitidis </it>serogroup B strain MC58.</p>
				</title>
				<aug>
					<au>
						<snm>Tettelin</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Saunders</snm>
						<fnm>NJ</fnm>
					</au>
					<au>
						<snm>Heidelberg</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Jeffries</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>Nelson</snm>
						<fnm>KE</fnm>
					</au>
					<au>
						<snm>Eisen</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Ketchum</snm>
						<fnm>KA</fnm>
					</au>
					<au>
						<snm>Hood</snm>
						<fnm>DW</fnm>
					</au>
					<au>
						<snm>Peden</snm>
						<fnm>JF</fnm>
					</au>
					<au>
						<snm>Dodson</snm>
						<fnm>RJ</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2000</pubdate>
				<volume>287</volume>
				<fpage>1809</fpage>
				<lpage>1815</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.287.5459.1809</pubid>
						<pubid idtype="pmpid" link="fulltext">10710307</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Crystal structure of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases.</p>
				</title>
				<aug>
					<au>
						<snm>Dyda</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Hickman</snm>
						<fnm>AB</fnm>
					</au>
					<au>
						<snm>Jenkins</snm>
						<fnm>TM</fnm>
					</au>
					<au>
						<snm>Engelman</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Craigie</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Davies</snm>
						<fnm>DR</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1994</pubdate>
				<volume>266</volume>
				<fpage>1981</fpage>
				<lpage>1986</lpage>
				<xrefbib>
					<pubid idtype="pmpid">7801124</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Amelioration of bacterial genomes: rates of change and exchange.</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1997</pubdate>
				<volume>44</volume>
				<fpage>383</fpage>
				<lpage>397</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9089078</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Global dinucleotide signatures and analysis of genomic heterogeneity.</p>
				</title>
				<aug>
					<au>
						<snm>Karlin</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Curr Opin Microbiol</source>
				<pubdate>1998</pubdate>
				<volume>1</volume>
				<fpage>598</fpage>
				<lpage>610</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1369-5274(98)80095-7</pubid>
						<pubid idtype="pmpid" link="fulltext">10066522</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Detecting alien genes in bacterial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Mrazek</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Karlin</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Ann NY Acad Sci</source>
				<pubdate>1999</pubdate>
				<volume>870</volume>
				<fpage>314</fpage>
				<lpage>329</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10415493</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>How to interpret an anonymous bacterial genome: machine learning approach to gene identification.</p>
				</title>
				<aug>
					<au>
						<snm>Hayes</snm>
						<fnm>WS</fnm>
					</au>
					<au>
						<snm>Borodovsky</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>1998</pubdate>
				<volume>8</volume>
				<fpage>1154</fpage>
				<lpage>1171</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9847079</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>On surrogate methods for detecting lateral gene transfer.</p>
				</title>
				<aug>
					<au>
						<snm>Ragan</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>FEMS Microbiol Lett</source>
				<pubdate>2001</pubdate>
				<volume>201</volume>
				<fpage>187</fpage>
				<lpage>191</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1097(01)00262-2</pubid>
						<pubid idtype="pmpid" link="fulltext">11470360</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>Reconciling the many faces of lateral gene transfer.</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Trends Microbiol</source>
				<pubdate>2002</pubdate>
				<volume>10</volume>
				<fpage>1</fpage>
				<lpage>4</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0966-842X(01)02282-X</pubid>
						<pubid idtype="pmpid" link="fulltext">11755071</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>Molecular archaeology of the <it>Escherichia coli </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<fpage>9413</fpage>
				<lpage>9417</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">21352</pubid>
						<pubid idtype="pmpid" link="fulltext">9689094</pubid>
						<pubid idtype="doi">10.1073/pnas.95.16.9413</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>GeneTRACE-reconstruction of gene content of ancestral species. Bioinformatics.</p>
				</title>
				<aug>
					<au>
						<snm>Kunin</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<fpage>1412</fpage>
				<lpage>1416</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg174</pubid>
						<pubid idtype="pmpid" link="fulltext">12874054</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>Decoupling of genome size and sequence divergence in a symbiotic bacterium.</p>
				</title>
				<aug>
					<au>
						<snm>Wernegreen</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>IB</fnm>
					</au>
					<au>
						<snm>Moran</snm>
						<fnm>NA</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>2000</pubdate>
				<volume>182</volume>
				<fpage>3867</fpage>
				<lpage>3869</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">94565</pubid>
						<pubid idtype="pmpid" link="fulltext">10851009</pubid>
						<pubid idtype="doi">10.1128/JB.182.13.3867-3869.2000</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<aug>
					<au>
						<snm>Mizrahi</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Dawes</snm>
						<fnm>SS</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>In Molecular Genetics of Mycobacteria</source>
				<publisher>Washington, DC: American Society for Microbiology</publisher>
				<editor>Hatfull GF, Jacobs WR Jr</editor>
				<pubdate>2000</pubdate>
				<fpage>159</fpage>
				<lpage>172</lpage>
			</bibl>
			<bibl id="B43">
				<title>
					<p>Comparative genomics of microbial pathogens and symbionts.</p>
				</title>
				<aug>
					<au>
						<snm>Andersson</snm>
						<fnm>SG</fnm>
					</au>
					<au>
						<snm>Alsmark</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Canback</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Davids</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Frank</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Karlberg</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Klasson</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Antoine-Legault</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Mira</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Tamas</snm>
						<fnm>I</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<issue>Suppl 2</issue>
				<fpage>S17</fpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12385978</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B44">
				<title>
					<p>Microbial minimalism: genome reduction in bacterial pathogens.</p>
				</title>
				<aug>
					<au>
						<snm>Moran</snm>
						<fnm>NA</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>2002</pubdate>
				<volume>108</volume>
				<fpage>583</fpage>
				<lpage>586</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0092-8674(02)00665-7</pubid>
						<pubid idtype="pmpid" link="fulltext">11893328</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B45">
				<title>
					<p>A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs.</p>
				</title>
				<aug>
					<au>
						<snm>Harrison</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Carriero</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2003</pubdate>
				<volume>333</volume>
				<fpage>885</fpage>
				<lpage>892</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.jmb.2003.09.016</pubid>
						<pubid idtype="pmpid" link="fulltext">14583187</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B46">
				<title>
					<p>The EMBL Nucleotide Sequence Database.</p>
				</title>
				<aug>
					<au>
						<snm>Stoesser</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Baker</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>van den Broek</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Camon</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Garcia-Pastor</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kanz</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Kulikova</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Leinonen</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Lin</snm>
						<fnm>Q</fnm>
					</au>
					<au>
						<snm>Lombard</snm>
						<fnm>V</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>21</fpage>
				<lpage>26</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">99098</pubid>
						<pubid idtype="pmpid" link="fulltext">11752244</pubid>
						<pubid idtype="doi">10.1093/nar/30.1.21</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B47">
				<title>
					<p>On the total number of genes and their length distribution in complete microbial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Skovgaard</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Jensen</snm>
						<fnm>LJ</fnm>
					</au>
					<au>
						<snm>Brunak</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ussery</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Krogh</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<fpage>425</fpage>
				<lpage>428</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(01)02372-1</pubid>
						<pubid idtype="pmpid" link="fulltext">11485798</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B48">
				<title>
					<p>Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>335</fpage>
				<lpage>337</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(02)02668-9</pubid>
						<pubid idtype="pmpid" link="fulltext">12127765</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B49">
				<title>
					<p>Statistics of local complexity in amino acid sequences and sequence databases.</p>
				</title>
				<aug>
					<au>
						<snm>Wootton</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Federhen</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Comput Chem</source>
				<pubdate>1993</pubdate>
				<volume>17</volume>
				<fpage>149</fpage>
				<lpage>163</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1016/0097-8485(93)85006-X</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B50">
				<title>
					<p>RECODE: a database of frameshifting, bypassing and codon redefinition utilized for gene expression.</p>
				</title>
				<aug>
					<au>
						<snm>Baranov</snm>
						<fnm>PV</fnm>
					</au>
					<au>
						<snm>Gurvich</snm>
						<fnm>OL</fnm>
					</au>
					<au>
						<snm>Fayet</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Prere</snm>
						<fnm>MF</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>WA</fnm>
					</au>
					<au>
						<snm>Gesteland</snm>
						<fnm>RF</fnm>
					</au>
					<au>
						<snm>Atkins</snm>
						<fnm>JF</fnm>
					</au>
					<au>
						<snm>Giddings</snm>
						<fnm>MC</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>264</fpage>
				<lpage>267</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">29850</pubid>
						<pubid idtype="pmpid" link="fulltext">11125107</pubid>
						<pubid idtype="doi">10.1093/nar/29.1.264</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B51">
				<title>
					<p>NCBI trace archive</p>
				</title>
				<url>http://www.ncbi.nlm.nih.gov/Traces</url>
			</bibl>
			<bibl id="B52">
				<title>
					<p>Ensembl trace archive</p>
				</title>
				<url>http://trace.ensembl.org</url>
			</bibl>
			<bibl id="B53">
				<title>
					<p>Comparative genome sequencing for discovery of novel polymorphisms in <it>Bacillus anthracis</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Read</snm>
						<fnm>TD</fnm>
					</au>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>Pop</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Shumway</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Umayam</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Jiang</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Holtzapple</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Busch</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>KL</fnm>
					</au>
					<au>
						<snm>Schupp</snm>
						<fnm>JM</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2002</pubdate>
				<volume>296</volume>
				<fpage>2028</fpage>
				<lpage>2033</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1071837</pubid>
						<pubid idtype="pmpid" link="fulltext">12004073</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B54">
				<title>
					<p>Statistical data analysis in the computer age.</p>
				</title>
				<aug>
					<au>
						<snm>Efron</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Tibshirani</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1991</pubdate>
				<volume>253</volume>
				<fpage>390</fpage>
				<lpage>395</lpage>
			</bibl>
		</refgrp>
	</bm>
</art>
