<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2004-5-2-r7</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Koonin</snm>
					<mi>V</mi>
					<fnm>Eugene</fnm>
					<insr iid="I1"/>
					<email>koonin@ncbi.nlm.nih.gov</email>
				</au>
				<au id="A2">
					<snm>Fedorova</snm>
					<mi>D</mi>
					<fnm>Natalie</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A3">
					<snm>Jackson</snm>
					<mi>D</mi>
					<fnm>John</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A4">
					<snm>Jacobs</snm>
					<mi>R</mi>
					<fnm>Aviva</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A5">
					<snm>Krylov</snm>
					<mi>M</mi>
					<fnm>Dmitri</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A6">
					<snm>Makarova</snm>
					<mi>S</mi>
					<fnm>Kira</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A7">
					<snm>Mazumder</snm>
					<fnm>Raja</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
				</au>
				<au id="A8">
					<snm>Mekhedov</snm>
					<mi>L</mi>
					<fnm>Sergei</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A9">
					<snm>Nikolskaya</snm>
					<mi>N</mi>
					<fnm>Anastasia</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A10">
					<snm>Rao</snm>
					<mnm>Sridhar</mnm>
					<fnm>B</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A11">
					<snm>Rogozin</snm>
					<mi>B</mi>
					<fnm>Igor</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A12">
					<snm>Smirnov</snm>
					<fnm>Sergei</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A13">
					<snm>Sorokin</snm>
					<mi>V</mi>
					<fnm>Alexander</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A14">
					<snm>Sverdlov</snm>
					<mi>V</mi>
					<fnm>Alexander</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A15">
					<snm>Vasudevan</snm>
					<fnm>Sona</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A16">
					<snm>Wolf</snm>
					<mi>I</mi>
					<fnm>Yuri</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A17">
					<snm>Yin</snm>
					<mi>J</mi>
					<fnm>Jodie</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A18">
					<snm>Natale</snm>
					<mi>A</mi>
					<fnm>Darren</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA</p>
				</ins>
				<ins id="I2">
					<p>Current address: Protein Identification Resource, Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007, USA</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2004</pubdate>
			<volume>5</volume>
			<issue>2</issue>
			<fpage>R7</fpage>
			<url>http://genomebiology.com/2004/5/2/R7</url>
			<xrefbib>
				<pubid idtype="pmpid">14759257</pubid>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>23</day>
					<month>10</month>
					<year>2003</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>1</day>
					<month>12</month>
					<year>2003</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>4</day>
					<month>12</month>
					<year>2003</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>15</day>
					<month>1</month>
					<year>2004</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2004</year>
			<collab>Koonin et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
		</cpyrt>
		<shorttitle>
			<p>A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes</p>
		</shorttitle>
		<shortabs>
			<p>We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs from seven eukaryotic genomes. The analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: <it>Caenorhabditis elegans</it>, <it>Drosophila melanogaster</it>, <it>Homo sapiens</it>, <it>Arabidopsis thaliana</it>, <it>Saccharomyces cerevisiae</it>, <it>Schizosaccharomyces pombe </it>and <it>Encephalitozoon cuniculi</it>. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st>
					<p>The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010015">Model organisms</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Comparative analysis of genomes from distant species provides new insights into gene functions, genome evolution and phylogeny. In particular, the comparative genomics of prokaryotes has revealed previously underappreciated major trends in genome evolution, namely, extensive lineage-specific gene loss and horizontal gene transfer (HGT) <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. To efficiently extract functional and evolutionary information from multiple genomes, rational classification of genes based on homologous relationships is indispensable. The two principal classes of homologs are orthologs and paralogs <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Orthologs are defined as homologous genes that evolved via vertical descent from a single ancestral gene in the last common ancestor of the compared species. Paralogs are homologous genes, which, at some stage of evolution, have evolved by duplication of an ancestral gene. Orthology and paralogy are intimately linked because, if a duplication (or a series of duplications) occurs after the speciation event that separated the compared species, orthology becomes a relationship between sets of paralogs, rather than individual genes (in which case, such genes are called co-orthologs).</p>
			<p>Correct identification of orthologs and paralogs is of central importance for both the functional and evolutionary aspects of comparative genomics <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Orthologs typically occupy the same functional niche in different organisms; in contrast, paralogs evolve to functional diversification as they diverge after the duplication <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Therefore, robustness of genome annotation depends on accurate identification of orthologs. A clear demarcation of orthologs and paralogs is also required for constructing evolutionary scenarios, which include, along with vertical inheritance, lineage-specific gene loss and HGT <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr></abbrgrp>.</p>
			<p>In principle, orthologs, including co-orthologs, should be identified by means of phylogenetic analysis of entire families of homologous proteins, which is expected to define orthologous protein sets as clades <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. However, for genome-wide protein sets, such analysis remains extremely labor-intensive, and error-prone as well. Accordingly, procedures have been developed for identifying sets of likely orthologs without explicit referral to phylogenetic analysis. These procedures are based on the notion of a genome-specific best hit (BeT), that is, the protein from a target genome that is most similar (typically in terms of similarity scores computed using BLAST or another sequence-comparison method) to a given protein from the query genome <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. The assumption central to this approach is that orthologs have a greater similarity to each other than to any other protein from the respective genomes. When multiple genomes are analyzed, pairs of probable orthologs detected on the basis of BeTs are combined into orthologous clusters represented in all or a subset of the analyzed genomes <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B22">22</abbr></abbrgrp>. This approach, amended with additional procedures for detecting co-orthologous protein sets and for treating multidomain proteins, was implemented in the database of Clusters of Orthologous Groups (COGs) of proteins <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The current COG set includes approximately 70% of the proteins encoded in 69 genomes of prokaryotes and unicellular eukaryotes <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The COGs have been used for functional annotation of new genomes <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>, target selection in structural genomics <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>, identification of potential drug targets <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp> and genome-wide evolutionary studies <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B13">13</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. Sonnhammer and co-workers independently developed a similar methodology for identification of co-orthologous protein sets from pairwise genome comparisons and applied it to the sequenced eukaryotic genomes <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>.</p>
			<p>A central notion introduced in the context of the COG analysis is that of a phyletic pattern, that is, the pattern of representation (presence-absence) of analyzed species in each COG <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B20">20</abbr></abbrgrp>. Similar concepts have been independently developed and applied by others <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. The COGs show a remarkable scatter of phyletic patterns, with only a small minority represented in all sequenced genomes. A recent quantitative study showed that parsimonious evolutionary scenarios for most COGs involve multiple events of gene loss and HGT <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Both similarity and complementarity among the phyletic patterns of COGs, in conjunction with other information, such as conservation of gene order, have been successfully employed to predict gene functions <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>. The comparison of phyletic pattern has been formalized in set-theoretical algorithms and systematically applied to the computational and experimental analysis of bacterial flagellar systems, which demonstrated the considerable robustness of this approach <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
			<p>We recently extended the system of orthologous protein clusters to complex, multicellular eukaryotes <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Here, we examine the phyletic patterns of KOGs in connection with known and predicted protein functions. In-depth analysis of some of these KOGs resulted in prediction of previously uncharacterized, but apparently essential, conserved eukaryotic protein functions. We also reconstruct the parsimonious scenario of evolution of the crown-group eukaryotes by assigning the loss of genes (KOGs) and emergence of new genes to the branches of the phylogenetic tree and explicitly delineate the minimal gene sets for various ancestral forms. To our knowledge, this is the first systematic, genome-wide examination of the sets of orthologous genes in eukaryotes.</p>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>KOGs for seven sequenced eukaryotic genomes: functional and evolutionary implications of phyletic patterns</p>
				</st>
				<p>Eukaryotic KOGs were constructed on the basis of the comparison of proteins encoded in the genomes of three animals (<it>Homo sapiens </it><abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, the fruit fly <it>Drosophila melanogaster </it><abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and the nematode <it>Caenorhabditis elegans </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp>), the green plant <it>Arabidopsis thaliana </it>(thale cress) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, two fungi (budding yeast <it>Saccharomyces cerevisiae </it><abbrgrp><abbr bid="B49">49</abbr></abbrgrp> and fission yeast <it>Schizosaccharomyces pombe </it><abbrgrp><abbr bid="B50">50</abbr></abbrgrp>) and the microsporidian <it>Encephalitozoon cuniculi </it><abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. The procedure for KOG construction was a modification of the one previously used for COGs <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B24">24</abbr></abbrgrp> and is described in greater detail elsewhere (<abbrgrp><abbr bid="B25">25</abbr></abbrgrp>; see also Materials and methods). An important difference stems from the fact that complex eukaryotes encode many more multidomain proteins than prokaryotes and, furthermore, orthologous eukaryotic proteins often differ in domain composition, with additional domains accrued in more complex forms <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B45">45</abbr></abbrgrp>. Accordingly, and unlike the original COG construction procedure, probable orthologs with different domain architectures were assigned to one KOG and were not split if they shared a common core of domains. In addition to the KOGs, which consisted of at least three species, clusters of putative orthologs from two species (TWOGs) and lineage-specific expansions (LSEs) of paralogs from each of the analyzed genomes were identified (<abbrgrp><abbr bid="B25">25</abbr><abbr bid="B52">52</abbr></abbrgrp>; see also Materials and methods). In most of the analyses discussed below, KOGs and TWOGs are treated together, unless otherwise specified.</p>
				<p>Figure <figr fid="F1">1</figr> shows the assignment of the proteins from each of the analyzed eukaryotes to KOGs with different numbers of species, TWOGs and LSEs. The fraction of proteins assigned to KOGs tends to decrease with the increasing genome size, from 81% for <it>S. pombe </it>to 51% for the largest, the human genome. (For reasons that remain unclear, but might be related to its intracellular parasitic lifestyle, <it>E. cuniculi </it>has a relatively small fraction of conserved proteins that belonged to KOGs: approximately 60%.) The contribution of LSEs shows the opposite trend, being the greatest in the largest genomes, that is, human and <it>Arabidopsis</it>, and minimal in the microsporidian (Figure <figr fid="F1">1</figr>). A notable difference was observed between eukaryotes in terms of their representation in KOGs found in different numbers of species. While the three unicellular organisms are represented mainly in the highly conserved seven- or six-species KOGs, a much larger fraction of the gene set in animals and <it>Arabidopsis </it>is accounted for by LSEs, and by KOGs found in three or four genomes. These include animal-specific genes and genes that are shared by plants and animals but not by fungi and the microsporidian (Figure <figr fid="F1">1</figr>). The large number of KOGs in the latter group (700 KOGs represented in <it>Arabidopsis </it>and at least two animal species) is notable and probably results from massive, lineage-specific loss of genes during eukaryotic evolution (see below).</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Assignment of proteins from each of the seven analyzed eukaryotic genomes to KOGs with different numbers of species and to LSEs</p>
					</caption>
					<text>
						<p>Assignment of proteins from each of the seven analyzed eukaryotic genomes to KOGs with different numbers of species and to LSEs. 0, Proteins without detectable homologs (singletons); 1, LSEs. Species abbreviations: Ath, <it>Arabidopsis thaliana</it>; Cel, <it>Caenorhabditis elegans</it>; Dme, <it>Drosophila melanogaster</it>; Ecu, <it>Encephalitozoon cuniculi</it>; Hsa, <it>Homo sapiens</it>; Sce, <it>Saccharomyces cerevisisae</it>; Spo, <it>Schizosaccharomyces pombe</it>.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-1"/>
				</fig>
				<p>The phyletic patterns of KOGs reveal both the existence of a conserved eukaryotic gene core and substantial diversity. The 'pan-eukaryotic' genes, which are represented in each of the seven analyzed genomes, account for around 20% of the KOGs, and approximately the same number of KOGs include all species except for the microsporidian, an intracellular parasite with a highly degraded genome <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Among the remaining KOGs, a large group includes representatives of the three analyzed animal species (worm, fly and humans) but a substantial fraction (approximately 30%) are KOGs with unexpected patterns, for example, one animal, one plant and one fungal species (see <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> and examples in Table <tblr tid="T1">1</tblr>).</p>
				<tbl id="T1" hint_layout="double">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>KOGs and TWOGs with unexpected phyletic patterns (examples)</p>
					</caption>
					<tblbdy cols="5">
						<r>
							<c ca="left">
								<p>KOG/TWOG number</p>
							</c>
							<c ca="left">
								<p>Phyletic pattern*</p>
							</c>
							<c ca="left">
								<p>(Predicted) structure and function</p>
							</c>
							<c ca="left">
								<p>Prokaryotic homologs</p>
							</c>
							<c ca="left">
								<p>Comments</p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0892</p>
							</c>
							<c ca="left">
								<p>---H--E</p>
							</c>
							<c ca="left">
								<p>Discoidin domain protein, potential regulator of proteasome activity</p>
							</c>
							<c ca="left">
								<p>Detected in a few phylogenetically scattered bacteria, no COG so far <abbrgrp><abbr bid="B69">69</abbr></abbrgrp></p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0263</p>
							</c>
							<c ca="left">
								<p>A-----E</p>
							</c>
							<c ca="left">
								<p>ATP/ADP translocase</p>
							</c>
							<c ca="left">
								<p>ATP/ADP translocases of chlamydia, rickettsia, <it>Xylella fastidiosa</it></p>
							</c>
							<c ca="left">
								<p>ATP/ADP translocase is a hallmark of intracellular parasites and symbionts, which allows them to scavenge ATP from the host cell; chloroplast protein in plants. Could be acquired by plants and microsporidia via independent HGT from bacteria. <abbrgrp><abbr bid="B58">58</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0689</p>
							</c>
							<c ca="left">
								<p>---HY--</p>
							</c>
							<c ca="left">
								<p>Uncharacterized protein essential for propionate metabolism</p>
							</c>
							<c ca="left">
								<p>PrpD protein of several bacteria and archaea (COG2079)</p>
							</c>
							<c ca="left">
								<p>The yeast and human (and the orthologs from other vertebrates) proteins show the greatest similarity to different subsets of bacterial orthologs, which might suggest independent HGT events.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0871</p>
							</c>
							<c ca="left">
								<p>---H-P-</p>
							</c>
							<c ca="left">
								<p>Uncharacterized conserved protein, probably enzyme</p>
							</c>
							<c ca="left">
								<p>COG4336, sporadic representation in several bacterial lineages</p>
							</c>
							<c ca="left">
								<p>The human (and mouse) protein has an additional domain conserved in the archaeon <it>Pyrococcus</it>. Human and <it>S. pombe </it>proteins are most similar to different subsets of bacterial homologs, which suggests the possibility of independent HGT events.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0788</p>
							</c>
							<c ca="left">
								<p>A----P-</p>
							</c>
							<c ca="left">
								<p>Urease</p>
							</c>
							<c ca="left">
								<p>Ureases of many bacterial species</p>
							</c>
							<c ca="left">
								<p>Highly conserved enzyme present in plants and many fungi but not <it>S. cerevisiae</it>. Plant and fungal ureases have a common domain architecture distinct from that of bacterial orthologs, which suggests monophyletic origin. Might have evolved via early HGT from bacteria (proto-mitochondria?) with subsequent loss in animals and some fungi.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4751</p>
							</c>
							<c ca="left">
								<p>A--H--E</p>
							</c>
							<c ca="left">
								<p>Recombination repair protein BRCA2, contains varying number of BRCA2 repeats</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="left">
								<p>Although sequence conservation is limited to the BRC repeats <abbrgrp><abbr bid="B101">101</abbr></abbrgrp> the number of which varies substantially, statistical significance of the observed sequence similarity and the absence of other homologs suggests that the proteins in this KOG are true orthologs. Apparent orthologs of BRCA2 are detectable also in other species from the taxa represented in the KOGs (mosquito <it>Anopheles gambiae</it>, fungus <it>Ustilago maydis</it>) <abbrgrp><abbr bid="B102">102</abbr></abbrgrp> and in early-branching eukaryotes (<it>Leishmania</it>, <it>Trypanosoma</it>; E.V.K., unpublished work), suggesting that evolution of BRCA2 involved multiple gene losses</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4597</p>
							</c>
							<c ca="left">
								<p>A--H--E</p>
							</c>
							<c ca="left">
								<p>TATA-binding protein 1-interacting protein</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="left">
								<p>Probable multiple gene losses</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4486</p>
							</c>
							<c ca="left">
								<p>A--H--E</p>
							</c>
							<c ca="left">
								<p>3-methyl-adenine DNA glycosylase</p>
							</c>
							<c ca="left">
								<p>Orthologs in many bacteria (COG2094)</p>
							</c>
							<c ca="left">
								<p>The plant protein and those from mammals and microsporidia show the greatest similarity to different subsets of bacterial orthologs. Evolution might have included a combination of gene loss and independent HGT events</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1594</p>
							</c>
							<c ca="left">
								<p>A-D-Y--</p>
							</c>
							<c ca="left">
								<p>Predicted epimerase related to aldose 1-epimerase</p>
							</c>
							<c ca="left">
								<p>Bacterial orthologs, primarily proteobacteria (COG0676)</p>
							</c>
							<c ca="left">
								<p>Eukaryotic proteins are more closely related to each other than to bacterial orthologs, indicating monophyletic origin. Function remains unknown; might be involved in a distinct and still uncharacterized pathway of polysaccharide biosynthesis. LSE in <it>Arabidopsis </it>(seven paralogs).</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4141</p>
							</c>
							<c ca="left">
								<p>---HYPE</p>
							</c>
							<c ca="left">
								<p>Rad52/22, protein involved in double-strand break repair</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="left">
								<p>Probable gene loss in plants, insects and nematodes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4528</p>
							</c>
							<c ca="left">
								<p>-CDH--E</p>
							</c>
							<c ca="left">
								<p>Uncharacterized predicted enzyme, possibly a polynucleotide kinase (structure of the ortholog from the bacterium <it>Thermotoga maritima </it>has been determined - pdb code 1j5u)</p>
							</c>
							<c ca="left">
								<p>Conserved in all archaea and several bacteria (COG1371)</p>
							</c>
							<c ca="left">
								<p>Context analysis of archaeal and bacterial genomes suggests functional interaction between proteins of KOG5324 and KOG4246, RNA 3'-terminal phosphate cyclase (KOG4398, COG0430), and tRNA/rRNA cytosine C5-methylase (KOG1299/COG0144) (<abbrgrp><abbr bid="B103">103</abbr></abbrgrp> and E.V.K., unpublished observations). Taken together, the observations appear to implicate KOG5324 and KOG4246 in a still uncharacterized pathway of rRNA and/or tRNA processing and modification. Conservation of these proteins in archaea and early-branching eukaryotes suggests lineage-specific gene loss in plants and fungi.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3833</p>
							</c>
							<c ca="left">
								<p>-CDH--E</p>
							</c>
							<c ca="left">
								<p>Uncharacterized predicted enzyme, possibly a polynuclotide phosphatase</p>
							</c>
							<c ca="left">
								<p>Conserved in all archaea and several bacteria (COG1690)</p>
							</c>
							<c ca="left">
								<p>See comment for KOG5324</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*Abbreviations: A, thale cress <it>A. thaliana</it>; C, nematode <it>C. elegans</it>; D, fruit fly <it>D. melanogaster</it>; E, microsporidian <it>Encephalitozoon cuniculi</it>; H, <it>Homo sapiens</it>; S, budding yeast <it>S. cerevisiae</it>; P, fission yeast <it>S. pombe</it>; a letter indicates the presence of the respective species in the given KOG and a dash indicates its absence.</p>
					</tblfn>
				</tbl>
				<p>During the manual curation of the KOG set, the KOGs with unexpected patterns were scrutinized in an effort to detect potential highly diverged members from one or more of the analyzed genomes. Some of these unexpected patterns might indicate that a gene is still missing in the analyzed set of protein sequences from one or more of the species included; reports of newly discovered genes have appeared since the release of the initial reports on genome sequences of complex eukaryotes, for example, as a result of massive sequencing of human cDNAs <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, exhaustive annotation of the <it>Drosophila </it>genome <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> and comparative analysis of closely related yeast genomes <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. The unexpected phyletic patterns seem, however, largely to reflect the extensive, lineage-specific gene loss that is characteristic of eukaryotic evolution <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>; on many occasions, this scenario is supported by the presence of orthologs in other eukaryotic lineages and/or in prokaryotes (Table <tblr tid="T1">1</tblr>). However, interesting exceptions to the multiple loss explanation might exist as exemplified by the ATP/ADP-translocase, which is present in <it>Arabidopsis </it>and <it>Encephalitozoon </it>and could have evolved via independent HGT from intracellular bacterial parasites (<abbrgrp><abbr bid="B58">58</abbr></abbrgrp> and Table <tblr tid="T2">2</tblr>).</p>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>KOGs represented by exactly one ortholog in seven analyzed eukaryotic genomes (examples)</p>
					</caption>
					<tblbdy cols="8">
						<r>
							<c ca="left">
								<p>KOG number</p>
							</c>
							<c ca="left">
								<p>(Predicted) function</p>
							</c>
							<c ca="left">
								<p>Multiprotein complex</p>
							</c>
							<c ca="center">
								<p>Functional class*</p>
							</c>
							<c ca="left">
								<p>Prokaryotic homologs</p>
							</c>
							<c cspan="2" ca="center">
								<p>Fitness class<sup>&#8224;</sup></p>
							</c>
							<c ca="left">
								<p>Comments</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c cspan="2">
								<hr/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>Yeast<sup>&#8225;</sup></p>
							</c>
							<c ca="center">
								<p>Worm<sup>&#167;</sup></p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="8">
								<hr/>
							</c>
						</r>
						<r>
							<c cspan="8" ca="left">
								<p>
									<b>Genes experimentally or computationally characterized previously</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0392</p>
							</c>
							<c ca="left">
								<p>SNF2 family DNA-dependent ATPase</p>
							</c>
							<c ca="left">
								<p>TBP-DNA complex</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Many bacteria and archaea (COG0553)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Involved in regulation of transcription from POL II promoters <abbrgrp><abbr bid="B104">104</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0121</p>
							</c>
							<c ca="left">
								<p>Nuclear cap-binding protein complex, subunit CBP20 (RRM-domain-containing RNA-binding protein)</p>
							</c>
							<c ca="left">
								<p>Cap-binding complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Several bacteria (COG0724)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>RRM-domain proteins show scattered presence in bacteria and might have been horizontally transferred from eukaryotes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0213</p>
							</c>
							<c ca="left">
								<p>U2-snRNP associated splicing factor 3b, subunit 1</p>
							</c>
							<c ca="left">
								<p>Spliceosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0227</p>
							</c>
							<c ca="left">
								<p>snRNA-associated protein, splicing factor 3a, subunit b (Prp11p)</p>
							</c>
							<c ca="left">
								<p>Spliceosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2268</p>
							</c>
							<c ca="left">
								<p>Predicted nucleic-acid-binding protein kinase of the RIO1 family; 40S ribosomal subunit biogenesis/18S rRNA processing</p>
							</c>
							<c ca="left">
								<p>Pre-40S subunit</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Orthologs in most archaea but not in bacteria (COG0478)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>One of the very small number of protein kinases that show a clear-cut orthologous relationship between all eukaryotes and most archaea, and, apparently, the only one containing a helix-turn-helix nucleic-acid-binding domain. <abbrgrp><abbr bid="B105">105</abbr></abbrgrp> Associated with yeast pre-40S subunit and required for its maturation. <abbrgrp><abbr bid="B106">106</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3031</p>
							</c>
							<c ca="left">
								<p>Protein required for 60S ribosomal subunit biogenesis; <abbrgrp><abbr bid="B107">107</abbr></abbrgrp> contains the IMP4 domain, which is involved in rRNA processing <abbrgrp><abbr bid="B108">108</abbr></abbrgrp>; paralog of KOG3095 and KOG3292, which are also represented in all analyzed genomes.</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Distantly related to COG2136, represented by orthologs in most archaea, but not in bacteria (KSM, unpublished)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>The COG2136 proteins appear to be subunits of the predicted archaeal exosome <abbrgrp><abbr bid="B109">109</abbr></abbrgrp>. Apparently, this gene has undergone at least two ancient duplications in eukaryotes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3045</p>
							</c>
							<c ca="left">
								<p>Predicted RNA methylase involved in rRNA processing</p>
							</c>
							<c ca="left">
								<p>Processosome?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Distantly related to numerous Rossmann-fold methylases but prokaryotic orthologs could not be confidently identified</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>This protein (Rrp8p in yeast) has been shown to participate in the processing of rRNA and sequence analysis reveals the presence of a Rossmann-fold methylase domain <abbrgrp><abbr bid="B110">110</abbr></abbrgrp>. Therefore Rrp8p probably methylates either snoRNA or rRNA itself.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3064</p>
							</c>
							<c ca="left">
								<p>RNA-binding nuclear protein containing a distinct C4 Zn-finger; implicated in the biogenesis of 60S ribosomal subunits <abbrgrp><abbr bid="B111">111</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>Initially identified in yeast as the MAK16 protein required for dsRNA virus reproduction <abbrgrp><abbr bid="B112">112</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0291, 0302, 0306, 310, 0319, 0650, 1272</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins, subunits of rRNA processing complexes <abbrgrp><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)</p>
							</c>
							<c ca="center">
								<p>all 0</p>
							</c>
							<c ca="center">
								<p>X,X,1,X,1,1,1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0284</p>
							</c>
							<c ca="left">
								<p>Polyadenylation factor I complex, subunit PFS2, WD40-repeat protein</p>
							</c>
							<c ca="left">
								<p>Poly-adenylation complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Same as above (COG2319)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0337</p>
							</c>
							<c ca="left">
								<p>RNA helicase involved in 28S rRNA processing</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most of the archaea and bacteria (COG0513)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0343</p>
							</c>
							<c ca="left">
								<p>RNA helicase involved in 28S rRNA processing</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most of the archaea and bacteria (COG0513)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1069</p>
							</c>
							<c ca="left">
								<p>3'-5' exoribonuclease (RNAse PH), exosome subunit Rrp46</p>
							</c>
							<c ca="left">
								<p>Exosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most bacteria and archaea (COG0689)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1070</p>
							</c>
							<c ca="left">
								<p>Exosome subunit Rrp5 (RNA-binding S1 domain fused to TPR repeats)</p>
							</c>
							<c ca="left">
								<p>Exosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most bacteria (COG0539, COG0457)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1135</p>
							</c>
							<c ca="left">
								<p>mRNA cleavage and polyadenylation complex subunit CFT2 (CPSF)</p>
							</c>
							<c ca="left">
								<p>Cleavage and polyadenylation complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea and some bacteria (COG1236)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1914</p>
							</c>
							<c ca="left">
								<p>mRNA cleavage and polyadenylation factor I complex, subunit RNA14</p>
							</c>
							<c ca="left">
								<p>Cleavage and polyadenylation complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1975</p>
							</c>
							<c ca="left">
								<p>RNA (guanine-7-) methyltransferase (capping enzyme subunit)</p>
							</c>
							<c ca="left">
								<p>Capping enzyme</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Numerous methyltrans-ferases (COG0500) but no ortholog</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2051</p>
							</c>
							<c ca="left">
								<p>Nonsense-mediated mRNA decay complex, subunit 2</p>
							</c>
							<c ca="left">
								<p>NMD complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2554</p>
							</c>
							<c ca="left">
								<p>Pseudouridylate synthase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG0101)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2613</p>
							</c>
							<c ca="left">
								<p>Upf1p-interacting protein, NMD complex subunit Nmd3p</p>
							</c>
							<c ca="left">
								<p>NMD complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1499)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2771</p>
							</c>
							<c ca="left">
								<p>tRNA-specific adenosine-34 deaminase subunit Tad3p</p>
							</c>
							<c ca="left">
								<p>Heterodimeric RNA-specific deaminase</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0590)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2780</p>
							</c>
							<c ca="left">
								<p>Protein involved in ribosomal large subunit assembly (RPF1), contains IMP4 domain</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea, no bacteria (COG2136)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2781</p>
							</c>
							<c ca="left">
								<p>Subunit of the small (ribosomal) subunit (SSU) processosome (snoRNP), IMP4</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea, no bacteria (COG2136)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2874</p>
							</c>
							<c ca="left">
								<p>Protein involved in rRNA processing and ribosomal assembly</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1094)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein containing KH domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3013</p>
							</c>
							<c ca="left">
								<p>Exosome subunit Rrp4</p>
							</c>
							<c ca="left">
								<p>Exosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea, on bacteria (COG1097)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3031</p>
							</c>
							<c ca="left">
								<p>Protein involved in large ribosome subunit assembly and 28S rRNA processing (Rrf2)</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Contains the BRIX domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3322</p>
							</c>
							<c ca="left">
								<p>RNAse P/MRP subunit, involved in processing of pre-tRNAs and the 5.8S rRNA</p>
							</c>
							<c ca="left">
								<p>RNAse P/MRP holoenzyme</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3448</p>
							</c>
							<c ca="left">
								<p>Predicted snRNP core protein</p>
							</c>
							<c ca="left">
								<p>Spliceosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1958)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3482</p>
							</c>
							<c ca="left">
								<p>Small nuclear ribonucleoprotein (snRNP) SMF subunit</p>
							</c>
							<c ca="left">
								<p>Spliceosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1958)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2463</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein, consisting of a PIN domain and a Zn-ribbon. Involved in 26S proteasome assembly</p>
							</c>
							<c ca="left">
								<p>26S proteasome, pre-40S subunit</p>
							</c>
							<c ca="center">
								<p>A,O</p>
							</c>
							<c ca="left">
								<p>Represented by orthologs in all archaea but no bacteria (COG1349)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>PIN domain has been detected in exosome subunits and is thought to have RNA-binding properties or even nuclease activity <abbrgrp><abbr bid="B113">113</abbr><abbr bid="B114">114</abbr></abbrgrp>. The demonstration of the role of this protein (Nob1p) in proteasome assembly <abbrgrp><abbr bid="B115">115</abbr></abbrgrp>, 40S ribosome subunit assembly, and the processing of 18S rRNA 3'-end <abbrgrp><abbr bid="B116">116</abbr></abbrgrp> supports the connection between degradation of RNA and proteins that seems to have been established already in archaea <abbrgrp><abbr bid="B109">109</abbr></abbrgrp>.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3273</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein containing KH domain, interacts with Nob1p</p>
							</c>
							<c ca="left">
								<p>26S proteasome, pre-40S subunit</p>
							</c>
							<c ca="center">
								<p>A,O</p>
							</c>
							<c ca="left">
								<p>Orthologs in all archaea but no bacteria (COG1094)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>This is the second predicted RNA-binding protein involved in proteasome assembly, <abbrgrp><abbr bid="B115">115</abbr></abbrgrp> which emphasizes the aforementioned link between RNA and protein processing</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1831</p>
							</c>
							<c ca="left">
								<p>Deadenylating 3'-5' exonuclease, negative regulator of PolII transcription</p>
							</c>
							<c ca="left">
								<p>CCR4-NOT core complex</p>
							</c>
							<c ca="center">
								<p>AK</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1159</p>
							</c>
							<c ca="left">
								<p>NADP-dependent flavoprotein reductase, probably sulfite reductase subunit</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>CL</p>
							</c>
							<c ca="left">
								<p>Many bacteria (COG0369)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Genetic evidence of a role in DNA replication <abbrgrp><abbr bid="B117">117</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1800</p>
							</c>
							<c ca="left">
								<p>Ferredoxin/adrenodoxin reductase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>C</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0493)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1173</p>
							</c>
							<c ca="left">
								<p>Anaphase-promoting complex (APC), Cdc16 subunit (TPR-repeat protein)</p>
							</c>
							<c ca="left">
								<p>APC</p>
							</c>
							<c ca="center">
								<p>D</p>
							</c>
							<c ca="left">
								<p>Most of archaea and bacteria have TPR-repeat proteins (COG0457) but no orthologs of Cdc16</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3437</p>
							</c>
							<c ca="left">
								<p>Anaphase-promoting complex (APC), subunit 10</p>
							</c>
							<c ca="left">
								<p>APC</p>
							</c>
							<c ca="center">
								<p>D</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1358</p>
							</c>
							<c ca="left">
								<p>Serine palmitoyltransferase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>I</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0156)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1511</p>
							</c>
							<c ca="left">
								<p>Mevalonate kinase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>I</p>
							</c>
							<c ca="left">
								<p>Most archaea and some bacteria (COG1577)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3059</p>
							</c>
							<c ca="left">
								<p>N-acetylglucosaminyltransferase complex, subunit PIG-C/GPI2, involved in phosphatidylinositol biosynthesis</p>
							</c>
							<c ca="left">
								<p>N-acetylglucos-aminyltransferase complex</p>
							</c>
							<c ca="center">
								<p>I</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0467</p>
							</c>
							<c ca="left">
								<p>Translation elongation factor 2 paralog (GTPase)</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>J</p>
							</c>
							<c ca="left">
								<p>All (COG0480)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Involved in 60S ribosomal subunit maturation <abbrgrp><abbr bid="B118">118</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1147</p>
							</c>
							<c ca="left">
								<p>Glutamyl-tRNA synthetase</p>
							</c>
							<c ca="left">
								<p>Multispecificity aminoacyl-tRNA synthetase complex</p>
							</c>
							<c ca="center">
								<p>J</p>
							</c>
							<c ca="left">
								<p>All (COG0008)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2784</p>
							</c>
							<c ca="left">
								<p>Phenylalanyl-tRNA synthetase, beta subunit</p>
							</c>
							<c ca="left">
								<p>Heterodimeric phenylalanyl-tRNA synthetase</p>
							</c>
							<c ca="center">
								<p>J</p>
							</c>
							<c ca="left">
								<p>All (COG0016)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3123</p>
							</c>
							<c ca="left">
								<p>Diphtamide synthase (methyltransferase)</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>J</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1798)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0261</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III, largest subunit</p>
							</c>
							<c ca="left">
								<p>RNAPIII holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All (COG0086)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0262</p>
							</c>
							<c ca="left">
								<p>RNA polymerase I, largest subunit</p>
							</c>
							<c ca="left">
								<p>RNAPI holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All (COG0086)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0215</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III, second largest subunit</p>
							</c>
							<c ca="left">
								<p>RNAPIII holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All (COG0085)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0216</p>
							</c>
							<c ca="left">
								<p>RNA polymerase I, second largest subunit</p>
							</c>
							<c ca="left">
								<p>RNAPI holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All (COG0085)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1063</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II elongator complex, subunit ELP2, WD repeat protein</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II elongator complex</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1131</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, 5'-3' helicase subunit RAD3</p>
							</c>
							<c ca="left">
								<p>RNAPII holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG1199)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1920</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II Elongator subunit</p>
							</c>
							<c ca="left">
								<p>RNAP II elongator complex</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1932</p>
							</c>
							<c ca="left">
								<p>TBP-associated factor (Taf2p)</p>
							</c>
							<c ca="left">
								<p>TFIID complex</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2009</p>
							</c>
							<c ca="left">
								<p>Transcription initiation factor TFIIIB, Bdp1 subunit (Myb domain)</p>
							</c>
							<c ca="left">
								<p>TFIIIB</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2076</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III transcription factor TFIIIC, TPR-repeat-containing protein</p>
							</c>
							<c ca="left">
								<p>TFIIIC</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>Most of archaea and bacteria have TPR-repeat proteins (COG0457) but no orthologs of TFIIC</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2487</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, subunit TFB4</p>
							</c>
							<c ca="left">
								<p>TFIIH</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2691</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II subunit 9</p>
							</c>
							<c ca="left">
								<p>RNAP II holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>Most archaea, no bacteria (COG1594)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2807</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, SSL1 subunit</p>
							</c>
							<c ca="left">
								<p>TFIIH</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>No orthologs although von Willebrand A domains are present in a variety of prokaryotic proteins</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>Consists of a von Willebrand A domain most closely related to those in the proteasome subunit RPN10 <abbrgrp><abbr bid="B119">119</abbr></abbrgrp> and a Zn-finger domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2907</p>
							</c>
							<c ca="left">
								<p>RNA polymerase I transcription factor TFIIS, subunit A12.2/RPA12</p>
							</c>
							<c ca="left">
								<p>TFIIS</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1594)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3169</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcriptional regulation mediator</p>
							</c>
							<c ca="left">
								<p>Mediator complex <abbrgrp><abbr bid="B120">120</abbr></abbrgrp></p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3233</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III subunit C34</p>
							</c>
							<c ca="left">
								<p>RNAP III holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3297</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III subunit C25</p>
							</c>
							<c ca="left">
								<p>RNAP III holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1095)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3438</p>
							</c>
							<c ca="left">
								<p>Subunit common to RNA polymerases I (A) and III (C); Rpc19p</p>
							</c>
							<c ca="left">
								<p>RNAP I and III holoenzymes</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3471</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, subunit TFB2</p>
							</c>
							<c ca="left">
								<p>TFIIH</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3490</p>
							</c>
							<c ca="left">
								<p>Transcription elongation factor SPT4, Zn-ribbon protein</p>
							</c>
							<c ca="left">
								<p>Chromatin-associated transcription complexes</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3497</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II subunit; Rpb10p</p>
							</c>
							<c ca="left">
								<p>RNAP II holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1644)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3901</p>
							</c>
							<c ca="left">
								<p>Transcription initiation factor IID subunit (Taf13p)</p>
							</c>
							<c ca="left">
								<p>TFIID</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3949</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II elongator complex, subunit ELP4</p>
							</c>
							<c ca="left">
								<p>RNAP II elongator complex</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4086</p>
							</c>
							<c ca="left">
								<p>SOH1 protein potentially involved in Pol II transcription regulation and repair</p>
							</c>
							<c ca="left">
								<p>SMCC complex <abbrgrp><abbr bid="B121">121</abbr></abbrgrp></p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1532</p>
							</c>
							<c ca="left">
								<p>Predicted GTPase of the XAB1 family <abbrgrp><abbr bid="B122">122</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>TBP-free TAF(II) complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea and several bacteria (COG1100)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>XP-A-binding protein in humans, thus implicated in repair (<abbrgrp><abbr bid="B122">122</abbr></abbrgrp> and references therein).</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1533</p>
							</c>
							<c ca="left">
								<p>Predicted GTPase of the XAB1 family (paralog of KOG1757) <abbrgrp><abbr bid="B122">122</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>TBP-free TAF(II) complex?</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea and several bacteria (COG1100)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Might have a function in repair given the paralogous relationship with KOG1757.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1625</p>
							</c>
							<c ca="left">
								<p>DNA polymerase &#945; processivity subunit, inactivated phosphatase</p>
							</c>
							<c ca="left">
								<p>DNA polymerase &#945; holoenzyme</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Small subunit of archaeal DNA polymerase II (COG1311)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>The small, regulatory subunit of DNA polymerase &#945; also forms a pan-eukaryotic KOG3044, which is a paralog of KOG0861 (the only recent duplication in KOG3044 is seen in vertebrates). In contrast, another paralog, the small subunit of DNA polymerase &#949;, is represented in animals, fungi and the early-branching protozoan <it>Plasmodium</it>, but not in plants or Microsporidia. Thus, the history of this polymerase subunit apparently involved inactivation of the phosphatase (or nuclease) inherited from archaea, with subsequent duplications at early stages of eukaryotic evolution <abbrgrp><abbr bid="B123">123</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0479</p>
							</c>
							<c ca="left">
								<p>DNA replication licensing factor MCM3</p>
							</c>
							<c ca="left">
								<p>Pre-replication complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1241)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0481</p>
							</c>
							<c ca="left">
								<p>DNA replication licensing factor MCM5</p>
							</c>
							<c ca="left">
								<p>Pre-replication complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1241)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0482</p>
							</c>
							<c ca="left">
								<p>DNA replication licensing factor MCM7</p>
							</c>
							<c ca="left">
								<p>Pre-replication complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1241)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0964</p>
							</c>
							<c ca="left">
								<p>Structural maintenance of chromosome protein 3 (cohesin subunit SMC3)</p>
							</c>
							<c ca="left">
								<p>Sister chromatid cohesion complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Many archaea and bacteria (COG1196)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0979</p>
							</c>
							<c ca="left">
								<p>Structural maintenance of chromosome protein 5 (cohesin subunit SMC5)</p>
							</c>
							<c ca="left">
								<p>Sister chromatid cohesion complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Many archaea and bacteria (COG1196)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1942</p>
							</c>
							<c ca="left">
								<p>TBP-interacting protein TIP49 (DNA helicase)</p>
							</c>
							<c ca="left">
								<p>chromatin remodeling complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Most of the archaea, no bacteria (COG1224)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1979</p>
							</c>
							<c ca="left">
								<p>DNA mismatch repair ATPase, MLH1</p>
							</c>
							<c ca="left">
								<p>Mismatch repair complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0323)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2267</p>
							</c>
							<c ca="left">
								<p>DNA primase, large subunit</p>
							</c>
							<c ca="left">
								<p>DNA polymerase &#945;:primase complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG2219)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2299</p>
							</c>
							<c ca="left">
								<p>Ribonuclease HI</p>
							</c>
							<c ca="left">
								<p>Replisome</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, most bacteria (COG0164)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2310</p>
							</c>
							<c ca="left">
								<p>DNA repair exonuclease MRE11</p>
							</c>
							<c ca="left">
								<p>MRN complex involved in double-strand break repair</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, most bacteria (COG0420)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2929</p>
							</c>
							<c ca="left">
								<p>Origin recognition complex, subunit 2 (ORC2)</p>
							</c>
							<c ca="left">
								<p>ORC</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0179</p>
							</c>
							<c ca="left">
								<p>20S proteasome, regulatory subunit beta type PSMB1/PRE7 (paralog of KOG0185)</p>
							</c>
							<c ca="left">
								<p>20S proteasome</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>All archaea but only actinomycetes among bacteria (COG0638)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0185</p>
							</c>
							<c ca="left">
								<p>20S proteasome, regulatory subunit beta type PSMB4/PRE4 (paralog of KOG0179)</p>
							</c>
							<c ca="left">
								<p>20S proteasome</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>All archaea but only actinomycetes among bacteria (COG0638)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2708</p>
							</c>
							<c ca="left">
								<p>Predicted metalloprotease with chaperone activity (RNAse H/HSP70 fold) <abbrgrp><abbr bid="B124">124</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>Putative complex involved in translation regulation <abbrgrp><abbr bid="B125">125</abbr></abbrgrp></p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Represented by orthologs in all archaea and bacteria (COG0533)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>One of the few remaining uncharacterized proteins that are universally conserved in all cellular life forms. The only experimentally demonstrated activity is that of sialoglycoprotease but fusion with a distinct protein kinase in several archaea and analysis of gene neighborhood suggest a fundamental role in signal transduction, possibly translation regulation. <abbrgrp><abbr bid="B125">125</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0301</p>
							</c>
							<c ca="left">
								<p>Protein required for normal rates of ubiquitin-dependent proteolysis, contains WD40 repeats</p>
							</c>
							<c ca="left">
								<p>Proteasome?</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Same as above (COG2319)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0358</p>
							</c>
							<c ca="left">
								<p>Chaperonin complex component, TCP-1 delta subunit (CCT4)</p>
							</c>
							<c ca="left">
								<p>TCP-1</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>All archaea and nearly all bacteria (COG0459)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0363</p>
							</c>
							<c ca="left">
								<p>Chaperonin complex component, TCP-1 beta subunit (CCT2)</p>
							</c>
							<c ca="left">
								<p>TCP-1</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>All archaea and nearly all bacteria (COG0459)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0687</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex, subunit RPN7/PSMD6</p>
							</c>
							<c ca="left">
								<p>26S proteasome</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1299</p>
							</c>
							<c ca="left">
								<p>Vacuolar sorting protein VPS45/Stt10 (Sec1 family)</p>
							</c>
							<c ca="left">
								<p>t-SNARE complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Involved in t-SNARE complex assembly <abbrgrp><abbr bid="B126">126</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1349</p>
							</c>
							<c ca="left">
								<p>GPI-anchor transamidase complex, GPI8 subunit</p>
							</c>
							<c ca="left">
								<p>GPI-anchor transamidase complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Distantly related proteases in some bacteria (no COG)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1943</p>
							</c>
							<c ca="left">
								<p>Beta-tubulin folding cofactor D, involved in chromosome segregation</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2015</p>
							</c>
							<c ca="left">
								<p>NEDD8-activating complex, UBA3 subunit</p>
							</c>
							<c ca="left">
								<p>NEDD8-activating complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0476)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2126</p>
							</c>
							<c ca="left">
								<p>Phosphoethanolamine <it>N</it>-methyltransferase involved in GPI-anchor biosynthesis</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Several bacteria and archaea (COG1524)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2884</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex, subunit RPN10/PSMD4</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>No orthologs although von Willebrand A domains are present in a variety of prokaryotic proteins</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Contains von Willebrand A domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2908</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex, subunit RPN9/PSMD13</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>Contains PINT domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0209</p>
							</c>
							<c ca="left">
								<p>Endoplasmic reticulum membrane P-type ATPase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>P</p>
							</c>
							<c ca="left">
								<p>Many bacteria and some archaea (COG0474)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3379</p>
							</c>
							<c ca="left">
								<p>Uncharacterized member of the histidine triad superfamily of nucleotide hydorlases</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG0537)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Only biochemical function predicted.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2635</p>
							</c>
							<c ca="left">
								<p>Coatomer (COPI) complex delta subunit</p>
							</c>
							<c ca="left">
								<p>COPI complex</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2927</p>
							</c>
							<c ca="left">
								<p>Membrane component of ER protein translocation apparatus (Sec62)</p>
							</c>
							<c ca="left">
								<p>Sec complex</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2978</p>
							</c>
							<c ca="left">
								<p>Dolichol-phosphate mannosyltransferase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>All archaea, most bacteria (COG0463)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3198</p>
							</c>
							<c ca="left">
								<p>Signal recognition particle, subunit Srp19</p>
							</c>
							<c ca="left">
								<p>Signal recognition particle</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1400)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3315</p>
							</c>
							<c ca="left">
								<p>Subunit of the targeting complex (TRAPP) involved in ER to Golgi trafficking</p>
							</c>
							<c ca="left">
								<p>TRAPP</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3369</p>
							</c>
							<c ca="left">
								<p>Subunit of the targeting complex (TRAPP) involved in ER to Golgi trafficking</p>
							</c>
							<c ca="left">
								<p>TRAPP</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1992</p>
							</c>
							<c ca="left">
								<p>Nuclear export receptor CSE1/CAS (importin beta)</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>YU</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="2" ca="left">
								<p>
									<b>New functional predictions</b>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2316</p>
							</c>
							<c ca="left">
								<p>PP-loop family ATP pyrophosphatase domain, which in fungi, plants and insects is fused to a duplicated translation inhibitor domain. The fusion, along with the phyletic pattern of the PP-ATPase domain, suggests an essential function in translation regulation</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Orthologs of the PP-loop domain are present in all archaea (COG2102) but not in bacteria. Orthologs of the translation inhibitor domain are found in most bacteria and several archaea (COG0251)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>PP-loop ATPases have been previously implicated in base thiolation in various RNAs <abbrgrp><abbr bid="B127">127</abbr></abbrgrp> and proteins in this K/COG might have a similar function, which is likely to be conserved in eukaryotes and archaea. However, the fusion with translation inhibitor, which has been reported to have endoribonuclease activity <abbrgrp><abbr bid="B128">128</abbr></abbrgrp> is a eukaryote-specific feature</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2523</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein containing a PUA domain, probable role in RNA modification <abbrgrp><abbr bid="B129">129</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>Putative novel RNA modification complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Orthologs present in all archaea (COG2016) but not in bacteria</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Several of the archaeal orthologs of this protein form fusions with a PP-loop ATPase domain implicated in base thiolation <abbrgrp><abbr bid="B127">127</abbr></abbrgrp>. Thus, the proteins of this KOG might interact with those of KOG2840 (pan-eukaryotic, duplications in <it>Arabidopsis </it>and worm) or KOG2594 (missing in humans and microsporidia) to form a novel enzymatic complex involved in RNA modification</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0270, 0271, 1539</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)</p>
							</c>
							<c ca="center">
								<p>all 0</p>
							</c>
							<c ca="center">
								<p>X,1,X</p>
							</c>
							<c ca="left">
								<p>By analogy with other conserved WD40-repeat proteins, predicted to be subunits of rRNA processing/ribosome assembly complexes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2321</p>
							</c>
							<c ca="left">
								<p>Nucleolar protein, contains WD40 repeats</p>
							</c>
							<c ca="left">
								<p>rRNA processosome?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Probable subunit of an rRNA-processing complex</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1763</p>
							</c>
							<c ca="left">
								<p>Uncharacterized conserved protein containing a CCCH Zn-finger; possible role in RNA processing or splicing</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>CCCH fingers have been shown to bind 3' untranslated regions in various mRNAs <abbrgrp><abbr bid="B130">130</abbr><abbr bid="B131">131</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2837</p>
							</c>
							<c ca="left">
								<p>Protein containing a U1-type, RNA-binding C2H2 Zn-finger. Probable role in RNA splicing/processing</p>
							</c>
							<c ca="left">
								<p>Spliceosome?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>U1-type fingers are essential for the assembly of U1 RNP <abbrgrp><abbr bid="B132">132</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3073</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein containing PIN domain and involved in 18S rRNA processing</p>
							</c>
							<c ca="left">
								<p>Pre-40S subunit</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea, no in bacteria (COG1412)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Interacts with Nop14p and is required for 40S subunit biogenesis and 18S rRNA maturation (11694595). The presence of the PIN domain suggests RNA-binding and, possibly, RNAse activity</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3154</p>
							</c>
							<c ca="left">
								<p>Uncharacterized protein with potential function in translation or ribosomal biogenesis</p>
							</c>
							<c ca="left">
								<p>Pre-40S subunit?</p>
							</c>
							<c ca="center">
								<p>A?</p>
							</c>
							<c ca="left">
								<p>Most archaea, no bacteria (COG2042)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>The general functional prediction stems from the observation that the gene for this protein forms a predicted conserved operon with the gene for ribosomal protein L40E in several archaeal genomes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3214</p>
							</c>
							<c ca="left">
								<p>Small protein containing a Zn-ribbon, possibly RNA-binding; potential role in RNA processing or transcription regulation</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A?</p>
							</c>
							<c ca="left">
								<p>Conserved in Crenarchaeota (COG4888)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3800</p>
							</c>
							<c ca="left">
								<p>Predicted E3 ubiquitin ligase containing RING finger, subunit of transcription/repair factor TFIIH and CDK-activating kinase assembly factor</p>
							</c>
							<c ca="left">
								<p>TFIIH</p>
							</c>
							<c ca="center">
								<p>KO</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3176</p>
							</c>
							<c ca="left">
								<p>Predicted &#945;-helical protein, possibly involved in replication/repair; paralog of KOG3636</p>
							</c>
							<c ca="left">
								<p>A novel complex with PCNA involved in replication?</p>
							</c>
							<c ca="center">
								<p>L?</p>
							</c>
							<c ca="left">
								<p>Conserved in most (possibly all) archaea but not in bacteria (COG1711)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>A function in DNA replication/repair and/or transcription is suggested by the analysis of the genome context of archaeal orthologs which form an evolutionarily conserved association with the genes for replication sliding clamp (PCNA ortholog) (K.S.M. and E.V.K., unpublished work)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3303</p>
							</c>
							<c ca="left">
								<p>Predicted &#945;-helical protein, possibly involved in replication/repair transcription; paralog of KOG3508</p>
							</c>
							<c ca="left">
								<p>A novel complex with PCNA involved in replication?</p>
							</c>
							<c ca="center">
								<p>L?</p>
							</c>
							<c ca="left">
								<p>Conserved in most (possibly all) archaea but not in bacteria (COG1711)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>A function in DNA replication/repair and/or transcription is suggested by the analysis of the genome context of archaeal orthologs which form an evolutionarily conserved association with the genes for replication sliding clamp (PCNA ortholog) (K.S.M. and E.V.K., unpublished.work)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0396</p>
							</c>
							<c ca="left">
								<p>Predicted E3 ubiquitin ligase</p>
							</c>
							<c ca="left">
								<p>Ub ligase</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>The proteins in this KOG contain a modified RING domain, which might not be capable of metal-binding similarly to the U-box domain <abbrgrp><abbr bid="B133">133</abbr></abbrgrp> that has been shown to function as E3 <abbrgrp><abbr bid="B134">134</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1443</p>
							</c>
							<c ca="left">
								<p>Multitransmembrane protein, predicted drug/metabolite transporter</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG0697)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2647</p>
							</c>
							<c ca="left">
								<p>Multitransmembrane protein, potential transporter</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0628)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2488</p>
							</c>
							<c ca="left">
								<p>Predicted N-acetyltransferase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG0454)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Putative role in ribosomal maturation?</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3347</p>
							</c>
							<c ca="left">
								<p>Predicted nucleotide kinase; nuclear protein (Fap7p)</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Conserved in all archaea but not in bacteria (COG1936)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Involved in oxidative stress reponse in yeast <abbrgrp><abbr bid="B135">135</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3974</p>
							</c>
							<c ca="left">
								<p>Predicted sugar kinase</p>
							</c>
							<c ca="left">
								<p>Putative novel complex with KOG2585 proteins</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>All archaea and most bacteria (COG0063)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Based on fusions seen in prokaryotes, predicted to interact functionally and, possibly, physically with uncharacterized proteins of KOG2585 (represented in all eukaryotes but includes paralogs in some species)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="2" ca="left">
								<p>
									<b>No functional prediction</b>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2318</p>
							</c>
							<c ca="left">
								<p>Uncharacterized conserved protein</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>S</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3237</p>
							</c>
							<c ca="left">
								<p>Uncharacterized conserved protein containing coiled-coil domain</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>S</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Coiled-coil domains are often involved in complex assembly; this could be an uncharacterized component of the chromatin or the spliceosome</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*Abbreviations for the functional categories are as in Figure <figr fid="F3">3</figr>. <sup>&#8224;</sup>0, essential gene (lethal knockout); 1, non-essential gene (non-lethal knockout); X indicates that no data is available for the given gene. <sup>&#8225;</sup>Data from <abbrgrp><abbr bid="B85">85</abbr></abbrgrp>. <sup>&#167;</sup>Data from <abbrgrp><abbr bid="B86">86</abbr></abbrgrp>.</p>
					</tblfn>
				</tbl>
				<p>Common phyletic patterns of genes that otherwise were not suspected to be functionally linked might suggest the existence of such connections and prompt additional analysis leading to concrete functional predictions <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B59">59</abbr><abbr bid="B60">60</abbr><abbr bid="B61">61</abbr></abbrgrp>. The pair of KOG5324 and KOG4246 is a case in point that has not been described previously. The initial observation that these KOGs share the same unusual pattern of presence-absence in eukaryotes, and have similar phyletic patterns in prokaryotes, with a ubiquitous presence in archaea, prompted a more detailed examination of the multiple alignments of the respective proteins and the conservation of the (predicted) operon organization in archaea and bacteria (Table <tblr tid="T2">2</tblr> and data not shown). The combination of clues from these analyses suggests that the two proteins interact in a still uncharacterized pathway of RNA processing, which also includes RNA 3'-phosphate cyclase (KOG3980)) <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> and cytosine-C5-methylase (NOL1/NOP2 in eukaryotes; KOG1122). The proteins in KOG3833 and KOG4528 are likely to represent novel enzyme families, possibly a kinase-phosphatase pair (E.V.K. and L. Aravind, unpublished data). Notably, these predicted new enzymes are present in animals and <it>E. cuniculi </it>but not in <it>Arabidopsis </it>or yeasts. In contrast, KOG3980 is present in all analyzed eukaryotic genomes except for <it>Arabidopsis</it>, whereas KOG1122 is pan-eukaryotic. These differences in the phyletic patterns of the components of the predicted pathway are concordant with the patterns in eukaryotes in that.</p>
				<p>Figure <figr fid="F2">2</figr> shows the distribution of known and predicted functions of eukaryotic proteins among 20 functional categories for the entire set of KOGs and, separately, for KOGs represented in six or seven species and the animal-specific KOGs. Compared to the functional breakdown of prokaryotic COGs <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, the prevalence of signal transduction is notable among eukaryotes. This feature is particularly prominent in animal-specific KOGs, whereas the highly conserved set is comparatively enriched in proteins that are involved in translation, transcription, chaperone-like functions, cell cycle control and chromatin dynamics (Figure <figr fid="F2">2</figr>). The large number of KOGs for which only general functional prediction was feasible, and those whose functions remain unknown, even among the subset that is represented in six or seven eukaryotic species, emphasizes that our current understanding of eukaryotic biology is seriously lacking with even in respect of the functions of highly conserved genes.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Distribution of the KOGs by the number of paralogs in each of the analyzed eukaryotic genomes</p>
					</caption>
					<text>
						<p>Distribution of the KOGs by the number of paralogs in each of the analyzed eukaryotic genomes. The species abbreviations are as in Figure <figr fid="F1">1</figr>.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-2"/>
				</fig>
				<p>The distribution of KOGs by the number of paralogs in each genome is shown in Figure <figr fid="F3">3</figr>. The preponderance of lineage-specific duplication of conserved genes, that is, intra-KOG LSEs, in multicellular eukaryotes is obvious. Cases when a single gene in yeast or, particularly, <it>Encephalitozoon</it>, has two or more co-orthologs in animals and/or plants are most common in KOGs, whereas the reverse situation is rare. These observations support the notion of the major contribution of LSE to the evolution of eukaryotic complexity <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. However, 131 KOGs are represented by a single ortholog in all genomes compared (Table <tblr tid="T2">2</tblr>) and a substantial number of KOGs have one member from a majority of the genomes (data not shown). Recent theoretical modeling of the evolution of paralogous families has suggested that, in general, ancient protein families tend to have multiple paralogs <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B63">63</abbr></abbrgrp>. Therefore, whenever a KOG has a single member in all or most species, this should be attributed to selection against duplication of this particular gene. A prominent cause of such selection could be the involvement of the respective gene products in essential multisubunit complexes, such that imbalance between subunits leads to deleterious effects <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Functional breakdown of the KOGs</p>
					</caption>
					<text>
						<p>Functional breakdown of the KOGs. Designations of functional categories: A, RNA processing and modification; B, chromatin structure and dynamics; C, energy production and conversion; D, cell-cycle control and mitosis; E, amino acid metabolism and transport; F, nucleotide metabolism and transport; G, carbohydrate metabolism and transport; H, coenzyme metabolism; I, lipid metabolism; J, translation; K, transcription; L, replication and repair; M, membrane and cell wall structure and biogenesis; O, post-translational modification, protein turnover, chaperone functions; P, inorganic ion transport and metabolism; Q, secondary metabolites biosynthesis, transport and catabolism; T, signal transduction; U, intracellular trafficking and secretion; Y, nuclear structure; Z, cytoskeleton; R, general functional prediction only (typically, prediction of biochemical activity), S, function unknown. This breakdown is only for KOGs that included at least three species.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-3"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Known and new functions of single-member, pan-eukaryotic KOGs</p>
				</st>
				<p>We examined in greater detail the 131 KOGs that are represented by a single gene in each of the seven genomes (Table <tblr tid="T2">2</tblr>). As can be envisaged from their presence in diverse eukaryotic taxa, including the 'minimal' genome of <it>Encephalitozoon</it>, and as shown by comparison with the knockout phenotype data (Table <tblr tid="T2">2</tblr> and see below), these pan-eukaryotic KOGs are of particular biological importance. For the great majority of these KOGs (113 of the 131), the function has been experimentally determined or confidently predicted to a varying degree of detail using computational methods (Table <tblr tid="T2">2</tblr>). However, around 20 KOGs from this set remained uncharacterized at the time of this analysis and, for all but two of these, substantial functional inferences could be drawn through a combination of sequence-profile analysis, structure prediction and genomic-context analysis of prokaryotic homologs (Table <tblr tid="T2">2</tblr>). Some of these predicted new functions are variations on well-known themes, such as two predicted PP-loop ATPases, which are probably involved in novel, essential RNA modifications (KOGs 2522 and 2316) or two predicted E3 components of ubiquitin ligases (KOGs 0396 and 3800). Other predicted functions appear to be completely new, such as proteins in KOG3176 and 3303 which are likely to be essential components of eukaryotic replication and/or repair systems. Each of these uncharacterized but ubiquitous and largely essential eukaryotic genes is an attractive target for experimental studies.</p>
				<p>Examination of the experimentally characterized and predicted functions of pan-eukaryotic, single-member KOGs leads to interesting conclusions. Nearly all the functionally characterized KOGs in this set consist of proteins that are subunits of known multiprotein complexes (Table <tblr tid="T2">2</tblr>). The most prominent of these are the complexes involved in rRNA processing and ribosome assembly, such as the recently discovered rRNA processosome and the pre-40S subunit, as well as the spliceosome, and various complexes involved in transcription (Table <tblr tid="T2">2</tblr>). Accordingly, this set of KOGs is markedly enriched for proteins involved in various forms of RNA processing, assembly of ribonucleoprotein (RNP) particles and transcription. In addition, KOGs in the single-member pan-eukaryotic set include subunits of molecular complexes that are not directly related to RNA processing, such as the proteasome, the TCP-1 chaperonin complex <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> and the TRAPP complex involved in protein trafficking <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. Altogether, more than 80% of the yeast proteins in the pan-eukaryotic, single-member KOGs belong to known macromolecular complexes included in the MIPS database <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>, as compared to around 64% for all yeast proteins in the KOGs, which is a moderate but statistically highly significant excess (data not shown). This preponderance of multiprotein complex formation among the single-member pan-eukaryotic KOGs is fully compatible with the balance hypothesis <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>.</p>
				<p>The most unexpected observation regarding the single-member, pan-eukaryotic KOGs, is probably that in 14 of these proteins, the only detectable domain was the WD40 repeat (Table <tblr tid="T2">2</tblr>). This is particularly notable because WD40-repeat proteins, which are extremely abundant in eukaryotes and are present in several prokaryotic lineages as well <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>, are not generally known to form well-defined, one-to-one orthologous relationships. The WD40 proteins in the pan-eukaryotic KOGs listed in Table <tblr tid="T2">2</tblr> are exceptions, which is probably due to their unique and essential roles in the assembly of RNA-processing complexes. It has recently been demonstrated that, in <it>S. cerevisiae</it>, seven of these proteins are subunits of the 18S rRNA processosome, or at least are involved in ribosomal assembly <abbrgrp><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr></abbrgrp>. Taking these results together with the unusual phyletic pattern, it seems possible to predict with considerable confidence that those WD40 proteins in the 131-KOG set that remain uncharacterized belong to the same or similar RNA-processing complexes (Table <tblr tid="T2">2</tblr>).</p>
				<p>With some notable exceptions, such as the WD40 proteins, the KOGs in the single-member, pan-eukaryotic set show remarkable patterns of evolutionary conservation: they are either (nearly) ubiquitous in the three kingdoms of life, for example, RNA polymerase subunits, or are universally conserved in eukaryotes and archaea but missing in bacteria, such as most of the proteins implicated in RNA processing (Table <tblr tid="T2">2</tblr>). Thus, it appears that elaborate molecular machines central to the functioning of the eukaryotic cell have evolved, largely from ancestral archaeo-eukaryotic components, at the onset of eukaryotic evolution, and both loss and duplication of the respective genes have been strongly selected against throughout the rest of eukaryotic evolution.</p>
			</sec>
			<sec>
				<st>
					<p>Variation of evolutionary rates among KOGs</p>
				</st>
				<p>Genome-wide analysis of protein evolutionary rates shows a broad range of variation <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>. Here, we investigate the variation of evolutionary rates among the ubiquitous KOGs represented in all seven analyzed genomes and the connection between the evolutionary rate and protein function in the KOG set. The characteristic evolutionary rate of each KOG, which included a member(s) from <it>Arabidopsis</it>, was determined by measuring the mean evolutionary distance from <it>Arabidopsis </it>(the outgroup in the phylogenetic tree; see below) to the other species. Even among the KOGs that include all seven species and, accordingly, appear to represent the conserved core of eukaryotic genes, the evolutionary rates differ by a factor of 20 between the fastest- and the slowest-evolving KOGs. Excluding 5% of the KOGs from each tail of the distribution still leaves almost a fourfold difference in evolutionary rates (Figure <figr fid="F4">4a</figr>).</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Variation of amino-acid substitution rates among KOGs</p>
					</caption>
					<text>
						<p>Variation of amino-acid substitution rates among KOGs. <b>(a) </b>Probability-density function for the distribution of evolutionary rates among the set of KOGs including all seven analyzed eukaryotic species. <b>(b) </b>Distribution functions for the evolutionary rates in different functional categories of KOGs. The designations of functional categories are as in Figure <figr fid="F3">3</figr>.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-4"/>
				</fig>
				<p>We then compared the distributions of evolutionary rates for different functional categories of KOGs (Tables <tblr tid="T3">3</tblr>,<tblr tid="T4">4</tblr> and Figure <figr fid="F4">4b</figr>). Although all the distributions substantially overlapped, there was a statistically highly significant difference between the evolutionary rates for proteins with different functions (Tables <tblr tid="T3">3</tblr>,<tblr tid="T4">4</tblr> and Figure <figr fid="F4">4b</figr>). The slowest-evolving proteins are those involved in translation and RNA processing, the fastest-evolving ones are involved in cellular trafficking and transport, whereas components of replication and transcription systems have intermediate evolutionary rates (Tables <tblr tid="T3">3</tblr>,<tblr tid="T4">4</tblr> and Figure <figr fid="F4">4b</figr>).</p>
				<tbl id="T3" hint_layout="single">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>Evolutionary rates in KOGs with different functions: evolutionary rates for different functional categories of KOGs*</p>
					</caption>
					<tblbdy cols="4">
						<r>
							<c ca="left">
								<p>Functional category</p>
							</c>
							<c ca="center">
								<p>Number of KOGs</p>
							</c>
							<c ca="center">
								<p>Mean rate, substitutions per site</p>
							</c>
							<c ca="center">
								<p>Standard deviation</p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>J</p>
							</c>
							<c ca="center">
								<p>227</p>
							</c>
							<c ca="center">
								<p>0.98</p>
							</c>
							<c ca="center">
								<p>0.37</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>H</p>
							</c>
							<c ca="center">
								<p>62</p>
							</c>
							<c ca="center">
								<p>0.98</p>
							</c>
							<c ca="center">
								<p>0.30</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>A</p>
							</c>
							<c ca="center">
								<p>167</p>
							</c>
							<c ca="center">
								<p>1.01</p>
							</c>
							<c ca="center">
								<p>0.36</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>C</p>
							</c>
							<c ca="center">
								<p>140</p>
							</c>
							<c ca="center">
								<p>1.01</p>
							</c>
							<c ca="center">
								<p>0.43</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>O</p>
							</c>
							<c ca="center">
								<p>307</p>
							</c>
							<c ca="center">
								<p>1.01</p>
							</c>
							<c ca="center">
								<p>0.40</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>F</p>
							</c>
							<c ca="center">
								<p>50</p>
							</c>
							<c ca="center">
								<p>1.05</p>
							</c>
							<c ca="center">
								<p>0.34</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>E</p>
							</c>
							<c ca="center">
								<p>130</p>
							</c>
							<c ca="center">
								<p>1.07</p>
							</c>
							<c ca="center">
								<p>0.38</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>L</p>
							</c>
							<c ca="center">
								<p>139</p>
							</c>
							<c ca="center">
								<p>1.11</p>
							</c>
							<c ca="center">
								<p>0.38</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>B</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="center">
								<p>1.13</p>
							</c>
							<c ca="center">
								<p>0.33</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Z</p>
							</c>
							<c ca="center">
								<p>64</p>
							</c>
							<c ca="center">
								<p>1.13</p>
							</c>
							<c ca="center">
								<p>0.46</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>K</p>
							</c>
							<c ca="center">
								<p>209</p>
							</c>
							<c ca="center">
								<p>1.15</p>
							</c>
							<c ca="center">
								<p>0.42</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>G</p>
							</c>
							<c ca="center">
								<p>115</p>
							</c>
							<c ca="center">
								<p>1.16</p>
							</c>
							<c ca="center">
								<p>0.43</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>I</p>
							</c>
							<c ca="center">
								<p>110</p>
							</c>
							<c ca="center">
								<p>1.16</p>
							</c>
							<c ca="center">
								<p>0.32</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>T</p>
							</c>
							<c ca="center">
								<p>200</p>
							</c>
							<c ca="center">
								<p>1.18</p>
							</c>
							<c ca="center">
								<p>0.39</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>D</p>
							</c>
							<c ca="center">
								<p>111</p>
							</c>
							<c ca="center">
								<p>1.19</p>
							</c>
							<c ca="center">
								<p>0.40</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>R</p>
							</c>
							<c ca="center">
								<p>415</p>
							</c>
							<c ca="center">
								<p>1.23</p>
							</c>
							<c ca="center">
								<p>0.42</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>M</p>
							</c>
							<c ca="center">
								<p>33</p>
							</c>
							<c ca="center">
								<p>1.26</p>
							</c>
							<c ca="center">
								<p>0.47</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>U</p>
							</c>
							<c ca="center">
								<p>196</p>
							</c>
							<c ca="center">
								<p>1.27</p>
							</c>
							<c ca="center">
								<p>0.42</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>30</p>
							</c>
							<c ca="center">
								<p>1.27</p>
							</c>
							<c ca="center">
								<p>0.37</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>P</p>
							</c>
							<c ca="center">
								<p>69</p>
							</c>
			