<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-9-r185</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Method</dochead>
      <bibl>
         <title>
            <p>Cross-species cluster co-conservation: a new method for generating protein interaction networks</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Karimpour-Fard</snm>
               <fnm>Anis</fnm>
               <insr iid="I1"/>
               <email>anis.karimpour-fard@uchsc.edu</email>
            </au>
            <au id="A2" ce="yes">
               <snm>Detweiler</snm>
               <mi>S</mi>
               <fnm>Corrella</fnm>
               <insr iid="I2"/>
               <email>corrella.detweiler@colorado.edu</email>
            </au>
            <au id="A3">
               <snm>Erickson</snm>
               <mi>D</mi>
               <fnm>Kimberly</fnm>
               <insr iid="I2"/>
               <email>kimberly.erickson@colorado.edu</email>
            </au>
            <au id="A4">
               <snm>Hunter</snm>
               <fnm>Lawrence</fnm>
               <insr iid="I1"/>
               <email>larry.hunter@uchsc.edu</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Gill</snm>
               <mi>T</mi>
               <fnm>Ryan</fnm>
               <insr iid="I3"/>
               <email>rtg@colorado.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado 80045, USA</p>
            </ins>
            <ins id="I2">
               <p>MCD-Biology, University of Colorado, Boulder, CO 80309, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Chemical and Biological Engineering, University of Colorado, Boulder, CO 80309, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>9</issue>
         <fpage>R185</fpage>
         <url>http://genomebiology.com/2007/8/9/R185</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17803817</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-9-r185</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>5</day>
               <month>7</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>30</day>
               <month>8</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>5</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>05</day>
               <month>09</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Karimpour-Fard et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Cross-species cluster co-conservation</p>
      </shorttitle>
      <shortabs>
         <p>Cluster Co-Conservation (CCC) has been extended to a method for developing protein interaction networks based on co-conservation between protein pairs across multiple species, Cross-Species Cluster Co-Conservation (CS-CCC).</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>Co-conservation (phylogenetic profiles) is a well-established method for predicting functional relationships between proteins. Several publicly available databases use this method and additional clustering strategies to develop networks of protein interactions (cluster co-conservation (CCC)). CCC has previously been limited to interactions within a single target species. We have extended CCC to develop protein interaction networks based on co-conservation between protein pairs across multiple species, cross-species cluster co-conservation.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010014">Microbiology and parasitology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The exponential increase in sequence information has widened the gap between the number of predicted and experimentally characterized proteins. At present, about 400 microbial genomes are fully sequenced. The prediction of protein function from sequence is a critical issue in genome annotation efforts. Currently, the best established method for function prediction is based on sequence similarity to proteins of known function. Unfortunately, homoogy-based prediction is of limited use due to the large number of homologous protein families with no known function for any member. An alternative method for predicting protein function is the phylogenetic profiles approach, also known as the co-conservation (CC) method first introduced by Pellegrini <it>et al</it>. <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Co-conservation predicts interactions between pairs of proteins by determining whether both proteins are consistently present or absent across diverse genomes <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. CC methods have been shown to be more powerful than sequence similarity alone at predicting protein function.</p>
         <p>Even though all CC methods rely on the premise that functionally related proteins are gained or lost together over the course of evolution, several different strategies for performing CC studies have been reported. For example, Date <it>et al</it>. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> used real BLASTP best hit E-values normalized across 11 bins instead of binary classification for conservation, while Zheng and coworkers <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> constructed phylogenetic profiles using presence/absence of neighboring gene pairs. Alternatively, Pagel <it>et al</it>. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> constructed phylogenetic profiles between domains, instead of genes, and then created domain interaction maps. Barker <it>et al</it>. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> applied maximum likelihood statistical modeling for predicting functional gene linkages based on phylogenetic profiling. Their method detected independent instances of protein pair correlated gain or loss on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>.</p>
         <p>Currently, several web-based databases that compile predictions of protein-protein interactions are available, for example, PLEX <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, String <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, Prolinks <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, and Predictome <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. These databases use various methods, including CC, to organize groups of proteins within individual species into clusters (cluster co-conservation (CCC)) that represent predicted protein interaction networks. Here, we have investigated the degree to which these within-species clusters are conserved across different species, using an automated method for comparing phylogenetic profiling based CCC across multiple species (CS-CCC; Figure <figr fid="F1">1</figr>). CS-CCC is essentially a meta-analysis of CCC that automates the identification of interactions that are uniquely present or absent across different species, which cannot be easily accomplished using existing methods. We have shown that this method increased groupings among proteins that function in distinct but coordinate processes and decreased groupings among proteins with unknown functions. This suggests that CS-CCC, in comparison to CCC, allows one to extend the network to better understand pathways involving proteins with multiple functions. Our intention for CS-CCC was that the identity of proteins present or absent in co-conserved clusters when evaluated across multiple species would facilitate the assignment of protein function, enable the development of novel and testable biological hypotheses, and provide experimentalists with the scientific justification required to test these hypotheses. We show these features through a number of different examples involving complex biological phenomena (that is, flagellum, chemotaxis, and biofilm proteins).</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>CS-CCC builds on information generated via previously described CCC methods by comparing conserved network interactions across multiple species</p>
            </caption>
            <text>
               <p>CS-CCC builds on information generated via previously described CCC methods by comparing conserved network interactions across multiple species. CCC methods start by mapping <b>(a) </b>co-conserved proteins pairs to <b>(b) </b>large protein interaction networks. <b>(c) </b>CS-CCC extends this approach by comparing proteins and associated links within such interaction networks to identify the combined set of network interactions as well those interactions that are unique to individual species or common across multiple species. Clusters from three organisms are shown, but the method could examine any genome versus any number of genomes (the unique differences between an organism of choice and each organism are shown in different colors while conserved proteins across species are shown in gray). Common network interactions are shown in blue while unique interactions are shown in either green or red. Org (organism); org0 (organism of choice); P (protein).</p>
            </text>
            <graphic file="gb-2007-8-9-r185-1"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Cross-species clustered co-conservation</p>
            </st>
            <p>CS-CCC is based on the use of CC methods simultaneously across several species. As such, the reliability of the CS-CCC method is directly linked to the reliability of existing CC methods, which has been extensively documented <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. Specifically, since CC methods produce protein-protein interactions involving proteins with previously uncharacterized functions, CC methods perform better than sequence similarity methods alone at predicting protein function. Here, we performed the same comparison to assess the performance of CS-CCC (up to six species) when compared to CCC alone (one species) (Figure <figr fid="F2">2a</figr>). The reliability of predicted protein interaction pairs was evaluated by using a combination of Clusters of Orthologous Groups (COG) functional categories, and The Institute for Genomic Research (TIGR) role categories (Additional data file 1). As the number of species included in our CS-CCC analysis increased, the number of predicted interactions involving proteins with unclassified functions decreased (yellow bars). Interestingly, at the lowest confidence level, the number of predicted interactions involving proteins from different functional categories increased with the number of included species. At the highest confidence level, grouping between proteins from the same functional category increased. For example, 56% of <it>Escherichia coli </it>K12 protein pairs (confidence level of 0.6) consisted of proteins within the same COG functional group, 19% of protein pairs were in different functional categories, and 25% had at least one unclassified member due to limited experimental data. As the number of species is expanded, these percentages range from 54-62%, 30-45%, and 0-10%, respectively. At the highest confidence level (0.8), the inclusion of 6 species resulted in almost 80% of the predicted interactions involving proteins from the same functional category. These results suggest that expanding the number of species included in the analysis, as provided for by CS-CCC, not only predicts interactions that are not predicted at different confidence levels used in CCC analysis, but also that the nature of such predicted interactions is fundamentally different. One explanation for such observations is that CS-CCC has improved capabilities for extending the protein interaction network to include the various functions required in complex biological processes (that is, regulatory relationships, nutrient transport/catabolism links, and so on). As an example of this possibility, in the CS-CCC analysis using all 6 bacterial species at confidence level 0.8 (the green bar on the far right on Figure <figr fid="F2">2a</figr>), there were 6 co-conserved protein pairs involving 9 total proteins that were not in the same COG functional category. When the larger network that these pairs fall into was extracted (Figure <figr fid="F2">2b</figr>), it became apparent that each of the proteins in question function within the context of two larger, coherent networks involving related processes. For example, <it>rpoA </it>and <it>rpsD </it>encode proteins of differing functions, yet their interaction is well conserved across multiple species within a 12-gene network of related functions. The remaining seven proteins of varying functions were also well conserved across multiple species in a larger network. These data suggest that the addition of multiple species to the analysis adds confidence to predicted interactions among proteins from different functional categories (that is, a meta-analysis). This point is exemplified via the color-coded, species specific arcs in Figure <figr fid="F2">2b</figr>, where it is clear that addition of multiple species both adds new interactions (that is, unique sub-networks) and reinforces the interactions predicted for comparison species.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Assessment of CS-CCC Performance</p>
               </caption>
               <text>
                  <p>Assessment of CS-CCC Performance. <b>(a) </b>Comparison of COG functional categories of predicted pairs at three different confidence levels. The first method (1) used only <it>E. coli </it>K12. Each subsequent method added an additional (underlined) bacterial strain. 1, <it>E. coli </it>K12; 2, <it>E. coli </it>K12 and <ul><it>E. coli </it>O157</ul>; 3, <it>E. coli </it>K12, <it>E. coli </it>O157 and <it><ul>S. flexneri</ul></it>; 4, <it>E. coli </it>K12, <it>E. coli </it>O157, <it>S. flexneri</it>, and <ul><it>S. typhimurium </it>LT2</ul>; 5, <it>E. coli </it>K12, <it>E. coli </it>O157, <it>S. flexneri</it>, <it>S. typhimurium </it>LT2, and <it><ul>P. aeruginosa</ul></it>; 6, <it>E. coli </it>K12, <it>E. coli </it>O157, <it>S. flexneri</it>, <it>S. typhimurium </it>LT2, <it>P. aeruginosa</it>, and <it><ul>B. subtilis</ul></it>. The percentage of predicted interactions involving proteins from the same functional category (blue), different functional categories (green), or involving at least one protein that is unclassified (yellow) are depicted. <b>(b) </b>The CS-CCC network generated from the complete set of proteins included in the green bar of (a) for a confidence of 0.8, 6 species. A total of nine proteins (yellow nodes) and six-paired interactions were included in this group. The protein pairs and the classifications of each protein are as follows: (FtsI [M] and NusG [K]; MurE [M] and RecG [L]; MurG [M] and RecG [L]; MurC [M] and RecG [L]; MurA [M] and NusG [K]; RpoA [K] and RpsD [J]). M, cell envelope biogenesis, outer membrane; K, transcription; L, DNA replication, recombination and repair; J, translation, ribosomal structure and biogenesis. The edges are color coded for each species evaluated: <it>E. coli </it>K12, green; <it>E. coli </it>O157, blue; <it>Shigella flexneri</it>, black; <it>S. typhimurium </it>LT2, purple; <it>P. aeruginosa</it>, mustard; and <it>Bacillus subtilis</it>, red.</p>
               </text>
               <graphic file="gb-2007-8-9-r185-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>CS-CCC identifies interactions that could not be identified by CCC</p>
            </st>
            <p>Our analysis of CCC across six bacterial species indicated that CS-CCC revealed unique and useful information not provided by CCC alone. As one example, CS-CCC uniquely revealed that amino-acid biosynthesis and flagellar networks are connected via FliY (Figure <figr fid="F3">3c</figr>), a component of the flagella motor-switch complex that is predicted to transport amino acids <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Both <it>E. coli </it>and <it>Pseudomonas aeruginosa </it>ArgT networks revealed connections with the FliY protein (Figure <figr fid="F3">3a,b</figr>), but such networks did not include the extensive set of additional flagellar protein interactions predicted in the <it>Bacillus subtilis </it>network. Such information can be used to not only develop more precise hypotheses about protein function but also to provide the justification required to test such hypotheses. A second example of information uniquely revealed by CS-CCC suggests how the process of chemotaxis has evolved across species. A CS-CCC comparison of chemotaxis in <it>E. coli </it>K12 and <it>Salmonella </it>revealed that <it>Salmonella </it>lacks Tap, which transports maltose, but has Tcp, which transports citrate. In contrast, <it>E. coli </it>has Tap but lacks Tcp. CCC analysis alone does not capture this difference in chemotaxis responsiveness. As a final example, extending this CS-CCC analysis of chemotaxis proteins to include <it>P. aeruginosa </it>indicated new links among type IV pili and biofilm formation proteins <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, suggesting that the process of chemotaxis has evolved different functional relationships in different species. These three examples provide a simple demonstration of the ability of CS-CCC to predict unique and biologically informative interactions when compared to CCC alone. The next several sections elaborate upon the specific types of interactions that CS-CCC is uniquely suited at identifying.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>CS-CCC identifies protein interactions that could not be identified by CCC</p>
               </caption>
               <text>
                  <p>CS-CCC identifies protein interactions that could not be identified by CCC. <b>(a) </b><it>E. coli </it>K12 cluster built around Arg<it>T</it>; <b>(b) </b><it>P. aeruginosa </it>PA01 cluster built around ArgT; <b>(c) </b>an example of information revealed by CS-CCC but not by CCC. <it>E. coli </it>K12 proteins (green) that are co-conserved with <it>E. coli </it>ArgT (diamond) cluster were extracted. Then <it>P. aeruginosa </it>(mustard edge) and <it>B. subtilis </it>(red edge) proteins that are co-conserved with proteins in the <it>E. coli </it>ArgT cluster were extracted. Note that it is the <it>B. subtilis </it>network that shows a connection between amino acid biosynthesis proteins and flagellar proteins, via FliY (square). If only the <it>E. coli </it>cluster had been examined, as occurs using the CCC method, then this connection would have been missed.</p>
               </text>
               <graphic file="gb-2007-8-9-r185-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>CS-CCC reveals how proteins that function in distinct but coordinated processes may have evolved</p>
            </st>
            <sec>
               <st>
                  <p>Chemotaxis</p>
               </st>
               <p>Chemotaxis proteins are co-conserved across the examined bacteria (Figure <figr fid="F4">4</figr>). Three classes of proteins are essential for chemotaxis: transmembrane receptors, cytoplasmic signaling components, and enzymes for adaptive methylation. The transmembrane receptors are two-component signal transduction complexes called methyl-accepting chemotaxis proteins (MCPs). <it>E. coli </it>MCPs are Tsr, Tar, Trg, Tap, and Aer, and each recognizes specific sugars, amino acids or dipeptides (Figure <figr fid="F4">4a,c</figr>). Even though different bacteria have different MCPs, they are highly co-conserved among Gram-negative and positive bacteria. For example, <it>Salmonella </it>lacks Tap, which recognizes maltose, but has Tcp, a citrate sensor <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, which is co-conserved with the other <it>Salmonella </it>MCPs (Figure <figr fid="F4">4b,c</figr>). The cytoplasmic signaling components transmit signal between the MCP receptors and the flagellar apparatus. These proteins are CheA, CheW, CheY and CheZ, and they are not co-conserved among the bacteria. CheZ is not co-conserved because it has no homology across many bacteria <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. CheY is likely not co-conserved because it functions with CheZ. CheA and CheW are sometimes co-conserved and sometimes not, which may suggest that they function independently in different bacteria. The enzymes for adaptive methylation, CheB and CheR, modulate signaling of the cytoplasmic proteins, and both of these proteins are highly co-conserved among all six bacteria. Thus, chemotaxis analysis illustrates two important points. First, the CS-CCC method reveals species differences in protein interaction, including co-conserved pairs that are unique to a given species or that are common across select species (Figure <figr fid="F4">4c</figr>). For instance, the sequences of CheA and CheW are conserved but the proteins are not co-conserved, suggesting that their interactions and functions may differ among bacterial species. Second, the CS-CCC method yields information that functional assays do not. For instance, different MCPs recognize different ligands and yet are co-conserved because they function in the same pathway.</p>
               <fig id="F4">
                  <title>
                     <p>Figure 4</p>
                  </title>
                  <caption>
                     <p>Co-conservation of chemotaxis and flagellar proteins</p>
                  </caption>
                  <text>
                     <p>Co-conservation of chemotaxis and flagellar proteins. <b>(a) </b><it>E. coli </it>K12; <b>(b) </b><it>S. typhimurium </it>LT2; <b>(c) </b>across multiple species. Proteins are color coded base on function: chemotaxis, pink; biofilm, light blue; flagellar, light red; type III secretion, blue; and sigma factor and regulation, yellow. The gray proteins are <it>Bacillus </it>sigma factor and regulation that are co-conserved but were not identified by single species CC analysis. Edge color code: <it>E. coli </it>K12, green; <it>E. coli </it>O157, blue; <it>Shigella flexneri</it>, black; <it>S. typhimurium </it>LT2, purple; <it>P. aeruginosa</it>, mustard; and <it>Bacillus subtilis</it>, red.</p>
                  </text>
                  <graphic file="gb-2007-8-9-r185-4"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Biofilm formation</p>
               </st>
               <p>Figure <figr fid="F4">4</figr> shows a cluster containing proteins that function in distinct but inter-dependent processes. For instance, in <it>P. aerginosa</it>, flagella, chemotaxis machinery, and type IV pili are important for bacterial biofilm formation <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp> and are co-conserved. Type IV pili mediate twitching motility, which is important for subsequent spreading of the bacteria over the surface and the formation of microcolonies within a developing biofilm <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Twitching motility proteins PilJ and PilK are co-conserved within this cluster and are highly interconnected with flagella and chemotaxis proteins. Flagellar motility appears to be required for approaching surfaces, and 17 flagellar proteins are co-conserved (Figure <figr fid="F4">4c</figr>). Chemotaxis is required for the bacteria to swim towards nutrients associated with a surface. <it>P. aerginosa </it>has two chemotaxis signaling systems, and proteins representing both are in the biofilm cluster (CheR1, CheR2, CheA, CheW, PA0173, PA0178; PctA, PctB, PctC). These data suggest that chemotaxis, flagella, and pili proteins may be co-conserved because they all contribute to biofilm formation. Moreover, the inclusion of <it>P. aerginosa </it>in the CS-CCC analysis brought pili proteins into the biofilm cluster, suggesting that in some bacteria, all of these processes co-evolved. Thus, CS-CCC can identify co-conserved networks of proteins that function in biochemically distinct pathways but that contribute to complex biological phenomenon.</p>
            </sec>
            <sec>
               <st>
                  <p>RpoN connects RpoN-regulated proteins with flagella and with type III secretion system proteins</p>
               </st>
               <p>In some of the bacteria studied, RpoN (also known as &#963;<sup>54 </sup>or SigL) clustered with RpoN-regulated proteins and flagella proteins are clustered with type III secretion system proteins (Figure <figr fid="F4">4c</figr>). Flagellar proteins are cluster co-conserved with specific components of type III secretion systems (T3SS), which are important for virulence in <it>Salmonella enterica </it>serotype Typhimurium LT2, <it>E. coli </it>O157, <it>Shigella flexneri </it>and <it>P. aerginosa </it><abbrgrp><abbr bid="B16">16</abbr></abbrgrp> (Table <tblr tid="T1">1</tblr>). The T3SS of <it>Shigella </it>is not chromosomally encoded and so was not included in our analysis. The three subunits of the T3SS and flagella that are co-conserved are integral inner membrane proteins of the flagellar or T3SS export apparatus that forms the channel through which proteins are secreted <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. <it>S. typhimurium </it>LT2 and <it>E. coli </it>O157 both encode two T3SSes, and the corresponding proteins from each are within this cluster. In <it>E. coli </it>K12, <it>S</it>. <it>typhimurium </it>LT2, and <it>B. subtilis</it>, RpoN connects the RpoN-regulated and the flagellar/T3SS clusters. This is consistent with experimental data that flagellar genes (<it>flhA </it>and <it>flhB</it>) are activated by RpoN <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Thus, RpoN likely connects two distinct clusters because it regulates proteins in both clusters. This demonstrates that because CS-CCC examines multiple genomes simultaneously, it has the power to show that proteins unique to particular organisms may function with proteins common to multiple organisms, enabling the placement of unstudied proteins within a broader biological context.</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>Homology between co-conserved flagellar and T3SS genes</p>
                  </caption>
                  <tblbdy cols="2">
                     <r>
                        <c ca="left">
                           <p>Flagellar</p>
                        </c>
                        <c ca="left">
                           <p>T3SS</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c cspan="2" ca="left">
                           <p><it>S. typhimurium </it>LT2</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>
                              <it>flhA</it>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <it>invA; ssaV</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>
                              <it>flhB</it>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <it>spaS*; ssaU</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>
                              <it>fliP</it>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <it>spaP; ssaR</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="2" ca="left">
                           <p><it>E. coli </it>0157</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>
                              <it>flhA</it>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <it>Z4195, escV</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>
                              <it>flhB</it>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <it>Z4185, escU</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>
                              <it>fliP</it>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <it>Z4189, escR</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="2" ca="left">
                           <p><it>P. aerginosa </it>(PAO1)</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>
                              <it>fliP</it>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <it>pscR</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>
                              <it>flhA</it>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <it>pscD</it>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>
                              <it>flhB</it>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <it>pscU</it>
                           </p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>*<it>spaS </it>in not co-conserved with high cofidence (0.41); the confidence level for the remaining proteins is &#8805;0.6.</p>
                  </tblfn>
               </tbl>
            </sec>
         </sec>
         <sec>
            <st>
               <p>CS-CCC can be used to assign function to unstudied proteins</p>
            </st>
            <sec>
               <st>
                  <p>Genes that function in biofilm formation</p>
               </st>
               <p>Figure <figr fid="F5">5a</figr> shows two large clusters of proteins built around YegE or YfiN in <it>E. coli </it>K12 and <it>P. aeruginosa</it>. These clusters are co-conserved with variable numbers of proteins among all of our Gram-negative bacteria. Even though most of these proteins have unknown function, many have GGDEF (Gly-Gly-Asp-Glu-Phe) or EAL (Glu-Ala-Leu) domains, which have been implicated in expression of biofilm phenotypes <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Interestingly, each protein of known function within this cluster in PAO1 (WspR, MorA, and FimX) has also been implicated in biofilm phenotypes. WspR is a response regulator that activates pili adhesion genes required for biofilm formation <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. MorA is a membrane-localized negative regulator of the timing of flagellar formation and plays a role in the establishment of biofilms <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. FimX is required for a type of twitching motility critical to biofilm formation <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. FimX is a signal sensing protein with phosphotransfer activity and a GGDEF domain. GGDEF encodes a dinucleotide cyclase that generates cyclic di-GMP and is present in all proteins known to be involved in the regulation of cellulose synthesis. Cyclic di-GMP is a novel bacterial second messenger that directs the transition from sessility to motility <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Cyclic di-GMP is degraded by proteins with EAL domains, which are cyclic dinuclotide phosphodiesterases <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Proteins containing the GGDEF and EAL domain can regulate biofilm formation and/or cell aggregation by controlling the levels of cyclic di-GMP <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Interestingly, most of the proteins in these large clusters have GGDEF or EAL domains. Of the 44 known <it>P. aeruginosa </it>proteins with GGDEF or EAL domains <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, 34 are in this cluster; 19 have GGDEF and 15 have EAL domains. <it>E. coli </it>K12 has a similar cluster of GGDEF and EAL domains (Figure <figr fid="F5">5a</figr>). The 25 proteins within this cluster are highly interconnected. Of the 38 <it>E. coli </it>K12 known GGDEF or EAL domain containing proteins <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, 24 are co-conserved within this cluster. EvgS is a sensor protein for a two component regulatory system <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> that is also within this cluster. Evgs is involved in quorum sensing and may be important in biofilm establishment or maintenance. Over-expression of <it>evgS </it>causes abnormal biofilm architecture <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and previous studies also noted that quorum sensing is involved in biofilm formation <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Our experimental data show that four of the GGDEF domain containing proteins in the network of Figure <figr fid="F5">5a</figr> that previously had no known function do indeed mediate biofilm formation <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Similar biofilm clusters were identified by the CS-CCC method in all of the Gram-negative bacteria we examined. Thus, by clustering together unstudied proteins, whether or not they have sequence homology, CS-CCC suggests that these proteins may function in a common phenomenon.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Using CS-CCC to assign protein function</p>
                  </caption>
                  <text>
                     <p>Using CS-CCC to assign protein function. <b>(a) </b>Co-conservation of GGDEF and EAL domains across <it>E. coli </it>K12 (green edge) and <it>P. aeruginosa </it>(mustard edge). Proteins are color coded based on function: motility regulators, orange; sensors, red; RNase II modulators, yellow; two-component response regulators, light blue; <ul>diguanylate cyclases,</ul> blue; phosphodiesterases, purple; uncategorized, gray. <b>(b) </b>Co-conservation of triplet YcgB, YeaH, and YeaG across several species. Edge color code: <it>E. coli </it>K12, green; <it>E. coli </it>O157, blue; <it>Shigella flexneri</it>, black; <it>S. typhimurium </it>LT2, purple; <it>P. aeruginosa</it>, mustard.</p>
                  </text>
                  <graphic file="gb-2007-8-9-r185-5"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Small clusters can contain proteins that function in the same processes</p>
               </st>
               <p>Examination of small protein clusters revealed that most pairs or triplets contain proteins that function in the same processes. To further test this observation, we experimentally examined the triplet containing YcgB, YeaH, and YeaG, which cluster together across different bacteria (Figure <figr fid="F5">5b</figr>). Because independent data indicate that <it>yeaH</it>, but not <it>yeaG</it>, contributes to antimicrobial peptide resistance in <it>S. typhimurium </it><abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, we determined whether strains lacking <it>ycgB </it>have a similar phenotype. Strains lacking <it>ycgB </it>were indeed sensitive to antimicrobial peptides (unpublished data). Thus, CS-CCC analyses revealed previously unknown protein interactions that provided sufficient justification to test a specific biological hypothesis suggested by these interactions.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>When proteins are not identified as co-conserved using CS-CCC</p>
            </st>
            <p>In this study, we have shown that CS-CCC of proteins provides important information. Both the presence and the absence of clustered co-conservation for any given protein are informative. There are at least two reasons why proteins that function together are not co-conserved in a species: first, a protein is found only in certain organisms or a protein function is performed by different proteins in different organisms; and second, a result is a false negative.</p>
            <sec>
               <st>
                  <p>A protein is found only in certain organisms: T3SS effectors</p>
               </st>
               <p>Effector proteins are secreted by T3SS machinery and function to alter host cell physiology <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. A bacterial species can have many effectors but they generally do share apparent sequence homology, either within or between bacteria <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. We examined 49 known SPI2 and SPI1 effectors in <it>S</it>. <it>typhimurium </it>LT2 and 40 known effectors in <it>P. aeruginosa </it>and found that none of these proteins are co-conserved. In contrast, some of the known translocon T3SS proteins, which form the secretion apparatus, are highly co-conserved (Figure <figr fid="F4">4c</figr>). Thus, while CS-CCC offers insights into the function of proteins that are co-conserved, our results show that some of the non co-conserved proteins, such as effectors, are organism specific.</p>
            </sec>
            <sec>
               <st>
                  <p>A result is a false negative: flagella and RpoN</p>
               </st>
               <p>Our analysis of false negatives reveals that the CS-CCC method produces some false negatives. For instance, there is no co-conservation between RpoN and flagella in <it>E. coli 0157</it>, <it>S. flexneri </it>and <it>P. aeruginosa </it>(Figure <figr fid="F4">4c</figr>). However, it has been experimentally shown in <it>P. aeruginosa </it>that many flagellar genes, such as <it>flhA </it>and <it>flhB</it>, are regulated by RpoN <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In addition, an RpoN consensus sequence is located in the intergenic region between <it>flhB </it>and <it>flhA </it><abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. These data suggest that the absence of co-clustering of RpoN with flagellar proteins in <it>P. aeruginosa </it>is a false negative result. Thus, when proteins are not co-conserved, it cannot be concluded that they are functionally unrelated. This result further underlines the value of developing and comparing interaction networks from multiple genomes when attempting to infer function.</p>
               <p>There are also some situations in which a result is both a false negative and the protein in question is found only in certain organisms. The bacterial flagellum is a complex molecular system with multiple components required for functional motility. It extends from the cytoplasm to the cell exterior. Not only are flagella organelles of locomotion, but they also play important roles in attachment and biofilm formation. There are common themes in flagellar protein control and assembly, but there also appears to be variation among organisms. Some of the flagellar proteins are not co-conserved in any of the bacteria of our study, such as, three ring proteins (FlgH, FlgI, and FliF), and some of the axle-like proteins FliE, FlgB, FlgF, FlgL, and FliD. FliE has been shown to physically interact with FlgB <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. The stator motor proteins MotA and MotB are also not co-conserved. Thus, CS-CCC analysis of the flagellar cluster yields both false negative results and is also a consequence of species-specific proteins. This also illustrates that determining why proteins are not co-conserved can be difficult, without additional information.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Large volumes of data make computational methods feasible, exciting, and preferable to gene-by-gene homology searches. We have shown that use of CS-CCC expands protein interaction networks to include proteins with distinct functions that are involved in coherent biological processes, offers insight into the function of uncharacterized proteins, reveals unique information about each genome examined, and gives insight into the process of evolution.</p>
         <p>Protein co-conservation can be a result of many factors, including vertical inheritance or functional selection. Thus, we have examined patterns of CCC within and across several bacteria using CS-CCC. Our analysis showed that this computational approach provides us with more information than the traditional homology approaches or CCC. Homology approaches to protein function are based on similarity to other proteins with known functions and are limited by the fact that many proteins have unknown functions. While homology-based methods can be effective for predicting the functions of remote homologs, these methods perform poorly as the evolutionary distance between homologous proteins increases. Even a sophisticated homology-based method fails to successfully assign functions to most of the proteins for a particular organism. CCC, on the other hand, is not strictly based on homology but is limited by its ability to analyze only a single species at a time. In contrast, CS-CCC examines each cluster across multiple species and reveals interactions that both homology-based methods and CCC fail to identify. Use of CS-CCC allows researchers to extend the protein interaction network to better understand pathways involving multiple proteins with multiple functions. Therefore, the CS-CCC method is a significant advance and will be useful for researches in many different fields of biology.</p>
         <p>Prediction by CS-CCC provided us with global views of six complete bacterial genomes. Identification by CS-CCC of proteins that cluster together enabled more accurate predictions of the biological roles that proteins with previously unstudied functions may play. For instance, proteins that function in distinct but coordinated processes can be co-conserved across species even though not all processes occur in all bacteria (Figure <figr fid="F4">4c</figr>). In addition, in large, highly interconnected clusters in which most of the proteins have unknown functions, it is likely that they all function together in a common phenomenon. The GGDEF/EAL cluster is an example of this, as many of the previously unknown proteins in this cluster play roles in biofilm formation (Figure <figr fid="F5">5a</figr>). Even small protein clusters identified by CS-CCC are likely to consist of proteins that function in the same process, as shown by COG/TIGR analysis and experimentally (Figure <figr fid="F5">5b</figr>). These analyses provide evidence that the CS-CCC method is a reliable predictor of functional relationships.</p>
         <p>For any given method, there are advantages and disadvantages. The number of false positives and false negatives is a key measurement of accuracy. In our case, the number of false negatives is not possible to estimate without performing many additional laboratory experiments. However, our evaluation of CS-CCC showed that the number of false positives was low. Since this method was evaluated based on our selected bacteria, there may be some bias toward overestimation of accuracy when applied to other organisms, and this remains to be tested. In addition, we have shown that our results can be sensitive to the number of bacteria included in our analysis. Finally, there may be some aspects of the bacteria we chose that are not representative of other bacteria, further reducing the generality of these results. Thus, while the report here represents a compelling demonstration of the value of performing CCC across multiple species, future efforts should be focused on developing better understanding of which and how many organisms to include in CS-CCC studies.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Bacteria used to create CS-CCC graphs</p>
            </st>
            <p>We chose to focus on the Gamma subgroup of proteobacteria because members of this subgroup are among the best characterized, including whole genome sequences and curated datasets of protein functions and interactions. The genomes of five closely related Gamma Gram-negative and one low G+C bacteria (<it>B. subtilis</it>) were used to evaluate the CCC method. Substantial experimental data exist for all six bacteria. The gammaproteobacteria included <it>E. coli </it>(K12 and O157-O157:H7 EDL933), <it>S. flexneri </it>(2a str. 2457T), <it>S. typhimurium </it>(LT2), and <it>P. aeruginosa </it>(PAO1). <it>E. coli </it>(K12) is the most intensively studied Gram-negative bacteria and is the closest studied relative of <it>P. aeruginosa</it>, and <it>S. typhimurimum </it>LT2. <it>E. coli </it>(O157-O157:H7 EDL933) is a clinical isolate from raw hamburger meat implicated in hemorrhagic colitis outbreak, and <it>S. typhimurium </it>LT2 causes enteritis in humans. <it>P. aeruginosa </it>is an opportunistic pathogen and is the major cause of morbidity and mortality in patients with cystic fibrosis; <it>P. aeruginosa </it>PAO1 was isolated from a wound <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. <it>P. aeruginosa </it>is a versatile Gram-negative bacterium that also thrives in soil, marshes and coastal marine habitats, and on plant tissues <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. <it>E. coli </it>K12 diverged 4.5 million years ago (MYA) from O157, an estimated 100 MYA from <it>Salmonella</it>, 200 MYA from <it>Pseudomonas</it>, and 1,200 MYA from <it>Bacillus</it>. Thus, we examined a combination of pathogenic and non-pathogenic organisms that range from closely to distantly related.</p>
         </sec>
         <sec>
            <st>
               <p>Construction of CS-CCC graphs</p>
            </st>
            <p>We began construction of CS-CCC graphs (Figure <figr fid="F1">1</figr>) using predictions of pairwise protein-protein interactions based on phylogenetic profiles (CC methods; Figure <figr fid="F1">1a</figr>). Currently, several databases that compile predictions are available, including Prolinks <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, String <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, and Predictome <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. We used the Prolinks Database 2.0, which contains a total of 168 microbial genomes, including 10 eukaryotes, 16 Archaea, and 142 Bacteria <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Even though ProLinks provides predicted interactions based on a number of different methods (that is, Rosetta stone, gene neighbors, and so on), we have used only interactions prediction by the phylogenetic profiling method in this study. We chose not to use the STRING database as a source of predictions because it conflates co-conservation with orthology information from the COG database <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>; we used COG functional category and TIGR functional role category data to evaluate purely co-conservation inferences. Predictome <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> was not used because it does not provide statistical measures to evaluate the accuracy of each prediction. For each pair assignment (CC), we required a confidence scheme using phylogenetic profiling of at least 60% according to the Prolinks scoring scheme <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. An E-value of less than 10<sup>-10 </sup>was used as the threshold for BLASTP in Prolinks to define a homolog of a query protein to be present in a secondary genome. For each bacterial genome analyzed, the number of assigned pairs is shown in Table <tblr tid="T2">2</tblr>. For each bacterial species, we mapped accession IDs from Prolinks predicted protein pairs to NCBI <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> and then to EcoCyc <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> for <it>E. coli </it>K12, <it>P. aeruginosa </it><abbrgrp><abbr bid="B34">34</abbr></abbrgrp> for <it>P. aeruginosa </it>and <it>B. subtilis </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp> for <it>B. subtilis</it>. We matched corresponding proteins between species by protein name or synonym. We then constructed CCC graphs using the pairwise links for each species (Figures <figr fid="F1">1b</figr> and <figr fid="F6">6</figr>) using a binary adjacency matrix where 1 indicates the corresponding pair was co-conserved, and 0 otherwise. Networks were represented by graphs in which each node represents a protein and each edge represents an interaction that links two proteins. Network graphs were visualized using Cytoscape <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, an open-source, platform-independent environment software. The lengths of the lines connecting proteins hold no meaning and vary to facilitate viewing of the network. Each network is color-coded based on protein function categories, as described in the corresponding figure legends. The assignment of putative functions was based on EcoCyc, Pseudomonas.com, NCBI and SubtiList, as given in the links above. For separation of connected components of the network and building clusters of proteins, we used breadth-first search (BFS) graph algorithms.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Comparison of genomes examined in this study</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Species name</p>
                     </c>
                     <c ca="center">
                        <p>Genome size</p>
                     </c>
                     <c ca="center">
                        <p>No. of annotated genes</p>
                     </c>
                     <c ca="center">
                        <p>No. (%) of co-conserved genes</p>
                     </c>
                     <c ca="center">
                        <p>No. of co-conserved protein pairs</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>E. coli </it>(K12)</p>
                     </c>
                     <c ca="center">
                        <p>4,639,675</p>
                     </c>
                     <c ca="center">
                        <p>4,242</p>
                     </c>
                     <c ca="center">
                        <p>1,156 (27%)</p>
                     </c>
                     <c ca="center">
                        <p>2,926</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>E. coli </it>(O157-O157:H7 EDL933)</p>
                     </c>
                     <c ca="center">
                        <p>5,528,445</p>
                     </c>
                     <c ca="center">
                        <p>5,324</p>
                     </c>
                     <c ca="center">
                        <p>1,174 (22%)</p>
                     </c>
                     <c ca="center">
                        <p>3,216</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Shigella flexneri </it>2a str. 2457T</p>
                     </c>
                     <c ca="center">
                        <p>4,599,354</p>
                     </c>
                     <c ca="center">
                        <p>4,068</p>
                     </c>
                     <c ca="center">
                        <p>977 (24%)</p>
                     </c>
                     <c ca="center">
                        <p>4,490</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Salmonella typhimurium </it>LT2 + pSLT plasmid</p>
                     </c>
                     <c ca="center">
                        <p>4,857,432 + 93,939</p>
                     </c>
                     <c ca="center">
                        <p>4,425 + 102</p>
                     </c>
                     <c ca="center">
                        <p>1,103 (24%)</p>
                     </c>
                     <c ca="center">
                        <p>2,751</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>P. aeruginosa </it>(PAO1)</p>
                     </c>
                     <c ca="center">
                        <p>6,264,403</p>
                     </c>
                     <c ca="center">
                        <p>5,567</p>
                     </c>
                     <c ca="center">
                        <p>1,428 (26%)</p>
                     </c>
                     <c ca="center">
                        <p>5,794</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Bacillus subtilis</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4,214,630</p>
                     </c>
                     <c ca="center">
                        <p>4,105</p>
                     </c>
                     <c ca="center">
                        <p>869 (21%)</p>
                     </c>
                     <c ca="center">
                        <p>1,972</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Complete protein-protein interaction network for two organisms</p>
               </caption>
               <text>
                  <p>Complete protein-protein interaction network for two organisms. <b>(a) </b>Taxonomy of the organisms examined in this study. <b>(b,c) </b>Examples of complete protein interaction networks for two of the organisms evaluated here. These figures enable the examination of the size distribution of protein-protein interaction networks in different species. Moreover, proteins are color-coded based on function, thus allowing for the examination of relationships between function and cluster size. For example, this figure shows small or medium size clusters usually contains proteins with similar function. CS-CCC compares all of such networks across multiple species to identify conserved and unique sub-networks. The lengths of the lines in the network hold no meaning and vary simply to facilitate viewing. Cell envelope and cellular process, red; intermediary metabolism, green; information pathway or central dogma, yellow; uncategorized, gray; other, blue.</p>
               </text>
               <graphic file="gb-2007-8-9-r185-6"/>
            </fig>
            <p>Finally, for comparison of each cluster across different species (CS-CCC), we used BFS to build a network (source network) for a set of target proteins from the source genome. We then built networks for each additional organism that contained proteins with the same name as at least one of the proteins from the source networks. This process identifies proteins and protein interactions that are consistently identified across multiple species (colored gray in Figure <figr fid="F1">1c</figr>) or that are unique to individual species (colored red in Figure <figr fid="F1">1c</figr>). This same method can be used to further parse such networks to identify combined, common and unique networks present for specific proteins across a collection of organisms (Figure <figr fid="F1">1c</figr>). In this way, CS-CCC builds on information generated by CCC (Figure <figr fid="F1">1b</figr>) to provide more accurate and genome-specific protein function assignment. We used protein name to map links across conserved species (thus, links are not explicitly based on orthology) <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. Like all methods, the use of protein names has both advantages and disadvantages. Here, protein name was chosen in order to validate that CS-CCC provides new and biologically informative data not accessible by CCC alone. For this purpose, we chose to validate this method using named proteins where functional information was available. While this is appropriate for method validation, the disadvantage is that there are problems with annotation due in part to a lack of standardization, which would limit the number of proteins for which this analysis can be reliably performed. In light of this limitation, we considered using reciprocal homology as an alternative to protein name. We found that this introduces unacceptable levels of cross-talk, much of which is likely noise. Addressing this limitation is an important area for continued effort.</p>
         </sec>
         <sec>
            <st>
               <p>Data availability</p>
            </st>
            <p>Data are available upon request.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>BFS, breadth-first search; CC, co-conservation; CCC, cluster co-conservation; COG, Clusters of Orthologous Groups; CS-CCC, cross-species clustered co-conservation; MCP, methyl-accepting chemotaxis protein; MYA, million years ago; TIGR, The Institute for Genomic Research; T3SS, type III secretion systems.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>AK implemented the methods and analyzed the data. CSD interpreted the results. The manuscript was written by AK, CSD and edited by RTG and LH. KDE performed experiments. RTG oversaw all biological aspects of the work and LH supervised the computational aspect.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a figure that shows the reliability of predicted protein interaction pairs using TIGR role categories at three different confidence levels.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>The reliability of predicted protein interaction pairs using TIGR role categories at three different confidence levels</p>
            </caption>
            <text>
               <p>Comparison of TIGR functional categories of predicted pairs at three different confidence levels. The first method (1) used only <it>E. coli </it>K12. Each subsequent method added an additional (underlined) bacterial strain. 1, <it>E. coli </it>K12; 2, <it>E. coli </it>K12 and <ul><it>E. coli </it>O157</ul>; 3, <it>E. coli </it>K12, <it>E. coli </it>O157 and <it><ul>S. flexneri</ul></it>; 4, <it>E. coli </it>K12, <it>E. coli </it>O157, <it>S. flexneri</it>, and <ul><it>S. typhimurium </it>LT2</ul>; 5, <it>E. coli </it>K12, <it>E. coli </it>O157, <it>S. flexneri</it>, <it>S. typhimurium </it>LT2, and <it><ul>P. aeruginosa</ul></it>; 6, <it>E. coli </it>K12, <it>E. coli </it>O157, <it>S. flexneri</it>, <it>S. typhimurium </it>LT2, <it>P. aeruginosa</it>, and <it><ul>B. subtilis</ul></it>. Same functional category (blue); different functional category (green); at least one protein is unclassified (yellow).</p>
            </text>
            <file name="gb-2007-8-9-r185-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Norman Pace for excellent discussions, Daniel Barker and Sonia M Leach for reading the manuscript and helpful comments. We also thank Kevin B Cohen for helpful comments. This study was supported by NSF grant BES0228584, and NIH grants K25_AI064338, R01-AI-072492A, and R01-LM-008111.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.</p>
            </title>
            <aug>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>4285</fpage>
            <lpage>4288</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">16324</pubid>
                  <pubid idtype="pmpid" link="fulltext">10200254</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.8.4285</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Detecting protein function and protein-protein interactions from genome sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>285</volume>
            <fpage>751</fpage>
            <lpage>753</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.285.5428.751</pubid>
                  <pubid idtype="pmpid" link="fulltext">10427000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Inference of protein function and protein linkages in <it>Mycobacterium tuberculosis </it>based on prokaryotic genome organization: a combined computational approach.</p>
            </title>
            <aug>
               <au>
                  <snm>Strong</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mallick</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>R59</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193659</pubid>
                  <pubid idtype="pmpid" link="fulltext">12952538</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-9-r59</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Predicting protein function by genomic context: quantitative evaluation and qualitative inferences.</p>
            </title>
            <aug>
               <au>
                  <snm>Huynen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lathe</snm>
                  <fnm>W</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1204</fpage>
            <lpage>1210</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310926</pubid>
                  <pubid idtype="pmpid" link="fulltext">10958638</pubid>
                  <pubid idtype="doi">10.1101/gr.10.8.1204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Predictome: a database of putative functional links between proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Mellor</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Yanai</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Clodfelter</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Mintseris</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>DeLisi</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>306</fpage>
            <lpage>309</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99135</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752322</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.306</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Prolinks: a database of protein functional linkages derived from coevolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Bowers</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Fierro</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R35</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">416471</pubid>
                  <pubid idtype="pmpid" link="fulltext">15128449</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-5-r35</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Protein function prediction using the Protein Link EXplorer (PLEX).</p>
            </title>
            <aug>
               <au>
                  <snm>Date</snm>
                  <fnm>SV</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>2558</fpage>
            <lpage>2559</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti313</pubid>
                  <pubid idtype="pmpid" link="fulltext">15701682</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>STRING: a database of predicted functional associations between proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Huynen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jaeggi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>258</fpage>
            <lpage>261</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165481</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519996</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg034</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Genomic functional annotation using co-evolution profiles of gene clusters.</p>
            </title>
            <aug>
               <au>
                  <snm>Zheng</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0060</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">133444</pubid>
                  <pubid idtype="pmpid" link="fulltext">12429059</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-11-research0060</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A domain interaction map based on phylogenetic profiling.</p>
            </title>
            <aug>
               <au>
                  <snm>Pagel</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Frishman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>344</volume>
            <fpage>1331</fpage>
            <lpage>1346</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.10.019</pubid>
                  <pubid idtype="pmpid" link="fulltext">15561146</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Predicting functional gene links from phylogenetic-statistical analyses of whole genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Barker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Pagel</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>1</volume>
            <fpage>e3</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1183509</pubid>
                  <pubid idtype="pmpid" link="fulltext">16103904</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0010003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p><it>Escherichia coli </it>fliAZY operon.</p>
            </title>
            <aug>
               <au>
                  <snm>Mytelka</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Chamberlin</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1996</pubdate>
            <volume>178</volume>
            <fpage>24</fpage>
            <lpage>34</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">177617</pubid>
                  <pubid idtype="pmpid" link="fulltext">8550423</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Effect of motility on surface colonization and reproductive success of <it>Pseudomonas fluorescens </it>in dual-dilution continuous culture and batch culture systems.</p>
            </title>
            <aug>
               <au>
                  <snm>Korber</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Caldwell</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Appl Environ Microbiol</source>
            <pubdate>1994</pubdate>
            <volume>60</volume>
            <fpage>1421</fpage>
            <lpage>1429</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">201498</pubid>
                  <pubid idtype="pmpid" link="fulltext">16349247</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Genetic analysis of <it>Escherichia coli </it>biofilm formation: roles of flagella, motility, chemotaxis and type I pili.</p>
            </title>
            <aug>
               <au>
                  <snm>Pratt</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Kolter</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>1998</pubdate>
            <volume>30</volume>
            <fpage>285</fpage>
            <lpage>293</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.1998.01061.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">9791174</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Bacterial locomotion and signal transduction.</p>
            </title>
            <aug>
               <au>
                  <snm>Manson</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Armitage</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Hoch</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Macnab</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1998</pubdate>
            <volume>180</volume>
            <fpage>1009</fpage>
            <lpage>1022</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">106986</pubid>
                  <pubid idtype="pmpid" link="fulltext">9495737</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Type III secretion machines: bacterial devices for protein delivery into host cells.</p>
            </title>
            <aug>
               <au>
                  <snm>Galan</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Collmer</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>284</volume>
            <fpage>1322</fpage>
            <lpage>1328</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.284.5418.1322</pubid>
                  <pubid idtype="pmpid" link="fulltext">10334981</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Molecular mechanisms of bacterial virulence: type III secretion and pathogenicity islands.</p>
            </title>
            <aug>
               <au>
                  <snm>Mecsas</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Strauss</snm>
                  <fnm>EJ</fnm>
               </au>
            </aug>
            <source>Emerg Infect Dis</source>
            <pubdate>1996</pubdate>
            <volume>2</volume>
            <fpage>270</fpage>
            <lpage>288</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8969244</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>FlhA, a component of the flagellum assembly apparatus of <it>Pseudomonas aeruginosa</it>, plays a role in internalization by corneal epithelial cells.</p>
            </title>
            <aug>
               <au>
                  <snm>Fleiszig</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Arora</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Van</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ramphal</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2001</pubdate>
            <volume>69</volume>
            <fpage>4931</fpage>
            <lpage>4937</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">98584</pubid>
                  <pubid idtype="pmpid" link="fulltext">11447170</pubid>
                  <pubid idtype="doi">10.1128/IAI.69.8.4931-4937.2001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>GGDEF and EAL domains inversely regulate cyclic di-GMP levels and transition from sessility to motility.</p>
            </title>
            <aug>
               <au>
                  <snm>Simm</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Morr</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kader</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nimtz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Romling</snm>
                  <fnm>U</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2004</pubdate>
            <volume>53</volume>
            <fpage>1123</fpage>
            <lpage>1134</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.2004.04206.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15306016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Biofilm formation at the air-liquid interface by the <it>Pseudomonas fluorescens </it>SBW25 wrinkly spreader requires an acetylated form of cellulose.</p>
            </title>
            <aug>
               <au>
                  <snm>Spiers</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Bohannon</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gehrig</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Rainey</snm>
                  <fnm>PB</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>50</volume>
            <fpage>15</fpage>
            <lpage>27</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2003.03670.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">14507360</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>MorA defines a new class of regulators affecting flagellar development and biofilm formation in diverse <it>Pseudomonas </it>species.</p>
            </title>
            <aug>
               <au>
                  <snm>Choy</snm>
                  <fnm>WK</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Syn</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>LH</fnm>
               </au>
               <au>
                  <snm>Swarup</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2004</pubdate>
            <volume>186</volume>
            <fpage>7221</fpage>
            <lpage>7228</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">523210</pubid>
                  <pubid idtype="pmpid" link="fulltext">15489433</pubid>
                  <pubid idtype="doi">10.1128/JB.186.21.7221-7228.2004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>FimX, a multidomain protein connecting environmental signals to twitching motility in <it>Pseudomonas aeruginosa</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Whitchurch</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2003</pubdate>
            <volume>185</volume>
            <fpage>7068</fpage>
            <lpage>7076</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">296245</pubid>
                  <pubid idtype="pmpid" link="fulltext">14645265</pubid>
                  <pubid idtype="doi">10.1128/JB.185.24.7068-7076.2003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The EcoCyc Database.</p>
            </title>
            <aug>
               <au>
                  <snm>Karp</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Saier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Collado-Vides</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Paley</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Pellegrini-Toole</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bonavides</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gama-Castro</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>56</fpage>
            <lpage>58</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99147</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752253</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.56</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Signal decay through a reverse phosphorelay in the Arc two-component signal transduction system.</p>
            </title>
            <aug>
               <au>
                  <snm>Georgellis</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kwon</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>De Wulf</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>EC</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1998</pubdate>
            <volume>273</volume>
            <fpage>32864</fpage>
            <lpage>32869</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.273.49.32864</pubid>
                  <pubid idtype="pmpid" link="fulltext">9830034</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Systematic characterization of <it>Escherichia coli </it>genes/ORFs affecting biofilm formation.</p>
            </title>
            <aug>
               <au>
                  <snm>Tenorio</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Saeki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fujita</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kitakawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baba</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Mori</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Isono</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Lett</source>
            <pubdate>2003</pubdate>
            <volume>225</volume>
            <fpage>107</fpage>
            <lpage>114</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1097(03)00507-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12900028</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Differential gene expression shows natural brominated furanones interfere with the autoinducer-2 bacterial signaling system of <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Ren</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bedzyk</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Wood</snm>
                  <fnm>TK</fnm>
               </au>
            </aug>
            <source>Biotechnol Bioeng</source>
            <pubdate>2004</pubdate>
            <volume>88</volume>
            <fpage>630</fpage>
            <lpage>642</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/bit.20259</pubid>
                  <pubid idtype="pmpid" link="fulltext">15470704</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>SCALEs: multiscale analysis of library enrichment.</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Warnecke</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gill</snm>
                  <fnm>RT</fnm>
               </au>
            </aug>
            <source>Nat Methods</source>
            <pubdate>2007</pubdate>
            <volume>4</volume>
            <fpage>87</fpage>
            <lpage>93</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth946</pubid>
                  <pubid idtype="pmpid" link="fulltext">17099705</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The Rcs phosphorelay system is specific to enteric pathogens/commensals and activates ydeI, a gene important for persistent <it>Salmonella </it>infection of mice.</p>
            </title>
            <aug>
               <au>
                  <snm>Erickson</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Detweiler</snm>
                  <fnm>CS</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>62</volume>
            <fpage>883</fpage>
            <lpage>894</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.2006.05420.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">17010160</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Bacterial type III secretion systems are ancient and evolved by multiple horizontal-transfer events.</p>
            </title>
            <aug>
               <au>
                  <snm>Gophna</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Ron</snm>
                  <fnm>EZ</fnm>
               </au>
               <au>
                  <snm>Graur</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2003</pubdate>
            <volume>312</volume>
            <fpage>151</fpage>
            <lpage>163</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(03)00612-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12909351</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Type III protein secretion systems in bacterial pathogens of animals and plants.</p>
            </title>
            <aug>
               <au>
                  <snm>Hueck</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <source>Microbiol Mol Biol Rev</source>
            <pubdate>1998</pubdate>
            <volume>62</volume>
            <fpage>379</fpage>
            <lpage>433</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">98920</pubid>
                  <pubid idtype="pmpid" link="fulltext">9618447</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p><it>In vitro </it>characterization of FlgB, FlgC, FlgF, FlgG, and FliE, flagellar basal body proteins of Salmonella.</p>
            </title>
            <aug>
               <au>
                  <snm>Saijo-Hamano</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Uchida</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Namba</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Oosawa</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>339</volume>
            <fpage>423</fpage>
            <lpage>435</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.03.070</pubid>
                  <pubid idtype="pmpid" link="fulltext">15136044</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Complete genome sequence of <it>Pseudomonas aeruginosa </it>PA01, an opportunistic pathogen.</p>
            </title>
            <aug>
               <au>
                  <snm>Stover</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Pham</snm>
                  <fnm>XQ</fnm>
               </au>
               <au>
                  <snm>Erwin</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Mizoguchi</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Warrener</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hickey</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Brinkman</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Hufnagle</snm>
                  <fnm>WO</fnm>
               </au>
               <au>
                  <snm>Kowalik</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Lagrou</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <fpage>959</fpage>
            <lpage>964</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35023079</pubid>
                  <pubid idtype="pmpid" link="fulltext">10984043</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>NCBI Genbank Protein Annotation</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p><it>Pseudomonas aeruginosa </it>Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation.</p>
            </title>
            <aug>
               <au>
                  <snm>Winsor</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Lo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sui</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Ung</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ching</snm>
                  <fnm>WK</fnm>
               </au>
               <au>
                  <snm>Hancock</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Brinkman</snm>
                  <fnm>FS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <issue>33 Database</issue>
            <fpage>D338</fpage>
            <lpage>343</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608211</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Institut Pasteur</p>
            </title>
            <url>http://genolist.pasteur.fr/SubtiList/</url>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Cytoscape: a software environment for integrated models of biomolecular interaction networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Shannon</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Markiel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ozier</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Baliga</snm>
                  <fnm>NS</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Ramage</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Amin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Schwikowski</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ideker</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2498</fpage>
            <lpage>2504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403769</pubid>
                  <pubid idtype="pmpid" link="fulltext">14597658</pubid>
                  <pubid idtype="doi">10.1101/gr.1239303</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>van Noort</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Huynen</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>4725</fpage>
            <lpage>4731</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">519111</pubid>
                  <pubid idtype="pmpid" link="fulltext">15353560</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh815</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms.</p>
            </title>
            <aug>
               <au>
                  <snm>Penkett</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Wood</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bahler</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <issue>34 Web Server</issue>
            <fpage>W330</fpage>
            <lpage>334</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1538793</pubid>
                  <pubid idtype="pmpid" link="fulltext">16845020</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl311</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Homology a personal view on some of the problems.</p>
            </title>
            <aug>
               <au>
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>227</fpage>
            <lpage>231</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02005-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">10782117</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
