<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2002-3-11-research0064</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>The society of genes: networks of functional links between genes from comparative genomics</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Yanai</snm>
               <fnm>Itai</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>itai.yanai@weizmann.ac.il</email>
            </au>
            <au id="A2">
               <snm>DeLisi</snm>
               <fnm>Charles</fnm>
               <insr iid="I1"/>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Bioinformatics Graduate Program and Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA</p>
            </ins>
            <ins id="I2">
               <p>Current address: Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2002</pubdate>
         <volume>3</volume>
         <issue>11</issue>
         <fpage>research0064.1</fpage>
         <lpage>research0064.12</lpage>
         <url>http://genomebiology.com/2002/3/11/research/0064</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2002-3-11-research0064</pubid>
               <pubid idtype="pmpid">12429063</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>12</day>
               <month>3</month>
               <year>2002</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>2</day>
               <month>8</month>
               <year>2002</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>11</day>
               <month>9</month>
               <year>2002</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>25</day>
               <month>10</month>
               <year>2002</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2002</year>
         <collab>Yanai and DeLisi, licensee BioMed Central Ltd</collab>
      </cpyrt>
      <shorttitle>
         <p>networks of functional links between genes from comparative genomics</p>
      </shorttitle>
      <shortabs>
         <p>Comparative genomics provides at least three methods for identifying functional links between genes: examination of phylogenetic distributions, analysis of conserved proximity and observations of fusions of genes into a multidomain gene in another organism. We show that the functional networks obtained by applying these methods have different topologies and that the information they provide is largely additive. In particular, the combined networks of functional links contain an average of 57% of an organism's complete genetic complement, uncover substantial portions of known pathways, and suggest the function of previously unannotated genes. In addition, the combined networks are qualitatively different from the networks obtained using individual methods.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Comparative genomics provides at least three methods beyond traditional sequence similarity for identifying functional links between genes: the examination of common phylogenetic distributions, the analysis of conserved proximity along the chromosomes of multiple genomes, and observations of fusions of genes into a multidomain gene in another organism. We have previously generated the links according to each of these methods individually for 43 known microbial genomes. Here we combine these results to construct networks of functional associations.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We show that the functional networks obtained by applying these methods have different topologies and that the information they provide is largely additive. In particular, the combined networks of functional links contain an average of 57% of an organism's complete genetic complement, uncover substantial portions of known pathways, and suggest the function of previously unannotated genes. In addition, the combined networks are qualitatively different from the networks obtained using individual methods. They have a dominant cluster that contains approximately 80%-90% of the genes, independent of genome size, and the dominant clusters show the small world behavior expected of a biological system, with global connectivity that is nearly random, and local properties that are highly ordered.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>When the information on functional linkage provided by three emerging computational methods is combined, the integrated network uncovers large numbers of conserved pathways and identifies clusters of functionally related genes. It therefore shows considerable utility and promise as a tool for understanding genomic structure, and for guiding high throughput experimental investigations.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010014">Microbiology and parasitology </classification>
         <classification type="BMC" subtype="man_spc_id" id="30010009">Genetics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Complex systems, ubiquitous in science, owe their complexity to the interrelatedness of their elements [<abbr bid="B1">1</abbr>]. Investigations of the local and global structures of network representations of these systems have advanced our understanding of the systems themselves as well as some of their emergent, system-level properties [<abbr bid="B2">2</abbr>,<abbr bid="B3">3</abbr>]. A network of interactions also relates genes of a cell as each gene product carries out its function in the context of other gene products. Thanks to the availability of complete genomic sequences, the elements of cellular networks - the gene products - have been identified. Methods for identifying functional relationships between genes have also been introduced. We now have the opportunity to get a glimpse of the structure of the networks that lie at the core of life at the cellular level.</p>
         <p>Novel high-throughput experimental methods for identifying protein-protein interactions, such as yeast two-hybrid [<abbr bid="B4">4</abbr>,<abbr bid="B5">5</abbr>] and mass spectrometry [<abbr bid="B6">6</abbr>,<abbr bid="B7">7</abbr>], are now complemented by computational analyses of sequenced genomes to detect functional links between genes. The three comparative genomics methods applied here (Figure <figr fid="F1">1</figr>) utilize correlations in the properties and occurrence of genes across known genomes [<abbr bid="B8">8</abbr>,<abbr bid="B9">9</abbr>,<abbr bid="B10">10</abbr>,<abbr bid="B11">11</abbr>]. The first method, phylogenetic profiling, infers functional linkage between genes whose orthologs have identical phylogenetic patterns of occurrence across genomes [<abbr bid="B12">12</abbr>,<abbr bid="B13">13</abbr>,<abbr bid="B14">14</abbr>]. More generally, this method links genes when the similarity between their phyletic distributions is unlikely to have occurred by chance. Conserved chromosomal proximity of genes in multiple genomes, whether enforced by operons or co-horizontal transfer [<abbr bid="B15">15</abbr>], forms the basis for the second method for detecting functional correlations [<abbr bid="B16">16</abbr>,<abbr bid="B17">17</abbr>,<abbr bid="B18">18</abbr>,<abbr bid="B19">19</abbr>]. Finally, the domain fusion method [<abbr bid="B20">20</abbr>,<abbr bid="B21">21</abbr>,<abbr bid="B22">22</abbr>,<abbr bid="B23">23</abbr>] is based on the observation that distinct non-homologous genes are functionally related if their orthologs are fused in another organism.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Three comparative genomics methods for identifying functional links between genes and the networks they produce</p>
            </caption>
            <text>
               <p>Three comparative genomics methods for identifying functional links between genes and the networks they produce. The schematics on the left show links between gene A (pink) and gene B (blue) based on <b>(a)</b> the same phyletic distribution across the known genomes, here arbitrarily labeled W, X, Y, Z, etc.; <b>(b)</b> their proximity on chromosomes from different genomes; and <b>(c)</b> fusion of A and B into one multidomain gene in another organism. On the right are networks found in <it>H. pylori</it> using each of the three methods. All network figures were made using the Pajek program [<abbr bid="B42">42</abbr>].</p>
            </text>
            <graphic file="gb-2002-3-11-research0064-1"/>
         </fig>
         <p>Previously, we reported on a database of functional links generated by the comparative genomics methods [<abbr bid="B24">24</abbr>]. Here we combine the total sets of links generated by these three methods for each of 43 microbial genomes with the following findings.</p>
         <p>Sets of functional links provided by the three methods are largely additive (that is, the sets largely do not overlap). The reliability of the links is approximately 70%, well above the background noise, which is estimated to be in the 10-15% range. Networks obtained by individual methods coalesce into a combined network covering, on average, 57% of an organisms' genes. The local structures of the combined networks correspond to genes participating in the same pathway. In the <it>Escherichia coli</it> network, 26 clusters are identified such that each is composed of seven or more genes and corresponds to different functional pathways. Functional predictions can be made based upon a gene's location in the combined networks. For example, <it>Methanococcus jannaschii</it> genes MJ1313 and MJ1407 are of completely unknown function, but are unambiguously predicted to be related, and therefore should be studied together. A giant cluster covering between 80 to 90% of the genes of the total network and demonstrating random global properties and highly cliquish local properties characterizes the structure of the combined networks.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Networks from individual comparative genomic methods</p>
            </st>
            <p>The relationships uncovered by each method form networks whose structures are method-dependent (Figure <figr fid="F1">1a</figr>). Phylogenetic profile links, which we will refer to as 'phylo links', are transitive; that is, two genes linked to a common gene are also linked to each other. An example is the fully connected clique formed by flagella motor proteins (also shown in Pelligrini <it>et al.</it> [<abbr bid="B14">14</abbr>]); that is, the proteins are generally either all present or all absent in a given genome and therefore have identical phylogenetic patterns. Phylo networks are characterized by the number and size distribution of their cliques. The occurrence of fully connected cliques follows from the definition that requires perfectly correlated patterns of occurrence to assign a link. A more permissive phylogenetic profiling method would result in reduced transitivity.</p>
            <p>Gene networks uncovered by conserved chromosomal proximity links, which we will refer to as 'chromo links', typically exhibit a chain structure. For example, one of the long chains in the network of <it>Helicobacter pylori</it> genes (Figure <figr fid="F1">1b</figr>) corresponds to the chromosomal region containing highly conserved ribosomal genes. This network formation results from the fact that the conservation of gene order tends to involve more than two genes. As a particular conserved group tends to be arranged in an order conserved among organisms, linear concatenation dominates the network.</p>
            <p>The domain fusion links, which we will refer to as 'fusion links', have complicated network relations, including the appearance of a major cluster. Even for small networks, as in Figure <figr fid="F1">1c</figr>, we find a few nodes that have a large number of links, and a large number that have few links [<abbr bid="B25">25</abbr>,<abbr bid="B26">26</abbr>,<abbr bid="B27">27</abbr>]. We discuss this power-law behavior below.</p>
            <p>In summary, the three methods for constructing networks all produce distinctive, non-random structures: branched but highly cliquish for phylo networks; generally branched structures for fusion networks; and linearly concatenated structures for chromo networks. By combining these three representations on the same grid of genes, we generate a largely non-overlapping map of functional linkages that provides more information than any of the three maps alone. The combined networks have a number of important functional and structural properties.</p>
         </sec>
         <sec>
            <st>
               <p>Functional properties of the combined comparative genomics networks</p>
            </st>
            <p>When the three networks are superimposed, four new types of links are formed combinatorially (Figure <figr fid="F2">2</figr>). The combined network of fusion, chromo and phylo links for a given genome (henceforth, combined network) captures between 30 and 80% of the genes in a genome and 57% on average, whereas the chromo, fusion and phylo networks independently capture 48%, 29% and 19% of the genes on average. As these numbers suggest, the graphs overlap significantly in terms of nodes (Table <tblr tid="T1">1</tblr>). We found, as did Huynen <it>et al.</it> [<abbr bid="B28">28</abbr>] in <it>Mycoplasma genitalium,</it> that of the three methods the chromo networks have the greatest coverage. The fusion and phylo networks share 72% and 75% of their nodes with chromo networks; that is, 72% and 75% of the nodes found in fusion and phylo networks are also found in chromo networks. Functional links, on the other hand, tend to be complementary. Only 20% and 14% of the links in fusion and phylo networks, respectively, are found in chromo networks. The least overlapping of all are the phylo and fusion graphs, with only 41% of the nodes in a phylo net found in a fusion net and 6% of the links in a phylo net found in a fusion net. These results indicate that although the three methods capture overlapping sets of genes, and conserved chromosomal proximity captures more than the other two methods combined, the links generated by the different methods individually show much less overlap than the nodes.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The combined network in <it>H. pylori</it></p>
               </caption>
               <text>
                  <p>The combined network in <it>H. pylori.</it> The networks of the three individual methods (shown in Figure <figr fid="F1">1</figr>) are superimposed. Links colored yellow were obtained by phylogenetic profiling; links colored blue by conserved chromosomal proximity, and links colored red by domain fusion. The remaining links are coded as composites of the three primary colors: purple, links found by both fusion and chromosomal proximity; green, links found by chromosomal proximity and phylogenetic profiling, orange, links found by phylogenetic profiling and fusion; and brown, links found by all three methods. The nodes highlighted in red and blue identify genes that participate in oxidative phosphorylation and aromatic amino-acid biosynthesis, respectively (see Figure <figr fid="F4">4</figr>).</p>
               </text>
               <graphic file="gb-2002-3-11-research0064-2"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Overlap of nodes and edges between the four types of network</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>
                           <b>(a)</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Nodes</p>
                     </c>
                     <c ca="center">
                        <p>Chromo</p>
                     </c>
                     <c ca="center">
                        <p>Fusion</p>
                     </c>
                     <c ca="center">
                        <p>Phylo</p>
                     </c>
                     <c ca="center">
                        <p>Combined</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Chromo</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>41 (6)</p>
                     </c>
                     <c ca="center">
                        <p>31 (3)</p>
                     </c>
                     <c ca="center">
                        <p>100 (0)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Fusion</p>
                     </c>
                     <c ca="center">
                        <p>72 (6)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>30 (3)</p>
                     </c>
                     <c ca="center">
                        <p>100 (0)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Phylo</p>
                     </c>
                     <c ca="center">
                        <p>75 (10)</p>
                     </c>
                     <c ca="center">
                        <p>41 (6)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>100 (0)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Combined</p>
                     </c>
                     <c ca="center">
                        <p>80 (6)</p>
                     </c>
                     <c ca="center">
                        <p>46 (6)</p>
                     </c>
                     <c ca="center">
                        <p>34 (4)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>(b)</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Edges</p>
                     </c>
                     <c ca="center">
                        <p>Chromo</p>
                     </c>
                     <c ca="center">
                        <p>Fusion</p>
                     </c>
                     <c ca="center">
                        <p>Phylo</p>
                     </c>
                     <c ca="center">
                        <p>Combined</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Chromo</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>14 (2)</p>
                     </c>
                     <c ca="center">
                        <p>11 (1)</p>
                     </c>
                     <c ca="center">
                        <p>100 (0)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Fusion</p>
                     </c>
                     <c ca="center">
                        <p>20 (4)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>7 (2)</p>
                     </c>
                     <c ca="center">
                        <p>100 (0)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Phylo</p>
                     </c>
                     <c ca="center">
                        <p>14 (7)</p>
                     </c>
                     <c ca="center">
                        <p>6 (2)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>100 (0)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Combined</p>
                     </c>
                     <c ca="center">
                        <p>43 (10)</p>
                     </c>
                     <c ca="center">
                        <p>30 (7)</p>
                     </c>
                     <c ca="center">
                        <p>38 (11)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Element in row <it>i</it> column <it>j</it> is the percentage of average nodes <b>(a)</b> or average edges <b>(b)</b> in network of type <it>i</it> that are also found in network of type <it>j</it>, with the standard deviation in parentheses. For example, 75% of the nodes found by phylogenetic profiling are contained in the chromo networks. This should, however, be contrasted with the finding that only 14% of the functional links in the phylo networks are in chromo networks.</p>
               </tblfn>
            </tbl>
            <p>A fundamental question is the quality of the links in terms of a direct functional correlation between the linked members. This quality can be estimated by reference to databases that classify genes into broad functional categories (clusters of orthologous groups (COGs) [<abbr bid="B12">12</abbr>]) or biological pathways (Kyoto Encyclopedia of Genes and Genomes (KEGG) [<abbr bid="B18">18</abbr>]). In particular, 72%, 68% and 64% of the fusion, chromo and phylo links are in the same COGs functional category, respectively [<abbr bid="B19">19</abbr>,<abbr bid="B22">22</abbr>,<abbr bid="B24">24</abbr>].</p>
            <p>We find, as did Marcotte <it>et al.</it> [<abbr bid="B29">29</abbr>], that the links corroborated by multiple methods are of exceedingly good quality. However, as discussed above, this intersecting set corresponds to a small fraction of the total links (approximately 10%). Thus, in our study of combined networks, we construct the union, instead of the intersection, of the links generated by each method.</p>
            <p>To evaluate the significance of a network we first produced 100 shuffled versions of this network (see Materials and methods). For each of these we calculate the percentage of linked pairs whose members are present in the same functional category (according to COGs) or pathway (according to KEGG) and then estimate the average and standard deviation for the population of random networks. We find a statistically significant difference between the observed networks of all types (fusion, chromo, phylo and combined), and the shuffled networks (<it>p</it> = 2.2e-16 in a <it>t</it>-test). On average, 33 standard deviations of separation distinguish an observed network from its 100 shuffled networks. As can be seen in Figure <figr fid="F3">3</figr>, the observed networks have high functional correlation centered around 60% agreement with COGs and 70% with KEGG, whereas the shuffled networks form a tight distant cluster with a low background functional correlation - a noise level in the range of 10 to 20%.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Functional correlation of networks in terms of COG functional correlations and KEGG pathways</p>
               </caption>
               <text>
                  <p>Functional correlation of networks in terms of COG functional correlations and KEGG pathways. Each circle corresponds to a network of one of the four types of the observed networks (differently colored) for one of the 43 genomes. Each triangle corresponds to the mean of 100 shuffled versions of each of the observed networks (see Methods). Note the clear separation of the functional correlation between the observed and shuffled networks.</p>
               </text>
               <graphic file="gb-2002-3-11-research0064-3"/>
            </fig>
            <p>The methods described here uncover a large number of functional systems such as citrate cycle, purine metabolism, and fructose and mannose metabolism. To provide two specific examples, the highlighted nodes in the network shown in Figure <figr fid="F2">2</figr> correspond to genes that encode proteins involved in oxidative phosphorylation and phenylalanine, tyrosine and tryptophan biosynthesis. We find that all but four of the 35 <it>H. pylori</it> genes annotated as participating in oxidative phosphorylation are present as six clusters in the combined network for this organism (Figure <figr fid="F4">4a</figr>). The clusters found by the three methods reflect the functions subserved by different groups of oxidative phosphorylation genes. In particular, the six clusters shown in Figure <figr fid="F4">4a</figr>, ranging in size from 1 to 12 are found in the network corresponding to five of the six receptor complexes that act in oxidative phosphorylation: NADH dehydrogenase, ATP synthase, succinate dehydrogenase, cytochrome <it>bc</it>, and cytochrome <it>d</it>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Local structure of the <it>H. pylori</it> combined network captures functionally related genes</p>
               </caption>
               <text>
                  <p>Local structure of the <it>H. pylori</it> combined network captures functionally related genes. <b>(a)</b> Oxidative phosphorylation genes; <b>(b)</b> genes involved in phenylalanine, tyrosine and tryptophan biosynthesis. The color of the links between genes is the same as in Figure <figr fid="F2">2</figr>. Gray lines indicate links with genes not ascribed to that pathway.</p>
               </text>
               <graphic file="gb-2002-3-11-research0064-4"/>
            </fig>
            <p>As can be gleaned from Figure <figr fid="F4">4a</figr>, seven of the genes involved in ATP synthase (HP1131 to HP1137) form a conserved cluster of genes (detected in almost all known bacterial genomes). The chromo links among HP0828, HP1212 and HP1137:HP1136 (the last two are paralogs), are based on conserved proximity in other genomes. Six of the genes also have identical phylogenetic profiles, and the profile is unique to these six genes. The link between genes HP1135 and HP1137:HP1136 is strengthened because it is established by both chromosomal proximity and fusion in two <it>Mycobacterium</it> genomes. The five (of six) <it>H. pylori</it> genes that form the cytochrome <it>bc</it> and <it>c</it> complexes are linked. Essentially, the string of links is based upon two separate chromosomal sections: HP0144, HP0145 and HP0147; and HP1539 and HP1540. Within these two sections, two links are further strengthened by phylo links. The two sections are linked by a fusion link between two of the genes (HP1539 and HP0147). Three genes (HP0191, HP0192 and HP0193) compose the fumarate reductase complex and are connected by two chromo links. Chromosomal links build a skeleton of links that unite all the genes involved in NADH dehydrogenase. This cluster is further supported, however, by numerous phylo and fusion links. Two links are supported by all three methods. Two genes, HP1010 and HP1420, involved in oxidative phosphorylation are not connected to other genes involved in this functional system.</p>
            <p>Figure <figr fid="F4">4b</figr> shows the functional links among the <it>H. pylori</it> genes involved in phenylalanine, tyrosine and tryptophan biosynthesis. Genes HP0402, HP0403 and HP0774 correspond to tRNA synthetase domains of phenylalanine and tyrosine linked by fusion events elsewhere as well as the proximity between HP0402 and HP0403 (HP0402 corresponds to two nodes, one for each of its domains, see Materials and methods). HP1249, HP1277, HP0663 and HP0401 share the same phylogenetic profile. HP0401, HP1249, HP0157 and HP0283 are found fused together as the yeast <it>Ar01</it> gene and are thus all fusion linked. The proximity of HP1277 and HP1278, tryptophan synthase &#945; and &#946; chain respectively, is conserved in other genomes. These two genes are also found fused together in some genomes. The relationship between HP1279 (represented twice for each of its domains), HP1281 and HP1282 is enforced by phylo, chromo and fusion links for each pair of the three genes, with the exception of a lack of a fusion link between HP1279 and HP1282. HP0154 and HP0672 are present in the network but are not linked to other genes known to act in this pathway.</p>
            <p>Overall, in <it>E. coli,</it> 26 clusters of seven or more genes corresponding to distinct KEGG pathways are identified in the network; where a cluster is a minimally connected subgraph such as those shown in Figure <figr fid="F4">4</figr>. The ability to reconstruct pathways depends on an integration of the three methods. In other words, the clusters shown would become fragmented if the methods were used individually. To illustrate this, for each of the 26 <it>E. coli</it> clusters we ask what fraction is obtained by the individual methods alone (Figure <figr fid="F5">5</figr>). Although some of the pathway clusters can be completely recovered by one of the methods - for example, the ribosomal system can be completely accounted for by chromo links - most can only be found by the integration of the methods. As an example, in the cluster in the combined network of 10 genes involved in the citrate cycle, 30%, 60% and 30% can be associated by only the fusion, chromo and phylo links, respectively.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Combined networks reconstruct portions of known pathways that cannot be obtained by applying the methods independently</p>
               </caption>
               <text>
                  <p>Combined networks reconstruct portions of known pathways that cannot be obtained by applying the methods independently. The blue spheres correspond to clusters of genes ascribed to a particular functional pathway (such as the ones described in Figure <figr fid="F4">4</figr>). The three-dimensional coordinates of the spheres correspond to the fraction of the clusters (in terms of nodes) that could have been recovered by each of the methods (the axes). The names of some of the pathways are shown. The <it>E. coli</it> genome was used and only clusters of seven or more genes are shown.</p>
               </text>
               <graphic file="gb-2002-3-11-research0064-5"/>
            </fig>
            <p>New functional information on particular genes can be predicted on the basis of the combined networks. Naturally, the most fundamental unit of prediction in the network corresponds to a link between two nodes. For example, <it>M. jannaschii</it> genes MJ1313 and MJ1407, whose functions are completely uncharacterized, are linked by fusion, chromo and phylo links. From this superlink we can confidently conclude that, although the actual functions of the genes in question remain elusive, a functional link probably exists between the two genes and they need to be experimentally studied together.</p>
            <p>By superimposing known pathway information onto the combined networks, functional predictions can be made on the basis of a gene's location in the network. For example, Figure <figr fid="F6">6</figr> shows a fraction of the <it>Thermotoga maritima</it> network. TM0885 and TM1367 have no characterized function, but their position in the network suggests that they have a role in energy production. These two genes form a four-gene clique of phylo links with TM0397 (a ferrodoxin-like domain) and TM0034 (an uncharacterized protein with a putative Fe-S center from the COGs database). The context of the completely uncharacterized genes (TM0885 and TM1367) within the network further strengthens the inference of their functional association with their neighbors (Figure <figr fid="F6">6</figr>). TM0397 is fusion linked to TM0396, a dehydrogenase with a Fe-S cluster, which is in turn fusion linked to an oxidoreductase protein, TM1640. From this network locus of TM0885 and TM1367, we predict for them a role in energy production.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Ascribing function to uncharacterized genes on the basis of their locus in the network</p>
               </caption>
               <text>
                  <p>Ascribing function to uncharacterized genes on the basis of their locus in the network. A portion ofthe <it>T. maritima</it> network is shown. Genes of uncharacterized function are highlighted in green and genes with only a partial annotation in yellow. Genes involved in energy production are in blue. The color of the links is the same as in Figure <figr fid="F2">2</figr>. Gray lines indicate links with genes not shown. From the network locus of TM0885 and TM1367, we predict for the genes shown in blue a role in energy production.</p>
               </text>
               <graphic file="gb-2002-3-11-research0064-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Structural properties of the combined comparative genomics networks</p>
            </st>
            <p>A striking property that emerges when the networks are combined is the formation of a giant cluster. This occurs for all but three genomes (<it>Mycoplasma pneumoniae, Mycoplasma genitalium,</it> and <it>Ureaplasma urealyticum</it>) perhaps because the number of links in these short genomes is very small. Although the number of nodes in the combined networks range from 400 to 1,600 (Figure <figr fid="F7">7a</figr>), depending on the genome, the percentage of nodes contained in a genome's largest cluster is relatively invariant, ranging from 80 to 90% (Figure <figr fid="F7">7b</figr>). These observations suggest that the properties of the combined networks have stabilized, and their general characteristics, as described below, should be relatively invariant against further increases in the number of functional links discovered. We note that random networks generated using the same number of nodes and edges as the chromo, fusion or combined networks, also form giant clusters, as is theoretically expected. In contrast to the very large clusters we find in chromo and fusion links, the largest cluster of phylogenetic links contains 19 ortholog families. This is not surprising because, by definition, all genes in a phylo cluster must have the same phylogenetic profile. Thus, the probability of finding a cluster with a very large number of genes is low.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Network properties</p>
               </caption>
               <text>
                  <p>Network properties. <b>(a)</b> Basic characteristics. For each of the 43 genomes the number of nodes and edges of the four different networks based upon the three methods - chromosome proximity (red), domain fusion (blue), phylogenetic profiling (yellow) - and the combined networks (black) are shown. <b>(b)</b> Giant clusters. The networks are characterized by a giant cluster that relates a dominant fraction of the nodes in the network. In the combined networks of most of the genomes, the giant cluster accounts for 80-90% of the nodes, on average. <b>(c)</b> Universality. The characteristic path distance (global property) and clustering coefficient (local property) of the giant cluster of each network is mapped. Note that networks of the same method tend to cluster together. <b>(d)</b> Degree distributions. The histogram of the number of edges per node is shown on a log-log scale for each network type for <it>Pseudomonas aeruginosa</it>; similar distributions are observed for the other organisms. As in Figure <figr fid="F3">3</figr>, the circles denote values from the observed distributions while the triangles denote the degree distributions of the <it>de novo</it> random networks (see Materials and methods) with the same number of nodes and edges.</p>
               </text>
               <graphic file="gb-2002-3-11-research0064-7"/>
            </fig>
            <p>In general, many paths of different lengths connect each pair of nodes in the giant cluster. A useful measure of the global characteristics of the cluster is the minimum path length between each pair of nodes. It seems intuitively plausible that such minimum paths are the most biologically relevant of all paths connecting a pair. For each genome, the set of shortest path lengths has a Gaussian distribution. Its average is referred to as the characteristic path length. The characteristic path length averaged over the 43 genomes that have a giant cluster is 7, and the standard deviation about this average is 3. In other words, on average, seven comparative genomic links separate any two genes in a giant cluster. This is in contrast to a characteristic path distance of 5 &#177; 1 for random networks simulated with the same number of nodes and edges. Thus, the characteristic path distances of the combined networks are slightly larger than those of random networks but much smaller than the characteristic path of uniform lattice networks.</p>
            <p>An example of a minimum path is shown in Figure <figr fid="F8">8</figr>. The <it>E. coli</it> gene <it>tyrA</it> of the phenylalanine, tyrosine and tryptophan biosynthesis pathway is separated from the <it>aspS</it> gene of the aminoacyl-tRNA biosynthesis and alanine and aspartate metabolism pathways by five links. This shortest path proceeds by way of four histidine metabolism genes. Each link along this path relates genes with a common functional pathway. However, as many genes are mapped to multiple functional pathways, paths in the network typically traverse many pathways. Snel <it>et al.</it> have proposed that such 'linker' genes be used to mark the boundaries between functional modules of genes in such networks [<abbr bid="B30">30</abbr>].</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Global path through the networks</p>
               </caption>
               <text>
                  <p>Global path through the networks. The nodes (genes) along this particular path of five links in the <it>E. coli</it> network are shown as circles. The symbols associated with each node represent the functional pathways in which the gene is annotated. Moon, phenylalanine, tyrosine and tryptophan biosynthesis; diamond, histidine metabolism; exclamation mark, phenylalanine metabolism; X, tyrosine metabolism; star, aminoacyl-tRNA biosynthesis; check mark, alanine and aspartate metabolism.</p>
               </text>
               <graphic file="gb-2002-3-11-research0064-8"/>
            </fig>
            <p>Although global properties appear to be nearly random, local properties are not. In particular we expect local properties to be structured as each biological pathway that subserves a particular function invariably has several members, and the relationships between these members are likely to emerge through the comparative genomics methods applied here. One descriptive characteristic of a local environment is the clustering coefficient: the average probability that two genes linked to a common gene are also linked to each other [<abbr bid="B2">2</abbr>]. This number is 1 for systems that are fully transitive (for example, networks constructed using phylogenetic profiling as applied here; see Figure <figr fid="F1">1</figr>) and becomes very small for random networks.</p>
            <p>Not surprisingly, the combined networks for all 43 genomes show a significant amount of local clustering - or 'cliquishness' - when compared to random graphs. The chromosomal proximity and fusion networks both have a similar clustering coefficient of 0.24 and 0.25 respectively, as calculated for the giant component of the networks. When shuffling the networks (see Materials and methods) the clustering coefficients for these random graphs are 0.02 and 0.06 for chromo and fusion networks, respectively. The clustering coefficient increases to 0.36 for the combined network (0.02 for the shuffled networks), reflecting the high coefficient of the fully transitive phylogenetic graphs. The combination of high clustering coefficients and random-like characteristic path distance places these biological networks among the well studied class of so-called 'small-world' networks, to which social nets often conform [<abbr bid="B2">2</abbr>,<abbr bid="B31">31</abbr>].</p>
            <p>Quantitative measures of the local (clustering coefficient) and global (characteristic path distance) properties of networks allow us to analyze the similarities and differences among all of the networks. Figure <figr fid="F7">7c</figr> shows the properties of the network mapped onto a two-dimensional space defined by the characteristic path distance and clustering coefficient. We find that networks of the same type (phylo, chromo, fusion or combined) cluster together, demonstrating the universality of network structures of each method. The combined networks have a characteristic path that is roughly the average of the fusion and chromo networks and a clustering coefficient that is greater than both.</p>
            <p>Furthermore, the networks appear to be scale-free in terms of the number of links per node (degree): a few nodes have many connections while most have few connections. As shown in Figure <figr fid="F7">7d</figr>, the distribution of links per node for the chromo, fusion and combined networks differs markedly from those of random graphs with the same number of nodes and edges (Figure <figr fid="F7">7d</figr>). While the power law distribution (linear regression on a log-log scale) holds well for the fusion and phylo graphs, the number of nodes of degree 1 in the chromo graph is less than would be expected from a power law, and is, indeed, not significantly larger than those of degree 2. The explanation lies in the operon organization of genes on bacterial and archaeal chromosomes, making two the typical number of chromo links of a conserved proximity of a gene with other genes - one on each end (see also chromo links in Figures <figr fid="F1">1</figr> and <figr fid="F2">2</figr>).</p>
            <p>We stress that the comparative genomic links do not necessarily correspond to direct protein-protein interactions, and thus are not expected to be detected by yeast two-hybrid methods. Links made <it>in silico</it> have been shown to be indicative of broader functional relationships [<abbr bid="B19">19</abbr>,<abbr bid="B22">22</abbr>]. In a recent study, Rain <it>et al.</it> identified over 1,200 interactions using a high-throughput application of the yeast two-hybrid method using approximately 15% of the <it>H. pylori</it> genes as 'baits' [<abbr bid="B32">32</abbr>]. Of the 1,200 interacting pairs detected, only 17 correspond to links also found in our comparative genomic network. We note that low overlap between sets of links is even observed between high-throughput datasets [<abbr bid="B33">33</abbr>,<abbr bid="B34">34</abbr>] and may be explained by a lack of saturation of the complete functional relationship space, a high false-positive rate, and/or a bias towards a certain type of link [<abbr bid="B35">35</abbr>].</p>
            <p>Uncovering functional links between genes is a major step towards deducing the function of individual genes. Here we describe the properties of networks generated by the combination of three comparative genomics methods for 43 microbial genomes representing the three domains of life. We find that a giant 'small-world' cluster consistently includes 80% to 90% of the nodes in these networks, so that the average minimum path between any two genes is small, but local cliquishness is frequent relative to random networks. This structure for a society of genes, along with the observation that local order corresponds to genes of the same functional pathways, supports the notion that the network of relationships among a cell's genes is a set of highly cliquish functional systems interlinked by genes that are common to multiple pathways.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Identification of comparative genomics links</p>
            </st>
            <p>We used three previously published methods for detecting functional links between proteins by comparative genomics. In this section we describe the implementation of these methods. As a pair of proteins can be coded as distinct genes in some genomes but fused as a multidomain protein in others (the basis for the fusion method), a comparative genomics study must adopt protein domains (instead of whole proteins) as the unit of analysis. Thus, multidomained proteins are split into domains (as they are at present in the COGs database, which clusters orthologous protein domains that are present in three or more lineages) and these, in turn, form the units of the comparative genomics links generated by the three methods. All the links are available in the Predictome database [<abbr bid="B36">36</abbr>] and as additional data files (see Additional data files). These methods all depend upon detecting orthologs across genomes and, consequently, each node in any network corresponds to a cluster of orthologous groups [<abbr bid="B12">12</abbr>,<abbr bid="B37">37</abbr>] - a COG - with a representative sequence in that genome. Thus, the term 'gene' which, for brevity, is used throughout as indicative of a single node in the network, may in some cases be represented by multiple nodes.</p>
            <sec>
               <st>
                  <p>Domain fusion links</p>
               </st>
               <p>Fusion links between protein domains were detected by a BLASTP [<abbr bid="B38">38</abbr>] search of the 43 genomes included in the COGs database against nrdb90 [<abbr bid="B39">39</abbr>], a non-redundant protein database. We deemed two protein domains - assigned to different COGs - to be fusion linked if each had an alignment of at least 80 residues to the same nrdb90 protein with a maximum expectation (E) value of 10<sup>-10</sup> and with a maximum overlap of the two alignments of 20 residues. We then extrapolated the link between the domains to a link between their respective ortholog families (COGs) by applying the link to the common representatives of the two sets; that is, each member of the first COG is linked to those members of its partner COG that are of the same genome. This final step assigns links to domains that were undetected as the result of the high-stringency cutoff, and it therefore greatly increases the number of links. The COGs database is threshold-free and is instead build upon clusters of bidirectional best intergenomic matches [<abbr bid="B12">12</abbr>]. Thus, the availability of this reliable ortholog family's database allows for the generalization of a fusion link from the level of domains to domain families while minimizing the possibility of false-positive alignments between the domains and their fusion.</p>
            </sec>
            <sec>
               <st>
                  <p>Chromosomal proximity links</p>
               </st>
               <p>Chromo links were identified as recently described [<abbr bid="B19">19</abbr>]. Genes A and B in genome X are linked by a chromosomal proximity link if they satisfy either of the two following conditions. They have a direct link, in which A and B are proximate (within 300 bp and transcribed in the same direction [<abbr bid="B16">16</abbr>]) in X and their orthologs are also proximate in at least two other genomes corresponding to different phylogenetic groups (as defined by COGs). Or, they have an inferred link, in which A and B are not proximate in X but their orthologs are proximate in at least three other genomes corresponding to different phyletic groups.</p>
            </sec>
            <sec>
               <st>
                  <p>Phylogenetic profiling links</p>
               </st>
               <p>The 43 organisms are organized into 26 phyletic groups as defined by the COGs database. Two ortholog groups (COGs) are phylo linked if their phyletic distributions are identical (zero bit difference in [<abbr bid="B14">14</abbr>]); that is, their 26-bit profiles of presence and absence are the same. Finally, two domains are phylo linked if their respective COGs are linked. The reliability of a phylo link is inversely related to how frequently the linked patterns are observed (data not shown). Thus, setting a lower threshold for the number of ortholog families that can have the same phyletic pattern can increase the quality of the links. However, a lower threshold, by definition, reduces the number of links. To strike a balance between the number of links and their functional correlation, a threshold of 29 was used. In other words, if a particular phyletic pattern corresponds to more than 29 orthologous groups, the pattern is considered uninformative for phylogenetic links.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Functional annotation of the networks</p>
            </st>
            <sec>
               <st>
                  <p>Functional correlation of the sets</p>
               </st>
               <p>To determine the functional correlation of a given network, observed (phylo, chromo, fusion or combined) or shuffled, with the classification of a given database (COGs or KEGG) we begin by collapsing the network to a list of links and selecting those links where both members are annotated (broad functional category in COGs [<abbr bid="B12">12</abbr>] or pathway in KEGG [<abbr bid="B40">40</abbr>]). The correlation of the network with the functional annotation is taken as the percentage of links in this set that is in the same category or pathway.</p>
            </sec>
            <sec>
               <st>
                  <p>Functional pathways</p>
               </st>
               <p>All pathway information used to investigate local network structures was derived from the KEGG database [<abbr bid="B40">40</abbr>].</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Random networks</p>
            </st>
            <p>Two different methods for generating random networks were used. Both methods compare random networks with observed networks having the same number of nodes and edges.</p>
            <sec>
               <st>
                  <p>Shuffling of existing network</p>
               </st>
               <p>To preserve the unique degree distribution of the observed network in its random counterpart, we shuffled the edges of the observed network according to the following algorithm [<abbr bid="B41">41</abbr>]. We begin with the observed network and repeatedly (10,000 times) randomly choose two links in the observed network, x<sub>1</sub>&lt;>y<sub>1</sub> and x<sub>2</sub>&lt;>y<sub>2</sub>, and rewire them to: x<sub>1</sub>&lt;>y<sub>2</sub> and x<sub>2</sub>&lt;>y<sub>1</sub>.</p>
            </sec>
            <sec>
               <st>
                  <p>De novo synthesis of random network</p>
               </st>
               <p>When analyzing the degree distribution of the observed networks, random networks are required to have the same number of nodes and edges as the observed networks but have a randomly generated degree distribution. We begin with <it>N</it> unlinked nodes and proceed by randomly choosing two nodes and adding an edge between them unless it already exists. The simulation ends when there are <it>E</it> edges in the random graph.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>All the network files are available as <supplr sid="s1">additional data files</supplr> , along with a <supplr sid="s2">text file</supplr> description of the COG clusters (COGs), and <supplr sid="s3">instructions</supplr> for interpreting the network files. The networks are organized according to type and genome. The 172 network files correspond to four networks (chromo, phylo, fusion and composite) for each of 43 genomes. Each network file lists the edges of the network. In the composite files there is an additional column to specify the nature of the link: 1, fusion; 2, chromo; 3, phylo; 4, chromo + fusion; 5, phylo + fusion; 6, chromo + phylo; 7, fusion + chromo + phylo.</p>
         <suppl id="s1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>All network files</p>
            </caption>
            <text>
               <p>All network files</p>
            </text>
            <file name="gb-2002-3-11-research0064-s1.zip">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>A description of the COG clusters (COGs)</p>
            </caption>
            <text>
               <p>A description of the COG clusters (COGs)</p>
            </text>
            <file name="gb-2002-3-11-research0064-s2.txt">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Instructions for interpreting the network files</p>
            </caption>
            <text>
               <p>Instructions for interpreting the network files</p>
            </text>
            <file name="gb-2002-3-11-research0064-s3.txt">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Adnan Derti, Ron Ophir, Carlos J. Camacho, Daniel Segr&#233; and Todd Silverstein for critical readings and helpful discussions. We thank Ivy Lee for work on an early stage of this study. This work was funded by a Whitaker Graduate Fellowship and by a Koshland Postdoctoral Fellowship to I.Y.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <aug>
               <au>
                  <snm>Bar-Yam</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Dynamics of Complex Systems.</source>
            <publisher>London: Addison Wesley Longman</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Collective dynamics of 'small-world' networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Watts</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Strogatz</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1998</pubdate>
            <volume>393</volume>
            <fpage>440</fpage>
            <lpage>442</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/30918</pubid>
                  <pubid idtype="pmpid" link="fulltext">9623998</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Emergence of scaling in random networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Barabasi</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Albert</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <fpage>509</fpage>
            <lpage>512</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.286.5439.509</pubid>
                  <pubid idtype="pmpid" link="fulltext">10521342</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>A comprehensive analysis of protein-protein interactions in <it>Saccharomyces cerevisiae.</it></p>
            </title>
            <aug>
               <au>
                  <snm>Uetz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Giot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cagney</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mansfield</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Judson</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Knight</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Lockshon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Narayan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pochart</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <fpage>623</fpage>
            <lpage>627</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35001009</pubid>
                  <pubid idtype="pmpid" link="fulltext">10688190</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A comprehensive two-hybrid analysis to explore the yeast protein interactome.</p>
            </title>
            <aug>
               <au>
                  <snm>Ito</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chiba</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ozawa</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sakaki</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>4569</fpage>
            <lpage>4574</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">31875</pubid>
                  <pubid idtype="pmpid" link="fulltext">11283351</pubid>
                  <pubid idtype="doi">10.1073/pnas.061034498</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Functional organization of the yeast proteome by systematic analysis of protein complexes.</p>
            </title>
            <aug>
               <au>
                  <snm>Gavin</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Bosche</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Grandi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Marzioch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rick</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Michon</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Cruciat</snm>
                  <fnm>CM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>141</fpage>
            <lpage>147</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415141a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11805826</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Systematic identification of protein complexes in <it>Saccharomyces cerevisiae</it> by mass spectrometry.</p>
            </title>
            <aug>
               <au>
                  <snm>Ho</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gruhler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Heilbut</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Millar</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Boutilier</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>180</fpage>
            <lpage>183</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415180a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11805837</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Who's your neighbor? New computational approaches for functional genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2000</pubdate>
            <volume>18</volume>
            <fpage>609</fpage>
            <lpage>613</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/76443</pubid>
                  <pubid idtype="pmpid" link="fulltext">10835597</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Exploitation of gene context.</p>
            </title>
            <aug>
               <au>
                  <snm>Huynen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lathe</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>366</fpage>
            <lpage>370</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(00)00098-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">10851194</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Protein function in the post-genomic era.</p>
            </title>
            <aug>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Xenarios</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>405</volume>
            <fpage>823</fpage>
            <lpage>836</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35015694</pubid>
                  <pubid idtype="pmpid" link="fulltext">10866208</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Computational genetics: finding protein function by nonhomology methods.</p>
            </title>
            <aug>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>359</fpage>
            <lpage>365</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(00)00097-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">10851184</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>A genomic perspective on protein families.</p>
            </title>
            <aug>
               <au>
                  <snm>Tatusov</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>278</volume>
            <fpage>631</fpage>
            <lpage>637</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.278.5338.631</pubid>
                  <pubid idtype="pmpid" link="fulltext">9381173</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Constructing multigenome views of whole microbial genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Gaasterland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ragan</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Microb Comp Genomics</source>
            <pubdate>1998</pubdate>
            <volume>3</volume>
            <fpage>177</fpage>
            <lpage>192</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9775388</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.</p>
            </title>
            <aug>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>4285</fpage>
            <lpage>4288</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">16324</pubid>
                  <pubid idtype="pmpid" link="fulltext">10200254</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.8.4285</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Selfish operons and speciation by gene transfer.</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>1997</pubdate>
            <volume>5</volume>
            <fpage>355</fpage>
            <lpage>359</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0966-842X(97)01110-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">9294891</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The use of gene clusters to infer functional coupling.</p>
            </title>
            <aug>
               <au>
                  <snm>Overbeek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fonstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>D'Souza</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pusch</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Maltsev</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>2896</fpage>
            <lpage>2901</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15866</pubid>
                  <pubid idtype="pmpid" link="fulltext">10077608</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.6.2896</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Conservation of gene order: a fingerprint of proteins that physically interact.</p>
            </title>
            <aug>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Huynen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>1998</pubdate>
            <volume>23</volume>
            <fpage>324</fpage>
            <lpage>328</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(98)01274-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">9787636</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping.</p>
            </title>
            <aug>
               <au>
                  <snm>Fujibuchi</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Matsuda</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>4029</fpage>
            <lpage>4036</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">110780</pubid>
                  <pubid idtype="pmpid" link="fulltext">11024184</pubid>
                  <pubid idtype="doi">10.1093/nar/28.20.4029</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Identifying functional links between genes using conserved chromosomal proximity.</p>
            </title>
            <aug>
               <au>
                  <snm>Yanai</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Mellor</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>DeLisi</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>176</fpage>
            <lpage>179</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(01)02621-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">11932011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Detecting protein function and protein-protein interactions from genome sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>285</volume>
            <fpage>751</fpage>
            <lpage>753</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.285.5428.751</pubid>
                  <pubid idtype="pmpid" link="fulltext">10427000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Protein interaction maps for complete genomes based on gene fusion events.</p>
            </title>
            <aug>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Iliopoulos</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <fpage>86</fpage>
            <lpage>90</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10573422</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Yanai</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Derti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>DeLisi</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>7940</fpage>
            <lpage>7945</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">35447</pubid>
                  <pubid idtype="pmpid" link="fulltext">11438739</pubid>
                  <pubid idtype="doi">10.1073/pnas.141236298</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions.</p>
            </title>
            <aug>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>1</fpage>
            <lpage>0034</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2001-2-9-research0034</pubid>
                  <pubid idtype="pmpid">1182054</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Predictome: a database of putative functional links between proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Mellor</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Yanai</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Clodfelter</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Mintseris</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>DeLisi</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>306</fpage>
            <lpage>309</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99135</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752322</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.306</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Scale-free behavior in protein domain networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Wuchty</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>1694</fpage>
            <lpage>1702</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11504849</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Domain combinations in archaeal, eubacterial and eukaryotic proteomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Apic</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>310</volume>
            <fpage>311</fpage>
            <lpage>325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4776</pubid>
                  <pubid idtype="pmpid" link="fulltext">11428892</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>An insight into domain combinations.</p>
            </title>
            <aug>
               <au>
                  <snm>Apic</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>S83</fpage>
            <lpage>S89</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.1.83</pubid>
                  <pubid idtype="pmpid" link="fulltext">11472996</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Predicting protein function by genomic context: quantitative evaluation and qualitative inferences.</p>
            </title>
            <aug>
               <au>
                  <snm>Huynen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lathe</snm>
                  <fnm>W</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1204</fpage>
            <lpage>1210</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">117020</pubid>
                  <pubid idtype="pmpid" link="fulltext">10958638</pubid>
                  <pubid idtype="doi">10.1101/gr.10.8.1204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>A combined algorithm for genome-wide prediction of protein function.</p>
            </title>
            <aug>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <fpage>83</fpage>
            <lpage>86</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/47048</pubid>
                  <pubid idtype="pmpid" link="fulltext">10573421</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>The identification of functional modules from the genomic association of genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huynen</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>5890</fpage>
            <lpage>5895</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">122872</pubid>
                  <pubid idtype="pmpid" link="fulltext">11983890</pubid>
                  <pubid idtype="doi">10.1073/pnas.092632599</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Classes of small-world networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Amaral</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Scala</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Barthelemy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stanley</snm>
                  <fnm>HE</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>11149</fpage>
            <lpage>11152</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">17168</pubid>
                  <pubid idtype="pmpid" link="fulltext">11005838</pubid>
                  <pubid idtype="doi">10.1073/pnas.200327197</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The protein-protein interaction map of <it>Helicobacter pylori.</it></p>
            </title>
            <aug>
               <au>
                  <snm>Rain</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Selig</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>De Reuse</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Battaglia</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Reverdy</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lenzen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Petel</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Wojcik</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schachter</snm>
                  <fnm>V</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>211</fpage>
            <lpage>215</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35051615</pubid>
                  <pubid idtype="pmpid" link="fulltext">11196647</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Networking proteins in yeast.</p>
            </title>
            <aug>
               <au>
                  <snm>Hazbun</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Fields</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>4277</fpage>
            <lpage>42778</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">33318</pubid>
                  <pubid idtype="pmpid" link="fulltext">11296274</pubid>
                  <pubid idtype="doi">10.1073/pnas.091096398</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Towards an understanding of complex protein networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Tucker</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Gera</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Uetz</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Trends Cell Biol</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>102</fpage>
            <lpage>106</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0962-8924(00)01902-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11306254</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Comparative assessment of large-scale data sets of protein-protein interactions.</p>
            </title>
            <aug>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Cornell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Fields</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>417</volume>
            <fpage>399</fpage>
            <lpage>403</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature750</pubid>
                  <pubid idtype="pmpid" link="fulltext">12000970</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Predictome</p>
            </title>
            <url>http://predictome.bu.edu</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The COG database: new developments in phylogenetic classification of proteins from complete genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Tatusov</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Garkavtsev</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Shankavaram</snm>
                  <fnm>UT</fnm>
               </au>
               <au>
                  <snm>Rao</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Kiryutin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Fedorova</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>22</fpage>
            <lpage>28</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29819</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125040</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.22</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Removing near-neighbour redundancy from large protein sequence collections.</p>
            </title>
            <aug>
               <au>
                  <snm>Holm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>423</fpage>
            <lpage>429</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.5.423</pubid>
                  <pubid idtype="pmpid" link="fulltext">9682055</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>KEGG: Kyoto encyclopedia of genes and genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Goto</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>27</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102409</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592173</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.27</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Random graphs with arbitrary degree distributions and their applications.</p>
            </title>
            <aug>
               <au>
                  <snm>Newman</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Strogatz</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Watts</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics</source>
            <pubdate>2001</pubdate>
            <volume>64</volume>
            <fpage>026118</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1103/PhysRevE.64.026118</pubid>
                  <pubid idtype="pmpid" link="fulltext">11497662</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Pajek: package for large network analysis</p>
            </title>
            <url>http://vlado.fmf.uni-lj.si/pub/networks/pajek/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
