<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-12-r271</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Method</dochead>
      <bibl>
         <title>
            <p>Consistent dissection of the protein interaction network by combining global and local metrics</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Wang</snm>
               <fnm>Chunlin</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>wangcl@stanford.edu</email>
            </au>
            <au id="A2">
               <snm>Ding</snm>
               <fnm>Chris</fnm>
               <insr iid="I3"/>
               <email>chqding@lbl.gov</email>
            </au>
            <au id="A3">
               <snm>Yang</snm>
               <fnm>Qiaofeng</fnm>
               <insr iid="I1"/>
               <email>qyang@lbl.gov</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Holbrook</snm>
               <mi>R</mi>
               <fnm>Stephen</fnm>
               <insr iid="I1"/>
               <email>SRHolbrook@lbl.gov</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I2">
               <p>Division of Infectious Diseases, School of Medicine, Stanford University, Stanford, CA 94035, USA</p>
            </ins>
            <ins id="I3">
               <p>Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>12</issue>
         <fpage>R271</fpage>
         <url>http://genomebiology.com/2007/8/12/R271</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18154653</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-12-r271</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>22</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>14</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>21</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>21</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Wang et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Identifying protein interaction modules</p>
      </shorttitle>
      <shortabs>
         <p>A new network decomposition method is proposed that uses both a global metric and a local metric to identify protein interaction modules in the protein interaction network. </p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>We propose a new network decomposition method to systematically identify protein interaction modules in the protein interaction network. Our method incorporates both a global metric and a local metric for balance and consistency. We have compared the performance of our method with several earlier approaches on both simulated and real datasets using different criteria, and show that our method is more robust to network alterations and more effective at discovering functional protein modules.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Protein complexes are building blocks of cellular components and pathways. A comprehensive understanding of a biological system requires knowledge about how protein complexes are assembled, regulated, and organized to form cellular components and perform cellular functions. The emergence of a variety of genomic and proteomic techniques to systematically obtain such information has generated an enormous amount of data <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. However, interpretation and analysis of such data in terms of biological function has not kept pace with data acquisition, mainly due to the complexity of the problem and the limitation of current techniques to handle the data.</p>
         <p>In this paper, we address the issue of constructing protein interaction modules from the protein interaction data. Highly connected protein modules are mostly found to be protein complexes performing a specific biological function. The concept of protein interaction modules as fundamental functional units was first outlined by Hartwell <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Protein interaction modules are composed of a variable number of proteins, with discrete functions arising from their individual constituents and their synergistic interactions. A multi-protein complex, such as the ribosome, is one common form of interaction module; other examples of protein functional modules include proteins working collectively in a pathway, such as signal transduction, that do not necessarily form a tightly associated, stable protein complex.</p>
         <p>To detect protein interaction modules from protein interaction data, we use a graph theory approach. Protein interaction networks are routinely represented as graphs, with proteins as nodes and interactions as edges. In a graphical representation of a protein interaction network, a functional unit, or a group of functionally related proteins, is tightly connected as a community, while proteins from different functional units are more loosely connected. In the past few years, new algorithms have been developed to extract communities from a generic network. Girvan and Newman <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> proposed a decomposition algorithm (GN algorithm) to analyze community structure in networks. Their algorithm iteratively removes edges based on betweenness values, the number of shortest paths between all pairs of nodes in the network running through an edge, in contrast to the traditional hierarchical clustering algorithm where closely connected nodes are iteratively joined together into larger and larger communities. In a different approach, Radicchi <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> replaced the edge betweenness metric with an edge clustering coefficient - the number of triangles to which a given edge belongs, divided by the number of triangles that might potentially include it, given the degrees of the adjacent nodes. The edge clustering coefficient is a local topology-based metric and a candidate edge with the lowest clustering coefficient is removed one at a time in the algorithm of Radicchi <it>et al</it>. (the 'edge clustering coefficient' algorithm, ECC algorithm for short).</p>
         <p>When applied to a large network, these two algorithms give substantially different results. The reason is that an individual edge with larger betweenness does not necessarily have a lower clustering coefficient, although on average it will. Ultimately, the global metric in the GN algorithm behaves differently from the local metric in the ECC algorithm. In this paper, we propose to resolve this conflict by combining the global and local metrics to form a consistent and robust algorithm. We make three additional significant contributions: a new metric (commonality) that takes into account the effects of random edge distributions; a new definition of a protein interaction module; and a novel filtering procedure to remove false-positive interactions based on a random graph model analysis. We demonstrate that our new algorithm is more effective and robust in terms of discovering protein interaction modules in protein interaction networks than either the global or local algorithm by application to the large yeast protein interaction network.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>The principal result of this paper is the development of a new algorithm for extracting protein interaction modules from a protein interaction network. We first present the new methodology developments and then compare the performance of different algorithms, including the MCL algorithm <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, on simulated networks where protein complexes were known. The MCL algorithm is a fast and scalable unsupervised cluster algorithm for graphs based on simulation of stochastic flow in graphs <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and was found to be overall the best performing one by the Brohee and van Helden study <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Note that our proposed new algorithm, the GN algorithm, and the ECC algorithm are divisive partitioning-type algorithms, while the MCL algorithm is a non-partitioning algorithm. Both the modularity <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> measure and productive cuts in the following sections are not applicable to the MCL algorithm. Second, we compare the results of different algorithms on a small protein interaction network where protein complexes are largely known. Lastly, we apply our new algorithm, the GN algorithm, the ECC algorithm, and the MCL algorithm, whenever applicable, to two large yeast protein interaction networks and evaluate the performance of each algorithm based on the value of modularity <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, overlap with Munich Information Center for Protein Sequences (MIPS) complexes <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and Gene Ontology (GO) term enrichment of each cluster.</p>
         <sec>
            <st>
               <p>A new commonality metric</p>
            </st>
            <p>Consider two proteins A and B. Let <it>k </it>be the number of common interacting partners (or neighbors) between A and B. If A and B belong to the same protein complex, they likely share many common interaction partners, that is, have a large <it>k</it>. On the other hand, if A and B do not belong to the same protein complex, they likely have few common interaction partners, that is, have a small <it>k</it>. However, randomness also enters the equation. Let n, m be the number of total interacting partners for protein A and B, respectively (n and m are also called degrees of A and B). A standard model of a protein interaction type network is the fixed-degree-sequence random graph <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> where the interactions follow the hypergeometric distribution. From this model, the average number of common interacting partners between proteins A and B in a random graph is given by:</p>
            <p>
               <display-formula>
                  <graphic file="gb-2007-8-12-r271-i1.gif"/>
               </display-formula>
            </p>
            <p>N is the total number of nodes. To offset this random effect that a large k results from large n and m, we propose a new commonality index as:</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i2">
                     <m:semantics>
                        <m:mrow>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mi>k</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mrow>
                                 <m:msqrt>
                                    <m:mrow>
                                       <m:mi>n</m:mi>
                                       <m:mo>&#8901;</m:mo>
                                       <m:mi>m</m:mi>
                                    </m:mrow>
                                 </m:msqrt>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaajuaGdaWcaaqaaiaadUgacqGHRaWkcaaIXaaabaWaaOaaaeaacaWGUbGaeyyXICTaamyBaaqabaaaaaaa@37AD@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The square root of <it>n</it>&#183;<it>m </it>makes it a scale invariant. We note that in <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, the authors define a similar metric as:</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i3">
                     <m:semantics>
                        <m:mrow>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mi>k</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>min</m:mi>
                                 <m:mo>&#8289;</m:mo>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>,</m:mo>
                                 <m:mi>m</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaajuaGdaWcaaqaaiaadUgacqGHRaWkcaaIXaaabaGaciyBaiaacMgacaGGUbGaaiikaiaad6gacqGHsislcaaIXaGaaiilaiaad2gacqGHsislcaaIXaGaaiykaaaakiaac6caaaa@3E3A@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
         </sec>
         <sec>
            <st>
               <p>BCD algorithm</p>
            </st>
            <p>Our goal is to discover protein interaction modules. Intuitively, when two protein functional modules are sparsely connected, edges between them should have higher edge-betweenness values and lower commonality, whereas edges within a module should have high commonality and low edge-betweeness. Thus, for sparsely connected functional modules, edge-betweenness highly correlates with edge-commonality. When protein functional modules overlap, the correlation between the global metric and local metric becomes less clear. For this reason, we combine these two metrics to build a more consistent and robust metric. The new BCD (Betweenness-Commonality Decomposition) algorithm is summarized as follows: step 1, calculate the edge commonality (<b>C</b>) for each edge in the network; step 2, calculate the edge-<ul>b</ul>etweenness (<b>B</b>) for each edge in the current subnetwork; step 3, remove the edge with the maximal ratio B/C; and step 4, repeat steps 2 and 3 until no edges remain.</p>
            <p>Like the edge clustering coefficient in the ECC algorithm, the edge commonality is a static property of an edge in the context of the entire network, telling how strong the affinity is between two nodes it connects. The edge commonality is calculated only once at the beginning of a decomposition process, while the edge-betweenness is updated each time an edge is removed to achieve best results <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. This algorithm runs with <it>O</it>(<it>M</it><sup>2</sup><it>N</it>) computational complexity, where M is the number of edges and N is the number of nodes in a network. As a practical matter, we calculate the betweenness using the fast algorithm of Brandes <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> where the edge-betweenness value can be obtained by summing pair-dependencies over all traversals <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, so that we can easily parallelize the computationally costly betweenness calculation.</p>
         </sec>
         <sec>
            <st>
               <p>A new definition of protein interaction module</p>
            </st>
            <p>Intuitively, a protein interaction module is a subnetwork in the protein interaction network with more internal interactions than external interactions. A precise definition of the interaction module is not trivial. A number of definitions of community (or protein interaction module in terms of the protein interaction network) have been proposed with different criteria <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B17">17</abbr><abbr bid="B22">22</abbr></abbrgrp>. No clear consensus of module definition exists.</p>
            <p>All three algorithms (BCD, GN, ECC) in this study transform a network into a decomposition tree (Figure <figr fid="F1">1</figr>). In this tree (called a dendrogram in the social sciences), the leaves are the nodes, whereas the branches join nodes or (at higher level) groups of nodes, thus identifying a hierarchical structure of communities nested within each other. When inspecting the resultant tree from either one of the tree algorithms on a small yeast transcription network with 225 proteins and 1,792 interactions, where known protein interaction modules can be inferred from the annotations of well-studied proteins, we found most, if not all, protein complexes, within which proteins are tightly grouped as subtrees in the decomposition tree with uniform structure similar to those shadowed subtrees in Figure <figr fid="F1">1</figr>. Similar results were seen in much larger networks. Based on those observations, we propose a precise definition of a protein interaction module utilizing the decomposition tree structure. We first note that on the decomposition tree, all leaf nodes are single proteins, while non-leaf nodes are collections of proteins. We define a 'special parent' as a non-leaf node with at least one child being a leaf (Figure <figr fid="F1">1</figr>). A protein interaction module is then defined as the nodes of a maximal sub-tree where all non-leaf nodes are special parents. Further, when two modules share the same parent, we merge them (Figure <figr fid="F1">1</figr>, subtrees in solid boxes) when the maximal commonality of edges connecting these two modules is larger than a pre-defined cutoff. Currently, the cutoff is set at 0.1 to avoid merging two modules with very limited connections between them. Results on actual protein interaction networks indicate that proteins within a module as defined above have very similar GO terms and perform similar functions (see Figure <figr fid="F2">2</figr> for examples). The dangling nodes outside modules (in dashed boxes in Figure <figr fid="F1">1</figr>) are simply categorized as singletons.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>A sample decomposition tree showing protein interaction modules</p>
               </caption>
               <text>
                  <p>A sample decomposition tree showing protein interaction modules. Special parents are marked with triangles. Modules as defined in the text are shown as shaded subtrees. Two modules with the same parent are merged if the edge commonality between the two modules is above a threshold (shown as boxes). Dashed lines outline singletons.</p>
               </text>
               <graphic file="gb-2007-8-12-r271-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>A yeast transcriptional sub-network (upper) and the decomposition tree constructed by the BCD algorithm (lower)</p>
               </caption>
               <text>
                  <p>A yeast transcriptional sub-network (upper) and the decomposition tree constructed by the BCD algorithm (lower). Predicted protein modules are highlighted with colored bars (lower panel) and protein nodes in the network (upper panel) are colored accordingly. The module names in the upper panel are inferred from their members' annotation information. Singletons are colored red.</p>
               </text>
               <graphic file="gb-2007-8-12-r271-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Filtering false-positive interactions</p>
            </st>
            <p>Most yeast protein interaction data were obtained from large-scale, high-throughput experiments, which generally contain false positives <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. To minimize the number of false positive interactions, we apply a statistical test to measure the reliability of an interaction (edge). We rigorously calculate the statistical significance of each interaction between two proteins as the random probability (<it>P </it>value) that the number of common interacting partners occurs at or above the observed number. Previous work has shown that the statistical significance based on the number of common interacting partners highly correlates with the functional association of two proteins <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>.</p>
            <p>In a species with N proteins, the number of distinct ways in which two interacting proteins A and B with n and m interaction partners have k partners in common is given by <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i4"><m:semantics><m:mrow><m:msubsup><m:mi>C</m:mi><m:mi>k</m:mi><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>2</m:mn></m:mrow></m:msubsup><m:mo>&#8901;</m:mo><m:msubsup><m:mi>C</m:mi><m:mrow><m:mi>n</m:mi><m:mo>&#8722;</m:mo><m:mi>k</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>2</m:mn><m:mo>&#8722;</m:mo><m:mi>k</m:mi></m:mrow></m:msubsup><m:mo>&#8901;</m:mo><m:msubsup><m:mi>C</m:mi><m:mrow><m:mi>m</m:mi><m:mo>&#8722;</m:mo><m:mi>k</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mi>n</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaam4qamaaDaaaleaacaWGRbaabaGaamOtaiabgkHiTiaaikdaaaGccqGHflY1caWGdbWaa0baaSqaaiaad6gacqGHsislcaWGRbGaeyOeI0IaaGymaaqaaiaad6eacqGHsislcaaIYaGaeyOeI0Iaam4AaaaakiabgwSixlaadoeadaqhaaWcbaGaamyBaiabgkHiTiaadUgacqGHsislcaaIXaaabaGaamOtaiabgkHiTiaad6gacqGHsislcaaIXaaaaaaa@4C37@</m:annotation></m:semantics></m:math></inline-formula>. The first factor (<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i5"><m:semantics><m:mrow><m:msubsup><m:mi>C</m:mi><m:mi>k</m:mi><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>2</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaam4qamaaDaaaleaacaWGRbaabaGaamOtaiabgkHiTiaaikdaaaaaaa@3402@</m:annotation></m:semantics></m:math></inline-formula>) is the number of ways to choose the k common partners from all N proteins except proteins A and B. The second term (<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i6"><m:semantics><m:mrow><m:msubsup><m:mi>C</m:mi><m:mrow><m:mi>n</m:mi><m:mo>&#8722;</m:mo><m:mi>k</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>2</m:mn><m:mo>&#8722;</m:mo><m:mi>k</m:mi></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaam4qamaaDaaaleaacaWGUbGaeyOeI0Iaam4AaiabgkHiTiaaigdaaeaacaWGobGaeyOeI0IaaGOmaiabgkHiTiaadUgaaaaaaa@3967@</m:annotation></m:semantics></m:math></inline-formula>) counts the number of ways of choosing dangling partners of protein A (note that the common partners and protein A, B are excluded). Similarly, the third term (<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i7"><m:semantics><m:mrow><m:msubsup><m:mi>C</m:mi><m:mrow><m:mi>m</m:mi><m:mo>&#8722;</m:mo><m:mi>k</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mi>n</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaam4qamaaDaaaleaacaWGTbGaeyOeI0Iaam4AaiabgkHiTiaaigdaaeaacaWGobGaeyOeI0IaamOBaiabgkHiTiaaigdaaaaaaa@3968@</m:annotation></m:semantics></m:math></inline-formula>) is for choosing dangling partners of protein B. The total number of ways for the two interacting proteins to have n and m interaction partners, regardless of how many are in common, is given by <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i8"><m:semantics><m:mrow><m:msubsup><m:mi>C</m:mi><m:mrow><m:mi>n</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>2</m:mn></m:mrow></m:msubsup><m:mo>&#8901;</m:mo><m:msubsup><m:mi>C</m:mi><m:mrow><m:mi>m</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mi>N</m:mi><m:mo>&#8722;</m:mo><m:mn>2</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaam4qamaaDaaaleaacaWGUbGaeyOeI0IaaGymaaqaaiaad6eacqGHsislcaaIYaaaaOGaeyyXICTaam4qamaaDaaaleaacaWGTbGaeyOeI0IaaGymaaqaaiaad6eacqGHsislcaaIYaaaaaaa@3E0C@</m:annotation></m:semantics></m:math></inline-formula>. Therefore, the probability to randomly see two interacting proteins with n and m partners, sharing k common partners in a species with N proteins, is given by:</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i9">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>p</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>k</m:mi>
                           <m:mo>|</m:mo>
                           <m:mo>&#8722;</m:mo>
                           <m:mo>,</m:mo>
                           <m:mi>n</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>m</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>N</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>C</m:mi>
                                    <m:mi>k</m:mi>
                                    <m:mrow>
                                       <m:mi>N</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>2</m:mn>
                                    </m:mrow>
                                 </m:msubsup>
                                 <m:mo>&#8901;</m:mo>
                                 <m:msubsup>
                                    <m:mi>C</m:mi>
                                    <m:mrow>
                                       <m:mi>n</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>k</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>N</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>2</m:mn>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>k</m:mi>
                                    </m:mrow>
                                 </m:msubsup>
                                 <m:mo>&#8901;</m:mo>
                                 <m:msubsup>
                                    <m:mi>C</m:mi>
                                    <m:mrow>
                                       <m:mi>m</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>k</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>N</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>n</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                 </m:msubsup>
                              </m:mrow>
                              <m:mrow>
                                 <m:msubsup>
                                    <m:mi>C</m:mi>
                                    <m:mrow>
                                       <m:mi>n</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>N</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>2</m:mn>
                                    </m:mrow>
                                 </m:msubsup>
                                 <m:mo>&#8901;</m:mo>
                                 <m:msubsup>
                                    <m:mi>C</m:mi>
                                    <m:mrow>
                                       <m:mi>m</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>N</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>2</m:mn>
                                    </m:mrow>
                                 </m:msubsup>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGWbGaaiikaiaadUgacaGG8bGaeyOeI0Iaaiilaiaad6gacaGGSaGaamyBaiaacYcacaWGobGaaiykaiabg2da9KqbaoaalaaabaGaam4qamaaDaaabaGaam4Aaaqaaiaad6eacqGHsislcaaIYaaaaiabgwSixlaadoeadaqhaaqaaiaad6gacqGHsislcaWGRbGaeyOeI0IaaGymaaqaaiaad6eacqGHsislcaaIYaGaeyOeI0Iaam4AaaaacqGHflY1caWGdbWaa0baaeaacaWGTbGaeyOeI0Iaam4AaiabgkHiTiaaigdaaeaacaWGobGaeyOeI0IaamOBaiabgkHiTiaaigdaaaaabaGaam4qamaaDaaabaGaamOBaiabgkHiTiaaigdaaeaacaWGobGaeyOeI0IaaGOmaaaacqGHflY1caWGdbWaa0baaeaacaWGTbGaeyOeI0IaaGymaaqaaiaad6eacqGHsislcaaIYaaaaaaaaaa@6686@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The statistical significance is then calculated by:</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i10">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>k</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:msub>
                                       <m:mi>k</m:mi>
                                       <m:mn>0</m:mn>
                                    </m:msub>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>min</m:mi>
                                    <m:mo>&#8289;</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>n</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mi>m</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:munderover>
                              <m:mrow>
                                 <m:mi>p</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>k</m:mi>
                                 <m:mo>|</m:mo>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mo>,</m:mo>
                                 <m:mi>n</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>m</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>N</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGqbGaeyypa0ZaaabCaeaacaWGWbGaaiikaiaadUgacaGG8bGaeyOeI0Iaaiilaiaad6gacaGGSaGaamyBaiaacYcacaWGobGaaiykaaWcbaGaam4Aaiabg2da9iaadUgadaWgaaadbaGaaGimaaqabaaaleaaciGGTbGaaiyAaiaac6gacaGGOaGaamOBaiabgkHiTiaaigdacaGGSaGaamyBaiabgkHiTiaaigdacaGGPaaaniabggHiLdaaaa@4C3B@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where k<sub>0 </sub>is the observed number of common partners shared by two interacting proteins. An interaction with <it>P </it>value greater than 0.01 is considered to be a 'false positive' and is discarded. We remove the edge with the highest <it>P </it>value and recalculate the <it>P </it>value for affected edges. The process is repeated until no edge has a <it>P </it>value > 0.01. We found in analysis of yeast data, this filtering always improves the quality of discovered protein interaction modules.</p>
         </sec>
         <sec>
            <st>
               <p>Application to simulated yeast protein interaction networks</p>
            </st>
            <p>To compare the performance of our BCD algorithm, the GN algorithm, the ECC algorithm with the original edge clustering coefficient definition (ECC1), and the ECC algorithm with our commonality metric (ECC2), and the MCL algorithm <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, in which the inflation parameter was set to the optimal value 1.8 according to the study <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, we built a test graph on the basis of 198 complexes manually annotated in the MIPS database <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> in a way similar to that used in Brohee and van Helden's study <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Briefly, for each manually annotated MIPS complex, an edge was created between each pair of proteins within that complex. The resulting graph (referred to as test graph) contains 1,078 proteins and 9,919 interactions. To evaluate the robustness to false positives and false negatives, we derived 16 altered networks by randomly removing edges from or adding edges to the test graph in various proportions. We then assessed the quality of clustering results on each derived network by different algorithms with each annotated complex. As done in Brohee and van Helden's study <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, we computed a geometric accuracy value and a separation value to estimate the overall correspondence between a clustering result (a set of clusters) and the collection of annotated complexes, where both a high geometric accuracy value and a high separation value indicate good clustering (please see <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> for more details).</p>
            <p>Figure <figr fid="F3">3a</figr> displays the impact of edge addition on geometric accuracy and Figure <figr fid="F3">3b</figr> show the impact on separation. Clearly, the ECC2 algorithm with our new commonality metric greatly outperforms the ECC1 algorithm with the older edge clustering coefficient measure when the graph is altered with adding edges. In Figure <figr fid="F3">3c,d</figr>, increasing proportions (0%, 20% 40%, 60%, and 80%) of edges are randomly removed from the test graph with prior 100% edge addition. Figure <figr fid="F3">3e,f</figr> show the effect of edge addition on graphs from which 40% of the edges had previously been removed. All curves show similar trends and that BCD and MCL outperform the other three algorithms. The performance of our BCD algorithm is better than that of the MCL algorithm when the graph is more dramatically altered with both edge removal and addition (Figure <figr fid="F3">3c-f</figr>).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Robustness of the algorithms to random edge addition and removal</p>
               </caption>
               <text>
                  <p>Robustness of the algorithms to random edge addition and removal. Each curve represents the value of accuracy (left panels) or separation (right panels). <b>(a, b) </b>Edge addition to the test graph. <b>(c, d) </b>Edge removal from an altered graph with 100% of randomly added edges. <b>(e, f) </b>Edge addition to an altered graph with 40% of randomly removed edges. Color code: red, BCD; blue, GN; cyan, MCL; orange, ECC with the original edge clustering coefficient; green, ECC with our commonality index.</p>
               </text>
               <graphic file="gb-2007-8-12-r271-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Application to the yeast protein interaction network</p>
            </st>
            <p>We used the yeast protein interaction network from the BioGrid database (version 2.0.24) <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, from which we extracted 36,238 unique interactions among 5,273 yeast proteins. We applied the filtering process to the data and the resulting dataset retained 3,030 yeast proteins and 17,242 high-confidence interactions, which we call the filtered dataset. On both the original and filtered datasets, we tested five algorithms: our BCD algorithm, the GN algorithm, the ECC1 algorithm with its original edge clustering coefficient, the ECC2 algorithm with our commonality metric and the MCL algorithm whenever applicable.</p>
            <sec>
               <st>
                  <p>Results on a small yeast protein interaction network</p>
               </st>
               <p>Before diving into the entire complex network, we first decomposed a small yeast transcription network with 225 proteins and 1,792 interactions, where known protein interaction modules can be inferred from the annotations of well-studied proteins (Figure <figr fid="F2">2a</figr>). Figure <figr fid="F2">2b</figr> displays a hierarchical decomposition tree by the BCD algorithm (decomposition trees constructed by the other three algorithms are provided in Additional data file 1). Note that there is no decomposition tree for the MCL algorithm.</p>
               <p>The proposed definition of protein interaction module works well for both the GN and BCD algorithms because almost all proteins within the same computed protein module do indeed belong to the same known protein complex. Decomposition trees obtained using the ECC1 algorithm and the ECC2 algorithm with our commonality metric are shown in Additional data file 1. They produce irregularly large modules and an excess number of singletons. This suggests that the purely local metric used in the ECC algorithm is not effective. Additional data file 1 also shows good results for both the GN and BCD algorithms that combine global and local metrics. They clearly produce more consistent and robust results.</p>
               <p>The BCD algorithm revealed 21 functional modules (Figure <figr fid="F2">2</figr>); all proteins within known protein complexes are also located within the same module, suggesting that the BCD algorithm is superior at unveiling fine structure buried in complex protein interaction networks. The MCL algorithm predicts only 11 clusters from this small yeast transcription network. Several functional modules are grouped together: the three RNA dependent RNA polymerases (A, B, C) and the RNA polymerase II mediator complex are merged into one cluster; the NuA4 histone acetyltransferase complex, the SWR1 complex, and the INO80 chromatin remodeling complex are grouped into one cluster; the TFIIA complex, the Elongator complex, the SAGA histone acetyltransferase complex, and the TFIID complex are grouped into one cluster; and the COMPASS complex and the mRNA cleavage and polyadenylation specificity complex (CPF) are grouped into one cluster. Apparently, the MCL algorithm is inefficient in discovering boundaries between functionally related protein complexes and tends to group them together. The quality of modules obtained using the GN algorithm is not as good; members of four functional modules, transcription factor IIA (TFIIA) [TOA1, TOA2], TFIID [TAF2, TAF3, TAF4, TAF7, TAF8, TAF11, TAF13], nuclear pore-associated [SAC3, CDC31, THP1], and a new one [ABD1, SPT6] predicted by the BCD algorithm, are misplaced. The ECC algorithm has the same tendency to separate peripheral members of the same known protein complex into incorrect protein modules. For instance, in the transcription network, the ECC algorithm disjoins peripheral proteins such as FOB1, RPC10, RRP8 and RPL6B in a very early phase of the decomposition process, causing those derived singletons to be separated from most functional modules. Singletons do not provide useful information for inferring the function of any module. Therefore, the number of singletons generated by an algorithm is an additional indicator of that algorithm's performance: an excess number of singletons indicates poor performance of a particular algorithm. On this small network, the ECC algorithm produces 13 singletons, while the BCD and GN algorithms produce 9 and 3 singletons, respectively. While the difference between the ECC algorithm and the BCD algorithm is only four singletons, those ECC singletons lose their connections with other modules as they are isolated at a much earlier stage of the decomposition process. Although the GN algorithm produces the least number of singletons in the example network, it is at the expense of generating mosaic modules. Similar trends are seen in following experiments of large networks.</p>
               <p>We also note that the original ECC1 algorithm performs more poorly than the ECC2 algorithm with our commonality index (Additional data file 1). From now on, we will not discuss the original ECC1 algorithm. When we refer to the ECC algorithm, we mean the ECC algorithm using our commonality index.</p>
            </sec>
            <sec>
               <st>
                  <p>Results on the global yeast network</p>
               </st>
               <p>In this section, we discuss the results of BCD decomposition of a specific network (yeast), the quality of computed modules, and comparison to MIPS hand-curated protein complex data.</p>
               <p>We first studied the decomposition processes by the three algorithms as curves in Figure <figr fid="F4">4</figr>. Each curve displays the size of the current network on which an algorithm acts versus the number of productive cuts thus far. We consider the tendency of network fragmentation due to different algorithms, as measured by the number of productive cuts. Note that most module (complex) finding algorithms are typically applied on connected components of network. A productive cut is defined as a removal of an edge resulting in two separate subnetworks. On the original dataset, the BCD, GN and ECC algorithms require 674, 2,779, and 2,304 productive cuts to split the largest connected component of 5,257 nodes into smaller pieces, which means, on average, the algorithms separate 7.8, 1.9 and 2.3 nodes, respectively, from the largest connected component in each productive cut. On the filtered dataset, the respective algorithms require 80, 107 and 710 productive cuts to split the largest connected component of 2,924 nodes into smaller pieces, which means, on average, the algorithms separate 36.5, 27.3 and 4.1 nodes, respectively, from the largest connected component in each productive cut. The more productive cuts made, the more fragmented the network and the more singletons generated, as shown in Table <tblr tid="T1">1</tblr>. As stated earlier, a large number of singletons is an indicator of poor performance by a particular algorithm. For both datasets, the BCD algorithm produces the fewest singletons of the three partitioning-type algorithms. The size distributions of predicted protein complexes for each algorithm, including the MCL algorithm, on both datasets are shown in Figure <figr fid="F5">5</figr>. The pattern of predicted complexes generated by all three methods is similar to that of hand-curated MIPS complexes <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, suggesting that the proposed protein module definition is effective.</p>
               <fig id="F4">
                  <title>
                     <p>Figure 4</p>
                  </title>
                  <caption>
                     <p>Decomposition curves for the largest sub-networks of two datasets on <b>(a) </b>unfiltered data and <b>(b) </b>filtered data by the three algorithms</p>
                  </caption>
                  <text>
                     <p>Decomposition curves for the largest sub-networks of two datasets on <b>(a) </b>unfiltered data and <b>(b) </b>filtered data by the three algorithms. During the decomposition process, the larger connected component and the larger one of its derived sub-networks are always decomposed earlier. The y-axis shows the size of the sub-network under decomposition and the x-axis shows the number of productive cuts so far. A productive cut means the removal of an edge splitting one network into two disconnected parts.</p>
                  </text>
                  <graphic file="gb-2007-8-12-r271-4"/>
               </fig>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>Number of predicted complexes and singletons</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Unfiltered</p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Filtered</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Algorithm</p>
                        </c>
                        <c ca="center">
                           <p>Complex</p>
                        </c>
                        <c ca="center">
                           <p>Singleton</p>
                        </c>
                        <c ca="center">
                           <p>Complex</p>
                        </c>
                        <c ca="center">
                           <p>Singleton</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>BCD</p>
                        </c>
                        <c ca="center">
                           <p>850 (5.0)</p>
                        </c>
                        <c ca="center">
                           <p>991</p>
                        </c>
                        <c ca="center">
                           <p>391 (6.8)</p>
                        </c>
                        <c ca="center">
                           <p>361</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>GN</p>
                        </c>
                        <c ca="center">
                           <p>614 (4.6)</p>
                        </c>
                        <c ca="center">
                           <p>2,477</p>
                        </c>
                        <c ca="center">
                           <p>297 (8.9)</p>
                        </c>
                        <c ca="center">
                           <p>379</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>ECC</p>
                        </c>
                        <c ca="center">
                           <p>875 (3.5)</p>
                        </c>
                        <c ca="center">
                           <p>2,214</p>
                        </c>
                        <c ca="center">
                           <p>491 (4.1)</p>
                        </c>
                        <c ca="center">
                           <p>1,021</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MCL</p>
                        </c>
                        <c ca="center">
                           <p>703 (7.3)</p>
                        </c>
                        <c ca="center">
                           <p>168</p>
                        </c>
                        <c ca="center">
                           <p>232 (13.0)</p>
                        </c>
                        <c ca="center">
                           <p>3</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>The average size of complexes is shown in parentheses.</p>
                  </tblfn>
               </tbl>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Size distribution of predicted and MIPS protein complexes</p>
                  </caption>
                  <text>
                     <p>Size distribution of predicted and MIPS protein complexes.</p>
                  </text>
                  <graphic file="gb-2007-8-12-r271-5"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Modularity</p>
               </st>
               <p>As a measure of the quality of the protein modules computed, we use modularity (Q) <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, which is a measure of a community structure in a network, measuring the difference between the number of edges falling within groups and the expected number in an equivalent network with edges placed at random. Basically, the higher the modularity, the better the separation. The best clusters are given at the point when the modularity is maximal. Previous studies stopped the decomposition process when the modularity reached its peak value and treated all resulting clusters as communities <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B21">21</abbr></abbrgrp>. Applying the modularity criteria on protein interaction networks in this study, however, we found that protein modules obtained in this way tend to be dominated by several very large examples. Nonetheless, the maximal modularity is an objective measure, which is useful for comparing the performance of different algorithms. Table <tblr tid="T2">2</tblr> lists the maximal modularities obtained by three algorithms on three networks of different size. The BCD algorithm has the highest Q values for both the transcription network and the unfiltered global network and is very close to the highest Q value of the GN algorithm on the filtered data, suggesting that the BCD algorithm is best in terms of maximal modularity. In particular, on the noisy original data, the maximal modularity Q value by the BCD algorithm is significantly higher than the Q values by the other two algorithms, suggesting the tolerance of data noise by the BCD algorithm is much better than the other algorithms.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>Comparison of modularity coefficients for network decomposition on three networks of varying sizes</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="3" ca="center">
                           <p>Modularity <it>Q</it></p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Network</p>
                        </c>
                        <c ca="center">
                           <p>Size <it>n</it></p>
                        </c>
                        <c ca="center">
                           <p>BCD</p>
                        </c>
                        <c ca="center">
                           <p>GN</p>
                        </c>
                        <c ca="center">
                           <p>ECC</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Transcription network</p>
                        </c>
                        <c ca="center">
                           <p>225</p>
                        </c>
                        <c ca="center">
                           <p>0.692</p>
                        </c>
                        <c ca="center">
                           <p>0.690</p>
                        </c>
                        <c ca="center">
                           <p>0.637</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Filtered global data</p>
                        </c>
                        <c ca="center">
                           <p>3030</p>
                        </c>
                        <c ca="center">
                           <p>0.701</p>
                        </c>
                        <c ca="center">
                           <p>0.717</p>
                        </c>
                        <c ca="center">
                           <p>0.550</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Unfiltered global data</p>
                        </c>
                        <c ca="center">
                           <p>5273</p>
                        </c>
                        <c ca="center">
                           <p>0.423</p>
                        </c>
                        <c ca="center">
                           <p>0.340</p>
                        </c>
                        <c ca="center">
                           <p>0.284</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Overlap with MIPS complexes</p>
               </st>
               <p>We validated the biological significance of our predicted protein modules by comparing the hand-curated protein complexes in the MIPS <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> database with the predicted modules. For each predicted module, we found a best-matching MIPS complex using the method of Spirin and Mirny <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, which finds two complexes with the least probability of random overlap using the hypergeometric distribution:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i11">
                        <m:semantics>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>P</m:mi>
                                 <m:mrow>
                                    <m:mi>o</m:mi>
                                    <m:mi>v</m:mi>
                                    <m:mi>e</m:mi>
                                    <m:mi>r</m:mi>
                                    <m:mi>l</m:mi>
                                    <m:mi>a</m:mi>
                                    <m:mi>p</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:mtable>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mi>n</m:mi>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mi>k</m:mi>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:mtable>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mrow>
                                                      <m:mi>N</m:mi>
                                                      <m:mo>&#8722;</m:mo>
                                                      <m:mi>n</m:mi>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mrow>
                                                      <m:mi>m</m:mi>
                                                      <m:mo>&#8722;</m:mo>
                                                      <m:mi>k</m:mi>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:mtable>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mi>N</m:mi>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mi>m</m:mi>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mfrac>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGqbWaaSbaaSqaaiaad+gacaWG2bGaamyzaiaadkhacaWGSbGaamyyaiaadchaaeqaaOGaeyypa0tcfa4aaSaaaeaadaqadaqaauaabeqaceaaaeaacaWGUbaabaGaam4AaaaaaiaawIcacaGLPaaadaqadaqaauaabeqaceaaaeaacaWGobGaeyOeI0IaamOBaaqaaiaad2gacqGHsislcaWGRbaaaaGaayjkaiaawMcaaaqaamaabmaabaqbaeqabiqaaaqaaiaad6eaaeaacaWGTbaaaaGaayjkaiaawMcaaaaaaaa@477A@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>where N is the total number in the protein interaction network, n and m are the sizes of two complexes, and k is the number of common nodes. Table <tblr tid="T3">3</tblr> presents the overlap (the number of common proteins divided by the number of proteins in the best-matching MIPS complexes) between predicted and MIPS complexes. In terms of the absolute number of clusters that overlap 100% with MIPS complexes, the BCD is the best one on the unfiltered dataset, while the MCL algorithm is the best on the filtered dataset. In terms of the percentage of clusters that overlap 100% with MIPS complexes, the MCL algorithm always performs better than the other three. However, we found the size of predicted clusters might affect the number. The larger a cluster is, the more likely it contains all members of an overlapping MIPS complex. From both Table <tblr tid="T1">1</tblr> and Figure <figr fid="F5">5</figr>, the MCL algorithm produces a greater number of larger clusters than the other three algorithms, which was seen previously in the small yeast transcription network.</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>Comparison of predicted protein complexes with known MIPS complexes</p>
                  </caption>
                  <tblbdy cols="5">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>BCD</p>
                        </c>
                        <c ca="center">
                           <p>GN</p>
                        </c>
                        <c ca="center">
                           <p>ECC</p>
                        </c>
                        <c ca="center">
                           <p>MCL</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="5">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <b>Unfiltered</b>
                           </p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>100%*</p>
                        </c>
                        <c ca="center">
                           <p>59 (6.9<sup>&#8224;</sup>)</p>
                        </c>
                        <c ca="center">
                           <p>27 (4.4)</p>
                        </c>
                        <c ca="center">
                           <p>56 (6.4)</p>
                        </c>
                        <c ca="center">
                           <p>53 (7.5)</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>>50%</p>
                        </c>
                        <c ca="center">
                           <p>65 (7.6)</p>
                        </c>
                        <c ca="center">
                           <p>51 (8.3)</p>
                        </c>
                        <c ca="center">
                           <p>56 (6.4)</p>
                        </c>
                        <c ca="center">
                           <p>63 (9.0)</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>>0%</p>
                        </c>
                        <c ca="center">
                           <p>125 (14.7)</p>
                        </c>
                        <c ca="center">
                           <p>92 (15.0)</p>
                        </c>
                        <c ca="center">
                           <p>122 (13.9)</p>
                        </c>
                        <c ca="center">
                           <p>153 (21.8)</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>No overlap</p>
                        </c>
                        <c ca="center">
                           <p>601 (70.7)</p>
                        </c>
                        <c ca="center">
                           <p>444 (72.3)</p>
                        </c>
                        <c ca="center">
                           <p>641 (73.3)</p>
                        </c>
                        <c ca="center">
                           <p>434 (61.7)</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>Accuracy<sup>&#8225;</sup></p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                        <c ca="center">
                           <p>0.64</p>
                        </c>
                        <c ca="center">
                           <p>0.62</p>
                        </c>
                        <c ca="center">
                           <p>0.65</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>Separation<sup>&#8225;</sup></p>
                        </c>
                        <c ca="center">
                           <p>0.21</p>
                        </c>
                        <c ca="center">
                           <p>0.16</p>
                        </c>
                        <c ca="center">
                           <p>0.20</p>
                        </c>
                        <c ca="center">
                           <p>0.27</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <b>Filtered</b>
                           </p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>100%</p>
                        </c>
                        <c ca="center">
                           <p>53 (13.6)</p>
                        </c>
                        <c ca="center">
                           <p>45 (15.2)</p>
                        </c>
                        <c ca="center">
                           <p>50 (10.2)</p>
                        </c>
                        <c ca="center">
                           <p>67 (28.9)</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>>50%</p>
                        </c>
                        <c ca="center">
                           <p>46 (11.8)</p>
                        </c>
                        <c ca="center">
                           <p>38 (12.8)</p>
                        </c>
                        <c ca="center">
                           <p>49 (10.0)</p>
                        </c>
                        <c ca="center">
                           <p>24 (10.3)</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>>0%</p>
                        </c>
                        <c ca="center">
                           <p>83 (21.2)</p>
                        </c>
                        <c ca="center">
                           <p>66 (22.2)</p>
                        </c>
                        <c ca="center">
                           <p>120 (24.4)</p>
                        </c>
                        <c ca="center">
                           <p>50 (21.6)</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>No overlap</p>
                        </c>
                        <c ca="center">
                           <p>209 (53.5)</p>
                        </c>
                        <c ca="center">
                           <p>148 (49.8)</p>
                        </c>
                        <c ca="center">
                           <p>272 (55.4)</p>
                        </c>
                        <c ca="center">
                           <p>91 (39.2)</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>Accuracy</p>
                        </c>
                        <c ca="center">
                           <p>0.73</p>
                        </c>
                        <c ca="center">
                           <p>0.71</p>
                        </c>
                        <c ca="center">
                           <p>0.61</p>
                        </c>
                        <c ca="center">
                           <p>0.67</p>
                        </c>
                     </r>
                     <r>
                        <c indent="1" ca="left">
                           <p>Separation</p>
                        </c>
                        <c ca="center">
                           <p>0.29</p>
                        </c>
                        <c ca="center">
                           <p>0.28</p>
                        </c>
                        <c ca="center">
                           <p>0.26</p>
                        </c>
                        <c ca="center">
                           <p>0.38</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>*The overlap is defined as the percentage of proteins in the best-matching MIPS complexes in a predicted cluster. Complexes with only one protein are excluded in this analysis. <sup>&#8224;</sup>The percentage of total predicted protein complexes. <sup>&#8225;</sup>The geometric accuracy and separation according to [16].</p>
                  </tblfn>
               </tbl>
               <p>Therefore, to estimate the overall correspondence between a resulting cluster by one approach and the collection of annotated complexes, we computed the geometric accuracy and separation as done in the described study <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The results are shown in Table <tblr tid="T3">3</tblr>. Clearly, the BCD algorithm achieves better accuracy than the other three algorithms on both unfiltered and filtered datasets. In terms of separation, it is the MCL algorithm that performs best among the four algorithms on both datasets (Table <tblr tid="T3">3</tblr>).</p>
            </sec>
            <sec>
               <st>
                  <p>GO term enrichment</p>
               </st>
               <p>In addition to the MIPS protein complex dataset we also evaluated the biological significance of predicted protein modules by quantifying GO term co-occurrences using the SGD GO Term Finder <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. The GO Term Finder calculates a <it>P </it>value that reflects the probability of observing by chance the co-occurrence of proteins with a given GO annotation in a certain complex based on a binomial distribution. The lower the <it>P </it>value of a GO term, the more statistically significant a complex is enriched in the GO term. Table <tblr tid="T4">4</tblr> lists the percentage of predicted protein modules whose <it>P </it>value falls within <it>P </it>&lt; e-15, [e-15, e-10], [e-10, e-5] and [e-5, 1]. There are more BCD complexes in terms of absolute number with <it>P </it>value less than 1e-15 on both the unfiltered and filtered datasets.</p>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p>Predicted protein complexes of size &#8805;3 enriched in GO terms</p>
                  </caption>
                  <tblbdy cols="9">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4" ca="center">
                           <p>Unfiltered</p>
                        </c>
                        <c cspan="4" ca="center">
                           <p>Filtered</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>&lt;e-15</p>
                        </c>
                        <c ca="center">
                           <p>e-15 to e-10</p>
                        </c>
                        <c ca="center">
                           <p>e-10 to e-5</p>
                        </c>
                        <c ca="center">
                           <p>e-5 to 1</p>
                        </c>
                        <c ca="center">
                           <p>&lt;e-15</p>
                        </c>
                        <c ca="center">
                           <p>e-15 to e-10</p>
                        </c>
                        <c ca="center">
                           <p>e-10 to e-5</p>
                        </c>
                        <c ca="center">
                           <p>e-5 to 1</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="9">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>BCD</p>
                        </c>
                        <c ca="center">
                           <p>58 (10.4)</p>
                        </c>
                        <c ca="center">
                           <p>41 (7.4)</p>
                        </c>
                        <c ca="center">
                           <p>118 (21.2)</p>
                        </c>
                        <c ca="center">
                           <p>339 (61.0)</p>
                        </c>
                        <c ca="center">
                           <p>62 (21.1)</p>
                        </c>
                        <c ca="center">
                           <p>38 (13.0)</p>
                        </c>
                        <c ca="center">
                           <p>86 (29.3)</p>
                        </c>
                        <c ca="center">
                           <p>108 (36.7)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>GN</p>
                        </c>
                        <c ca="center">
                           <p>47 (24.1)</p>
                        </c>
                        <c ca="center">
                           <p>23 (11.8)</p>
                        </c>
                        <c ca="center">
                           <p>43 (22.1)</p>
                        </c>
                        <c ca="center">
                           <p>82 (42.1)</p>
                        </c>
                        <c ca="center">
                           <p>60 (24.4)</p>
                        </c>
                        <c ca="center">
                           <p>32 (13.0)</p>
                        </c>
                        <c ca="center">
                           <p>66 (26.8)</p>
                        </c>
                        <c ca="center">
                           <p>88 (35.8)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>ECC</p>
                        </c>
                        <c ca="center">
                           <p>47 (10.1)</p>
                        </c>
                        <c ca="center">
                           <p>48 (10.3)</p>
                        </c>
                        <c ca="center">
                           <p>120 (25.9)</p>
                        </c>
                        <c ca="center">
                           <p>249 (53.7)</p>
                        </c>
                        <c ca="center">
                           <p>45 (13.7)</p>
                        </c>
                        <c ca="center">
                           <p>55 (16.7)</p>
                        </c>
                        <c ca="center">
                           <p>114 (34.7)</p>
                        </c>
                        <c ca="center">
                           <p>115 (35.0)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MCL</p>
                        </c>
                        <c ca="center">
                           <p>55 (11.2)</p>
                        </c>
                        <c ca="center">
                           <p>31 (6.3)</p>
                        </c>
                        <c ca="center">
                           <p>96 (19.6)</p>
                        </c>
                        <c ca="center">
                           <p>309 (62.9)</p>
                        </c>
                        <c ca="center">
                           <p>55 (24.1)</p>
                        </c>
                        <c ca="center">
                           <p>33 (14.5)</p>
                        </c>
                        <c ca="center">
                           <p>62 (27.2)</p>
                        </c>
                        <c ca="center">
                           <p>78 (34.2)</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>The number in parentheses indicates the percentage of total complexes in that category.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Prediction of possible novel protein complexes</p>
               </st>
               <p>The number of predicted protein complexes is larger than the number of known protein complexes compiled in the MIPS complex dataset, and many predicted protein complexes do not overlap with MIPS complexes. Among these unmatched predicted protein complexes, some are likely to be true functional protein modules because the GO terms in these complexes are greatly enriched as indicated by low <it>P </it>values. Figure <figr fid="F6">6</figr> presents two such modules: a five-member module (<it>P </it>= 1.9e-12) of a spindle-assembly checkpoint complex that is crucial in the checkpoint mechanism required to prevent cell cycle progression into anaphase in the presence of spindle damage <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> (Figure <figr fid="F6">6a</figr>), and a thirteen-member module (<it>P </it>= 9.8e-17) including members from the Set3 histone deacetylase complex (Set3, Hos2, Snt1, Hos4, Hst1, Sif2) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, proteins involved in telomeric silencing (Zds1, Zds2 and Skg6) <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, proteins related to sporulation (Spr6 and Bem3) <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp> and two other proteins (YIL055C and Cpr1) (Figure <figr fid="F6">6b</figr>). A complete list of complexes and modules with functional annotation is provided in Additional data files 2 and 3.</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>Examples of modules where the GO terms are greatly enriched</p>
                  </caption>
                  <text>
                     <p>Examples of modules where the GO terms are greatly enriched. <b>(a) </b>A five-member module of the spindle-assembly checkpoint complex that is crucial in the checkpoint mechanism required to prevent cell cycle progression into anaphase in the presence of spindle damage. <b>(b) </b>A thirteen member module including members from the Set3 histone deacetylase complex (Set3, Hos2, Snt1, Hos4, Hst1, Sif2), proteins involved in telomere silencing (Zds1, Zds2 and Skg6), proteins related to sporulation (Spr6 and Bem3), and two other proteins (YIL055C and Cpr1).</p>
                  </text>
                  <graphic file="gb-2007-8-12-r271-6"/>
               </fig>
               <p>Table <tblr tid="T5">5</tblr> provides the number of predicted protein modules (4 algorithms, 2 datasets) where either the GO terms are greatly enriched (<it>P </it>&lt; 1e-15) or they overlap with MIPS complexes (overlap = 100%). Generally, the protein modules falling within the above two categories can be viewed as functional modules. The BCD algorithm outperforms the other three algorithms in terms of identifying more functional protein modules on the unfiltered dataset. The MCL algorithm predicts more functional protein modules than our BCD algorithm does on the filtered dataset. In addition, all four algorithms predict a substantial number of complexes that do not overlap with MIPS or in which GO term co-occurrences are insignificant. However, these are potentially novel functional complexes for biologists to explore further.</p>
               <tbl id="T5">
                  <title>
                     <p>Table 5</p>
                  </title>
                  <caption>
                     <p>Predicted protein modules where either GO terms are greatly enriched (<it>P </it>&lt; 1e-15) or all members of a best-matching MIPS complex are found (overlap = 100%)</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="left">
                           <p>Algorithm</p>
                        </c>
                        <c ca="center">
                           <p>Unfiltered (percentage)</p>
                        </c>
                        <c ca="center">
                           <p>Filtered (percentage)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>BCD</p>
                        </c>
                        <c ca="center">
                           <p>95 (11.2*)</p>
                        </c>
                        <c ca="center">
                           <p>90 (23.0)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>GN</p>
                        </c>
                        <c ca="center">
                           <p>58 (9.4)</p>
                        </c>
                        <c ca="center">
                           <p>80 (27.0)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>ECC</p>
                        </c>
                        <c ca="center">
                           <p>87 (9.9)</p>
                        </c>
                        <c ca="center">
                           <p>83 (16.9)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MCL</p>
                        </c>
                        <c ca="center">
                           <p>84 (11.9)</p>
                        </c>
                        <c ca="center">
                           <p>91 (39.2)</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>*The percentage of total predicted protein complexes.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>The effects of filtering false-positive interactions</p>
               </st>
               <p>In all experiments, the results on the filtered data are consistently better than the results on the original data. For example, in Table <tblr tid="T3">3</tblr>, the non-overlap between computed protein modules by the BCD algorithm and known protein complexes was reduced from 601 for the original data to 209 on the filtered data. In Table <tblr tid="T4">4</tblr>, the percentage of GO terms with probability &lt;e-10 is always higher in the filtered data than in the original data.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Protein interaction networks are examples of complex systems that are difficult to understand from raw experimental data alone. Methods to organize, filter, extract significant features and display these data are critical to understanding these systems. A number of network partition algorithms have been proposed to find modular structures in protein interaction networks <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. Our work is a further development along the network decomposition approach <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Our main contribution is to combine the global metric with a local metric in the decomposition procedure. We also resolved several critical technical issues. We propose a new commonality metric based on random graph analysis, a clear definition of protein modules utilizing the decomposition tree structure, and a noise filtering algorithm based on random graph analysis. These advances in methodology result in an effective, consistent, and robust algorithm, as demonstrated on both simulated datasets and the experimental yeast interaction data. The protein modules obtained have clear biological functions, as shown in Table <tblr tid="T5">5</tblr>. Our approach to recover protein interaction modules is fully self-contained, that is, it does not need other input or parameters to identify protein module boundaries. Our test experiments on yeast show that this method can effectively predict protein interaction modules from a complex interaction network. We plan to further automate this algorithm to compute protein interaction modules for a large number of organisms.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Computing geometric accuracy and separation</p>
            </st>
            <p>We computed the geometric accuracy and separation by following the approach described in the study by Brohee and van Helden <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Briefly, each clustering result was compared with the annotated complexes by building a contingency table T, where row i corresponds to the i<sup>th </sup>annotated complex and column j to the j<sup>th </sup>cluster and the value of a cell T<sub>ij </sub>indicates the number of proteins found in common between complex i and cluster j. The contingency table has n rows (complexes) and m columns (clusters).</p>
            <sec>
               <st>
                  <p>Accuracy</p>
               </st>
               <p>First, we define complex-wise sensitivity <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i12"><m:semantics><m:mrow><m:mi>S</m:mi><m:msub><m:mi>n</m:mi><m:mrow><m:mi>c</m:mi><m:msub><m:mi>o</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaam4uaiaad6gadaWgaaWcbaGaam4yaiaad+gadaWgaaadbaGaamyAaaqabaaaleqaaaaa@349A@</m:annotation></m:semantics></m:math></inline-formula> as the maximal fraction of protein of complex i that could be found in one cluster by the formula:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i13">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>S</m:mi>
                              <m:msub>
                                 <m:mi>n</m:mi>
                                 <m:mrow>
                                    <m:mi>c</m:mi>
                                    <m:msub>
                                       <m:mi>o</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:msubsup>
                                 <m:mrow>
                                    <m:mi>max</m:mi>
                                    <m:mo>&#8289;</m:mo>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>m</m:mi>
                              </m:msubsup>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>T</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                       <m:mo>/</m:mo>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>N</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGtbGaamOBamaaBaaaleaacaWGJbGaam4BamaaBaaameaacaWGPbaabeaaaSqabaGccqGH9aqpciGGTbGaaiyyaiaacIhadaqhaaWcbaGaamOAaiabg2da9iaaigdaaeaacaWGTbaaaOWaaeWaaeaadaWcgaqaaiaadsfadaWgaaWcbaGaamyAaiaadQgaaeqaaaGcbaGaamOtamaaBaaaleaacaWGPbaabeaaaaaakiaawIcacaGLPaaaaaa@437B@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>where N<sub>i </sub>is the number of proteins belonging to complex i. To characterize the general sensitivity of a clustering result, we compute a clustering-wise sensitivity as the weighted average of <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i12"><m:semantics><m:mrow><m:mi>S</m:mi><m:msub><m:mi>n</m:mi><m:mrow><m:mi>c</m:mi><m:msub><m:mi>o</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaam4uaiaad6gadaWgaaWcbaGaam4yaiaad+gadaWgaaadbaGaamyAaaqabaaaleqaaaaa@349A@</m:annotation></m:semantics></m:math></inline-formula> over all complexes by the formula:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i14">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>S</m:mi>
                              <m:mi>n</m:mi>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msubsup>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>n</m:mi>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>N</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mi>S</m:mi>
                                          <m:msub>
                                             <m:mi>n</m:mi>
                                             <m:mrow>
                                                <m:mi>c</m:mi>
                                                <m:msub>
                                                   <m:mi>o</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msubsup>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>n</m:mi>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>N</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mfrac>
                              <m:mo>.</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGtbGaamOBaiabg2da9KqbaoaalaaabaWaaabmaeaacaWGobWaaSbaaeaacaWGPbaabeaacaWGtbGaamOBamaaBaaabaGaam4yaiaad+gadaWgaaqaaiaadMgaaeqaaaqabaaabaGaamyAaiabg2da9iaaigdaaeaacaWGUbaacqGHris5aaqaamaaqadabaGaamOtamaaBaaabaGaamyAaaqabaaabaGaamyAaiabg2da9iaaigdaaeaacaWGUbaacqGHris5aaaakiaac6caaaa@4821@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>Second, we calculate a cluster-wise positive predictive value <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i15"><m:semantics><m:mrow><m:mi>P</m:mi><m:mi>P</m:mi><m:msub><m:mi>V</m:mi><m:mrow><m:mi>c</m:mi><m:msub><m:mi>l</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaamiuaiaadcfacaWGwbWaaSbaaSqaaiaadogacaWGSbWaaSbaaWqaaiaadQgaaeqaaaWcbeaaaaa@3552@</m:annotation></m:semantics></m:math></inline-formula> as the maximal fraction of proteins of cluster j found in the best-matching complex by the formula:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i16">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>P</m:mi>
                              <m:mi>P</m:mi>
                              <m:msub>
                                 <m:mi>V</m:mi>
                                 <m:mrow>
                                    <m:mi>c</m:mi>
                                    <m:msub>
                                       <m:mi>l</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:msubsup>
                                 <m:mrow>
                                    <m:mi>max</m:mi>
                                    <m:mo>&#8289;</m:mo>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>n</m:mi>
                              </m:msubsup>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mfrac bevelled="true">
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>T</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>T</m:mi>
                                             <m:mrow>
                                                <m:mo>.</m:mo>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mfrac>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGqbGaamiuaiaadAfadaWgaaWcbaGaam4yaiaadYgadaWgaaadbaGaamOAaaqabaaaleqaaOGaeyypa0JaciyBaiaacggacaGG4bWaa0baaSqaaiaadMgacqGH9aqpcaaIXaaabaGaamOBaaaakmaabmaajuaGbaWaaSGaaeaacaWGubWaaSbaaeaacaWGPbGaamOAaaqabaaabaGaamivamaaBaaabaGaaiOlaiaadQgaaeqaaaaaaOGaayjkaiaawMcaaaaa@4556@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>where <it>T</it><sub><it>j </it></sub>is the marginal sum of a column j by:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i17">
                        <m:semantics>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>T</m:mi>
                                 <m:mrow>
                                    <m:mo>.</m:mo>
                                    <m:mi>j</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>n</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>T</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGubWaaSbaaSqaaiaac6cacaWGQbaabeaakiabg2da9maaqahabaGaamivamaaBaaaleaacaWGPbGaamOAaaqabaaabaGaamyAaiabg2da9iaaigdaaeaacaWGUbaaniabggHiLdaaaa@3CB4@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>To characterize the general PPV (positive predictive value) of a clustering result as a whole, we compute a clustering-wise PPV as the weighted average of <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i15"><m:semantics><m:mrow><m:mi>P</m:mi><m:mi>P</m:mi><m:msub><m:mi>V</m:mi><m:mrow><m:mi>c</m:mi><m:msub><m:mi>l</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8viVeY=Nipec8Eeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaamiuaiaadcfacaWGwbWaaSbaaSqaaiaadogacaWGSbWaaSbaaWqaaiaadQgaaeqaaaWcbeaaaaa@3552@</m:annotation></m:semantics></m:math></inline-formula> over all clusters by:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i18">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>P</m:mi>
                              <m:mi>P</m:mi>
                              <m:mi>V</m:mi>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msubsup>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>j</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>m</m:mi>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>T</m:mi>
                                             <m:mrow>
                                                <m:mo>.</m:mo>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mi>P</m:mi>
                                          <m:mi>P</m:mi>
                                          <m:msub>
                                             <m:mi>V</m:mi>
                                             <m:mrow>
                                                <m:mi>c</m:mi>
                                                <m:msub>
                                                   <m:mi>l</m:mi>
                                                   <m:mi>j</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msubsup>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>j</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>m</m:mi>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>T</m:mi>
                                             <m:mrow>
                                                <m:mo>.</m:mo>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mfrac>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGqbGaamiuaiaadAfacqGH9aqpjuaGdaWcaaqaamaaqadabaGaamivamaaBaaabaGaaiOlaiaadQgaaeqaaiaadcfacaWGqbGaamOvamaaBaaabaGaam4yaiaadYgadaWgaaqaaiaadQgaaeqaaaqabaaabaGaamOAaiabg2da9iaaigdaaeaacaWGTbaacqGHris5aaqaamaaqadabaGaamivamaaBaaabaGaaiOlaiaadQgaaeqaaaqaaiaadQgacqGH9aqpcaaIXaaabaGaamyBaaGaeyyeIuoaaaaaaa@4A49@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>The geometric accuracy (Acc) indicates the tradeoff between sensitivity and predictive value. It is obtained by computing the geometric mean of the Sn and the PPV by:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i19">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>A</m:mi>
                              <m:mi>c</m:mi>
                              <m:mi>c</m:mi>
                              <m:mo>=</m:mo>
                              <m:msqrt>
                                 <m:mrow>
                                    <m:mi>S</m:mi>
                                    <m:mi>n</m:mi>
                                    <m:mo>&#8901;</m:mo>
                                    <m:mi>P</m:mi>
                                    <m:mi>P</m:mi>
                                    <m:mi>V</m:mi>
                                 </m:mrow>
                              </m:msqrt>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGbbGaam4yaiaadogacqGH9aqpdaGcaaqaaiaadofacaWGUbGaeyyXICTaamiuaiaadcfacaWGwbaaleqaaaaa@3A94@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
            </sec>
            <sec>
               <st>
                  <p>Separation</p>
               </st>
               <p>From the contingency table, we derive relative frequencies with respect to the marginal sums, either per row:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i20">
                        <m:semantics>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>F</m:mi>
                                 <m:mrow>
                                    <m:mi>r</m:mi>
                                    <m:mi>o</m:mi>
                                    <m:msub>
                                       <m:mi>w</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>T</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msubsup>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>j</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>m</m:mi>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>T</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mfrac>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGgbWaaSbaaSqaaiaadkhacaWGVbGaam4DamaaBaaameaacaWGPbGaamOAaaqabaaaleqaaOGaeyypa0tcfa4aaSaaaeaacaWGubWaaSbaaeaacaWGPbGaamOAaaqabaaabaWaaabmaeaacaWGubWaaSbaaeaacaWGPbGaamOAaaqabaaabaGaamOAaiabg2da9iaaigdaaeaacaWGTbaacqGHris5aaaaaaa@431E@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>or per column:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i21">
                        <m:semantics>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>F</m:mi>
                                 <m:mrow>
                                    <m:mi>c</m:mi>
                                    <m:mi>o</m:mi>
                                    <m:msub>
                                       <m:mi>l</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>T</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msubsup>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>n</m:mi>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>T</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mfrac>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGgbWaaSbaaSqaaiaadogacaWGVbGaamiBamaaBaaameaacaWGPbGaamOAaaqabaaaleqaaOGaeyypa0tcfa4aaSaaaeaacaWGubWaaSbaaeaacaWGPbGaamOAaaqabaaabaWaaabmaeaacaWGubWaaSbaaeaacaWGPbGaamOAaaqabaaabaGaamyAaiabg2da9iaaigdaaeaacaWGUbaacqGHris5aaaaaaa@4304@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p>We then define the separation as the product of column-wise and row-wise frequencies by:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2007-8-12-r271-i22">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>S</m:mi>
                              <m:mi>e</m:mi>
                              <m:msub>
                                 <m:mi>p</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:msub>
                                 <m:mi>F</m:mi>
                                 <m:mrow>
                                    <m:mi>c</m:mi>
                                    <m:mi>o</m:mi>
                                    <m:msub>
                                       <m:mi>l</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>&#8901;</m:mo>
                              <m:msub>
                                 <m:mi>F</m:mi>
                                 <m:mrow>
                                    <m:mi>r</m:mi>
                                    <m:mi>o</m:mi>
                                    <m:msub>
                                       <m:mi>w</m:mi>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>j</m:mi>
                                       </m:mrow>
                                    </m:msub>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGtbGaamyzaiaadchadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaeyypa0JaamOramaaBaaaleaacaWGJbGaam4BaiaadYgadaWgaaadbaGaamyAaiaadQgaaeqaaaWcbeaakiabgwSixlaadAeadaWgaaWcbaGaamOCaiaad+gacaWG3bWaaSbaaWqaaiaadMgacaWGQbaabeaaaSqabaaaaa@4433@</m:annotation>
  