<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2009-10-9-r97</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Gene networks in <it>Drosophila melanogaster</it>: integrating experimental data to predict gene function</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Costello</snm>
               <mi>C</mi>
               <fnm>James</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jccostel@indiana.edu</email>
            </au>
            <au id="A2">
               <snm>Dalkilic</snm>
               <mi>M</mi>
               <fnm>Mehmet</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>dalkilic@indiana.edu</email>
            </au>
            <au id="A3">
               <snm>Beason</snm>
               <mi>M</mi>
               <fnm>Scott</fnm>
               <insr iid="I1"/>
               <email>smbeason@indiana.edu</email>
            </au>
            <au id="A4">
               <snm>Gehlhausen</snm>
               <mi>R</mi>
               <fnm>Jeff</fnm>
               <insr iid="I1"/>
               <email>jrgehlha@indiana.edu</email>
            </au>
            <au id="A5">
               <snm>Patwardhan</snm>
               <fnm>Rupali</fnm>
               <insr iid="I3"/>
               <insr iid="I4"/>
               <email>rpatward@u.washington.edu</email>
            </au>
            <au id="A6">
               <snm>Middha</snm>
               <fnm>Sumit</fnm>
               <insr iid="I3"/>
               <insr iid="I5"/>
               <email>middha.sumit@mayo.edu</email>
            </au>
            <au id="A7">
               <snm>Eads</snm>
               <mi>D</mi>
               <fnm>Brian</fnm>
               <insr iid="I2"/>
               <email>bdeads@indiana.edu</email>
            </au>
            <au ca="yes" id="A8">
               <snm>Andrews</snm>
               <mi>R</mi>
               <fnm>Justen</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jandrew@bio.indiana.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>School of Informatics, Indiana University, E. Tenth St, Bloomington, Indiana 47408, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biology, Indiana University, E. Third St, Bloomington, Indiana 47405, USA</p>
            </ins>
            <ins id="I3">
               <p>Center for Genomics and Bioinformatics, Indiana University, E. Third St., Bloomington, Indiana 47405, USA</p>
            </ins>
            <ins id="I4">
               <p>Current address: Department of Genome Sciences, University of Washington, NE Pacific St, Seattle, Washington 98195-5065, USA</p>
            </ins>
            <ins id="I5">
               <p>Current address: Bioinformatics Core, Mayo Clinic, First St SW, Rochester, Minnesota 55905, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>9</issue>
         <fpage>R97</fpage>
         <url>http://genomebiology.com/2009/10/9/R97</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2009-10-9-r97</pubid>
               <pubid idtype="pmpid">19758432</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>11</day>
               <month>6</month>
               <year>2009</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>17</day>
               <month>8</month>
               <year>2009</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>16</day>
               <month>9</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>9</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Andrews et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p><it>Drosophila</it> genetic interaction network</p>
      </shorttitle>
      <shortabs>
         <p>The first computational interaction network built from <it>Drosophila melanogaster</it> protein-protein and genetic interaction data allows the functional annotation of orphan genes and reveals clusters of functionally-related genes.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Discovering the functions of all genes is a central goal of contemporary biomedical research. Despite considerable effort, we are still far from achieving this goal in any metazoan organism. Collectively, the growing body of high-throughput functional genomics data provides evidence of gene function, but remains difficult to interpret.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We constructed the first network of functional relationships for <it>Drosophila melanogaster </it>by integrating most of the available, comprehensive sets of genetic interaction, protein-protein interaction, and microarray expression data. The complete integrated network covers 85% of the currently known genes, which we refined to a high confidence network that includes 20,000 functional relationships among 5,021 genes. An analysis of the network revealed a remarkable concordance with prior knowledge. Using the network, we were able to infer a set of high-confidence Gene Ontology biological process annotations on 483 of the roughly 5,000 previously unannotated genes. We also show that this approach is a means of inferring annotations on a class of genes that cannot be annotated based solely on sequence similarity. Lastly, we demonstrate the utility of the network through reanalyzing gene expression data to both discover clusters of coregulated genes and compile a list of candidate genes related to specific biological processes.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Here we present the the first genome-wide functional gene network in <it>D. melanogaster</it>. The network enables the exploration, mining, and reanalysis of experimental data, as well as the interpretation of new data. The inferred annotations provide testable hypotheses of previously uncharacterized genes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010009">Genetics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Understanding how a metazoan organism functions requires knowledge of the biochemical, cellular, and overall phenotypic effects of all genes. Despite considerable effort, direct experimental evidence supporting the participation of genes in biological process(es) exists for only a modest proportion of the full complement of metazoan genes (as reflected by Gene Ontology (GO) annotations <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>; see Materials and methods section for details). For instance, of the nearly 29 <it>K </it>(<it>K </it>= 1,000) genes in mouse, there is experimental evidence supporting the functional annotation of less than half, or approximately 12 <it>K </it>genes. Similarly, for <it>Caenorhabditis elegans</it>, experimental evidence exists for about a third (approximately 7.5 <it>K</it>) of its approximately 20 <it>K </it>genes. Even the most experimentally amenable and well-characterized eukaryotic organism, <it>Saccharomyces cerevisiae</it>, though not a metazoan, still has over 1 <it>K </it>of its 6 <it>K </it>genes lacking functional annotation <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>Both new and improving synthetic and analytic genome-scale technologies can help us determine the biological process(es) of unannotated genes, as well as provide new insight into annotated genes. Some of these approaches include yeast-two-hybrid (Y2H) screens to detect physically interacting proteins, expression profiling to detect transcript coexpression, modifier screens to identify genetic interactions, RNA interference screens to measure the genetic effects of gene knockdowns, genome tiling path arrays and next-gen sequencing to discover transcribed genomic elements, and ChIP-Chip and ChIP-seq to identify protein-DNA interactions. While these assays have the advantage of being high-throughput, distinguishing the biologically relevant relationships from noise within a single experiment is not a straightforward task. This, together with their sheer volume, makes interpretation challenging.</p>
         <p>Methods to derive functional annotation from the available corpuses of data have been developed <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp> and those that focus on data integration are among the more successful <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. Integrating different types of genomics data has been shown to reveal relationships between genes not distinguishable within single datasets <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. In the context of genomics data, the overarching theme of an integrative model is to distill the available data down to a value indicative of a gene pair being functionally related. These methods, pioneered by Troyanskaya <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, Jansen <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, and Lee <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, were heavily based on Bayesian networks to bring together weighted gene-gene relationships across heterogeneous datasets. Here, and inspired from this previous work, a functional relationship between genes represents the likelihood that two genes are involved in the same biological process. Integrative models have been successfully used to construct molecular networks (that is, transcriptional regulation and metabolic) <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, predict genetic interactions in yeast <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, predict phenotypic effects in worm <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, provide new gene candidates in human disease <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>, and make novel predictions of gene function <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B12">12</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. The number of organisms with well-annotated genomes and sufficient experimental data to build integrated networks is limited. Thus, networks constructed from genome-wide data have been restricted to: bacteria <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B25">25</abbr></abbrgrp>, <it>S. cerevisiae </it><abbrgrp><abbr bid="B5">5</abbr><abbr bid="B12">12</abbr><abbr bid="B26">26</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>, <it>C. elegans </it><abbrgrp><abbr bid="B16">16</abbr><abbr bid="B30">30</abbr></abbrgrp>, mouse <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>, and human <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B27">27</abbr></abbrgrp>. <it>Drosophila </it>is among the most well-annotated organisms, and the amount of experimental and computational data for it is on par with worm, yeast, and mouse <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. Although there exist repositories for flies that provide sophisticated query capability, namely FlyBase <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and FlyMine <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, as well as ongoing attempts at mining disparate sources of fly data <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>, an integrated system that can be interrogated <it>ad hoc </it>to easily deal with large sets of <it>Drosophila </it>genes has not been available until now.</p>
         <p>As one of the preeminent model organisms, <it>Drosophila </it>has been the object of study for more than a century <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. This research has not only increased our understanding of the organism itself <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>, but more importantly increased our knowledge of molecular mechanisms in biology in its broadest sense, particularly in the fields of genetics, development, evolution, and molecular biology. <it>Drosophila </it>has the richest set of sequenced genomes for a metazoan genus <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp> and, along with <it>C. elegans </it>and human, will have the most comprehensive inventory of metazoan genomic elements stemming from the modENCODE <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> and ENCODE projects <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. Despite these resources, there exist many genes for which biological process(es) are unknown. At the time of this study (v5.3 of the <it>D. melanogaster </it>genome <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>) there is direct experimental evidence supporting the biological process GO annotations (hereafter referred to as GO:BP) for less than half (approximately 42%) of the more than 15 <it>K </it>protein-coding genes (counted from curator reviewed GO evidence codes). These annotations are mostly based on genetic evidence, (that is, mutant phenotypes, genetic interactions, and RNA interference knockdown phenotypes). In addition to experimental evidence, roughly 26% of the genes have GO:BP terms that are inferred from electronic annotation methods (inferred from electronic annotation (IEA) GO evidence code). Considering all the available methods to determine in which biological process(es) a gene participates, we underscore the fact that nearly one-third of <it>Drosophila </it>protein-coding genes (> 4.6 <it>K</it>) remain unannotated.</p>
         <p>In this study, we bring together experimental data to build the first integrated functional gene networks in <it>Drosophila</it>. We focus specifically on building functional relationships between pairs of genes that are likely to participate in the same biological process and are supported by experimental evidence. We adapt the approach developed by Marcotte and colleagues <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B16">16</abbr><abbr bid="B28">28</abbr></abbrgrp> to integrate three experimental classes of data, in particular, genetic interactions, protein-protein interactions, and microarray gene expression. We demonstrate that the integrated networks perform well at recapitulating known functional relationships and outperform networks built exclusively from individual types of data (that is, just microarray data). We then utilize the functional relationships in the network to predict GO:BP annotations for unannotated genes using the Markov random field (MRF) method <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> and demonstrate that this approach performs well at predicting annotations through tenfold cross-validation. We use this method to infer high confidence GO:BP terms for 483 uncharacterized genes, and evaluate these predictions with respect to the available independent evidence. Finally, we use the constructed network to reanalyze gene expression data related to nutritional deprivation. We show that the network can be used to discover clusters of functionally related genes amongst genes that were identified to be differentially expressed.</p>
         <p>All data are made available through supplemental material <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Types of data and datasets</p>
            </st>
            <p>This study includes three classes of data: genetic interactions (GIs); protein-protein interactions (PPIs); and microarray (MA) expression data. All reported GIs were downloaded from FlyBase <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and each GI was weighted equally. PPIs were extracted from the following databases: BIND <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, DIP <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>, DroID <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, BioGRID <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>, and IntAct <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. The union of the PPIs across these databases was taken and separated based on the assay type, namely direct assay (that is, co-immunoprecipitation, biochemical assay), high-confidence Y2H (high-confidence as defined by Giot <it>et al</it>. <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>), and positive Y2H. A total of 18 published MA experiments were used (see Figure S1 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>). These 18 experiments can be divided into individual subcomponents, often reflecting several timecourse studies done under the umbrella of one published experiment. Thus, these 18 experiments were broken into 34 individual datasets. The 34 datasets were evaluated using log-likelihood scores (<it>LLS</it>) and several other filters detailed in the 'Calculating the likelihood that gene pairs participate in a common biological process' and Materials and methods sections. From these results, we determined that 20 of the 34 datasets provided <it>LLS</it>s meeting our evaluation criteria; therefore, only these 20 MA datasets were included in the construction of the integrated networks. In total, 24 datasets were used in this study, including all GIs, three classes of PPIs, and 20 MA datasets (see Table S4 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> for the number of conditions per MA dataset). The datasets are summarized in Table <tblr tid="T1">1</tblr>, and further details of the acquisition and processing of these datasets are provided in the Materials and methods section.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Datasets</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Source</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Dataset</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Pass filter?</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Genes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Relationships</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Genetic interactions</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>FlyBase</p>
                     </c>
                     <c ca="left">
                        <p>All reported GIs</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>2,878</p>
                     </c>
                     <c ca="center">
                        <p>6,941</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Protein-protein interactions</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>BIND, DIP, IntAct</p>
                     </c>
                     <c ca="left">
                        <p>Direct assay</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>935</p>
                     </c>
                     <c ca="center">
                        <p>1,234</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>DroID, BioGRID</p>
                     </c>
                     <c ca="left">
                        <p>High-confidence Y2H</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>4,543</p>
                     </c>
                     <c ca="center">
                        <p>4,590</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Positive Y2H</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>6,183</p>
                     </c>
                     <c ca="center">
                        <p>19,584</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Microarray</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Hooper <it>et al</it>. <abbrgrp><abbr bid="B100">100</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>10,460</p>
                     </c>
                     <c ca="center">
                        <p>3,289,275</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Chintapalli <it>et al</it>. <abbrgrp><abbr bid="B102">102</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>10,054</p>
                     </c>
                     <c ca="center">
                        <p>3,618,216</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Parisi <it>et al</it>. <abbrgrp><abbr bid="B92">92</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>9,922</p>
                     </c>
                     <c ca="center">
                        <p>5,656,854</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Edwards <it>et al</it>. <abbrgrp><abbr bid="B99">99</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>Line1</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>8,403</p>
                     </c>
                     <c ca="center">
                        <p>8,072,394</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Line2</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>8,296</p>
                     </c>
                     <c ca="center">
                        <p>8,118,665</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Altenhein <it>et al</it>. <abbrgrp><abbr bid="B98">98</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>8,341</p>
                     </c>
                     <c ca="center">
                        <p>1,030,457</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Gof</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Lof</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Hild <it>et al</it>. <abbrgrp><abbr bid="B97">97</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>8,214</p>
                     </c>
                     <c ca="center">
                        <p>677,746</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Qin <it>et al</it>. <abbrgrp><abbr bid="B94">94</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>6,734</p>
                     </c>
                     <c ca="center">
                        <p>4,187,496</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Tomancak <it>et al</it>. <abbrgrp><abbr bid="B103">103</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>6,288</p>
                     </c>
                     <c ca="center">
                        <p>2,626,310</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Magalhaes <it>et al</it>. <abbrgrp><abbr bid="B59">59</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>5,718</p>
                     </c>
                     <c ca="center">
                        <p>1,102,629</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>De Gregorio <it>et al</it>. <abbrgrp><abbr bid="B57">57</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>5,698</p>
                     </c>
                     <c ca="center">
                        <p>1,561,265</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Bacteria</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>4,920</p>
                     </c>
                     <c ca="center">
                        <p>237,361</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Fungus</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Spaetzle</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Relish</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p><it>Spaetzle </it>&amp;<it>relish</it></p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Sandmann <it>et al</it>. <abbrgrp><abbr bid="B101">101</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>5,474</p>
                     </c>
                     <c ca="center">
                        <p>1,238,924</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Arbeitman <it>et al</it>. <abbrgrp><abbr bid="B61">61</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>4,354</p>
                     </c>
                     <c ca="center">
                        <p>1,769,479</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Embryo</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>4,126</p>
                     </c>
                     <c ca="center">
                        <p>1,271,286</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Larva</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Pupal</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Adult male</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Adult female</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Sorensen <it>et al</it>. <abbrgrp><abbr bid="B96">96</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>Heat</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>4,219</p>
                     </c>
                     <c ca="center">
                        <p>690,181</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>No heat</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>4,083</p>
                     </c>
                     <c ca="center">
                        <p>701,546</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Beckstead <it>et al</it>. <abbrgrp><abbr bid="B95">95</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>Third instar</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>4,015</p>
                     </c>
                     <c ca="center">
                        <p>1,000,994</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Estrada <it>et al</it>. <abbrgrp><abbr bid="B93">93</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>2,978</p>
                     </c>
                     <c ca="center">
                        <p>657,929</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Wertheim <it>et al</it>. <abbrgrp><abbr bid="B58">58</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>Yes</p>
                     </c>
                     <c ca="center">
                        <p>2,280</p>
                     </c>
                     <c ca="center">
                        <p>551,684</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Beckstead <it>et al</it>. <abbrgrp><abbr bid="B95">95</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Ecr</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Li <it>et al</it>. <abbrgrp><abbr bid="B91">91</abbr></abbrgrp></p>
                     </c>
                     <c ca="left">
                        <p>All conditions</p>
                     </c>
                     <c ca="center">
                        <p>No</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>List of all datasets used in this study. The unit of data which we call a dataset is contained in the 'dataset' column. The filtering criteria apply to the microarray data as described in the Materials and methods section. The number of unique genes and functional relationships that a dataset contributes to the integrated network are listed. A '0' indicates that the dataset was not used for integration. There are two examples, Edwards <it>et al</it>. <abbrgrp><abbr bid="B99">99</abbr></abbrgrp> (all conditions) and Sorensen <it>et al</it>. <abbrgrp><abbr bid="B96">96</abbr></abbrgrp> (all conditions), where the dataset passed the filter but was not used in the integration. This is because all components in these experiments passed the filter criteria, but to remove redundant data, the subcomponent datasets were taken in favor of the dataset defined over the full set of conditions.</p>
               </tblfn>
            </tbl>
            <p>We restricted our use of the GO to the category of biological process (GO:BP). Unless specified, we also required any GO:BP annotations to be examined by a human curator as described on the GO website <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>; therefore, the GO evidence codes of IEA, ND (No biological data available), and NR (Not recorded) were removed. Please refer to the Materials and methods section for details on how annotations are handled given the structure of the GO.</p>
         </sec>
         <sec>
            <st>
               <p>Shared biological processes across datasets</p>
            </st>
            <p>Understanding the degree of overlap in biological processes amongst the datasets is integral in determining how the information contained in each dataset should be integrated. We explored this overlap by measuring how well a dataset connects genes involved in the same annotated GO:BP. The GI and PPI datasets are each a compendium of all reported interactions, many from largely unbiased screens, that is, Y2H and modifier screens; therefore, we would expect these datasets to provide links between genes across a diverse range of biological processes. On the other hand, individual MA datasets measure gene expression across distinct biological conditions such as time, space, genotype, or stress/treatment. Therefore, we would expect that within each MA dataset, genes with correlated expression profiles will reflect the biological processes that are affected under the experimental conditions. For instance, we expect that genes involved in immune response will show expression changes upon infection with bacteria or fungus, as studied in De Gregorio <it>et al</it>. <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. In order to evaluate the datasets, we first counted the number of gene pairs that were co-annotated with the same GO:BP term. This count was done for each dataset where gene pairs were measured as: statistically significant Pearson correlation coefficients for MAs; all GIs; or all PPIs. The results of performing this test for all GO:BP terms across the GI, PPI (direct assay, high-confidence Y2H, and Y2H are combined in this case), and 20 MA datasets are shown in Figure <figr fid="F1">1</figr> (see Additional data file 1 for the data used to create Figure <figr fid="F1">1</figr>).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Significant GO:BP terms across datasets.</p>
               </caption>
               <text>
                  <p>Significant GO:BP terms across datasets. Visualization of how well a dataset connects genes annotated with the same GO:BP term. The dataset names are listed on the left (see Table 1 for citations) and GO:BP terms are listed across the top. All datasets shown are used in the weighted sum (<it>WS</it>) integration. From black to red represents the least significant to the most significant GO:BP terms within a dataset as measured through statistically significant coherence (see the Materials and methods section). Both GO:BP terms and datasets were hierarchically clustered and visualized using TM4 MEV <abbrgrp><abbr bid="B112">112</abbr></abbrgrp>. The colored blocks on the top of the figure highlight similar GO:BP terms selected to show different patterns of significance across the datasets. Marked in brown are oxidative metabolism GO:BP terms, which are significant in most MA datasets but absent from the genetic interaction and protein interaction datasets. Marked in green are cell cycle GO:BP terms, which are well represented across most datasets. Marked in yellow are development and neurogenesis GO:BP terms, which are overrepresented in the Magalhaes <it>et al</it>. <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> dataset (a microarray experiment on axon guidance). Marked in purple are immune response related GO:BP terms, which are well represented in the DeGregorio <it>et al</it>. <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> and Wertheim <it>et al</it>. <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> datasets, both of which tested gene expression of immune response.</p>
               </text>
               <graphic file="gb-2009-10-9-r97-1"/>
            </fig>
            <p>A large number of statistically significant GO:BP terms were revealed across all the datasets, with some terms being nearly ubiquitously significant. In other words, genes annotated with a particular GO:BP term were much more highly connected than expected at random for almost all datasets. The example of cell cycle related GO:BP terms is marked in green in Figure <figr fid="F1">1</figr>. This is a specific example where functional connections between cell cycle-related genes can be strengthened by looking across multiple datasets. Additionally, there are processes that are only found in MA datasets and not in GIs or PPIs; for example, processes involved in oxidative metabolism, namely electron transport and oxidative phosphorylation (Figure <figr fid="F1">1</figr>, marked in brown). Conversely, we also see GO:BP terms that are uniquely significant to a particular dataset. For instance, De Gregorio <it>et al</it>. <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> and Wertheim <it>et al</it>. <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> performed MA experiments to explore the gene expression responses of flies upon infection with bacteria and fungus, and parasitoid wasps, respectively, and we see that these two datasets are highly significant for immune response GO:BP annotations (Figure <figr fid="F1">1</figr>, marked in purple), while the other MA datasets are largely not well-represented in this class of GO:BP terms. Similarly, Magalhaes <it>et al</it>. <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> sampled gene expression related to axon guidance and we see that this dataset is highly significant for developmental biological processes, particularly neurogenesis (Figure <figr fid="F1">1</figr>, marked in yellow). Overall, the GIs and PPIs have the greatest proportion of significant GO:BP terms, while MA datasets vary in the number and kind of GO:BP terms that are statistically significant. Finally, while some GO:BP tend to be common to several of the MA datasets, it is clear that none of the MA datasets provide fully redundant information. This is to be expected given the wide range of biological conditions surveyed in the experiments, and indicates that the data are not strongly biased towards a limited range of biological processes.</p>
            <p>These results show that no individual dataset fully represents all biological processes and we see that the datasets both complement and supplement each other, suggesting that integration can be used to more accurately group genes that share biological processes.</p>
         </sec>
         <sec>
            <st>
               <p>Calculating the likelihood that gene pairs participate in a common biological process</p>
            </st>
            <p>While the GI, PPI, and MA data each provide evidence for gene pair involvement in a common biological process, each type of data has a different measure. GIs and PPIs are reported as Boolean, while the correlations between gene expression profiles in MA experiments are continuous (Pearson correlation coefficient [-1, 1]). We utilized the <it>LLS </it>approach, developed by Lee <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B28">28</abbr></abbrgrp>, to convert the gene pair measures from each dataset to a common scale. The <it>LLS </it>(Equation 2) reflects how well the relationships in a given dataset agree with GO:BP annotations (see Materials and methods section for details). This approach achieves two important objectives. First, since we are calculating the <it>LLS </it>with respect to GO:BP annotation, this score reflects the likelihood that any two genes connected within a dataset share a common biological process. Second, because the <it>LLS</it>s for all the classes of data are calculated with respect to the same benchmark set of GO:BP terms, each dataset can now be directly compared.</p>
            <p><it>LLS</it>s were calculated for all 24 datasets. We treated all reported GIs as Boolean and then calculated a single <it>LLS </it>of 2.661 for the entire dataset. Although the PPI data are reported as Boolean interactions, assay types differ in reliability <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. We expect direct assay (that is, co-immunoprecipitation, biochemical assay) to be the most reliable, followed by high-confidence Y2H (as defined in Giot <it>et al</it>. <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>), then Y2H; therefore, we calculated separate <it>LLS</it>s for each class. Our expectations were borne out with a <it>LLS </it>of 2.389 for direct assay, 1.045 for high-confidence Y2H, and 0.630 for Y2H. As mentioned, the similarity measures for MA data are continuous correlation coefficients. We expect that gene pairs with the most similar expression profiles will have the highest likelihood of sharing a biological process, and the gene pairs with the least similar expression profiles (coefficient of 0) will have the lowest likelihood of sharing a biological process. Therefore, for each MA dataset, we rank ordered the gene pairs with statistically significant correlation coefficients, divided the ranked list into sequential bins of one thousand, then calculated the <it>LLS </it>for each bin. As expected, most MA datasets showed a trend towards increasing <it>LLS </it>as correlation values increased. An example can be seen in Figure <figr fid="F2">2</figr>, which reflects this calculation for the Arbeitman <it>et al</it>. <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> fly life-cycle timecourse (see Figure S1 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> for all additional plots). Interestingly, the most positively correlated and statistically significant gene pairs, in the interval [0.3,1], show a trend of increasing <it>LLS </it>with increasing correlation, while the most negatively correlated and statistically significant gene pairs, in the interval [-1,-0.3] (absolute value in Figure <figr fid="F1">1</figr>), show a trend of flat to decreasing <it>LLS </it>with more inversely correlated gene pairs. This trend was observed for all the MA datasets. Given the poor performance reflected by the <it>LLS</it>s, we removed negatively correlated gene expression profiles from the integration process and only considered positively correlated MA gene pairs. For each of the <it>LLS </it>versus positive correlation plots, a polynomial regression was calculated to model the overall trend (blue curve in Figure <figr fid="F2">2</figr>). All pairwise correlation values were then assigned a <it>LLS </it>computed from the regressed curve. <it>LLS</it>s across all microarray datasets range from 0.1 to 2.3. The <it>LLS</it>s calculated for GI, PPI, and MA data indicate that each of these types of data provide evidence for GO:BP annotation shared between gene pairs. We therefore aimed to utilize the <it>LLS</it>s with the expectation that, by integrating across all data, we should observe stronger evidence of shared biological processes between two genes than can be detected in individual types of data.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Log-likelihood score calculated for a microarray dataset</p>
               </caption>
               <text>
                  <p>Log-likelihood score calculated for a microarray dataset. The log-likelihood score (<it>LLS</it>) compared to the significant correlation coefficients for the Arbeitman <it>et al</it>. <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> microarray dataset. Statistically significant correlation coefficients are rank ordered and separated into bins of 1,000 gene pairs. For example, the right-most black dot represents the top 1,000 ranked gene pairs by correlation coefficient. The black dots are positively correlated gene pairs, while the red circles are the absolute value of the negatively correlated expression profiles. The blue line is the polynomial model fit to the data and used to transform all correlation coefficients to <it>LLS</it>s.</p>
               </text>
               <graphic file="gb-2009-10-9-r97-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Integrating the data to construct functional gene networks</p>
            </st>
            <p>Our analysis of the overlap between datasets indicated that, for most biological processes, multiple datasets provided supporting information, but no single dataset provides the preponderance of information. (see the 'Shared biological processes across datasets' section and Figure <figr fid="F1">1</figr>). Based on this observation, we expected that the weighted sum (<it>WS</it>) approach, which has been shown to be effective in integrating data in yeast <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B28">28</abbr></abbrgrp>, worm <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, and mouse <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, would be equally as effective an approach to integrating fly data. In order to test this, we constructed integrated functional networks using the <it>WS </it>method developed by Lee <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B28">28</abbr></abbrgrp>. The <it>WS </it>approach mathematically integrates (through weighting) the <it>LLS</it>s for gene pairs across the multiple datasets into one measure reflecting our confidence that a gene pair is functionally related.</p>
            <p>The <it>WS </it>calculation was performed by first rank ordering the <it>LLS</it>s for a gene pair, then summing the scores (Equation 4). Included in the <it>WS </it>calculation is the parameter, <it>M</it>, that down-weights subsequently ranked <it>LLS</it>s for a gene pair, where <it>M </it>&#949; 1. Increasing the value of <it>M </it>results in greater emphasis being placed on the datasets that provide the greatest likelihood that the members of a gene pair are functionally related. We evaluated the performance of networks constructed with a range of values for the <it>M </it>parameter (from 1 to approaching infinity (<it>M </it>&#8594; &#8734;), where <it>M </it>&#8594; &#8734; effectively only considers the greatest <it>LLS </it>for a gene pair). We also tested the na&#239;ve approach of summing across all <it>LLS</it>s. By varying the values of <it>M</it>, we assessed the network's performance on tasks described in more detail below to search for an optimal <it>M </it>value.</p>
            <p>We additionally evaluated the performance of integrated networks with varying network sizes (number of edges in a network). We were interested in the networks' ability to recapitulate known functional relationships between genes reported in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways database <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. We selected the KEGG pathway database for this evaluation since, despite being biased towards biochemical pathways and not entirely independent of GO annotations, it is nevertheless the most appropriate, large, and high-confidence set of annotated functional relationships available for <it>Drosophila</it>. Networks were constructed by rank ordering the <it>WS </it>scores for all gene pairs and then progressively lowering the threshold on the <it>WS </it>score to add edges to the network. Figure <figr fid="F3">3</figr> shows the performance of the <it>WS </it>integration related to network sizes as measured through KEGG pathways coherence, a measure of how tightly a set of genes are connected in a network (see Materials and methods section for details). The dots in Figure <figr fid="F3">3</figr> represent the average coherence values measured over network size intervals, while the solid lines represent the average coherence values minus the coherence of random sets of genes. The solid lines thus represent the true gain in coherence with increasing network size that is not due to noise. Two important trends are evident. First, the networks constructed with 1 &lt;<it>M </it>&lt;&lt; &#8734; are more effective at constructing coherent networks than the na&#239;ve approach or where <it>M </it>&#8594; &#8734;. Further evaluation revealed an optimized <it>M </it>parameter of <it>M </it>= 1.8. Second, Figure <figr fid="F3">3</figr> shows two points at the network sizes of 20 <it>K </it>and 200 <it>K </it>edges where the slope of the lines flatten. These points reflect the two network sizes that show the greatest KEGG pathway coherence related to network size. We have therefore focused further analysis on 20 <it>K </it>and 200 <it>K </it>networks constructed using <it>M </it>= 1.8. We designate these in the form <inline-formula><graphic file="gb-2009-10-9-r97-i1.gif"/></inline-formula> to account for both the value of <it>M </it>(Equation 4) and the size of network (where the <it>net size </it>is in thousands of edges). Both <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> are supplied at <abbrgrp><abbr bid="B63">63</abbr><abbr bid="B64">64</abbr></abbrgrp>. Also, the full set of integrated data with over 25 million gene pairs and their associated <it>WS </it>scores covering approximately 85% of the protein-coding genes in v5.3 of the <it>D. melanogaster </it>genome are supplied at <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Average KEGG pathway coherence for integration evaluation</p>
               </caption>
               <text>
                  <p>Average KEGG pathway coherence for integration evaluation. The average coherence of 25 KEGG pathways over different weighted sum (<it>WS</it>) integrations at increasing network sizes (number of edges). The dots represent the actual measured values averaged over 25 KEGG pathways, while the lines represent the difference between the actual measured values and random coherence at an equivalent network size. The coherence is measured over networks of increasing size up to one million gene pairs. The grey dashed lines mark the network sizes of 20 <it>K </it>and 200 <it>K</it>, which are the points where the slope (gain in coherence) flattens.</p>
               </text>
               <graphic file="gb-2009-10-9-r97-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Validation of integration</p>
            </st>
            <p>Although data can be integrated, the derived relationships must be vetted. Validation of the integrated gene network data was done in two ways. First, we evaluated how well the integrated network recovered relationships in individual KEGG pathways. Second, we compared the integrated network to networks built from different, individual datasets to test whether integrating the data results in improved performance.</p>
            <p>All KEGG pathways containing at least 10 <it>D. melanogaster </it>genes were tested against <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>. In total, 63 KEGG pathways were tested. Of these, 59 are statistically significant at a corrected <it>P</it>-value &lt; 10<sup>-20 </sup>as quantitatively assessed using permutation testing and single sample Wilcoxon signed-rank test (see Table S5 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> for more details). The number of coherent KEGG pathways and the degree of statistical significance of these pathways provide evidence that the derived functional relationships are biologically meaningful.</p>
            <p>We next tested whether the network constructed using integrated data outperforms networks constructed from separate classes of data and individual datasets. We compared the fully integrated gene network to a network built from integrated MA data while ignoring GIs and PPIs, a network built from exclusively GIs and PPIs while ignoring MA data, and a network of only PPIs. We also examined the relative contribution of individual MA datasets. Across the range of network sizes examined (1 <it>K </it>to 1,000 <it>K</it>; the GI and PPI network and the PPI network have maximum sizes of 32,240 and 25,408, respectively) the average coherence measure (across 63 KEGG pathways) of the fully integrated network was greater than that for the networks based on any subset(s) of data (Figure <figr fid="F4">4a</figr>). This is evident at a network size of 20 <it>K </it>where the fully integrated network (GI, PPI, MA) performed the best (area under the curve (AUC) = 0.1020), followed by the GI and PPI network (AUC = 0.0777), and then a step down to the MA only network (AUC = 0.0396). The KEGG pathway coherence for the networks built using the various datasets and summarized as the AUC at network sizes 20 <it>K </it>and 200 <it>K </it>is provided in Table S6 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. We also see that networks built using the integrated framework outperformed networks based on the individual component datasets. For instance, the integrated MA network performed better (AUC = 0.0396 at 20 <it>K</it>) than all networks based on individual MA datasets (maximum AUC = 0.0314 at 20 <it>K</it>), and much better than the average individual microarray dataset network (AUC = 0.018 at 20 <it>K</it>). In summary, these data indicate that the integrated network performs best in terms of recapitulating known functional relationships across the range of KEGG pathways tested.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Coherence of types of data and datasets on individual KEGG pathways</p>
               </caption>
               <text>
                  <p>Coherence of types of data and datasets on individual KEGG pathways. Examples of how types of data and individual datasets compare to the fully integrated network as measured through coherence of KEGG pathways <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. The average coherence of a given dataset is calculated for a set of genes defined by a KEGG pathway at increasing network sizes up to one million edges. <b>(a) </b>The average coherence over 63 tested KEGG pathways. The full integration of genetic interactions, protein interactions, and microarray data performs best compared to all other data sources and individual datasets. <b>(b) </b>A specific example where the fully integrated network performs better than all other individual datasets and in relation to the 'purine metabolism' KEGG pathways. <b>(c) </b>Ribosomal constituents are highly coherent in the microarray data, with many individual microarray datasets performing well. In this instance, not taking into account the genetic interactions and protein interactions performs better than the fully integrated network. <b>(d) </b>An example of where the genetic interactions and protein interactions contribute nearly all of the coherent relationships for the 'Hedgehog signaling' KEGG pathway. <b>(e) </b>An example of where the integration method performs worse than several individual microarray datasets for the 'phenylpropanoid biosynthesis' KEGG pathway. See Table 1 for citations for the datasets.</p>
               </text>
               <graphic file="gb-2009-10-9-r97-4"/>
            </fig>
            <p>We also examined the performance of networks based on the coherence of the various combinations of data with respect to the 63 individual KEGG pathways examined. Given that the fully integrated network performed best when measured against all 63 pathways, we would expect this to be the case for many individual pathways; this was, indeed, the case. For example, the 'purine metabolism' KEGG pathway shows that most of the individual datasets contribute to the coherence and the fully integrated network performs best (Figure <figr fid="F4">4b</figr>). However, it is also clear that the performance of the different datasets varies across different KEGG pathways. For instance, the coherence among genes in the 'Hedgehog signaling' KEGG pathway is based largely on GI and PPI data (Figure <figr fid="F4">4c</figr>), whereas the MA data contribute most of the coherence among genes in the KEGG category 'ribosome' (Figure <figr fid="F4">4d</figr>). There were also cases where networks based on individual datasets outperformed the fully integrated network. This is the case for the 'phenylpropanoid biosynthesis' KEGG pathway, where several individual MA datasets provide greater coherence than the fully integrated network (Figure <figr fid="F4">4e</figr>). While these examples serve to illustrate the ways in which the datasets vary in their performance across specific biological processes, the observed patterns do not fall simply into distinct classes. Plots of all 63 KEGG pathways can be found in Figure S2 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> and are summarized in Table S6 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. While the fully integrated network performs best across a wide range of biological processes, the contribution of individual datasets varies across biological processes and there are processes that may be better studied with a subset of data.</p>
         </sec>
         <sec>
            <st>
               <p>General network properties</p>
            </st>
            <p><inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> contains 5,021 unique genes and <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> contains 9,528 unique genes. It should be noted that these networks include any genetic element defined as a 'gene' in FlyBase <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, and consequently includes some elements that have yet to be mapped to the genome (for example, modifier mutations). The inclusion of these elements does not adversely affect the construction of the network; however, it should be kept in mind that while some may represent new genes, many are likely to be alleles of existing genes. Roughly 25% of the genes in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and 13% of the genes in <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> are of this nature. These genes contribute to 9% of the edges in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and 1.2% of the edges in <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>. The underlying data used to draw an edge in the networks can be any combination of the three types of data (MA, PPI, and GI). In other words, an edge in the network can be based on MA data, MA and GIs, just PPIs, and so on. The composition of the functional relationships between genes can be seen in Figure <figr fid="F5">5</figr>, where the colors in the pie charts correspond to the edge colors in Figure <figr fid="F6">6</figr>, an image of <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> visualized in Cytoscape <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. Overall, in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula>, 34.8% of the edges are supported exclusively or partially by GI data, 6.8% are supported exclusively or partially by PPI data, and 82.2% are supported exclusively or partially by MA data. Thus, while the GI and PPI data constitute a very low proportion of the available genomics data, a much greater proportion of these data was used in constructing this network. Specifically, for <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula>, 100% of the GI data were used, 5% of the PPI data were used, and 0.004% of the possible edges from MA data were used. As many of the gene pairs used to construct <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> are supported by PPIs and GIs, these data are also in <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>; therefore, the edges gained from increasing the size of the network from <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> to <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> are from MA data. This can be seen where <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> has 60.8% of the edges derived solely from MA data and as the network increases to <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>, the number of edges drawn exclusively from MA data increases to 95.8%.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Composition of edges in the integrated networks</p>
               </caption>
               <text>
                  <p>Composition of edges in the integrated networks. Relative contribution of the different types of data to the integrated network of <b>(a) </b><inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <b>(b) </b><inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>. The teal color represents edges that are drawn solely on microarray data. Dark blue represents edges drawn from genetic interactions only and green from protein interactions only. Orange represents edges drawn from both protein interactions and microarray data. Edges drawn from both genetic interactions and microarray data are in red. Purple represents edges supported by both genetic interactions and protein interactions. Lastly, the light blue represents edges supported by genetic interactions, protein interactions, and microarray data. The colors correspond to the edges in Figure 6.</p>
               </text>
               <graphic file="gb-2009-10-9-r97-5"/>
            </fig>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p><inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> integrated network</p>
               </caption>
               <text>
                  <p><inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> integrated network. Screenshot of <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> visualized in Cytoscape <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. The edge colors correspond to Figure 5, where, for example, the teal edges are built from only microarray data and the red edges are built from genetic interaction and microarray data.</p>
               </text>
               <graphic file="gb-2009-10-9-r97-6"/>
            </fig>
            <p>Since the relationships between genes in the integrated network reflect the likelihood that two genes participate in a biological process, we expect that genes involved in the same biological process will cluster together. Manual inspection of <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> reveals both many connections between gene pairs and gene clusters that are consistent with prior knowledge. In order to examine the most prominent examples, we scored and ranked highly interconnected subnetworks within <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> using Cytoscape <abbrgrp><abbr bid="B66">66</abbr></abbrgrp> and the graph clustering algorithm and visualization tool MCODE <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>. Manual inspection of these subnetworks revealed that the annotated genes within them are largely annotated with common, or closely related, GO:BP terms. (Cytoscape <abbrgrp><abbr bid="B66">66</abbr></abbrgrp> formatted session files, including MCODE clusters, are provided at <abbrgrp><abbr bid="B63">63</abbr><abbr bid="B64">64</abbr></abbrgrp>. We have also utilized Java Web Start to make the Cytoscape sessions directly accessible through an internet browser <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.) As an illustration, a subnetwork enriched for genes encoding nuclear ribosomal proteins includes a total of 68 genes, of which 64 encode ribosomal proteins, one encodes a translation initiation factor (<it>Eukaryotic initiation factor 4A </it>[FlyBase:FBgn0001942]), and two encode translation elongation factors (<it>Elongation factor 1&#946; </it>[FlyBase:FBgn0028737], and <it>Elongation factor 2b </it>[FlyBase:FBgn0000559]). A striking feature of the most highly interconnected subnetworks is that they are largely enriched for genes that participate in basic cellular processes such as ribosome biogenesis, the ribosome, proteolysis, mitochondrial electron transport, intracellular protein transport, and cell division, which is consistent with the tight clusters in integrated gene networks in yeast <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B26">26</abbr><abbr bid="B28">28</abbr></abbrgrp>, worm <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, mouse <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B68">68</abbr></abbrgrp>, and human <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Since the functional relationships in the network are based mostly on MA data, this suggests that ubiquitously expressed genes - often referred to as 'housekeeping' genes - are, in fact, coordinately and tightly regulated with distinct expression patterns reflecting their respective biological processes. In addition to expected connections, the network also includes many previously unknown (or previously unnoticed) functional connections, including novel connections between previously studied genes, connections between unannotated and annotated genes, and connections between unannotated genes. For instance, the gene <it>Receptor of activated protein kinase C 1 </it>(<it>Rack1 </it>[FlyBase:FBgn0020618) is present in the ribosomal proteins cluster already mentioned. Of the 68 genes in this cluster, <it>Rack1 </it>is the only gene not annotated with GO:BP terms related to translation. Neither the molecular function ('protein kinase C binding' [GO:0005080]) nor the mutant phenotype (larval lethal and defective oogenesis in germline clones) suggest an involvement in ribosome function <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>, but the functional relationships in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> suggest a role in the ribosome. This inference is strongly supported by the findings that, in yeast and mammals, highly conserved orthologous proteins are physically associated with the ribosome <abbrgrp><abbr bid="B70">70</abbr><abbr bid="B71">71</abbr><abbr bid="B72">72</abbr></abbrgrp>. The preceding examples serve to illustrate that the network can be used to identify functional relationships between groups of interconnected genes as well as the immediate neighbors of any given gene. This in turn provides a means of analyzing new genome-wide datasets with respect to gene function and to infer the annotation of previously unannotated genes. In the following sections we utilize the integrated functional gene network to infer the GO:BP annotations of previously unannotated genes, and explore the use of the network in reanalyzing a genome-wide dataset.</p>
         </sec>
         <sec>
            <st>
               <p>Inferring biological process gene annotations</p>
            </st>
            <p>Both the <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> networks contain a mixture of annotated and unannotated genes. Specifically, there are 2,544 annotated and 2,477 unannotated genes within <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula>, and 3,691 annotated and 5,837 unannotated genes within <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>. A total of 2,673 unique GO:BP terms are associated with the 2,544 annotated genes in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula>, and 2,998 unique GO:BP terms are associated with the 3,691 annotated genes in <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>. Taken together, the functional relationships within the network and the gene-GO:BP annotations provide a means to make <it>de novo </it>GO:BP predictions on un- and under-annotated genes. A recent assessment of gene function prediction methods using heterogeneous data sources (a competition among seven groups) demonstrated that reasonably accurate predictions can be made for a metazoan <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. However, this study also showed that predicting GO:BP terms is more difficult than predicting GO cellular component or molecular function terms - with an average of 21% precision at 20% recall for biological process terms, an average of 32% precision at 20% recall for cellular component terms, and an average of 42% precision at 20% recall for molecular function terms <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. This assessment provides a useful benchmark for gene function prediction in <it>Drosophila</it>. Based on the functional gene network derived from heterogeneous fly data, we explored whether we could make reasonable GO:BP predictions for un- and under-annotated genes.</p>
            <p>We calculated the probabilities of gene-GO:BP associations based on the MRF method as described by Letovsky and Kasif <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> (see Materials and methods section). Three key aspects of the network topology and gene-GO:BP term associations are considered: the frequency of a GO:BP term with respect to the tested network; how often genes with the same GO:BP annotation(s) are connected; and the immediate neighbors of the gene whose function is being predicted. Taken in concert, the probability for a gene being annotated with a GO:BP term was calculated using Equation 5. Prediction evaluation was done through tenfold cross-validation. All <it>D. melanogaster </it>genes with known GO:BP annotations were divided randomly into ten equally sized groups and GO:BP terms were held-out from one of the ten groups of genes. The <it>LLS</it>s were recalculated from scratch using the annotations from the other nine groups. An integrated network was constructed under the <it>WS </it>framework (<it>M </it>= 1.8) and GO:BP terms were predicted using the MRF method. This procedure was repeated ten times. In the following two sections we use this evaluation to address two questions. First, can we establish a threshold for the prediction posterior probability, denoted <it>t</it><sub><it>p</it></sub>, that provides reasonable <it>de novo </it>predictions? Second, do the predictions from the integrated network outperform predictions made from networks built from individual types of data?</p>
            <sec>
               <st>
                  <p>Determining prediction thresholds</p>
               </st>
               <p>We first explored the performance of the MRF GO:BP predictions at various thresholds of <it>t</it><sub><it>p</it></sub>. In order to do this, we calculated the precision and recall of the predicted gene-GO:BP annotations with respect to the held-out gene-GO:BP annotations. It has been observed that measurements of performance on predicted GO terms tend to be quite conservative <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. This stems from the fact that gene annotation is far from being complete, and the extent to which genes are under-annotated, with as yet undiscovered pleiotropic functions, is not known. This under-annotation will lead to an underestimate of true positives and likely an overestimate of false positives, which will result in a lower measure of precision. Nevertheless, while these performance measures need to be interpreted in light of the fact that they are inherently conservative, they do provide a useful relative measure of performance. Here, a predicted gene-GO:BP annotation was called a true positive if the predicted term matched the held-out term, or the parent or child of the held-out term as defined in the GO. A predicted gene-GO:BP annotation was called a false positive if the predicted term did not match a held-out term on that gene, or a parent or child term. Lastly, a false negative was called for all held-out gene-GO:BP associations where we did not predict a term. In addition to measuring precision and recall in relation to all the held out gene-GO:BP annotations, we also measured the precision and recall with respect to the genes with held-out annotations. In this case we called a gene prediction a true positive if at least one predicted annotation for the gene is a true positive gene-GO:BP prediction. A false positive gene prediction was called if predictions were made for the gene but none were correct. Lastly, a false negative gene prediction was called if the gene had held-out GO:BP terms but we did not make a prediction for the gene.</p>
               <p>Figure <figr fid="F7">7</figr> shows precision (Figure <figr fid="F7">7a</figr>) and recall (Figure <figr fid="F7">7b</figr>) as a function of <it>t</it><sub><it>p</it></sub>. These plots show the general trend that increasing <it>t</it><sub><it>p </it></sub>increases the precision and decreases the recall of gene-GO:BP predictions. In contrast, precision related to gene predictions stays relatively flat over increasing <it>t</it><sub><it>p</it></sub>. This indicates that, for the predictions made for a gene, at least one has a high likelihood of being true regardless of <it>t</it><sub><it>p</it></sub>, but the likelihood that any individual GO:BP prediction is true increases with increasing <it>t</it><sub><it>p</it></sub>. We report a precision for gene-GO:BP predictions of 23% at 20% recall. This is comparable to the average of 21% precision at 20% measured over seven different groups predicting GO:BP annotations for mouse <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. While it should be noted that there are slight differences in both the input data and the way precision and recall were measured, this comparison serves to illustrate that precision of our predictions is similar to that achieved for another metazoan.</p>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p>Precision/recall of GO:BP predictions</p>
                  </caption>
                  <text>
                     <p>Precision/recall of GO:BP predictions. Precision and recall plots evaluating GO:BP predictions on unannotated <it>D. melanogaster </it>genes using the MRF method. The black color reflects predictions made from a network size of 20 <it>K </it>and the red color reflects predictions made from a network size of 200 <it>K</it>. For the tenfold cross-validation, <b>(a) </b>precision and <b>(b) </b>recall are shown in relation to the prediction probability (<it>t</it><sub><it>p</it></sub>). Both precision and recall were measured in relation to all GO:BP predictions and also in relation to the gene (see Materials and methods section for distinction).</p>
                  </text>
                  <graphic file="gb-2009-10-9-r97-7"/>
               </fig>
               <p>After establishing the precision and recall for predictions with the integrated networks, we address the first question of establishing a threshold on <it>t</it><sub><it>p </it></sub>that produces reliable predictions. In order to quantify the similarity between the held-out and predicted annotations in the tenfold cross-validation, we used a measure of semantic similarity (<it>SS</it>) and calibrated this measure against a benchmark dataset. In the context of this study, <it>SS </it>provides a quantification of the degree of similarity between two sets of GO:BP terms taking into consideration the structure of GO. The measure of <it>SS </it>was calculated using the program G-SESAME (Gene Semantic Similarity Analysis and Measurement Tool) developed by Wang <it>et al</it>. <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. The scale ranges from [0,1], where 0 indicates that two sets of GO:BP terms are unrelated, and 1 indicates two sets are the same. As an example, Figure <figr fid="F8">8a</figr> illustrates the overlap of two sets of terms within the structure of the GO where <it>SS </it>= 0.45. In order to calibrate this scale with respect to a known benchmark, we examined the distribution of <it>SS </it>scores between all pairs of genes with reported GIs (Figure <figr fid="F8">8b</figr>). Since GIs are reliable indicators that two genes function in a common biological process - both experimentally and also shown through the <it>LLS </it>- this provided a useful reference set. The median <it>SS </it>of gene pairs with reported GIs is 0.45, which we adopted as a reasonable cut-off for our analysis. We then used G-SESAME to measure the <it>SS </it>between known GO:BP annotations compared to the predicted GO:BP terms. This was performed for the tenfold cross-validation of both network sizes, 20 <it>K </it>and 200 <it>K</it>, where <it>M </it>= 1.8 over <it>t</it><sub><it>p </it></sub>&#8712; [0, 1].</p>
               <fig id="F8">
                  <title>
                     <p>Figure 8</p>
                  </title>
                  <caption>
                     <p>Semantic similarity and GO:BP predictions</p>
                  </caption>
                  <text>
                     <p>Semantic similarity and GO:BP predictions. Series of plots relating the semantic similarity (<it>SS</it>) for tenfold cross-validation to establishing a threshold for the prediction probability, <it>t</it><sub><it>p</it></sub>. <b>(a) </b>An example illustrating the <it>SS </it>calculation. The nodes represent GO:BP terms, where the topmost node is the root. The red edges are 'is-a' and the blue, dashed edges are 'part-of' relationships in the ontology. Green nodes represent terms that are known and held-out for one gene, while the orange nodes are examples of predicted terms for the same gene. The half orange, half green node is an example where the predicted term perfectly matches a held-out term. The light blue nodes are the ancestor terms that fall within the path to the root, but are not annotated to either of the genes in this example. The <it>SS </it>of (a) is measured to be 0.45 through G-SESAME <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. <b>(b) </b>Also, <it>SS </it>= 0.45 is the median <it>SS </it>value when measured over all reported and annotated genetic interactions. With respect to the GO:BP predictions, <it>SS </it>was measured by comparing the set of predicted terms to the set of held-out terms. <b>(c,d) </b>The black color reflects predictions made from a network size of 20 <it>K </it>and the red color reflects predictions made from a network size of 200 <it>K</it>. (c) The proportion of genes at a given threshold <it>t</it><sub><it>p </it></sub>that show a <it>SS </it>measure of > 0.45. (d) The number of predictions made for both integrated networks, <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>. The top plot in (d) shows the total number of genes with at least one prediction in relation to <it>t</it><sub><it>p </it></sub>and the bottom bar graph shows the average number of GO:BP terms predicted per gene at a given <it>t</it><sub><it>p</it></sub>.</p>
                  </text>
                  <graphic file="gb-2009-10-9-r97-8"/>
               </fig>
               <p>These results can be seen in Figure <figr fid="F8">8c</figr>, where the general trend shows that increasing <it>t</it><sub><it>p </it></sub>also increases the proportion of genes with predictions that have a <it>SS </it>> 0.45 when compared to the held-out annotations for the same set of genes.</p>
               <p>Summaries of GO:BP predictions that were made using both <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> are shown in Figure <figr fid="F8">8d</figr>. We can see that at a <it>t</it><sub><it>p </it></sub>> 0.5, there are an average of 10.5 GO:BP predictions made on 941 genes for <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and an average of 12.7 GO:BP predictions made on 3,816 genes for <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>. Extrapolating from the <it>SS </it>results shown in Figure <figr fid="F8">8c</figr>, at a <it>t</it><sub><it>p </it></sub>> 0.5 for <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula>, roughly 61% of genes have a set of GO:BP predictions with <it>SS </it>> 0.45, so we would expect about 574 genes (941 &#215; 0.61 = 574) to have a set of GO:BP predictions with <it>SS </it>> 0.45. See Additional data files 2 and 3 for predictions from both integrated networks.</p>
            </sec>
            <sec>
               <st>
                  <p>Integrated network increases the performance of predicted annotations</p>
               </st>
               <p>To address the second question of whether an integrated network built from all three types of data (GI, PPI, and MA) outperformed networks built from individual types of data, we evaluated the predictions in terms of precision and recall with respect to the held-out GO:BP annotations (see Materials and methods section). This was done for three networks built from the following data: fully integrated (GI, PPI and MA); GI and PPI only; and MA only. The integration of the GI and PPI only and MA only data was constructed for networks of 20 <it>K </it>and 200 <it>K </it>gene pairs using the <it>WS </it>framework where <it>M </it>= 1.8. When using the fully integrated network, increasing the value of <it>t</it><sub><it>p </it></sub>resulted in concomitantly increasing precision and decreasing recall. Comparing the results from the three different networks reveals that integrating across all three types of data, on average, outperforms the other two integrated networks (Figure <figr fid="F4">4</figr>; Figure S2 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>). The network constructed from GI and PPI data performs better than the network constructed from MA data only for precision and recall with respect to GO:BP terms and precision with respect to genes; however, the MA integration performs better at recalling genes. These results are shown in Figure <figr fid="F9">9</figr>, where the example (<it>t</it><sub><it>p </it></sub>&#8805; 0.5 at a network size of 20 <it>K </it>edges) is a fair representative of the entire set of evaluations. It should also be noted that we tested the precision and recall of random predictions, where a GO:BP prediction from the MRF method was replaced with a random GO:BP term at the same level in the GO hierarchy. These random predictions performed very poorly and consistently returned less than 1% for both precision and recall. These results demonstrate that the fully integrated network does, in fact, provide more reliable predictions than either of the other networks.</p>
               <fig id="F9">
                  <title>
                     <p>Figure 9</p>
                  </title>
                  <caption>
                     <p>Comparing precision/recall for different data sources</p>
                  </caption>
                  <text>
                     <p>Comparing precision/recall for different data sources. An example of precision and recall calculated on the tenfold cross-validation where the prediction probability is <it>t</it><sub><it>p </it></sub>&#8805; 0.5. The colors represent three different networks, all with 20 <it>K </it>edges. Blue represents the network built from only microarray data, red represents the network built from only genetic interactions and protein interactions, and green represents the fully integrated network using genetic interactions, protein interactions, and microarray data. The whiskers show the standard deviation of the precision and recall over the tenfold cross-validation. The squares are the precision and recall measures with respect to the GO:BP terms, while the circles are precision and recall as measured for genes (see Materials and methods section for distinction). Predictions of random GO:BP terms are made and the precision and recall are shown as the squares and circles with a plus in the middle.</p>
                  </text>
                  <graphic file="gb-2009-10-9-r97-9"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Qualitative assessment of GO:BP predictions</p>
               </st>
               <p>In order to provide a qualitative assessment of the GO:BP predictions, we manually inspected the set of predictions made on genes without experimental evidence for any GO:BP annotation. Predictions from <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> (<it>t</it><sub><it>p </it></sub>&#8805; 0.5) resulted in roughly 3,000 gene-GO:BP predictions over 941 unique genes. Of the 941 genes, we excluded 458 that could either not be localized to the v5.3 <it>D. melanogaster </it>genome or had at least one known GO:BP annotation (not IEA, NR, or ND). Thus, the set of gene predictions consisted of 1,148 gene-GO:BP predictions over 483 unique genes that could be localized to the genome and did not have any experimental annotation (10% of unannotated <it>Drosophila </it>genes).</p>
               <p>These predictions were then examined in light of electronically inferred GO:BP terms, molecular function GO and cellular component GO annotation and also an updated version of gene annotation from v5.7 of the <it>D. melanogaster </it>genome. We also considered the best non-<it>Drosophila </it>sequence matches to the NCBI nr database, along with the respective annotations of these sequences. Over the entire set of 1,148 gene-GO:BP predictions, we found roughly 18% have supporting evidence concordant with our predictions (Additional data file 4). The next two paragraphs provide a few examples of the types of supporting relationships within this 18%.</p>
               <p>In our set of predictions, there are several examples of well-studied genes that provide inadvertent cases of well-supported validation. For instance, there are examples of genes whose annotation was not recorded in v5.3 of the <it>D. melanogaster </it>genome, such as <it>Cenp-C </it>[FlyBase:FBgn0086697] and <it>crossveinless </it>[FlyBase:FBgn0000394]. <it>Cenp-C </it>is known to be a component of the centromere at mitotic anaphase <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>, which we predicted to be involved in 'mitotic sister chromatid segregation' [GO:0000070]. Another example is <it>crossveinless</it>, which is known to function in bone morphogenetic protein (BMP) signaling required for wing crossvein development <abbrgrp><abbr bid="B75">75</abbr><abbr bid="B76">76</abbr></abbrgrp>. We correctly predicted the GO:BP terms 'imaginal disc-derived wing vein morphogenesis' [GO:0008586], 'regulation of BMP signaling pathway' [GO:0030510], 'torso signaling pathway' [GO:0008293], and 'regulation of transforming growth factor &#946; receptor signaling pathway' [GO:0017015]; however, we also, and potentially erroneously, predicted 'blastoderm segmentation' [GO:0007350] and 'terminal region determination' [GO:0007362].</p>
               <p>Further confirmation of prediction quality comes from unannotated genes with additional supporting evidence that is consistent with our predictions. For instance, <it>CG5525 </it>[FlyBase:FBgn0032444] was predicted to be involved in 'protein folding' [GO:0006457] where <it>t</it><sub><it>p </it></sub>= 1. Within the data used from v5.3 of the <it>D. melanogaster </it>genome, there was no experimental evidence for any GO:BP terms, but 'protein folding' [GO:0006457] was inferred from electronic annotation and this gene was also annotated with the cellular component GO term 'chaperonin-containing T-complex' [GO:0005832], inferred from sequence similarity. Additionally, the top BLAST hits (default settings) are chaperonin genes from <it>Culex pipiens </it>and <it>Aedes aegypti</it>. <it>CG5525 </it>is an example where the network prediction is consistent with gene function predicted from sequence similarity. As a final example, <it>Nuf2 </it>[FlyBase:FBgn0031886] was predicted to be involved in 'M phase' [GO:0000279] where <it>t</it><sub><it>p </it></sub>= 0.986. From the v5.3 annotations, this gene was inferred through electronic annotation to be involved in 'immune response' [GO:0006955]. However, when checked against the updated annotation of v5.7, <it>Nuf2 </it>was annotated with 'chromosome segregation' [GO:0007059], 'mitotic metaphase plate congression' [GO:0007080], and 'mitotic spindle organization and biogenesis' [GO:0007052], all of which are implied from a mutant phenotype. <it>Nuf2 </it>is an example where the prediction was validated through experimental evidence that became available after our predictions were made.</p>
               <p>Overall, GO:BP predictions have been evaluated using precision/recall and <it>SS </it>in tenfold cross-validation. We then used these data to extrapolate the expected number of reasonable predictions that were made using the fully integrated networks. We have also evaluated the predictions qualitatively and shown that roughly 18% have independent evidence that supports the predictions. As a complete analysis, this suggests that the GO:BP predictions are valid.</p>
            </sec>
            <sec>
               <st>
                  <p>Function prediction on genes with novel sequence features</p>
               </st>
               <p>The GO:BP predictions are based on the functional relationships drawn from the integrated gene networks. The construction of these relationships does not directly take into account any sequence-based information. Traditionally, function prediction methods have relied heavily on sequence and structural similarity <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. As a comparison, we used sequence similarity to infer GO:BP terms for the set of 483 genes for which we have made high-confidence network-based predictions. The translated proteins from these genes were used to search the NCBI nr database using BLASTp (<it>E</it>-value &lt; 10<sup>-6</sup>). All BLAST hits to <it>Drosophila </it>proteins were removed, matches under 40% identity were removed, then the top 10 hits were taken for each gene. Any associated GO:BP annotations (including IEA, NR, or ND) for the top ten hits were then transferred to the <it>D. melanogaster </it>gene. We were able to transfer GO:BP annotations for 224 of the 483 genes. Interestingly, when the GO evidence codes of IEA, ND, and NR were removed, the number of genes with any transferable annotation dropped to 98 of the 483. The <it>D. melanogaster </it>genes for which we predicted GO:BP terms using the integrated data appear to be in a class of genes where prediction of biological processes based solely on sequence similarity performs poorly. This is not surprising given the wide scoping meaning of biological process versus sequence features, which often reflect a molecular function, that is, kinase domain or DNA binding domain. Thus, gene prediction utilizing integrated gene networks is a complementary method to make predictions for the class of unannotated genes where traditional function prediction methods perform poorly.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Interpreting new datasets</p>
            </st>
            <p>Genome-wide functional genomics experiments typically yield lengthy lists of genes that are often difficult to interpret. Common approaches to investigate the biological meaning of these gene lists include GO term enrichment analysis and gene set enrichment analysis (GSEA) (reviewed in <abbrgrp><abbr bid="B77">77</abbr><abbr bid="B78">78</abbr></abbrgrp>). Both approaches are dependent on the completeness and quality of the pre-existing reference data: gene annotations in the case of GO term analysis, and gene sets in the case of GSEA. Given that our functional gene network includes previously unannotated genes and clusters together with genes with shared biological processes, we expect that it can be used for improved interpretation of existing and new genome-wide datasets. In order to test this conjecture, we selected a microarray dataset (not used in the construction of the network) and reanalyzed the data with respect to the integrated <it>Drosophila </it>gene network. We used data from Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>, who examined genes regulated in response to nutrient deprivation in <it>D. melanogaster </it>larvae. In particular, we focused on the genes that were found to be significantly differentially expressed (DE) in the muscle tissue of starved larvae.</p>
            <p>We first examined whether the network might be used as an aide for classifying DE genes into functional categories. Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp> identified 1,943 genes that were statistically DE in larval muscle tissue in response to starvation. Of these, 300 genes were classified according to their annotated functions and are explicitly discussed in the text and figures (referred to here as <it>DE-categorized</it>) and the remaining 1,700 genes were not assigned to the categories discussed in the manuscript (referred to here as <it>DE-uncategorized</it>). The <it>DE-categorized </it>genes were assigned to 16 categories, the prominent ones encompassing carbohydrate metabolism, lipid metabolism, mitochondrial biogenesis and function, cellular translational capacity, and cuticle proteins <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>. In order to visualize the functional connections among all of the DE genes, we mapped them onto <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and identified 530 genes sharing 1,536 edges within the network (single gene networks were removed). Inspecting this network revealed three observations (Figure <figr fid="F10">10a</figr>). First, a large number of genes grouped together into distinct clusters, and these clusters are largely concordant with the categories reported in Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp> (we highlight a few of the prominent categories in Figure <figr fid="F10">10a</figr>). For instance, of the 20 <it>DE-categorized </it>genes in the ribosomal protein category that were found in the network, 19 are tightly clustered (blue in Figure <figr fid="F10">10a</figr>). It should be noted that this was not the case for all categories. For instance, only half of the <it>DE-categorized </it>genes in the cuticle protein category were clustered together in the network. Second, the network clusters include <it>DE-uncategorized </it>genes interconnected with the <it>DE-categorized </it>genes. For instance, a single tightly interconnected subnetwork that includes 11 <it>DE-categorized </it>genes in cellular respiration also includes an additional 12 <it>DE-uncategorized </it>genes. Third, there is at least one tightly interconnected subnetwork that is composed almost exclusively of <it>DE-uncategorized </it>genes. The annotated genes in this subnetwork are enriched for terms related to ribosome biogenesis; however, many of the genes in this subnetwork are unannotated. Thus, the functional gene network revealed that many more DE genes can be grouped into the identified categories and also suggests the existence of at least one additional cluster of genes with the putative function of ribosome biogenesis, which is entirely consistent with the functions studied in Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>.</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>Network analysis in coordination with microarray data</p>
               </caption>
               <text>
                  <p>Network analysis in coordination with microarray data. Analysis combing the integrated <it>Drosophila </it>gene network and microarray data from Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>. <b>(a) </b>The network represents the differentially expressed genes in starved versus fed larval muscle tissue that could also be found in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula>. Several examples of categories of genes listed in Teleman <it>et al</it>. are highlighted: cuticle, cellular respiration (Cell. Resp.), signal recognition particle (SRP), mitochondrial ribosomal proteins (mRP), ribosomal proteins (RP), and tRNA synthetases (Aats). The clustering of genes is a result of the integrated network and was done irrespective of the gene expression data from Teleman <it>et al</it>. <b>(b) </b>The subnetwork is the network built from a seeded set of SRP-related genes as defined by Teleman <it>et al</it>. and derived from <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> (see Materials and methods section for seeded network construction). Gene expression ratios reflect wild-type larval muscle tissue upon starvation over wild-type larval muscle tissue under normal feeding conditions, where green represents genes down-regulated upon starvation and red genes up-regulated upon starvation. All nodes with a dark outline are differentially expressed (DE) genes as defined in Teleman <it>et al</it>. The diamond nodes are the seed genes, the circle nodes are genes reported as DE in Teleman <it>et al</it>. but not used as seed genes, and the hexagon nodes are genes not reported as DE by Teleman <it>et al</it>. The genes in the network in (b) were then treated as a gene set and used as input to GSEA <abbrgrp><abbr bid="B81">81</abbr></abbrgrp>. <b>(c) </b>The enrichment plot for all genes in the network in (b). Additionally, we performed an GSEA analysis on the genes in the network in (b) that did not include the seed genes (which corresponds to the set of genes that are circle and hexagon-shaped). <b>(d) </b>The enrichment plot for this set of genes showing that the network places together similarly regulated genes that are still significantly enriched even when the set of genes defined in Teleman <it>et al</it>. were excluded. See Figure S3 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> for more detail on the global performance of gene sets. The gene set representing (d) corresponds to the purple line in Figure S3a at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>.</p>
               </text>
               <graphic file="gb-2009-10-9-r97-10"/>
            </fig>
            <p>We next examined whether the network could be used to expand the list of genes found to be differentially regulated. To do this, we focused on the set of <it>DE-categorized </it>genes reported in Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp> as being associated with signal recognition particle (SRP) function. We used the 14 such genes that could be found in the network as a query set to retrieve tightly connected genes from <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> (see Materials and methods section for details on the search algorithm). This retrieved 56 additional genes selected solely on the connections present in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula>. This network of 70 genes is shown in Figure <figr fid="F10">10b</figr> and the genes are designated as follows: the set of 14 query genes defined in Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp> (shown in Figure <figr fid="F10">10b</figr> as diamond nodes), 30 <it>DE-uncategorized </it>genes (shown in Figure <figr fid="F10">10b</figr> as circular nodes), and an additional 26 genes that were not determined to be DE in response to starvation <abbrgrp><abbr bid="B79">79</abbr></abbrgrp> (referred to here as <it>non-DE </it>genes and shown in Figure <figr fid="F10">10b</figr> as hexagonal nodes). Of the 56 genes added through the integrated network, 18 are annotated as being involved in protein secretion, including the SRP, ER translocon, signal peptidase complex, cargo receptors, and COPI and COPII vesicle components. Interestingly, the annotated set of 18 additional genes largely encode components of the COPI and COPII vesicles (for example, <it>CG10882 </it>[FlyBase:FBgn0031408], <it>Arf72A </it>[FlyBase:FBgn0000115], <it>Arf102E </it>[FlyBase:FBgn0013749], <it>&#948;Cop </it>[FlyBase:FBgn0028969], <it>&#950;Cop </it>[FlyBase:FBgn0040512], <it>&#946;'Cop </it>[FlyBase:FBgn0025724], <it>&#946;Cop </it>[FlyBase:FBgn0008635], <it>&#947;Cop </it>[FlyBase:FBgn0028968], <it>Sec13 </it>[FlyBase:FBgn0024509], <it>Sec31 </it>[FlyBase:FBgn0033339], and <it>&#945;Cop </it>[FlyBase:FBgn0025725]) <abbrgrp><abbr bid="B80">80</abbr></abbrgrp>. (See Additional data file 5 for further annotation information on this cluster of 70 genes.) Using GSEA <abbrgrp><abbr bid="B81">81</abbr></abbrgrp>, we tested whether this expanded set of 56 genes was collectively enriched for downregulated genes. Both the full set of 70 genes (Figure <figr fid="F10">10c</figr>) and the subset of 56 (Figure <figr fid="F10">10d</figr>) show a significant enrichment score at a false discovery rate of &lt; 10%. Thus, using the functional gene network, we identified an additional 56 genes that are interconnected in the functional gene network and are collectively significant in GSEA. This example serves to illustrate that the functional gene network can be used effectively to interpret functional genomics datasets. We performed this same analysis for all the categories defined in Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp> and consistently found that the gene sets identified using the functional gene network generally performed as well, if not better, than gene sets identified in the original study or those constructed according to GO or KEGG (Figure S3 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The focus of this work is to produce a resource that provides the most comprehensive set of experimentally supported functional relationships between fly genes. Thus, we present the first, comprehensive functional gene networks for <it>D. melanogaster </it>by integrating experimentally disparate sources of data. The integrated networks are a community resource that benefits researchers in three ways. First, we have distilled a major portion of the extant fly data (over 48 million individual measurements) into functional relationships between genes. The <it>WS </it>value of a functional relationship is easily interpretable as the measure of confidence that a gene pair is involved in a shared biological process based on the experimental evidence; however, trying to make sense of the same individual datasets outside the integrative framework is not easily manageable. Second, the functional relationships are built on experimental evidence, which can be easily retrieved to determine the dataset(s) underlying the connection. Third, and as demonstrated in this study, the functional relationships drawn between genes are biologically supported through computational validation. Thus, the networks can be used to derive experimentally testable hypotheses related to gene function.</p>
         <p>Understanding the function of every gene in the genome is a central goal of modern biology and integrated networks are another resource that draws a connection from gene to function. To demonstrate the utility of the integrated functional gene networks, we must show that they provide higher quality information than any individual dataset. We have demonstrated this by showing that KEGG pathways are, on average, more coherent within the integrated network compared to any individual dataset or type of data (Figure <figr fid="F4">4</figr>; Table S6 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>). We have also shown that edges drawn between gene pairs in the network are consistent with our biological expectation by revealing highly interconnected subnetworks of genes that are consistent with a common biological process. We then used the networks to predict GO:BP terms for un- and under-annotated genes. From these predictions we have shown that the integrated networks outperform individual types of data in both precision and recall, and we can predict GO:BP terms that are semantically similar to known annotation. These observations support the idea that integrated functional gene networks can be used to draw more reliable connections between genes and function. Finally, we showed how the integrated gene network can aide in the analysis of microarray data to uncover relationships that would have been missed without the network.</p>
         <p>Additionally, we have shown that there is a class of genes where sequence similarity performs poorly for predicting GO:BP terms. Since sequence information is not included in the construction of the integrated functional gene networks, these networks provide another source of confident relationships that can be used to predict biological processes on this class of genes. Function prediction using gene networks complements sequence-based prediction methods. Although we only discuss the most confident GO:BP predictions for 483 genes, we also make predictions that cover more levels in the GO hierarchy and predictions for genes with already known and experimentally supported annotations. These predictions constitute the first genome-scale attempt to use an integrated set of experimental data to make biological process predictions for <it>D. melanogaster </it>genes. These predictions are another source of data to aide in identifying the associated biological process(es) of the one-third of <it>D. melanogaster </it>protein-coding genes that are currently unannotated.</p>
         <p>The functional gene networks are a resource for exploring functional relationships among genes at both the local and global levels. The network sizes 20 <it>K </it>and 200 <it>K </it>were selected to maximize the number of connected genes that are involved in the same biological process while minimizing the overall number of edges. <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> is restricted to the most highly supported functional relationships at the expense of including fewer genes and edges. Consequently, users interested in exploring high confidence relationships including specific genes of interest are advised to query <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> first. On the other hand, the <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> has a lower threshold that allows for an increased number of genes and connections to be made that are heavily based on microarray data. Thus, <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> is useful for exploring functional relationships at a more global level supported by gene expression data, as well as identifying relationships between genes that may not be present in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula>.</p>
         <p>The integrated networks built from fly data tend to perform well at drawing connections between genes involved in core biological processes and components, such as cell cycle, catabolic processes, the ribosome, and the proteasome. This same trend also holds in the integrated networks built from yeast <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B26">26</abbr><abbr bid="B28">28</abbr></abbrgrp>, worm <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, mouse <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B68">68</abbr></abbrgrp>, and human <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> data. Issues with repeatability and false positive rate have been raised with genome-scale data, particularly with microarray <abbrgrp><abbr bid="B82">82</abbr></abbrgrp> and yeast-two-hybrid <abbrgrp><abbr bid="B60">60</abbr><abbr bid="B83">83</abbr></abbrgrp> assays. Integrative methods mitigate the effect of one data source determining a connection between a gene pair by requiring multiple independent datasets to support the relationship between two genes. Finding core biological processes consistently clustered together across different species, which are derived from different experimental datasets, instills confidence that the relationships are both biologically real and computationally detectable.</p>
         <p>Integrative methods are not without their biases. Annotation of genes with GO terms are biased towards well-characterized genes and well-studied processes. For example, 'eye morphogenesis' [GO:0048592] is a widely studied process in <it>Drosophila </it>and is associated with over 200 genes, while 'muscle morphogenesis' [GO:0048644], which is at the same level in the GO hierarchy, is annotated to only four genes. Though the number of genes involved in eye or muscle morphogenesis are not expected to be equal, it is likely they would be on par with each other. Certainly we expect there to be more than four genes involved in muscle morphogenesis. Most integration methods, including the one implemented here, require a gold-standard set of comprehensive and biologically validated gene-gene pairs. Genes sharing GO annotation terms have been used as this gold-standard and the biases reflected in annotation will thus be reflected in the final product of data integration methods. Though some biological processes will certainly be underrepresented, integrative methods have been highly productive in constructing networks that both capture the current state of biological knowledge and expand upon this knowledge by drawing connections between genes of unknown function.</p>
         <p>Clearly, the quality, scope, and types of experimental data used are key factors in the integrative framework, and incorporating new data, as well as refining the selection of input data, offers the opportunity to improve and tune future networks. This study focuses on producing a comprehensive global functional gene network using available GI, PPI, and MA datasets for <it>Drosophila</it>. These datasets were selected based on their ability to connect genes that are involved in the same biological process. Overall, the extant GI data provided the greatest likelihood of gene pairs being functionally related, followed closely by direct assay PPI and the most highly correlated gene pairs within several MA datasets (indicated by the calculated <it>LLS</it>). The <it>LLS</it>s for MA data drop as the correlation coefficients within the datasets drop, but the reported values are commensurate with high-confidence Y2H and Y2H PPIs. Thus, while these classes of data did not contribute equally, all three provide high quality information used in constructing the global integrated networks. However, there are many datasets available that were not incorporated into the current version of the networks. There are several reasons for this. First, we tested the usefulness of fluorescent <it>in situ </it>hybridizations <abbrgrp><abbr bid="B84">84</abbr></abbrgrp> and transcription factor binding sites <abbrgrp><abbr bid="B85">85</abbr><abbr bid="B86">86</abbr></abbrgrp> as input data, but these data did not meet the evaluation criteria under the <it>LLS </it>framework. Second, there are datasets, such as RNA interference screens <abbrgrp><abbr bid="B87">87</abbr></abbrgrp>, that are not easily translated into a measure that can be used under the <it>LLS </it>framework. Third, this study focuses on experimentally supported datasets; therefore, computational methods to relate genes <abbrgrp><abbr bid="B88">88</abbr><abbr bid="B89">89</abbr><abbr bid="B90">90</abbr></abbrgrp> were ignored. Better utilization of these data sources will likely contribute to increased quality of functional relationships assigned between genes. Additionally, the ongoing modENCODE <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> projects promise an unprecedented increase in high-resolution functional genomics data. Functional gene networks offers one route to help interpret these forthcoming data. On the other hand, we do note that networks constructed using subsets of the data can outperform the global network in identifying relationships among genes in specific KEGG pathways (Figure <figr fid="F4">4e</figr>). Thus, refinement of the current framework, using only selected subsets of the available data, should make it possible to build networks more representative of specific biological processes. Building integrated networks in relation to a particular biological process would likely yield functional relationships more closely related to the specified biological process.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We have integrated heterogeneous datasets to produce the first comprehensive functional gene network in <it>D. melanogaster</it>. We have shown that the functional relationships between genes are highly consistent with KEGG pathways and use these results to construct the two networks <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>. We have demonstrated that edges drawn between gene pairs are consistent with our biological expectation by revealing highly interconnected subnetworks of genes that are nearly completely consistent with a common biological process. We also show how the network can be used to enhance the interpretation of microarray data by both discovering clusters of genes that are co-regulated and identifying candidate unannotated genes tightly coordinated with a known and co-regulated biological process. The full set of integrated data and networks built from these data (<inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>) are made available. We also provide GO:BP predictions for 2,154 genes in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and for 5,107 genes in <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula>. This community resource can be accessed online <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Data acquisition, cleaning, normalization, filtering</p>
            </st>
            <sec>
               <st>
                  <p>Genetic interactions</p>
               </st>
               <p>GIs were downloaded as a pre-computed file from FlyBase, version FB2007_02 <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. Interactions containing a gene not belonging to <it>D. melanogaster </it>were removed (that is, transgenic construct from <it>D. simulans</it>). All reported interactions (6,941) were given the same weight, a value of 1.</p>
            </sec>
            <sec>
               <st>
                  <p>Protein-protein interactions</p>
               </st>
               <p>All PPIs - <it>D. melanogaster</it>-specific where possible - were downloaded from the following databases: BIND <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, DIP (version Dmela20071007) <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>, DroID (September 2007) <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, BioGRID (version 2.0.32) <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>, IntAct (September 2007) <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. The varying protein IDs across all datasets were mapped to v5.3 FlyBase gene identifiers. Any IDs that did not unambiguously map to a single FlyBase gene ID were removed. The union of reported interactions across all the datasets was taken. The experimental method used to detect an interaction was also considered. If a reported interaction was detected through multiple experimental methods, the most reliable method was ascribed to the interaction. The order for reliability is as follows: direct assays (that is, co-immunoprecipitation, biochemical assay) > high-confidence Y2H (high-confidence as reported in Giot <it>et al</it>. <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>) > Y2H. In total, there were 25,408 reported PPIs among pairs of <it>D. melanogaster </it>proteins. These include 1,234 determined by direct assays and 24,408 Y2H interactions. The Y2H assays were subdivided into 4,590 high-confidence interactions and 19,584 positive interactions.</p>
            </sec>
            <sec>
               <st>
                  <p>Microarray gene expression</p>
               </st>
               <p>The following raw MA datasets were downloaded from Gene Expression Omnibus (GEO): [GEO:GSE94] <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>, [GEO:GSE541] <abbrgrp><abbr bid="B91">91</abbr></abbrgrp>, [GEO:GSE442] <abbrgrp><abbr bid="B92">92</abbr></abbrgrp>, [GEO:GSE3854] <abbrgrp><abbr bid="B93">93</abbr></abbrgrp>, [GEO:GSE5430] <abbrgrp><abbr bid="B94">94</abbr></abbrgrp>, [GEO:GSE3057] <abbrgrp><abbr bid="B95">95</abbr></abbrgrp>, [GEO:GSE3069] <abbrgrp><abbr bid="B95">95</abbr></abbrgrp>, [GEO:GSE5147] <abbrgrp><abbr bid="B96">96</abbr></abbrgrp>, [GEO:GSE695] <abbrgrp><abbr bid="B97">97</abbr></abbrgrp>, [GEO:GSE3257] <abbrgrp><abbr bid="B98">98</abbr></abbrgrp>, [GEO:GSE5404] <abbrgrp><abbr bid="B99">99</abbr></abbrgrp>, [GEO:GSE6515] <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>, [GEO:GSE6186] <abbrgrp><abbr bid="B100">100</abbr></abbrgrp>; [ArrayExpress:E-TABM-57] <abbrgrp><abbr bid="B101">101</abbr></abbrgrp>, [ArrayExpress:E-MAXD-6] <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>; and supplemental pages De Gregorio <it>et al</it>. <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>, Chintapalli <it>et al</it>. <abbrgrp><abbr bid="B102">102</abbr></abbrgrp>, and Tomancak <it>et al</it>. <abbrgrp><abbr bid="B103">103</abbr></abbrgrp>. These data used two distinct platforms; two channel cDNA or oligonucleotide spotted arrays, and single channel Affymetrix short oligonucleotide arrays. All data normalizations were performed in the R statistical programming environment <abbrgrp><abbr bid="B104">104</abbr></abbrgrp>. The datasets selected were required to have at least five conditions to make reliable correlation measures. We also did not use any datasets that were <it>Drosophila </it>cell lines.</p>
               <p>Two channel experiments were normalized using local regression within the OLIN package <abbrgrp><abbr bid="B105">105</abbr></abbrgrp>. OLIN was run with default parameters, scaling turned on, and flagged spots were ignored for any calculations. The results of the full OLIN normalization are log-transformed ratio values for each gene on each individual MA slide.</p>
               <p>The Affymetrix arrays were normalized using the Affy <abbrgrp><abbr bid="B106">106</abbr></abbrgrp> and GCRMA <abbrgrp><abbr bid="B107">107</abbr></abbrgrp> R packages. Affinities for all oligonucleotide sequences were calculated and the 'fullmodel' GCRMA normalization was run, resulting in log-transformed expression values for each probe set on each array.</p>
               <p>All spots or probe sets were mapped to the v5.3 <it>D. melanogaster </it>genome assembly and annotation. Genome sequence files were downloaded from FlyBase under the FB2007_02 release <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. Primer-based platforms required two rounds of BLAST; one round to match the primers to the genome (BLASTn; <it>E</it>-value &lt; 10<sup>-2</sup>) and the second round to match the amplicon product to the genome (BLASTn; <it>E</it>-value &lt; 10<sup>-6</sup>). Physical coordinates from the forward and reverse primers were checked for strandedness and to make sure the PCR product would be under 1,000 nucleotides. The segment of DNA between the forward and reverse primers (including the primers) was taken as the amplicon product for that primer pair and searched back against the genome to ensure the amplicon did not align to any other region outside the intended segment, potentially leading to cross-hybridization. cDNA-based arrays required the cDNA sequence be aligned against the genome to test for potential cross-hybridization. Any amplicons or cDNAs with a second best BLAST hit with 80% sequence identity were flagged and removed. Unique BLAST hits mapping to exons of v5.3 annotated genes were assigned the corresponding FlyBase gene ID, otherwise the spot was flagged and removed.</p>
               <p>Sequence files for both Affymetrix <it>Drosophila </it>array platforms (versions 1 and 2) were downloaded from the Affymetrix website <abbrgrp><abbr bid="B108">108</abbr></abbrgrp>. They contain a unique sequence for each probe set, which is searched (BLASTn; <it>E</it>-value &lt; 10<sup>-6</sup>) against the genome to test for potential cross-hybridization. A segment of DNA associated with a probe set was assigned a v5.3 FlyBase gene ID if the BLAST result showed a putative hit to at least one or part of one exon from one gene. A probe set was not assigned a gene ID and flagged if the BLAST result was ambiguous, meaning the second best BLAST hit was greater than 80% sequence identity, or the query sequence did not hit at least one exon.</p>
               <p>For either MA platform, gene expression profiles were constructed using the calculated expression values for a gene across the tested conditions. If a gene expression profile had greater than 25% absent/removed expression values, that gene's profile was removed, otherwise missing values were inferred using KNNimpute <abbrgrp><abbr bid="B109">109</abbr></abbrgrp>.</p>
               <p>We defined an MA dataset to be the full, published unit of data, and, where possible, datasets were additionally defined as the subcomponents of the published dataset. For example, the Arbeitman <it>et al</it>. <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> study contains six datasets; all published conditions, embryo, larva, pupa, adult male, and adult female. See Table <tblr tid="T1">1</tblr> for the breakdown of all datasets.</p>
               <p>Gene expression profiles that did not change over the course of a dataset - referred to as 'flat' - were filtered out. This was done on a gene by gene basis by taking the difference between the maximum and minimum expression values across all conditions in one dataset. For the Affymetrix platform, if the difference between the maximum and minimum expression values was less than 50, then that gene and corresponding expression profile was removed. For the two channel experiments, if the difference between the maximum and minimum log ratio value was less than.5, then the gene and corresponding expression profile was removed.</p>
            </sec>
            <sec>
               <st>
                  <p>Genome annotations: Gene Ontology terms</p>
               </st>
               <p>The count of genes annotated for the organisms discussed in the introduction were downloaded from the GO website <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. The annotation counts were limited to the biological process component of the GO. Additionally, the evidence codes IEA, ND, and NR were ignored.</p>
               <p>Specific to <it>Drosophila</it>, gene annotations for GO:BP terms were taken from the FB2007_02 version of FlyBase <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. These data provide a mapping from a FlyBase gene ID to the GO:BP term ID(s). GO:BP terms with the following evidence codes were removed: IEA, ND, and NR. The structure of the GO is a directed acyclic graph, meaning each term has a parent term(s) (the root term is the only exception) and each term potentially has a child term(s). As described in Lord <it>et al</it>. <abbrgrp><abbr bid="B110">110</abbr></abbrgrp>, a connection was drawn in the ontology for the link types 'is-a' and 'part-of', then each gene was propagated from its annotated position on the GO to the root. Thus, the number of genes associated with any particular term, <it>t</it><sub><it>i</it></sub>, in the GO includes the genes annotated to <it>t</it><sub><it>i </it></sub>and additionally subsumes any genes that are annotated to the child term(s) of <it>t</it><sub><it>i</it></sub>.</p>
            </sec>
            <sec>
               <st>
                  <p>Additional data</p>
               </st>
               <p>It should be noted that we also evaluated two additional, potential data sources, which include matches to transcription factor binding sites <abbrgrp><abbr bid="B85">85</abbr><abbr bid="B86">86</abbr></abbrgrp> and fluorescent <it>in situ </it>hybridizations <abbrgrp><abbr bid="B84">84</abbr></abbrgrp>; however, these data were not included as they did not meet our evaluation criteria (data not shown).</p>
            </sec>
            <sec>
               <st>
                  <p>Microarray profile correlation, statistical significance</p>
               </st>
               <p>In total, 34 MA gene expression datasets were collected, normalized, and filtered. We define these 34 datasets as <it>D </it>= {<it>D</it><sub>1</sub>, <it>D</it><sub>2</sub>,...,<it>D</it><sub>34</sub>}. The Pearson correlation coefficient was calculated for all gene pairs in a dataset <it>D</it><sub><it>i </it></sub>&#8712; <it>D</it>, For <it>n </it>genes in <it>D</it><sub><it>i </it></sub>= {<it>g</it><sub>1</sub>, <it>g</it><sub>2</sub>,...,<it>g</it><sub><it>n</it></sub>}, each <it>g</it><sub><it>j </it></sub>&#8712; <it>D</it><sub><it>i </it></sub>is a vector of expression values <inline-formula><graphic file="gb-2009-10-9-r97-i4.gif"/></inline-formula> across <it>m </it>conditions. The Pearson correlation coefficient between <it>g</it><sub><it>x</it></sub>, <it>g</it><sub><it>y </it></sub>&#8712; <it>D</it><sub><it>i</it></sub>, where 1 &#8804; <it>x </it>&#8804; <it>n </it>and 1 &#8804; <it>y </it>&#8804; <it>n </it>was calculated as:</p>
               <p>
                  <display-formula id="M1">
                     <graphic file="gb-2009-10-9-r97-i5.gif"/>
                  </display-formula>
               </p>
               <p>Calculating the correlation between all <it>g</it><sub><it>x</it></sub>, <it>g</it><sub><it>y </it></sub>&#8712; <it>D</it><sub><it>i </it></sub>results in a distribution of correlation values. Since the majority of correlations do not reflect a functional linear relationships between two genes, only statistically significant correlations were used. Significance of the correlations were assessed through permutation testing. Within each condition of a particular dataset, gene expression values were shuffled, thus randomizing the correlation measures for each gene. From the shuffled data, 20% of the genes were selected at random and the pairwise Pearson correlation coefficient calculated for this subset of genes. This process was then repeated five times to create a stable empirical null distribution of correlation coefficients. Any correlation coefficients with a <it>P</it>-value &lt; 0.01 on the two-tail null distribution - corresponding to positive and negative correlation values - were considered for further analysis.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Calculating significant biological processes across datasets</p>
            </st>
            <p>A total of 22 individual datasets were tested for over-representation of GO:BP terms (details on the GO:BP terms discussed above): all reported GIs; all reported PPIs (direct assay, high-confidence Y2H, and Y2H combined); and for each of the 20 MA datasets used, gene pairs with significant coexpression correlations as defined in the previous section. (Methods to arrive at 20 MA datasets are discussed below in the 'Integration' section.) For each individual dataset, the number of gene pairs annotated to the same GO:BP term were counted. GO:BP terms were only considered if they were annotated to at least 10 and less than 300 <it>D. melanogaster </it>genes. The lower cutoff of 10 genes was set in order to calculate reliable statistics and the upper cutoff of 300 was set to not bias the analysis to highly annotated terms. The cutoff of 300 was determined by the information content (<it>IC</it>) measured over all GO:BP terms meeting the criteria mentioned in the previous paragraph. The <it>IC </it>for <it>t</it><sub><it>i </it></sub>is calculated as <it>IC</it>(<it>t</it><sub><it>i</it></sub>) = ln(<it>P</it>(<it>t</it><sub><it>i</it></sub>)), where <it>P</it>(<it>t</it><sub><it>i</it></sub>) is the probability that <it>t</it><sub><it>i </it></sub>is annotated to a gene. <it>P</it>(<it>t</it><sub><it>i</it></sub>) is calculated by finding the fraction of times <it>t</it><sub><it>i </it></sub>is annotated to a gene compared to the total number of possible annotations. The total number of possible annotations is the count of genes annotated at the root, since the root term subsumes all gene annotations. A qualitative assessment of <it>IC </it>measures on GO:BP terms showed a reasonable cutoff corresponding to 300 annotations.</p>
            <p>Each GO:BP term used in this analysis has an associated number of <it>x </it>genes. To test the significance of a particular GO:BP term within a particular dataset (Figure <figr fid="F1">1</figr>), an empirical null distribution was constructed. For each GO:BP term with <it>x </it>associated genes, a random set of <it>x </it>genes was selected from the dataset being analyzed, and the number of connections between this set of <it>x </it>random genes was determined. This procedure was repeated 100 times. In all cases the counts were normally distributed. Significance of the number of connections between the <it>x </it>genes tested was performed through a right-tailed, single-sample <it>t</it>-test. This resulted in a matrix of 22 datasets by 1,133 GO:BP terms, where the values in the cells of the matrix are <it>P</it>-values. This matrix was hierarchically clustered on both dimensions using TM4 MEV <abbrgrp><abbr bid="B111">111</abbr><abbr bid="B112">112</abbr></abbrgrp> with average linkage and Euclidean distance. Visualization of the clustered matrix was also done in TM4 MEV.</p>
         </sec>
         <sec>
            <st>
               <p>Integration methods</p>
            </st>
            <sec>
               <st>
                  <p>Log-likelihood score</p>
               </st>
               <p>The general procedure for integrating gene-gene relationships across all datasets was adapted from Lee <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B28">28</abbr></abbrgrp>. Datasets and the functional relationships drawn between two genes were scored in relation to GO:BP annotation, where the annotations met the same criteria as mentioned in the previous section. The <it>LLS </it>was calculated for each dataset as follows (we will use the same notation as Lee <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B28">28</abbr></abbrgrp>; in particular ~ denotes 'not'):</p>
               <p>
                  <display-formula id="M2">
                     <graphic file="gb-2009-10-9-r97-i6.gif"/>
                  </display-formula>
               </p>
               <p><it>D </it>represents a dataset of gene pairs and can be PPI, GI, or MA. <it>I </it>represents the set of gene pairs that were annotated and shared at least one GO:BP term, while gene pairs in ~ <it>I </it>were annotated, but there was no overlap between the GO:BP terms annotated to individual genes in a pair. Both <it>I </it>and ~ <it>I </it>are counts taken across all genes in the v5.3 <it>D. melanogaster </it>genome. <it>P</it>(<it>I</it>) is the probability of a gene pair sharing at least one GO:BP annotation, and <it>P</it>(~ <it>I</it>) is the complement. The probability of finding an annotated gene pair sharing at least one GO:BP term restricted to the gene pairs within dataset <it>D </it>is <it>P</it>(<it>I</it>|<it>D</it>), and <it>P</it>(~ <it>I</it>|<it>D</it>) is the complement. In the case of MA data, <it>D </it>represents the dataset after being filtered for significant correlation values and removing 'flat' expression profiles.</p>
            </sec>
            <sec>
               <st>
                  <p><it>LLS </it>for genetic interactions</p>
               </st>
               <p>A <it>LLS </it>was calculated for the entire GI dataset. Each reported gene pair was weighted equally; therefore, a gene pair within the GI dataset was assigned a <it>LLS </it>score calculated from the entire dataset, where <it>LLS </it>= 2.661.</p>
            </sec>
            <sec>
               <st>
                  <p><it>LLS </it>for protein-protein interactions</p>
               </st>
               <p>The PPI data were separated into three subsets reflecting the expected reliability of the experimental methods to detect interacting proteins. A <it>LLS </it>was then calculated for each subset. Protein pairs within a subset were assigned their respective <it>LLS</it>s. The first class of PPIs reflected interactions reported in a Y2H assay, where <it>LLS </it>= 0.630. The second class reflected interactions defined as high-confidence Y2H, where <it>LLS </it>= 1.045. The most confident class of experimental techniques (noted 'direct assay') included co-immunoprecipitation, affinity methods, biochemical assays, and mass spectrometry, where <it>LLS </it>= 2.389.</p>
            </sec>
            <sec>
               <st>
                  <p><it>LLS </it>for microarray datasets</p>
               </st>
               <p>As described in Lee <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B28">28</abbr></abbrgrp>, gene pairs from each individual MA dataset (filtered on significant correlations and 'flat' profiles) were first ordered according to their correlation coefficients and then separated into bins of 1,000 gene pairs, where the first bin contains the most significant positively correlated gene pairs. A <it>LLS </it>was then calculated for each bin and plotted against the mean correlation value <inline-formula><graphic file="gb-2009-10-9-r97-i7.gif"/></inline-formula> for bin <it>i </it>(Figure <figr fid="F2">2</figr>). From this plot, we fit the polynomial equation <inline-formula><graphic file="gb-2009-10-9-r97-i8.gif"/></inline-formula>, using the <monospace>lm()</monospace> function in <monospace>R</monospace>. A separate curve was fit for both positively and negatively correlated data. Every point along the curve for a positive correlation was greater than a <it>LLS </it>of 0, while every curve fit to the negative correlations had at least some portion that fell below a <it>LLS </it>of 0. Therefore, only the significant positively correlated data were considered in evaluating each MA dataset. From all fit curves, a measure of the fraction of variance explained by the model was calculated as:</p>
               <p>
                  <display-formula id="M3">
                     <graphic file="gb-2009-10-9-r97-i9.gif"/>
                  </display-formula>
               </p>
               <p>where <it>f</it><sub><it>i </it></sub>is the <it>i</it><sup><it>th </it></sup>fitted value of the model, <it>y</it><sub><it>i </it></sub>is the fitted value plus the residuals for the <it>i</it><sup><it>th </it></sup>bin, and <inline-formula><graphic file="gb-2009-10-9-r97-i10.gif"/></inline-formula> is the average of <it>y</it><sub><it>i </it></sub>over all <it>i </it>bins. Additionally, the value for <it>r</it><sup>2 </sup>was adjusted for the number of coefficients in the model. Datasets that had an adjusted <it>r</it><sup>2 </sup>&lt; 0.5 were removed from further analysis. Also, datasets were required to have a positive linear trend. After applying these criteria to all MA datasets, 20 of the 34 passed and were used in this study, whereas 14 of the 34 did not meet these criteria and were removed (Table <tblr tid="T1">1</tblr>; Figure S1 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> for all datasets). In two cases (Sorensen <it>et al</it>. <abbrgrp><abbr bid="B96">96</abbr></abbrgrp> and Edwards <it>et al</it>. <abbrgrp><abbr bid="B99">99</abbr></abbrgrp>), all datasets related to one experiment passed the above criteria. To remove the redundancy with these two cases, the datasets constituting the subcomponents of the experiment were chosen over the full set of conditions. Specifically, the Sorensen <it>et al</it>. <abbrgrp><abbr bid="B96">96</abbr></abbrgrp> control timecourse and heat-shocked timecourse were used and the dataset consisting of all conditions was not used. Within the Edwards <it>et al</it>. <abbrgrp><abbr bid="B99">99</abbr></abbrgrp> datasets, two lines of flies were tested, so line 1 and line 2 were used and the full set of conditions was not used.</p>
               <p>The positively correlated gene pairs in the 20 datasets passing the above criteria were rescored and assigned a <it>LLS </it>according to the fit polynomial equation. This rescoring transformed a gene pair's correlation coefficient into a <it>LLS</it>.</p>
            </sec>
            <sec>
               <st>
                  <p>Weighted sum</p>
               </st>
               <p>The weighted sum (<it>WS</it>) was adapted from Lee <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B28">28</abbr></abbrgrp> and was calculated as follows:</p>
               <p>
                  <display-formula id="M4">
                     <graphic file="gb-2009-10-9-r97-i11.gif"/>
                  </display-formula>
               </p>
               <p><it>LLS </it>values for a gene pair across all <it>k </it>datasets were ordered from largest to smallest <it>LLS</it><sub><it>i </it></sub>&#8805; <it>LLS</it><sub><it>i</it>+1</sub>, &#8704;<it>i</it>; 0 &#8804; <it>i </it>&#8804; <it>k </it>1, <it>M </it>is a free parameter and can be adjusted to increase or decrease the contribution of subsequently ranked <it>LLS</it>s. It should be noted that ignoring the denominator (<it>i</it>&#183;<it>M</it>) and simply summing all <it>LLS</it>s across the <it>k </it>datasets is akin to a na&#239;ve Bayesian integration. This assumes uniform priors on each of the <it>k </it>datasets. Although, this method of integration is not completely Bayesian as the values being summed are <it>LLS</it>s and not probabilities. The opposite of ignoring the denominator is to set <it>M </it>&#8594; &#8734;. This causes the <it>WS </it>calculation to consider only the <it>0</it><sup><it>th </it></sup>ranked <it>LLS </it>(that is, <it>WS </it>= <it>LLS</it><sub>0</sub>). To test a range of integration scores, <it>WS </it>calculations were made for all gene pairs where <it>M </it>&#8712; {1,2,5,10,100}, <it>M </it>&#8594; &#8734;, and also for the na&#239;ve method. These seven <it>WS </it>calculations were selected to cover a range of different weighting schemes.</p>
               <p>The KEGG pathways were used to validate functional relationships in the integrated network <abbrgrp><abbr bid="B113">113</abbr></abbrgrp>. To test the overlap between KEGG and GO, we compared gene-gene associations derived from KEGG pathways and the set of GO:BP annotated gene pairs used in our analysis. This comparison revealed that roughly a quarter of the gene pairs from KEGG pathways are also present as gene pairs in GO:BP.</p>
               <p>Gene IDs for each KEGG pathway were mapped to the v5.3 genome annotation. The genes in each pathway were tested against a network through the measure of coherence. The network is a graph and can be defined as <it>G</it>&#10216;<it>V</it>, <it>E</it>&#10217; with <it>V </it>vertices (genes) and <it>E </it>edges (functional relationships). The set of KEGG pathways is defined as <it>K </it>= {<it>K</it><sub>1</sub>, <it>K</it><sub>2</sub>,...,<it>K</it><sub><it>n</it></sub>}, where <it>K</it><sub><it>i </it></sub>is the set of genes defined by KEGG pathway <it>K</it><sub><it>i</it></sub>. The greatest connected component for <it>K</it><sub><it>i</it></sub>, noted <inline-formula><graphic file="gb-2009-10-9-r97-i12.gif"/></inline-formula>, was determined by the greatest number of genes in <it>K</it><sub><it>i </it></sub>present and creating a connected component in <it>G</it>&#10216;<it>V</it>, <it>E</it>&#10217;. The coherence for <it>K</it><sub><it>i </it></sub>was then calculated as <inline-formula><graphic file="gb-2009-10-9-r97-i13.gif"/></inline-formula>. Twenty-five pathways were selected to evaluate the <it>WS </it>integrated networks (Figure <figr fid="F3">3</figr>; the 25 pathways are marked with asterisks in Table S5 at <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>). The 25 KEGG pathways were selected because they consistently showed the highest coherence amongst all the KEGG pathways tested.</p>
               <p>The scores for each of the seven <it>WS </it>calculations were rank ordered, then networks were built starting from the top 1,000 scoring gene pairs in increasing intervals to networks of one million edges. The average coherence of the 25 pathways over each of the size intervals was measured (Figure <figr fid="F3">3</figr>). The curves in Figure <figr fid="F3">3</figr> were then used to determine the smallest network size that provides a high overall coherence across KEGG pathways, since the average coherence varies as a function of the size of the network. We identify the points on the curve where the gain in average coherence flattens as the size of the network increases. These points of the curves occur at network sizes of 20 <it>K </it>and 200 <it>K</it>. These two network sizes are used throughout the rest of this study.</p>
               <p>After establishing the network sizes, we aimed to optimize the <it>M </it>parameter in the <it>WS </it>score to provide the greatest average KEGG pathway coherence. Since most of the coherence was gained by the network size of 200 <it>K </it>gene pairs, this network was used to evaluate seven <it>WS </it>integration schemes. This was done by measuring the AUC. Large gains of KEGG pathway coherence in the smaller sized networks results in a higher AUC, while slow or little gain in coherence results in a low AUC. Thus, the AUC (Figure <figr fid="F3">3</figr>) is a means of assessing how well a <it>WS </it>integration method recovers KEGG pathway relationships. By iteratively testing networks built with increasing <it>M </it>values from 1, we determined the <it>WS </it>integration where <it>M </it>= 1.8 maximized the AUC for the network size of 200 <it>K </it>edges.</p>
               <p>All KEGG pathways having at least ten <it>D. melanogaster </it>genes were tested individually against the <it>WS </it>network, where <it>M </it>= 1.8 at a size of 200 <it>K </it>edges. In total, 63 pathways were tested. Statistically significant coherence measures were evaluated through permutation testing; an empirical null distribution of coherence values was calculated by randomly sampling 1,000 times a set of genes equivalent to |<it>K</it><sub><it>i</it></sub>|. A single-sample Wilcoxon ranked-sum statistic was used to measure the significance of <it>K</it><sub><it>i </it></sub>when compared to the null distribution. <it>P</it>-values were adjusted using a Bonferroni correction.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Markov random field method to predict GO:BP</p>
            </st>
            <p>We employed the MRF method implemented by Letovsky and Kasif <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> to predict gene function utilizing an integrated network and known GO:BP terms (excluding IEA, ND, and NR evidence codes). The probability for a gene being annotated with a GO:BP term can be calculated as follows (note that the equations are taken from Letovsky and Kasif <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> and further detail can be found in their manuscript):</p>
            <p>
               <display-formula id="M5">
                  <graphic file="gb-2009-10-9-r97-i14.gif"/>
               </display-formula>
            </p>
            <p>where <it>L</it><sub><it>i</it>, <it>t </it></sub>is a Boolean random variable dependent on gene <it>i </it>and term <it>t</it>, <it>N</it><sub><it>i </it></sub>is the number of genes directly adjacent to <it>i</it>, and <it>k</it><sub><it>i</it>, <it>t </it></sub>is the number of genes directly adjacent to <it>i </it>that are annotated with term <it>t</it>. The authors also make the assumption that the degree distribution of nodes labeled with <it>t </it>is not significantly different than the overall degree distribution. While this assumption does not hold for all terms <it>t </it>in our study, it does for the majority; therefore, we also make this assumption. Ultimately, the authors develop the probabilistic neighborhood function:</p>
            <p>
               <display-formula id="M6">
                  <graphic file="gb-2009-10-9-r97-i15.gif"/>
               </display-formula>
            </p>
            <p>where <it>f</it><sub><it>t </it></sub>is the frequency of term <it>t </it>in the network, <it>p</it><sub>0 </sub>is the probability that any given gene in the network annotated with term <it>t </it>is NOT connected to another gene annotated with term <it>t</it>, while <it>p</it><sub>1 </sub>is the probability that any given gene in the network annotated with term <it>t </it>IS connected to another gene annotated with term <it>t</it>. <it>&#955; </it>can be described as the ratio of the weighted frequency of the presence of term <it>t </it>annotated to the neighbors of gene <it>i </it>over the weighed frequency of the neighbor genes not annotated with term <it>t</it>. The ratio relies on the binomial distribution <inline-formula><graphic file="gb-2009-10-9-r97-i16.gif"/></inline-formula>. The MRF method produces a probability for a gene by GO:BP term basis and was run on the networks of size 20 <it>K </it>and 200 <it>K</it>.</p>
            <sec>
               <st>
                  <p>Prediction evaluation (precision/recall)</p>
               </st>
               <p>The GO:BP predictions were evaluated using tenfold cross-validation. All genes annotated with GO:BP terms were randomly divided into ten equal sets, <it>G </it>= {<it>G</it><sub>1</sub>, <it>G</it><sub>2</sub>,...,<it>G</it><sub>10</sub>}. The following methods are performed for each of the ten sets in <it>G</it>. The annotations for all the genes in set <it>G</it><sub><it>n </it></sub>(where <it>n </it>= {1, ..., 10}) were masked from their corresponding genes. The <it>LLS </it>and <it>WS </it>integration, where <it>M </it>= 1.8, were recalculated for each dataset. Note that just the annotations are removed from the set of genes, but the genes remain in the analysis. The newly calculated <it>WS </it>relationships were rank ordered and networks with the top 20 <it>K </it>values and 200 <it>K </it>values were built. These two networks along with the GO:BP annotations from sets {<it>G</it><sub>1</sub>, ..., <it>G</it><sub>10</sub>}-{<it>G</it><sub><it>n</it></sub>} were then used as input to the MRF prediction method. Predictions were made on all genes in the network and measures can be used to evaluate the performance of predictions in relation to the held-out annotations for <it>G</it><sub><it>n</it></sub>.</p>
               <p>Two methods were used to evaluate the GO:BP predictions made on the genes in <it>G</it><sub><it>n</it></sub>. First, the precision (<inline-formula><graphic file="gb-2009-10-9-r97-i17.gif"/></inline-formula>) and recall (<inline-formula><graphic file="gb-2009-10-9-r97-i18.gif"/></inline-formula>) were calculated with respect to GO:BP terms and also with respect to the genes (<it>tp </it>= true positives, <it>fp </it>= false positive, and <it>fn </it>= false negative). The second method measured the semantic similarity (<it>SS</it>) between the known set of annotations for a gene and the predicted terms for that gene.</p>
               <p>Precision and recall with respect to the GO:BP terms were calculated as follows. A true positive prediction was called if the predicted term exactly matched a known, held-out term, or the known term's parent(s), or the known term's child(ren) (&#177; 1 level in the GO with respect to one GO term). A false positive was called if the predicted term did not match a known, held-out term or a parent or child of the known term. A false negative was called for any known, held-out annotation not called a true positive. It should be noted that we also tested a more stringent criterion of requiring predictions to exactly match known GO:BP terms and a less stringent criterion where predictions can match &#177; 2 levels in the GO hierarchy. The evaluation method we used is a fair balance between the more and less stringent criteria and the precision/recall values followed the same trends for each of the three tested criteria.</p>
               <p>A measure of precision and recall was also calculated in relation to the gene. Extrapolated from the evaluation methods of GO:BP terms, we counted a true positive gene prediction if a gene had at least one true positive GO:BP term prediction. In other words, a true positive gene was called if the intersection between previously known, held-out terms and predicted terms was at least 1. A false positive gene was called if GO:BP terms were predicted on a gene, but none matched the known, held-out terms (intersection of 0) and false negatives were called on genes that had known, held-out GO:BP terms, but a GO:BP prediction was not made on the gene.</p>
            </sec>
            <sec>
               <st>
                  <p>Prediction evaluation (semantic similarity)</p>
               </st>
               <p>In addition to precision and recall, we calculated <it>SS </it>between the set of held-out terms and predicted terms for the same gene. We employed the <it>SS </it>calculation developed by Wang <it>et al</it>. <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. Briefly, each GO term is assigned a semantic value based on the term's location in the GO hierarchy and the relationship types between ancestor GO terms 'is-a' and 'part-of'. The <it>SS </it>between two GO terms was calculated by considering the location of both terms in the ontology and the relationships between the ancestor GO terms jointly. <it>SS </it>between two sets of GO terms, which is representative of the annotations of two genes, was calculated by iteratively comparing each GO term from the held-out set to the GO terms from the set of predicted terms, and <it>vice versa</it>. This method calculates a single <it>SS </it>measure on the interval [0,1] for each annotated gene pair compared.</p>
               <p>To determine a reliable <it>SS </it>threshold, we measured the <it>SS </it>between all reported GI gene pairs where each gene in the pair was annotated with at least one GO:BP term. GIs provided the highest <it>LLS </it>for any dataset and, therefore, was used as the benchmark set for <it>SS </it>scores. The median measure of <it>SS </it>for GIs was calculated to be 0.45, which we determined to be the threshold to consider a <it>SS </it>score reliable.</p>
            </sec>
            <sec>
               <st>
                  <p>Prediction evaluation (comparison with sequence similarity)</p>
               </st>
               <p>The translated protein sequences for each of the 483 genes tested were downloaded from FlyBase FB2007_02 <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. The sequences were searched against the NCBI nr database using BLASTp with an <it>E</it>-value cutoff of 10<sup>-6</sup>. Sequence hits with less than 40% identity were removed. Also, all sequences from the <it>Drosophila </it>genus were removed. The top 10 BLAST hits for each of the 483 genes were taken and the GO:BP annotations for these BLAST hits were downloaded from the GO database <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. The mapping between BLAST results and GO term annotations was done through UniProt IDs. All GO:BP annotations were directly transferred to the <it>D. melanogaster </it>gene from the top ten BLAST hits.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Analysis of Teleman <it>et al</it>. gene expression data</p>
            </st>
            <p>Processed gene expression data from Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp> were downloaded from ArrayExpress <abbrgrp><abbr bid="B114">114</abbr></abbrgrp> under accession number [ArrayExpress:E-TABM-375]. Normalization and filtering was done following the methods in Teleman <it>et al</it>. Expression ratios for replicate spots were averaged.</p>
            <sec>
               <st>
                  <p>Subnetwork construction algorithm</p>
               </st>
               <p>The goal of the subnetwork construction algorithm is to build a tightly connected subnetwork around a set of query genes. This was done by first defining a set of query genes, <it>Q</it>. This set is user defined and in this case is a set of genes that share a common biological process. We are given a graph <it>G </it>= &#10216;<it>V</it>, <it>E</it>&#10217;, where <it>v</it><sub><it>i</it></sub>&#8712; <it>V </it>and <it>v</it><sub><it>i</it></sub>, <it>v</it><sub><it>j</it></sub>&#8712; <it>E</it>. In this analysis, <it>G </it>= <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <it>Q </it>&#8834; <it>V</it>. We want to find a new set of genes, <it>Q</it>', that contains all <it>v</it><sub><it>i </it></sub>that meet the following criteria: <it>v</it><sub><it>i </it></sub>&#8712; <it>V</it>, <it>v</it><sub><it>i </it></sub>&#8713; <it>Q</it>, <it>v</it><sub><it>i</it></sub>, <it>v</it><sub><it>j </it></sub>&#8712; <it>E</it>, <it>v</it><sub><it>i</it></sub>, <it>v</it><sub><it>k </it></sub>&#8712; <it>E</it>, <it>v</it><sub><it>j </it></sub>&#8712; <it>Q</it>, and <it>v</it><sub><it>k </it></sub>&#8712; <it>Q</it>. In other words, we want to find all nodes in <it>G </it>that are not already present in <it>Q </it>and have an edge between at least two nodes in <it>Q</it>. This new set of nodes, <it>Q</it>', is then added to <it>Q </it>(<it>Q </it>= <it>Q </it>&#8746; <it>Q</it>'). A second iteration of this procedure is performed to find a new set <it>Q</it>' in relation to <it>Q</it>. The two sets are again combined to form the final set <it>Q</it>. The subnetwork <it>G</it>' is returned, where <it>G</it>' &#8834; <it>G </it>and <it>G</it>' = &#10216;<it>Q</it>, <it>E</it>'&#10217;, <it>E</it>' &#8834; <it>E</it>.</p>
            </sec>
            <sec>
               <st>
                  <p>Gene set enrichment analysis</p>
               </st>
               <p>All genes from the wild-type muscle tissue gene expression experiment (fed versus starved larvae) were rank ordered according to their log-transformed ratio values. Gene sets were defined for the following categories: category 1, the functional categories reported in Teleman <it>et al</it>.; category 2, the genes from the subnetworks constructed from query seed sets from category 1; category 3, genes listed in KEGG pathways; and category 4, the three GO categories of biological process, molecular function, and cellular component. Gene sets from category 1 were taken directly from the list of genes reported in the figures of Teleman <it>et al</it>. Gene sets from category 2 were defined as the genes present in a seed set (gene set in category 1) in addition to the genes from the network constructed according to the subnetwork construction algorithm. Genes that were present in sets from category 1 but not found in the integrated network were not included in any sets in category 2. Gene sets from category 3 were defined by the genes in individual KEGG pathways. Gene sets from category 4 were defined by the genes annotated to individual GO terms. Gene-GO term sets were parsed directly from all associations defined by FlyBase (including IEA, NR, and ND) <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>.</p>
               <p>The GSEA <abbrgrp><abbr bid="B81">81</abbr></abbrgrp> software was run using the 'GseaPreranked' option, with the rank ordered list of wild-type muscle expression ratios and all gene sets as input. Gene sets smaller than 15 and bigger than 500 were ignored and default weighting parameters were used.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>AUC: area under the curve; DE: differentially expressed; GEO: Gene Expression Omnibus; GI: genetic interaction; GSEA: gene set enrichment analysis; GO: Gene Ontology; GO:BP: Gene Ontology biological process; <it>IC</it>: information content; KEGG: Kyoto Encyclopedia of Genes and Genomes; <it>LLS</it>: log-likelihood score; MA: microarray; MRF: Markov random field; ND: No biological data available; NR: not recorded; PPI: protein-protein interaction; SRP: signal recognition particle; <it>SS</it>: semantic similarity; <it>WS</it>: weighted sum; Y2H: yeast-two-hybrid.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JA conceived the project. MMD, BDE, JCC, and JA were involved in developing the project. JCC, SMB, JRG, RP, and SM performed data processing and computation. JCC, JA, and MMD wrote the paper.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data files are available with the online version of this paper: the matrix of values used to create Figure <figr fid="F2">2</figr> (Additional data file <supplr sid="S1">1</supplr>); the full set of GO:BP predictions made for <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> using the MRF method (Additional data file <supplr sid="S2">2</supplr>); the full set of GO:BP predictions made for <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> using the MRF method (Additional data file <supplr sid="S3">3</supplr>); the filtered set of GO:BP predictions made for the 483 genes discussed in the text from <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> using the MRF method (Additional data file <supplr sid="S4">4</supplr>); a table of information related to the 70 genes found in Figure <figr fid="F10">10b</figr> (Additional data file <supplr sid="S5">5</supplr>).</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>The matrix of values used to create Figure <figr fid="F2">2</figr></p>
            </caption>
            <text>
               <p>The matrix contains GO:BP terms by dataset with the <it>P</it>-value reported in each cell.</p>
            </text>
            <file name="gb-2009-10-9-r97-S1.TXT">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>The full set of GO:BP predictions made for <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> using the MRF method</p>
            </caption>
            <text>
               <p>All predictions meet the criteria of <it>t</it><sub><it>p </it></sub>&#8805; 0.1.</p>
            </text>
            <file name="gb-2009-10-9-r97-S2.XLS">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>The full set of GO:BP predictions made for <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> using the MRF method</p>
            </caption>
            <text>
               <p>All predictions meet the criteria of <it>t</it><sub><it>p </it></sub>&#8805; 0.1.</p>
            </text>
            <file name="gb-2009-10-9-r97-S3.XLS">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>The filtered set of GO:BP predictions made for the 483 genes discussed in the text from <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> using the MRF method</p>
            </caption>
            <text>
               <p>In addition to gene-GO:BP predictions, the file also contains all GO annotations from v5.3 and v5.7 of the <it>D. melanogaster </it>genome, best blast hits to the NCBI nr database, and any GO annotations transferred to the gene based on sequence similarity. Genes having any evidence supporting the GO:BP prediction are marked.</p>
            </text>
            <file name="gb-2009-10-9-r97-S4.XLS">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>Information related to the 70 genes found in Figure <figr fid="F10">10b</figr></p>
            </caption>
            <text>
               <p>A set of signal recognition particle (SRP)-related genes were taken from Teleman <it>et al</it>. <abbrgrp><abbr bid="B79">79</abbr></abbrgrp> and a network was built around this query set of genes from the connections in <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula>. This file contains information related to gene expression, GO terms, and notes on whether a gene is related to the protein secretory pathway. Networks <inline-formula><graphic file="gb-2009-10-9-r97-i2.gif"/></inline-formula> and <inline-formula><graphic file="gb-2009-10-9-r97-i3.gif"/></inline-formula> along with the full set of functional relationships between <it>Drosophila </it>gene pairs are made available at <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
            </text>
            <file name="gb-2009-10-9-r97-S5.XLS">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We would like to thank the Center for Genomics and Bioinformatics for their computer support. Computing resources provided by the Center for Genomics and Bioinformatics were supported in part by the METACyt Initiative of Indiana University, funded by a major grant from the Lilly Endowment. We would like to thank A Teleman for supplying data. Lastly, we would like to thank the three reviewers for helpful comments.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The Gene Ontology project in 2008.</p>
            </title>
            <aug>
               <au>
                  <cnm>The Gene Ontology Consortium</cnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <volume>36</volume>
            <fpage>D440</fpage>
            <lpage>444</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238979</pubid>
                  <pubid idtype="pmpid" link="fulltext">17984083</pubid>
                  <pubid idtype="doi">10.1093/nar/gkm883</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Why are there still over 1000 uncharacterized yeast genes?</p>
            </title>
            <aug>
               <au>
                  <snm>Pena-Castillo</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2007</pubdate>
            <volume>176</volume>
            <fpage>7</fpage>
            <lpage>14</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17435240</pubid>
                  <pubid idtype="doi">10.1534/genetics.107.074468</pubid>
                  <pubid idtype="pmcid">1893027</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Predicting protein function from sequence and structural data.</p>
            </title>
            <aug>
               <au>
                  <snm>Watson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Laskowski</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>275</fpage>
            <lpage>284</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.sbi.2005.04.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">15963890</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Automatic prediction of protein function.</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nair</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wrzeszczynski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ofran</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Cell Mol Life Sci</source>
            <pubdate>2003</pubdate>
            <volume>60</volume>
            <fpage>2637</fpage>
            <lpage>2650</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00018-003-3114-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">14685688</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A Bayesian framework for combining heterogeneous data sources for gene function prediction (in <it>Saccharomyces cerevisiae</it>).</p>
            </title>
            <aug>
               <au>
                  <snm>Troyanskaya</snm>
                  <fnm>OG</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Owen</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>8348</fpage>
            <lpage>8353</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">166232</pubid>
                  <pubid idtype="pmpid" link="fulltext">12826619</pubid>
                  <pubid idtype="doi">10.1073/pnas.0832373100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A combined algorithm for genome-wide prediction of protein function.</p>
            </title>
            <aug>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <fpage>83</fpage>
            <lpage>86</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/47048</pubid>
                  <pubid idtype="pmpid" link="fulltext">10573421</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Probabilistic protein function prediction from heterogeneous genome-wide data.</p>
            </title>
            <aug>
               <au>
                  <snm>Nariai</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kolaczyk</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>PLoS ONE</source>
            <pubdate>2007</pubdate>
            <volume>2</volume>
            <fpage>e337</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1828618</pubid>
                  <pubid idtype="pmpid" link="fulltext">17396164</pubid>
                  <pubid idtype="doi">10.1371/journal.pone.0000337</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>A Bayesian networks approach for predicting protein-protein interactions from genomic data.</p>
            </title>
            <aug>
               <au>
                  <snm>Jansen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Greenbaum</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kluger</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Krogan</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Chung</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Emili</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Snyder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>449</fpage>
            <lpage>453</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1087361</pubid>
                  <pubid idtype="pmpid" link="fulltext">14564010</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Getting connected: analysis and principles of biological networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Snyder</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2007</pubdate>
            <volume>21</volume>
            <fpage>1010</fpage>
            <lpage>1024</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gad.1528707</pubid>
                  <pubid idtype="pmpid" link="fulltext">17473168</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A probabalistic view of gene function.</p>
            </title>
            <aug>
               <au>
                  <snm>Fraser</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2004</pubdate>
            <volume>36</volume>
            <fpage>559</fpage>
            <lpage>564</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1370</pubid>
                  <pubid idtype="pmpid" link="fulltext">15167932</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The model organism as a system: integrating 'omics' datasets.</p>
            </title>
            <aug>
               <au>
                  <snm>Joyce</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Palsson</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nat Rev Mol Cell Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>198</fpage>
            <lpage>210</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrm1857</pubid>
                  <pubid idtype="pmpid" link="fulltext">16496022</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>A probabilistic functional network of yeast genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Date</snm>
                  <fnm>SV</fnm>
               </au>
               <au>
                  <snm>Adai</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>306</volume>
            <fpage>1555</fpage>
            <lpage>1558</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1099511</pubid>
                  <pubid idtype="pmpid" link="fulltext">15567862</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Integrating high-throughput and computational data elucidates bacterial networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Covert</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Knight</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Reed</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Herrgard</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Palsson</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>429</volume>
            <fpage>92</fpage>
            <lpage>96</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02456</pubid>
                  <pubid idtype="pmpid" link="fulltext">15129285</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Integration of omics data: how well does it work for bacteria?</p>
            </title>
            <aug>
               <au>
                  <snm>DeKeersmaecker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Thijs</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Vanderleyden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Marchal</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>62</volume>
            <fpage>1239</fpage>
            <lpage>1250</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.2006.05453.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">17040488</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Combining biological networks to predict genetic interactions.</p>
            </title>
            <aug>
               <au>
                  <snm>Wong</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>LV</fnm>
               </au>
               <au>
                  <snm>Tong</snm>
                  <fnm>AHY</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Goldberg</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>OD</fnm>
               </au>
               <au>
                  <snm>Lesage</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bussey</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Boone</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>FP</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>15682</fpage>
            <lpage>15687</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">524818</pubid>
                  <pubid idtype="pmpid" link="fulltext">15496468</pubid>
                  <pubid idtype="doi">10.1073/pnas.0406614101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A single gene network accurately predicts phenotypic effects of gene perturbation in <it>Caenorhabditis elegans</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lehner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Crombie</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2008</pubdate>
            <volume>40</volume>
            <fpage>181</fpage>
            <lpage>188</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng.2007.70</pubid>
                  <pubid idtype="pmpid" link="fulltext">18223650</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Gene prioritization through genomic data fusion.</p>
            </title>
            <aug>
               <au>
                  <snm>Aerts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lambrechts</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Maity</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Van Loo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Coessens</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>De Smet</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Tranchevent</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>De Moor</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Marynen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hassan</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Carmeliet</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2006</pubdate>
            <volume>24</volume>
            <fpage>537</fpage>
            <lpage>544</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1203</pubid>
                  <pubid idtype="pmpid" link="fulltext">16680138</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>ONCOMINE: a cancer microarray database and integrated data-mining platform.</p>
            </title>
            <aug>
               <au>
                  <snm>Rhodes</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shanker</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Deshpande</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Varambally</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ghosh</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Barrette</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Pandey</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chinnaiyan</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Neoplasia</source>
            <pubdate>2004</pubdate>
            <volume>6</volume>
            <fpage>1</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1635162</pubid>
                  <pubid idtype="pmpid" link="fulltext">15068665</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Franke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>van Bakel</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Fokkens</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>de Jong</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Egmont-Petersen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wijmenga</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Am J Hum Genet</source>
            <pubdate>2006</pubdate>
            <volume>78</volume>
            <fpage>1011</fpage>
            <lpage>1125</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1474084</pubid>
                  <pubid idtype="pmpid" link="fulltext">16685651</pubid>
                  <pubid idtype="doi">10.1086/504300</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Network modeling links breast cancer susceptibility and centrosome dysfunction.</p>
            </title>
            <aug>
               <au>
                  <snm>Pujana</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Starita</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>KN</fnm>
               </au>
               <au>
                  <snm>Tewari</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ahn</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Rennert</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Moreno</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Kirchhoff</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gold</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Assmann</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Elshamy</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Rual</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Levine</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rozek</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Gelman</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Gunsalus</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Greenberg</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Sobhian</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bertin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Venkatesan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ayivi-Guedehoussou</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sole</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Hernandez</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lazaro</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Nathanson</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Cusick</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Offit</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2007</pubdate>
            <volume>39</volume>
            <fpage>1338</fpage>
            <lpage>1349</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng.2007.2</pubid>
                  <pubid idtype="pmpid" link="fulltext">17922014</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Integrating computational biology and forward genetics in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Aerts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vilain</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tranchevent</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Barriot</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Moreau</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hassan</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Quan</snm>
                  <fnm>XJ</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2009</pubdate>
            <volume>5</volume>
            <fpage>e1000351</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2628282</pubid>
                  <pubid idtype="pmpid" link="fulltext">19165344</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.1000351</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Prediction of protein function using protein-protein interaction data.</p>
            </title>
            <aug>
               <au>
                  <snm>Deng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mehta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Shun</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <fpage>947</fpage>
            <lpage>960</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652703322756168</pubid>
                  <pubid idtype="pmpid" link="fulltext">14980019</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Genome-scale gene function prediction using multiple sources of high-throughput data in yeast <it>Saccharomyces cerevisiae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Joshi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Alexandrov</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>OMICS</source>
            <pubdate>2004</pubdate>
            <volume>8</volume>
            <fpage>322</fpage>
            <lpage>333</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/omi.2004.8.322</pubid>
                  <pubid idtype="pmpid" link="fulltext">15703479</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A critical assessment of <it>Mus musculus </it>gene function prediction using integrated genomic evidence.</p>
            </title>
            <aug>
               <au>
                  <snm>Pena-Castillo</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Tasan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Joshi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Guan</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Leone</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pagnani</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Krumpelman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Obozinski</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Qi</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Mostafavi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Berriz</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gibbons</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Lanckriet</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Qiu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Grant</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Barutcuoglu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Warde-Farley</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Grouios</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ray</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Deng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jordan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>W</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>S2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2447536</pubid>
                  <pubid idtype="pmpid" link="fulltext">18613946</pubid>
                  <pubid idtype="doi">10.1186/gb-2008-9-s1-s2</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Comparative genomics for reliable protein-function prediction from genomic data.</p>
            </title>
            <aug>
               <au>
                  <snm>Huynen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>vanNoort</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>340</fpage>
            <lpage>344</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2004.06.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">15262404</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Discovery of biological networks from diverse functional genomic data.</p>
            </title>
            <aug>
               <au>
                  <snm>Myers</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Robson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Wible</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hibbs</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Chiriac</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Theesfeld</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Troyanskaya</snm>
                  <fnm>OG</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R114</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1414113</pubid>
                  <pubid idtype="pmpid" link="fulltext">16420673</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-13-r114</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Exploring the human genome with functional maps.</p>
            </title>
            <aug>
               <au>
                  <snm>Huttenhower</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Haley</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Hibbs</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Dumeaux</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>C oller</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Troyanskaya</snm>
                  <fnm>OG</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2009</pubdate>
            <volume>19</volume>
            <fpage>1093</fpage>
            <lpage>1106</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.082214.108</pubid>
                  <pubid idtype="pmpid" link="fulltext">19246570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>An improved, bias-reduced probabilistic functional gene network of Baker's yeast, <it>Saccharomyces cerevisiae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>PLoS ONE</source>
            <pubdate>2007</pubdate>
            <volume>2</volume>
            <fpage>e988</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1991590</pubid>
                  <pubid idtype="pmpid" link="fulltext">17912365</pubid>
                  <pubid idtype="doi">10.1371/journal.pone.0000988</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Predicting gene function through systematic analysis and quality assessment of high-throughput data.</p>
            </title>
            <aug>
               <au>
                  <snm>Kemmeren</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kockelkorn</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bijma</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Donders</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Holstege</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1644</fpage>
            <lpage>1652</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti103</pubid>
                  <pubid idtype="pmpid" link="fulltext">15531615</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Predictive models of molecular machines involved in <it>Caenorhabditis elegans </it>early embryogenesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Gunsalus</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Ge</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Schetter</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Goldberg</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Hao</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Berriz</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Bertin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chuang</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hyman</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Sonnichsen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Echeverri</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>FP</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Piano</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>436</volume>
            <fpage>861</fpage>
            <lpage>865</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03876</pubid>
                  <pubid idtype="pmpid" link="fulltext">16094371</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Krumpelman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>S5</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2447539</pubid>
                  <pubid idtype="pmpid" link="fulltext">18613949</pubid>
                  <pubid idtype="doi">10.1186/gb-2008-9-s1-s5</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>A genomewide functional network for the laboratory mouse.</p>
            </title>
            <aug>
               <au>
                  <snm>Guan</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lemischka</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Troyanskaya</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2008</pubdate>
            <volume>4</volume>
            <fpage>e1000165</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2527685</pubid>
                  <pubid idtype="pmpid" link="fulltext">18818725</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.1000165</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Research resources for <it>Drosophila</it>: The expanding universe.</p>
            </title>
            <aug>
               <au>
                  <snm>Matthews</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gelbart</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>179</fpage>
            <lpage>193</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1554</pubid>
                  <pubid idtype="pmpid" link="fulltext">15738962</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Data pushing: a fly-centric guide to bioinformatics tools.</p>
            </title>
            <aug>
               <au>
                  <snm>Costello</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cash</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dalkilic</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Fly (Austin)</source>
            <pubdate>2008</pubdate>
            <volume>2</volume>
            <note>[Epub ahead of print]</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">18849648</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>FlyBase: enhancing <it>Drosophila </it>Gene Ontology annotations.</p>
            </title>
            <aug>
               <au>
                  <snm>Tweedie</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Falls</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Leyland</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>McQuilton</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Marygold</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Millburn</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Osumi-Sutherland</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Seal</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Consortium</snm>
                  <fnm>TF</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2009</pubdate>
            <volume>37</volume>
            <fpage>D555</fpage>
            <lpage>559</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2686450</pubid>
                  <pubid idtype="pmpid" link="fulltext">18948289</pubid>
                  <pubid idtype="doi">10.1093/nar/gkn788</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>FlyMine: an integrated database for <it>Drosophila </it>and Anopheles genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Lyne</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wakeling</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Varley</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Guillier</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Janssens</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ji</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Mclaren</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>North</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rana</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Watkins</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Woodbridge</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lilley</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>M</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mizuguchi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Micklem</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>R129</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2323218</pubid>
                  <pubid idtype="pmpid" link="fulltext">17615057</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-7-r129</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Evidence for large domains of similarly expressed genes in the <it>Drosophila </it>genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>J Biol</source>
            <pubdate>2002</pubdate>
            <volume>1</volume>
            <fpage>5</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">117248</pubid>
                  <pubid idtype="pmpid" link="fulltext">12144710</pubid>
                  <pubid idtype="doi">10.1186/1475-4924-1-5</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Prediction of gene expression in embryonic structures of <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Samsonova</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Niranjan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brazma</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <fpage>e144</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1924873</pubid>
                  <pubid idtype="pmpid" link="fulltext">17658945</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0030144</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>A brief history of <it>Drosophila </it>'s contribution to genome research.</p>
            </title>
            <aug>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>EB</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>287</volume>
            <fpage>2216</fpage>
            <lpage>2218</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.287.5461.2216</pubid>
                  <pubid idtype="pmpid" link="fulltext">10731135</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The BDGP gene disruption project: single transposon insertions associated with 40% of <it>Drosophila </it>genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Bellen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Levis</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Liao</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tsang</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Evans-Holm</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hiesinger</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Schulze</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hoskins</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Spradling</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2004</pubdate>
            <volume>167</volume>
            <fpage>761</fpage>
            <lpage>781</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1470905</pubid>
                  <pubid idtype="pmpid">15238527</pubid>
                  <pubid idtype="doi">10.1534/genetics.104.026427</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>A genome-wide transgenic RNAi library for conditional gene inactivation in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Dietzl</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Schnorrer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Su</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Barinova</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Fellner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gasser</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinsey</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Oppel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Scheiblauer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Couto</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Marra</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Keleman</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dickson</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>448</volume>
            <fpage>151</fpage>
            <lpage>156</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05954</pubid>
                  <pubid idtype="pmpid" link="fulltext">17625558</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Discovery of functional elements in 12 <it>Drosophila </it>genomes using evolutionary signatures.</p>
            </title>
            <aug>
               <au>
                  <snm>Stark</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kheradpour</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pederson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Parts</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rasmussen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Roy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Deogras</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ruby</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brennecke</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <cnm>Harvard FlyBase curators, Berkeley Drosophila Genome Project</cnm>
               </au>
               <au>
                  <snm>Hodges</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Caspi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Paten</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Maeder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Polansky</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Robson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Aerts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>vanHelden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hassan</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eastman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weir</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>450</volume>
            <fpage>219</fpage>
            <lpage>232</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2474711</pubid>
                  <pubid idtype="pmpid" link="fulltext">17994088</pubid>
                  <pubid idtype="doi">10.1038/nature06340</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Evolution of genes and genomes on the <it>Drosophila </it>phylogeny.</p>
            </title>
            <aug>
               <au>
                  <cnm>Drosophila 12 Genomes Consortium</cnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bergman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Markow</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kaufman</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>W</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Iyer</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Pollard</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sackton</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Larracuente</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Singh</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Abad</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Abt</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Adryan</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Aguade</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Akashi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Andreson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Aguadro</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ardell</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Arguello</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Artieri</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Barbash</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Barsanti</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Batterham</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>450</volume>
            <fpage>203</fpage>
            <lpage>218</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature06341</pubid>
                  <pubid idtype="pmpid" link="fulltext">17994087</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Unlocking the secrets of the genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Celniker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dillon</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gunsalus</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Karpen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lai</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lieb</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>MacAlpine</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Micklem</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Piano</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Snyder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stein KL</snm>
                  <fnm>White</fnm>
               </au>
               <au>
                  <snm>Waterson</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>modENCODE</snm>
                  <fnm>Consortium</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2009</pubdate>
            <volume>459</volume>
            <fpage>927</fpage>
            <lpage>930</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/459927a</pubid>
                  <pubid idtype="pmpid" link="fulltext">19536255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>The ENCODE(ENCyclopedia Of DNA Elements).</p>
            </title>
            <aug>
               <au>
                  <cnm>The ENCODE Project Consortium</cnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>306</volume>
            <fpage>636</fpage>
            <lpage>640</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1105136</pubid>
                  <pubid idtype="pmpid" link="fulltext">15499007</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>FlyBase</p>
            </title>
            <url>http://www.flybase.net</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Predicting protein function from protein/protein interaction data: a probabilistic approach.</p>
            </title>
            <aug>
               <au>
                  <snm>Letovsky</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>i197</fpage>
            <lpage>i204</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg1026</pubid>
                  <pubid idtype="pmpid" link="fulltext">12855458</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Supplemental</p>
            </title>
            <url>http://www.indigene.org</url>
         </bibl>
         <bibl id="B49">
            <title>
               <p>BIND - The Biomolecular Interaction Network Database.</p>
            </title>
            <aug>
               <au>
                  <snm>Bader</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Donaldson</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Wolting</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ouellette</snm>
                  <fnm>BFF</fnm>
               </au>
               <au>
                  <snm>Pawson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hogue</snm>
                  <fnm>CWV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>242</fpage>
            <lpage>245</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29820</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125103</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.242</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions.</p>
            </title>
            <aug>
               <au>
                  <snm>Xenarios</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Salwinski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Duan</snm>
                  <fnm>XJ</fnm>
               </au>
               <au>
                  <snm>Higney</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>303</fpage>
            <lpage>305</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99070</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752321</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.303</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>A <it>Drosophila </it>protein-interaction map centered on cell-cycle regulators.</p>
            </title>
            <aug>
               <au>
                  <snm>Stanyon</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mangiola</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Giot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kuang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhong</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Finley</snm>
                  <fnm>RL</fnm>
                  <suf>Jr</suf>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R96</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545799</pubid>
                  <pubid idtype="pmpid" link="fulltext">15575970</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-12-r96</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>BioGRID: a general repository for interaction datasets.</p>
            </title>
            <aug>
               <au>
                  <snm>Stark</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Breitkreutz</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Reguly</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Boucher</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Breitkreutz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tyers</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D535</fpage>
            <lpage>D539</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347471</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381927</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj109</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>IntAct-open source resource for molecular interaction data.</p>
            </title>
            <aug>
               <au>
                  <snm>Kerrien</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Alam-Faruque</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Aranda</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bancarz</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bridge</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Derow</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dimmer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Feuermann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Friedrichsen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Huntley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kohler</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Khadake</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Leroy</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Liban</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lieftink</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Montecchi-Palazzi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Orchard</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Risse</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Robbe</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Roechert</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Thorneycroft</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hermjakob</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>D561</fpage>
            <lpage>D565</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1751531</pubid>
                  <pubid idtype="pmpid" link="fulltext">17145710</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl958</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>A protein interaction map of <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Giot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Brouwer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chaudhuri</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kuang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hao</snm>
                  <fnm>YL</fnm>
               </au>
               <au>
                  <snm>Ooi</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Godwin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Vitols</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Vijayadamodar</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pochart</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Machineni</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Welsh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kong</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zerhusen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Malcolm</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Varrone</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Collis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Minto</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Burgess</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McDaniel</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Stimpson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Spriggs</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Neurath</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ioime</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Agee</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Voss</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Furtak</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>1727</fpage>
            <lpage>1736</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090289</pubid>
                  <pubid idtype="pmpid" link="fulltext">14605208</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Supplemental Figures and Tables</p>
            </title>
            <url>http://www.indigene.org/downloads/Costello_Suppl_Data.pdf</url>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Gene Ontology</p>
            </title>
            <url>http://www.geneontology.org/</url>
         </bibl>
         <bibl id="B57">
            <title>
               <p>The Toll and Imd pathways are the major regulators of the immune response in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>De Gregorio</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Tzou</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Lemaitre</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>2002</pubdate>
            <volume>21</volume>
            <fpage>2568</fpage>
            <lpage>2579</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">126042</pubid>
                  <pubid idtype="pmpid" link="fulltext">12032070</pubid>
                  <pubid idtype="doi">10.1093/emboj/21.11.2568</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Genome-wide gene expression in response to parasitoid attack in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Wertheim</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kraaijeveld</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Schuster</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Blanc</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hopkins</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pletcher</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Strand</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Partridge</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Godfray</snm>
                  <fnm>HC</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R94</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1297650</pubid>
                  <pubid idtype="pmpid" link="fulltext">16277749</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-11-r94</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Transcriptional control in embryonic <it>Drosophila </it>midline guidance assessed through a whole genome approach.</p>
            </title>
            <aug>
               <au>
                  <snm>Magalhaes</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Palmer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tomancak</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pollard</snm>
                  <fnm>KS</fnm>
               </au>
            </aug>
            <source>BMC Neurosci</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>59</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17672901</pubid>
                  <pubid idtype="doi">10.1186/1471-2202-8-59</pubid>
                  <pubid idtype="pmcid">1950096</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Assessment of the reliability of protein-protein interactions and protein function prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Deng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2003</pubdate>
            <fpage>140</fpage>
            <lpage>151</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12603024</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Gene expression during the life cycle of <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Arbeitman</snm>
                  <fnm>MN</fnm>
               </au>
               <au>
                  <snm>Furlong</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Imam</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Null</snm>
                  <fnm>BH</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Krasnow</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>KP</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>297</volume>
            <fpage>2270</fpage>
            <lpage>2275</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1072152</pubid>
                  <pubid idtype="pmpid" link="fulltext">12351791</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>KEGG: Kyoto Encyclopedia of Genes and Genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Goto</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>27</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102409</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592173</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.27</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>Integrated <it>Drosophila </it>Gene Network with 20 K Edges</p>
            </title>
            <url>http://www.indigene.org/downloads/Costello_20K_network.cys</url>
         </bibl>
         <bibl id="B64">
            <title>
               <p>Integrated <it>Drosophila </it>Gene Network with 200 K Edges</p>
            </title>
            <url>http://www.indigene.org/downloads/Costello_200K_network.cys</url>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Full Set of Integrated <it>Drosophila </it>Data</p>
            </title>
            <url>http://www.indigene.org/downloads/Costello_All_Data.tar.gz</url>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Cytoscape: a software environment for integrated models of biomolecular interaction networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Shannon</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Markiel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ozier</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Baliga</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ramage</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Amin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Schwikowski</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ideker</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2498</fpage>
            <lpage>2504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403769</pubid>
                  <pubid idtype="pmpid" link="fulltext">14597658</pubid>
                  <pubid idtype="doi">10.1101/gr.1239303</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>An automated method for finding molecular complexes in large protein interaction networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Bader</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Hogue</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">149346</pubid>
                  <pubid idtype="pmpid" link="fulltext">12525261</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-4-2</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Mutation of TweedleD, a member of an unconventional cuticle protein family, alters body shape in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Guan</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Middlebrooks</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Alexander</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wasserman</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <fpage>16794</fpage>
            <lpage>16799</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1636534</pubid>
                  <pubid idtype="pmpid" link="fulltext">17075064</pubid>
                  <pubid idtype="doi">10.1073/pnas.0607616103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Characterization of RACK1 function in <it>Drosophila </it>development.</p>
            </title>
            <aug>
               <au>
                  <snm>Kadrmas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pronovost</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Beckerle</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Dev Dyn</source>
            <pubdate>2007</pubdate>
            <volume>236</volume>
            <fpage>2207</fpage>
            <lpage>2215</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17584887</pubid>
                  <pubid idtype="doi">10.1002/dvdy.21217</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>Cpc2/RACK1 is a ribosome-associated protein that promotes efficient translation in <it>Schizosaccharomyces pombe</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Shor</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Calaycay</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rushbrook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>McLeod</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2003</pubdate>
            <volume>278</volume>
            <fpage>49119</fpage>
            <lpage>49128</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M303968200</pubid>
                  <pubid idtype="pmpid" link="fulltext">12972434</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Yeast Asc1p and mammalian RACK1 are functionally orthologous core 40S ribosomal proteins that repress gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Gerbasi</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Weaver</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Link</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>2004</pubdate>
            <volume>24</volume>
            <fpage>8276</fpage>
            <lpage>8287</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">515043</pubid>
                  <pubid idtype="pmpid" link="fulltext">15340087</pubid>
                  <pubid idtype="doi">10.1128/MCB.24.18.8276-8287.2004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Identification of the versatile scaffold protein RACK1 on the eukaryotic ribosome by cryo-EM.</p>
            </title>
            <aug>
               <au>
                  <snm>Sengupta</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nilsson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gursky</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Spahn</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Nissen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Frank</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nat Struct Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>957</fpage>
            <lpage>962</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsmb822</pubid>
                  <pubid idtype="pmpid" link="fulltext">15334071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>A new method to measure the semantic similarity of GO terms.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>JZ</fnm>
               </au>
               <au>
                  <snm>Du</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Payattakool</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>CF</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>1274</fpage>
            <lpage>1281</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btm087</pubid>
                  <pubid idtype="pmpid" link="fulltext">17344234</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Incorporation of <it>Drosophila </it>CID/CENP-A and CENP-C into centromeres during early embryonic anaphase.</p>
            </title>
            <aug>
               <au>
                  <snm>Schuh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lehner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Heidmann</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>237</fpage>
            <lpage>243</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cub.2006.11.051</pubid>
                  <pubid idtype="pmpid" link="fulltext">17222555</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>The mutant crossveinless in <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Bridges</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1920</pubdate>
            <volume>6</volume>
            <fpage>660</fpage>
            <lpage>663</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1084670</pubid>
                  <pubid idtype="pmpid">16576553</pubid>
                  <pubid idtype="doi">10.1073/pnas.6.11.660</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B76">
            <title>
               <p>The <it>crossveinless </it>gene encodes a new member of the Twisted gastrulation family of BMP-binding proteins which, with Short gastrulation, promotes BMP signaling in the crossveins of the <it>Drosophila </it>wing.</p>
            </title>
            <aug>
               <au>
                  <snm>Shimmi</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Ralston</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Blair</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>O'Connor</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2005</pubdate>
            <volume>282</volume>
            <fpage>70</fpage>
            <lpage>83</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ydbio.2005.02.029</pubid>
                  <pubid idtype="pmpid" link="fulltext">15936330</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>Microarray data analysis: from disarray to consolidation and consensus.</p>
            </title>
            <aug>
               <au>
                  <snm>Allison</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Cui</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Page</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sabripour</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>55</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1749</pubid>
                  <pubid idtype="pmpid" link="fulltext">16369572</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B78">
            <title>
               <p>Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.</p>
            </title>
            <aug>
               <au>
                  <snm>Huang</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sherman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lempicki</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2009</pubdate>
            <volume>37</volume>
            <fpage>1</fpage>
            <lpage>13</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2615629</pubid>
                  <pubid idtype="pmpid" link="fulltext">19033363</pubid>
                  <pubid idtype="doi">10.1093/nar/gkn923</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B79">
            <title>
               <p>Nutritional control of protein biosynthetic capacity by insulin via Myc in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Teleman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hietakangas</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Sayadian</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Cell Metab</source>
            <pubdate>2008</pubdate>
            <volume>7</volume>
            <fpage>21</fpage>
            <lpage>32</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cmet.2007.11.010</pubid>
                  <pubid idtype="pmpid" link="fulltext">18177722</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B80">
            <title>
               <p>CrebA regulates secretory activity in the <it>Drosophila </it>salivary gland and epidermis.</p>
            </title>
            <aug>
               <au>
                  <snm>Abrams</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Andrew</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>2005</pubdate>
            <volume>132</volume>
            <fpage>2743</fpage>
            <lpage>2758</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15901661</pubid>
                  <pubid idtype="doi">10.1242/dev.01863</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B81">
            <title>
               <p>Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.</p>
            </title>
            <aug>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mootha</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Mukherjee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ebert</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gillette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Paulovich</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pomeroy</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mesirova</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>15545</fpage>
            <lpage>15550</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1239896</pubid>
                  <pubid idtype="pmpid" link="fulltext">16199517</pubid>
                  <pubid idtype="doi">10.1073/pnas.0506580102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B82">
            <title>
               <p>Repeatability of published microarray gene expression analyses.</p>
            </title>
            <aug>
               <au>
                  <snm>Ioannidis</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Allison</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Coulibaly</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Cui</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Culhane</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Falchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Furlanello</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Game</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Jurman</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mangion</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mehta</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nitzberg</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Page</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Petretto</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>van Noor</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2009</pubdate>
            <volume>41</volume>
            <fpage>149</fpage>
            <lpage>155</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng.295</pubid>
                  <pubid idtype="pmpid" link="fulltext">19174838</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B83">
            <title>
               <p>How reliable are experimental protein-protein interaction data?</p>
            </title>
            <aug>
               <au>
                  <snm>Sprinzak</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sattath</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Margalit</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>327</volume>
            <fpage>9919</fpage>
            <lpage>9923</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12662919</pubid>
                  <pubid idtype="doi">10.1016/S0022-2836(03)00239-0</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B84">
            <title>
               <p>Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function.</p>
            </title>
            <aug>
               <au>
                  <snm>L&#232;cuyer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Parthasarathy</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Alm</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Babak</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Cerovina</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tomancak</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2007</pubdate>
            <volume>131</volume>
            <fpage>174</fpage>
            <lpage>187</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2007.08.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">17923096</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B85">
            <title>
               <p>REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Halfon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gallo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bergman</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <volume>36</volume>
            <fpage>D594</fpage>
            <lpage>D598</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238825</pubid>
                  <pubid idtype="pmpid" link="fulltext">18039705</pubid>
                  <pubid idtype="doi">10.1093/nar/gkm876</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B86">
            <title>
               <p><it>Drosophila </it>DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Bergman</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1747</fpage>
            <lpage>1749</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti173</pubid>
                  <pubid idtype="pmpid" link="fulltext">15572468</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B87">
            <title>
               <p>FlyRNAi: the <it>Drosophila </it>RNAi screening center database.</p>
            </title>
            <aug>
               <au>
                  <snm>Flockhart</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Booker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kiger</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Boutros</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Armknecht</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ramadan</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Perrimon</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mathey-Prevot</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D489</fpage>
            <lpage>494</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347476</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381918</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj114</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B88">
            <title>
               <p>Predicting protein function by genomic context: quantitative evaluation and qualitative inferences.</p>
            </title>
            <aug>
               <au>
                  <snm>Huynen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lathe</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1204</fpage>
            <lpage>1210</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310926</pubid>
                  <pubid idtype="pmpid" link="fulltext">10958638</pubid>
                  <pubid idtype="doi">10.1101/gr.10.8.1204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B89">
            <title>
               <p>Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.</p>
            </title>
            <aug>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>4285</fpage>
            <lpage>4288</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">16324</pubid>
                  <pubid idtype="pmpid" link="fulltext">10200254</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.8.4285</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B90">
            <title>
               <p>Prolinks: a database of protein functional linkages derived from coevolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Bowers</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fierro</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R35</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">416471</pubid>
                  <pubid idtype="pmpid" link="fulltext">15128449</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-5-r35</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B91">
            <title>
               <p>Tissue-specific gene expression and ecdysone-regulated genomic networks in <it>Drosophila</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>KP</fnm>
               </au>
            </aug>
            <source>Dev Cell</source>
            <pubdate>2003</pubdate>
            <volume>5</volume>
            <fpage>59</fpage>
            <lpage>72</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1534-5807(03)00192-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">12852852</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B92">
            <title>
               <p>Paucity of genes on the <it>Drosophila </it>X chromosome showing male-biased expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Parisi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nuttall</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Naiman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bouffard</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Malley</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Eastman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>299</volume>
            <fpage>697</fpage>
            <lpage>700</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1363366</pubid>
                  <pubid idtype="pmpid" link="fulltext">12511656</pubid>
                  <pubid idtype="doi">10.1126/science.1079190</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B93">
            <title>
               <p>An integrated strategy for analyzing the unique developmental programs of different myoblast subtypes.</p>
            </title>
            <aug>
               <au>
                  <snm>Estrada</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Choe</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Gisselbrecht</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Michaud</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Raj</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Busser</snm>
                  <fnm>BW</fnm>
               </au>
               <au>
                  <snm>Halfon</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Michelson</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e16</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1366495</pubid>
                  <pubid idtype="pmpid" link="fulltext">16482229</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0020016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B94">
            <title>
               <p>Global analyses of mRNA translational control during early <it>Drosophila </it>embryogenesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Qin</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Ahn</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>R63</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1896012</pubid>
                  <pubid idtype="pmpid" link="fulltext">17448252</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-4-r63</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B95">
            <title>
               <p>The genomic response to 20-hydroxyecdysone at the onset of <it>Drosophila </it>metamorphosis.</p>
            </title>
            <aug>
               <au>
                  <snm>Beckstead</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Thummel</snm>
                  <fnm>CS</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R99</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1414087</pubid>
                  <pubid idtype="pmpid" link="fulltext">16356271</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-12-r99</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B96">
            <title>
               <p>Full genome gene expression analysis of the heat stress response in <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Sorensen</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Kruhoffer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Justesen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Loeschcke</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Cell Stress Chaperones</source>
            <pubdate>2005</pubdate>
            <volume>10</volume>
            <fpage>312</fpage>
            <lpage>328</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid">16333985</pubid>
                  <pubid idtype="doi">10.1379/CSC-128R1.1</pubid>
                  <pubid idtype="pmcid">1283958</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B97">
            <title>
               <p>An integrated gene annotation and transcriptional profiling approach towards the full gene content of the <it>Drosophila </it>genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Hild</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Beckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Haas</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Koch</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Solovyev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Busold</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fellenberg</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Boutros</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sauer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hoheisel</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Paro</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>5</volume>
            <fpage>R3</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395735</pubid>
                  <pubid idtype="pmpid" link="fulltext">14709175</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-5-1-r3</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B98">
            <title>
               <p>Expression profiling of glial genes during <it>Drosophila </it>embryogenesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Altenhein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Busold</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Beckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hoheisel</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Technau</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2006</pubdate>
            <volume>296</volume>
            <fpage>545</fpage>
            <lpage>560</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ydbio.2006.04.460</pubid>
                  <pubid idtype="pmpid" link="fulltext">16762338</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B99">
            <title>
               <p>Quantitative genomics of aggressive behavior in <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Edwards</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Rollmann</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Morgan</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Mackay</snm>
                  <fnm>TF</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e154</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1564424</pubid>
                  <pubid idtype="pmpid" link="fulltext">17044737</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0020154</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B100">
            <title>
               <p>Identification of tightly regulated groups of genes during <it>Drosophila melanogaster </it>embryogenesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Hooper</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Bou&#233;</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Mason</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Ghanim</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Furlong</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Mol Syst Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <fpage>72</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17224916</pubid>
                  <pubid idtype="pmcid">1800352</pubid>
                  <pubid idtype="doi">10.1038/msb4100112</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B101">
            <title>
               <p>A temporal map of transcription factor activity: <it>mef2 </it>directly regulates target genes at all stages of muscle development.</p>
            </title>
            <aug>
               <au>
                  <snm>Sandmann</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Jakobsen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Karzynski</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Eichenlaub</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Furlong</snm>
                  <fnm>EE</fnm>
               </au>
            </aug>
            <source>Dev Cell</source>
            <pubdate>2006</pubdate>
            <volume>10</volume>
            <fpage>797</fpage>
            <lpage>807</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.devcel.2006.04.009</pubid>
                  <pubid idtype="pmpid" link="fulltext">16740481</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B102">
            <title>
               <p>Using FlyAtlas to identify better <it>Drosophila melanogaster </it>models of human disease.</p>
            </title>
            <aug>
               <au>
                  <snm>Chintapalli</snm>
                  <fnm>VR</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dow</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2007</pubdate>
            <volume>39</volume>
            <fpage>715</fpage>
            <lpage>720</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng2049</pubid>
                  <pubid idtype="pmpid" link="fulltext">17534367</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B103">
            <title>
               <p>Systematic determination of patterns of gene expression during <it>Drosophila </it>embryogenesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Tomancak</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Beaton</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Weiszmann</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kwan</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Shu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hartenstein</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0088</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151190</pubid>
                  <pubid idtype="pmpid" link="fulltext">12537577</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-12-research0088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B104">
            <title>
               <p>The R Project for Statistical Computing</p>
            </title>
            <url>http://www.r-project.org/</url>
         </bibl>
         <bibl id="B105">
            <title>
               <p>OLIN: optimized normalization, visualization and quality testing of two-channel microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Futschik</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Crompton</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1724</fpage>
            <lpage>1726</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti199</pubid>
                  <pubid idtype="pmpid" link="fulltext">15585527</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B106">
            <aug>
               <au>
                  <cnm>Affymetrix</cnm>
               </au>
            </aug>
            <source>Affymetrix Microarray Suite User's Guide. Version 5.0</source>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B107">
            <title>
               <p>A model-based background adjustment for oligonucleotide expression arrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Irizarry</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gentleman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Martinez-Murillo</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Spencer</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>J Am Stat Assoc</source>
            <pubdate>2001</pubdate>
            <volume>99</volume>
            <fpage>909</fpage>
            <lpage>917</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1198/016214504000000683</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B108">
            <title>
               <p>Affymetrix <it>Drosophila </it>Platform Files</p>
            </title>
            <url>http://www.affymetrix.com/support/technical/byproduct.affx?cat=arrays</url>
         </bibl>
         <bibl id="B109">
            <title>
               <p>Missing value estimation methods for DNA microarrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Troyanskaya</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Cantor</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>520</fpage>
            <lpage>525</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.6.520</pubid>
                  <pubid idtype="pmpid" link="fulltext">11395428</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B110">
            <title>
               <p>Semantic similarity measures as tools for exploring the Gene Ontology.</p>
            </title>
            <aug>
               <au>
                  <snm>Lord</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Brass</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Goble</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2003</pubdate>
            <fpage>601</fpage>
            <lpage>612</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12603061</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B111">
            <title>
               <p>Cluster analysis and display of genome-wide expression patterns.</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>14863</fpage>
            <lpage>14868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">24541</pubid>
                  <pubid idtype="pmpid" link="fulltext">9843981</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.25.14863</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B112">
            <title>
               <p>TM4: a free, open-source system for microarray data management and analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Saeed</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sharov</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Bhagabati</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Braisted</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Klapa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Currier</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Thiagarajan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sturn</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Snuffin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rezantsev</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Popov</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ryltsov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kostukovich</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Borisovsky</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Vinsavich</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Trush</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Biotechniques</source>
            <pubdate>2003</pubdate>
            <volume>34</volume>
            <fpage>374</fpage>
            <lpage>378</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12613259</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B113">
            <title>
               <p>KEGG <it>Drosophila </it>Download FTP Directory</p>
            </title>
            <url>ftp://ftp.genome.jp/pub/kegg/pathway/organisms/dme/</url>
         </bibl>
         <bibl id="B114">
            <title>
               <p>ArrayExpress</p>
            </title>
            <url>http://www.ebi.ac.uk/microarray-as/ae/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
