<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2008-9-1-r22</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Method</dochead>
      <bibl>
         <title>
            <p>Computational discovery of <it>cis</it>-regulatory modules in <it>Drosophila </it>without prior knowledge of motifs</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Ivan</snm>
               <fnm>Andra</fnm>
               <insr iid="I1"/>
               <email>andrucu@gmail.com</email>
            </au>
            <au id="A2">
               <snm>Halfon</snm>
               <mi>S</mi>
               <fnm>Marc</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>mshalfon@buffalo.edu</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Sinha</snm>
               <fnm>Saurabh</fnm>
               <insr iid="I1"/>
               <email>sinhas@cs.uiuc.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, N. Goodwin Ave, Urbana, IL 61801, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biochemistry, State University of New York at Buffalo, Main St, Buffalo, NY 14214, USA</p>
            </ins>
            <ins id="I3">
               <p>New York State Center of Excellence in Bioinformatics and the Life Sciences, Ellicott St, Buffalo, NY 14203, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>R22</fpage>
         <url>http://genomebiology.com/2008/9/1/R22</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18226245</pubid>
               <pubid idtype="doi">10.1186/gb-2008-9-1-r22</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>12</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>18</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>28</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>28</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Ivan et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Discovery of <it>cis</it>-regulatory modules</p>
      </shorttitle>
      <shortabs>
         <p>Prediction of <it>cis</it>-regulatory modules <it>ab initio</it>, without any input of relevant motifs, is achieved with two novel methods.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>We consider the problem of predicting <it>cis</it>-regulatory modules without knowledge of motifs. We formulate this problem in a pragmatic setting, and create over 30 new data sets, using <it>Drosophila </it>modules, to use as a 'benchmark'. We propose two new methods for the problem, and evaluate these, as well as two existing methods, on our benchmark. We find that the challenge of predicting <it>cis</it>-regulatory modules <it>ab initio</it>, without any input of relevant motifs, is a realizable goal.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010013">Methods</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Understanding the richness and complexity of the transcriptional network underlying the early stages of fruitfly development is a success story of developmental molecular biology. It is also an inspiration for bioinformaticians working on sequence analysis. This transcriptional regulatory network is implemented through '<it>cis</it>-regulatory modules' (CRMs), which are approximately 500-1,000 bp long sequences in the vicinity of genes harboring one to many binding sites for multiple transcription factors. These CRMs serve to mediate the activating and repressing action of the different transcription factors, and enforce the complex expression pattern of the adjacent gene. Discovery and analysis of CRMs is, therefore, a crucial step in understanding gene regulatory networks in the fruitfly and, more generally, in metazoans.</p>
         <p>Starting with early advances <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>, a host of computational approaches to discover CRMs in a genome have been proposed recently <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. These methods typically rely on prior characterization of the binding affinities ('motifs') of the relevant transcription factors. For instance, one may search for CRMs involved in anterior-posterior segmentation of the embryo, if one knows the five to ten key transcription factors orchestrating this process, as well as their binding site motifs. However, the more common scenario, arising whenever one explores a relatively uncharted regulatory network, is that the relevant transcription factors and their motifs are unknown. The usual strategy of looking for clusters of (putative) binding sites is inapplicable, because we do not have a way to predict the binding sites in the first place. We explore here this more common version of the CRM prediction problem, where the relevant motifs are unknown.</p>
         <p>Clearly, the new problem is less tractable than its traditional version with known motifs, and the 'genome-wide scan' approach of programs like Cis-analyst <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, Ahab <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, Stubb <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, or Cluster-Buster <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> seems infeasible. We therefore investigate a special variant of the problem, where the entire genome is not scanned; rather, the regions around a small set of genes are searched. To define this problem variant, we need to understand the notion of a 'gene battery'. This term was used by Britten and Davidson <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> to refer to a group of genes that are coordinately expressed because their regulatory regions respond to the same transcription factor inputs (also see <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>.) In molecular terms, a gene battery is a group of genes that are regulated by CRMs containing similar transcription factor binding sites. The CRMs associated with genes in a battery are usually not identical in terms of either number or arrangement of binding sites, nor do they harbor sites for exactly the same set of transcription factors. Nevertheless, these CRMs share some level of similarity in terms of the collection of binding sites present within, and this similarity may be the basis for their computational discovery <it>ab initio</it>. This gives us the crucial insight to attempt CRM prediction in the absence of motifs. The gene battery CRM discovery problem is defined as: given a gene battery, and the 'control regions' of each gene, find in these control regions the CRMs that coordinate the expression of genes in the battery.</p>
         <p>Here, the control region of a gene is the candidate sequence in which we must search for a gene's CRMs. A possible definition of a gene's control region may be 'the 10 Kbp sequence upstream of the gene', since CRMs are often found to be located in these regions. A more inclusive definition might be 'the 10 Kbp upstream and downstream sequences, and introns'. Under the new definition of the CRM discovery problem, we do not search the entire genome with known motifs; instead, we harness our prior knowledge about gene co-expression to narrow down the search space to the control regions of a gene battery.</p>
         <p>It is clear that the gene battery CRM discovery problem is a highly practical problem with immense applicability in genomic biology. It is very common that a biologist has microarray data providing information on co-expressed clusters of genes. Such gene sets may be treated as a gene battery, and the scientist may wish to find out how they are regulated. This is a classic example of the gene battery CRM discovery problem. Whole-mount <it>in situ </it>hybridization data <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> comprise another source for defining potential gene batteries. For instance, a biologist interested in <it>Drosophila </it>dorsal-ventral axis specification may take a set of genes whose <it>in situ </it>images show dorsal-ventral expression patterns in the embryo, treat these genes as a gene battery, and proceed to identify the CRMs that regulate the gene battery. Once the CRMs have been identified, more detailed analysis of the modules may be conducted through binding site analysis and computational motif discovery, or direct experimental tests of the expression pattern driven by them, for example, through reporter gene assays <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>Outline</p>
            </st>
            <p>This paper is a comprehensive investigation into the gene battery CRM discovery problem. We ask several questions related to this problem, assuming that the relevant motifs are unknown. What are the data sets available for testing solutions to this problem? How do we evaluate the performance of any given algorithm on a given data set? What are the existing computational methods to solve the problem? Can we design new algorithms to solve this problem? How do the existing and new algorithms perform on the data sets?</p>
            <p>In a previous study <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, we explored CRM properties and found that CRMs belonging to different gene batteries can have distinct characteristics. Our data indicated that several existing approaches to computational CRM discovery would be effective only for finding CRMs of certain subtypes, suggesting that CRM discovery methods need to be evaluated on a diverse selection of data sets. We show here how to use the REDfly database <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> to construct useful data sets for this purpose and present a 'benchmark' collection of 33 such data sets, marking a great leap (of coverage) from the currently available 2-3 data sets. We define normalized measures to evaluate the performance of any CRM prediction method. We identify and evaluate existing approaches for the problem, such as the 'CisModule' program of Zhou and Wong <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, and the Markov chain-based approach of Grad <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. We then propose and assess two novel algorithms for the problem, based on statistical properties of CRMs that we have reported in previous work <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B17">17</abbr></abbrgrp>. The hallmark of each of these algorithms is that CRM prediction does not depend on accurate motif discovery, which is a notoriously difficult problem <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. This marks a clear departure from previous methods like CisModule and EMCModule <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, where motif-finding and CRM discovery are tightly coupled. We find that our two new methods achieve significant accuracy on a majority of the benchmark data sets, despite not using any input motifs. This gives us the first clear indications that <it>ab initio </it>CRM prediction may be a realizable goal in several gene batteries, beyond the two or three widely studied examples (<it>Drosophila </it>segmentation <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> and human muscle-specific <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> or liver-specific <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> enhancers), where motifs were either known <it>a priori </it>or relatively easy to discover.</p>
            <p>Our work opens up a new line of research by clearly focusing on a practical version of the CRM discovery problem, creating extensive benchmarks for it, and providing effective strategies and novel insights for attacking the problem.</p>
         </sec>
         <sec>
            <st>
               <p>Related work</p>
            </st>
            <p>The literature on computational CRM discovery is dominated by algorithms that require well-characterized motifs <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. One such example is our previously published algorithm, called 'Stubb' <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, which uses a probabilistic model parameterized by the given motifs to predict CRMs in a genome-wide scan. However, there are very few prior studies on the problem in the absence of motif information. Not surprisingly, each of these studies, discussed below, is designed for the 'gene battery CRM discovery problem', rather than genome-wide search.</p>
            <p>To our knowledge, one of the first attempts to solve the gene battery CRM discovery problem was made by Grad <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Their 'PFRSearcher' program used Gibbs sampling to find CRMs in control regions of <it>Drosophila </it>segmentation genes. However, no other gene batteries were tested in that work, making it unclear if the approach is generalizable. (Our previous work <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> found that this gene battery has CRMs with unique sequence characteristics that may not be representative of CRMs in other gene batteries.) Also, the PFRSearcher method relied crucially on inter-species comparison. Another algorithm to leverage evolutionary comparisons for CRM prediction (without motif knowledge) is called 'CisPlusFinder', developed by Pierstoff <it>et al</it>. <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. More recently, Sosinsky <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> have proposed a method that uses pattern discovery from seven <it>Drosophila </it>genomes to predict CRMs genome-wide, followed by validation on a data set of blastoderm segmentation-related CRMs. The method development and assessment in our work is exclusively based on a single genome. We recognize the potential of evolutionary information for CRM discovery, but this being a complex, phylogeny-dependent issue, we leave it for future research.</p>
            <p>A model-based approach to CRM discovery (without motif knowledge) has been espoused by Zhou and Wong <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, whose CisModule program learns the motifs and the CRMs simultaneously from the data. The underlying idea is that spatial clustering of binding sites in a CRM should aid motif discovery, and that motif discovery should aid CRM prediction. Hence, both steps are performed in a combined probabilistic framework. The EMCModule program of Gupta and Liu <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> is similar; however, it begins with a generously large set of motifs (from a motif database or a separate motif-finding program), and learns which ones are relevant to the gene battery, and where the CRMs are located. Both these methods (CisModule and EMCModule) intertwine the motif discovery and CRM discovery tasks together. These programs have been shown to discover functional motifs and binding sites related to <it>Drosophila </it>segmentation, but were not tested for discovery of entire (experimentally delineated) CRMs. Also, the tests were performed on the two to three popular data sets available then and, hence, did not provide a comprehensive evaluation. The Gibbs Module Finder program of Thompson <it>et al</it>. <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> is another model-based approach in this genre. However, this work uses the term '<it>cis</it>-regulatory module' in a different manner, that is, to mean any region with at least two binding sites with a spacing of less than 100 bp. This definition is rather distinct from our semantics of a CRM, which is based on the expression pattern driven by the CRM rather than its binding site architecture. The Gibbs Module Finder was tested on a single gene battery (human skeletal muscle genes), and shown to find known binding sites and pairs thereof. This does not automatically imply its applicability to our problem setting.</p>
            <p>There is another variant of the CRM discovery problem, which we do not address here. This is the 'supervised learning' approach of Chan and Kibler <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> or Nazina and Papatsenko <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> (also explored by Grad <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>), where a set of known CRMs is available as 'training data'. These programs use such known CRMs to train their parameters before predicting new CRMs in any test sequences.</p>
            <p>In summary, the gene battery CRM prediction problem is a relatively less studied, yet highly practical formulation of computational CRM discovery. There exist only a handful of methods, outlined above, that may be applied to this problem, but no such method has been tested on a large collection of data sets. The model-based approaches that have been proposed previously have focused on prediction of binding sites (and motifs), and have used the notion of CRMs as an aid to this discovery process. Here, our objective is to predict the CRMs themselves rather than their constituent binding sites or motifs.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Benchmarks for the gene battery CRM discovery problem</p>
            </st>
            <p>We first describe a classic example of this problem. In <it>Drosophila</it>, meticulous experimentation has led to a rich collection of CRMs involved in the gene battery for anterior-posterior segmentation of the blastoderm stage embryo <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. We refer to this set of approximately 50 CRMs as the <smcaps>blastoderm</smcaps> set of CRMs. All CRMs in this set drive some pattern of gene expression along the anterior-posterior axis, at the blastoderm stage of development. Their target genes, and respective control regions, make for a natural data set to evaluate CRM prediction methods. Indeed, the <smcaps>blastoderm</smcaps> set has been extensively used as a 'benchmark' in the past <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Here, our goal was to create several new benchmarks similar to this classic example.</p>
            <p>The REDfly database <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> is an up-to-date, comprehensive collection of experimentally verified CRMs in <it>Drosophila </it>mediating regulation in a broad spectrum of gene batteries. The database also records the gene expression pattern driven by each CRM. We grouped REDfly CRMs based on common gene expression annotation, and took their target genes to be a gene battery. The natural way to construct a data set is to take the control regions of each of these genes. However, this choice makes the task of evaluating CRM predictions complicated, for the following reasons.</p>
            <p>It has been widely observed, especially in the context of the <smcaps>blastoderm</smcaps> set of modules, that a control region may have multiple CRMs. In general, some of these may be unknown. Therefore, we will not know for sure if predictions that do not coincide with the known CRMs are true or false positives.</p>
            <p>If multiple known CRMs lie in the same control region, the prediction task is more demanding than when each control region has exactly one CRM. The predictor has to have the additional ability to decide if there are one or more CRMs in any particular input sequence. In our first take on the problem, we wish to circumvent including this ability in our assessment, in order to simplify the evaluation.</p>
            <p>Using the native control regions of the gene battery allows us less control on the 'difficulty level' of a data set. Some control regions will have a substantially greater ratio of signal (CRM positions) to noise (non-CRM position) compared to other sequences. While this is indeed a fact of real genomes, in this initial evaluation we want to have data sets where every input sequence has the same 'signal-to-noise' ratio.</p>
            <p>We address the above issues in our design of data sets. Once the set of CRMs (with common expression annotation) have been decided, we plant each CRM in a carefully chosen artificial 'control region', built from the genome itself. This control region is constructed from the non-coding part of the <it>D. melanogaster </it>genome, and is required to have G/C content similar to the native context of the CRM. By constructing data sets in this manner, we minimize the chances of uncharacterized CRMs influencing the false positive estimation. The non-native control region still has the odd chance of containing an uncharacterized CRM, but it is extremely unlikely that such a CRM will be in the same gene battery as the planted CRMs of the data set. We create one control region for each CRM, requiring each control region (with CRM planted within) to be of a length ten times the length of the CRM. These choices were dictated by our need to 'standardize' the difficulty of the benchmark data sets, as discussed above. Given that a typical CRM has a length of approximately 500-1,000 bp, and a typical control region is 5-10 Kbp long, a 1:10 ratio of CRM length to total length seems realistic.</p>
            <p>We obtained 33 data sets in this manner, with 4-77 sequences (an average of 16) in a data set, and where the CRM lengths range from 83 bp to 2,013 bp. Details of these data sets are presented in Table <tblr tid="T1">1</tblr>. The entire collection of data sets is available in Additional data file 1. Note that each data set name is prefixed by a 'mapping number', which we explain now. Data sets were constructed using the expression pattern information provided in REDfly, by grouping CRMs with similar tissue specificity. Different mappings represent different levels of tissue specificity, and correspond to Figures S1-1b, S1-1c, and S1-2 in Li <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. 'Mapping3' represents the highest level clustering of CRMs, such as 'adult' or 'larva'. On the other hand, 'mapping1', represents the lowest level of tissue specificity, such as 'ventral ectoderm' or 'cardiac mesoderm'. 'Mapping2' is an intermediate level of specificity. Thus, for example, 'mapping2.mesoderm' includes all CRMs that regulate gene expression in the mesoderm, whereas in mapping1 these CRMs are divided between 'adult mesoderm', 'cardiac mesoderm', 'larval mesoderm', 'somatic mesoderm' and 'visceral mesoderm'. Mappings at different levels may refer to the same tissue (for example, mapping1.mesoderm and mapping2.mesoderm), in which case the mapping with the higher numbering refers to a more inclusive definition of specificity to that tissue. We also note that data sets defined by us are potentially non-exclusive, that is, the same CRM can belong to more than one data set. This is possible if the CRM regulates expression in more than one tissue, or if one data set is subsumed by another data set at a higher level mapping.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Statistics for the data sets in our benchmark</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Name</p>
                     </c>
                     <c ca="center">
                        <p>Number of CRMs</p>
                     </c>
                     <c ca="center">
                        <p>Minimum CRM length</p>
                     </c>
                     <c ca="center">
                        <p>Maximum CRM length</p>
                     </c>
                     <c ca="center">
                        <p>Average CRM length</p>
                     </c>
                     <c ca="center">
                        <p>Total CRM length (Kbp)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping3.adult</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>2,013</p>
                     </c>
                     <c ca="center">
                        <p>748</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.adult mesoderm</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>126</p>
                     </c>
                     <c ca="center">
                        <p>927</p>
                     </c>
                     <c ca="center">
                        <p>561</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.amnioserosa</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>469</p>
                     </c>
                     <c ca="center">
                        <p>1,500</p>
                     </c>
                     <c ca="center">
                        <p>708</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.blastoderm</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                     <c ca="center">
                        <p>126</p>
                     </c>
                     <c ca="center">
                        <p>1,833</p>
                     </c>
                     <c ca="center">
                        <p>906</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.cardiac mesoderm</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>237</p>
                     </c>
                     <c ca="center">
                        <p>1,513</p>
                     </c>
                     <c ca="center">
                        <p>536</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.cns</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>304</p>
                     </c>
                     <c ca="center">
                        <p>1,986</p>
                     </c>
                     <c ca="center">
                        <p>1,034</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.dorsal ectoderm</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>267</p>
                     </c>
                     <c ca="center">
                        <p>1,657</p>
                     </c>
                     <c ca="center">
                        <p>842</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.ectoderm</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>105</p>
                     </c>
                     <c ca="center">
                        <p>2,015</p>
                     </c>
                     <c ca="center">
                        <p>839</p>
                     </c>
                     <c ca="center">
                        <p>31</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping2.ectoderm</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>105</p>
                     </c>
                     <c ca="center">
                        <p>2,015</p>
                     </c>
                     <c ca="center">
                        <p>815</p>
                     </c>
                     <c ca="center">
                        <p>41</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.endoderm</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>220</p>
                     </c>
                     <c ca="center">
                        <p>1,373</p>
                     </c>
                     <c ca="center">
                        <p>579</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.eye</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>187</p>
                     </c>
                     <c ca="center">
                        <p>1,930</p>
                     </c>
                     <c ca="center">
                        <p>824</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping2.eye</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>187</p>
                     </c>
                     <c ca="center">
                        <p>2,015</p>
                     </c>
                     <c ca="center">
                        <p>868</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.fat body</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>375</p>
                     </c>
                     <c ca="center">
                        <p>529</p>
                     </c>
                     <c ca="center">
                        <p>456</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.female gonad</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>1,657</p>
                     </c>
                     <c ca="center">
                        <p>442</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.glia</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>515</p>
                     </c>
                     <c ca="center">
                        <p>1,890</p>
                     </c>
                     <c ca="center">
                        <p>899</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.imaginal disc</p>
                     </c>
                     <c ca="center">
                        <p>47</p>
                     </c>
                     <c ca="center">
                        <p>177</p>
                     </c>
                     <c ca="center">
                        <p>2,015</p>
                     </c>
                     <c ca="center">
                        <p>938</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping2.imaginal disc</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>490</p>
                     </c>
                     <c ca="center">
                        <p>2,015</p>
                     </c>
                     <c ca="center">
                        <p>1,248</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping3.larva</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                     <c ca="center">
                        <p>176</p>
                     </c>
                     <c ca="center">
                        <p>2,015</p>
                     </c>
                     <c ca="center">
                        <p>892</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.male gonad</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>200</p>
                     </c>
                     <c ca="center">
                        <p>1,319</p>
                     </c>
                     <c ca="center">
                        <p>862</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.malpighian tubules</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>540</p>
                     </c>
                     <c ca="center">
                        <p>1,373</p>
                     </c>
                     <c ca="center">
                        <p>782</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.mesectoderm</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>601</p>
                     </c>
                     <c ca="center">
                        <p>1,415</p>
                     </c>
                     <c ca="center">
                        <p>913</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.mesoderm</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>105</p>
                     </c>
                     <c ca="center">
                        <p>1,415</p>
                     </c>
                     <c ca="center">
                        <p>544</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping2.mesoderm</p>
                     </c>
                     <c ca="center">
                        <p>45</p>
                     </c>
                     <c ca="center">
                        <p>105</p>
                     </c>
                     <c ca="center">
                        <p>1,513</p>
                     </c>
                     <c ca="center">
                        <p>518</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.neuroectoderm</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>343</p>
                     </c>
                     <c ca="center">
                        <p>1,360</p>
                     </c>
                     <c ca="center">
                        <p>575</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping2.neuronal</p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>177</p>
                     </c>
                     <c ca="center">
                        <p>2,013</p>
                     </c>
                     <c ca="center">
                        <p>988</p>
                     </c>
                     <c ca="center">
                        <p>53</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.pns</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>177</p>
                     </c>
                     <c ca="center">
                        <p>2,013</p>
                     </c>
                     <c ca="center">
                        <p>976</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping2.reproductive system</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>1,801</p>
                     </c>
                     <c ca="center">
                        <p>734</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.salivary gland</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>295</p>
                     </c>
                     <c ca="center">
                        <p>1,890</p>
                     </c>
                     <c ca="center">
                        <p>786</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.somatic muscle</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>312</p>
                     </c>
                     <c ca="center">
                        <p>1,513</p>
                     </c>
                     <c ca="center">
                        <p>718</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.tracheal system</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>515</p>
                     </c>
                     <c ca="center">
                        <p>2,015</p>
                     </c>
                     <c ca="center">
                        <p>1,236</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.ventral ectoderm</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>343</p>
                     </c>
                     <c ca="center">
                        <p>1,657</p>
                     </c>
                     <c ca="center">
                        <p>700</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.visceral mesoderm</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>183</p>
                     </c>
                     <c ca="center">
                        <p>1,104</p>
                     </c>
                     <c ca="center">
                        <p>451</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping2.wing</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>177</p>
                     </c>
                     <c ca="center">
                        <p>2,015</p>
                     </c>
                     <c ca="center">
                        <p>1,029</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Each control region is ten times the CRM length.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Performance evaluation</p>
            </st>
            <p>Each data set consists of a set of control regions, with a single CRM located within each control region. In evaluating any module prediction algorithm, we require it to predict one CRM per input sequence, and that each predicted module be of the same length (for reasons explained below). This length, calculated as the mean of the known CRM lengths in the data set, is given as input to the prediction tool. Most tools evaluated here conform to these requirements, with the exception of CisModule. This program can predict multiple, variable-length CRMs per sequence, and its output is post-processed (as described in Materials and methods) to meet our requirements.</p>
            <p>For each data set, we have a set of positions (<it>I</it><sub><it>k</it></sub>) known to be CRM positions, and a set of positions (<it>I</it><sub><it>p</it></sub>) predicted by a method. We may compute the positive predictive value <smcaps>ppv</smcaps> (or precision) and sensitivity <smcaps>sens</smcaps> (or recall) as per the following formulas:</p>
            <p>
               <display-formula id="M1">
                  <m:math name="gb-2008-9-1-r22-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>p</m:mi>
                           <m:mi>p</m:mi>
                           <m:mi>v</m:mi>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mo>|</m:mo>
                                 <m:msub>
                                    <m:mi>I</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mo>&#8745;</m:mo>
                                 <m:msub>
                                    <m:mi>I</m:mi>
                                    <m:mi>p</m:mi>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mo>|</m:mo>
                                 <m:msub>
                                    <m:mi>I</m:mi>
                                    <m:mi>p</m:mi>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                              </m:mrow>
                           </m:mfrac>
                           <m:mi>s</m:mi>
                           <m:mi>e</m:mi>
                           <m:mi>n</m:mi>
                           <m:mi>s</m:mi>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mo>|</m:mo>
                                 <m:msub>
                                    <m:mi>I</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mo>&#8745;</m:mo>
                                 <m:msub>
                                    <m:mi>I</m:mi>
                                    <m:mi>p</m:mi>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mo>|</m:mo>
                                 <m:msub>
                                    <m:mi>I</m:mi>
                                    <m:mi>k</m:mi>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8GiVeY=Pipec8Eeeu0xXdbba9frFj0xb9Lqpepeea0xd9q8qiYRWxGi6xij=hbbc9s8aq0=yqpe0xbbG8A8frFve9Fve9Fj0dmeaabaqaciaacaGaaeqabaqabeGadaaakeaacaWGWbGaamiCaiaadAhacqGH9aqpjuaGdaWcaaqaaiaacYhacaWGjbWaaSbaaeaacaWGRbaabeaacqGHPiYXcaWGjbWaaSbaaeaacaWGWbaabeaacaGG8baabaGaaiiFaiaadMeadaWgaaqaaiaadchaaeqaaiaacYhaaaGccaWGZbGaamyzaiaad6gacaWGZbGaeyypa0tcfa4aaSaaaeaacaGG8bGaamysamaaBaaabaGaam4AaaqabaGaeyykICSaamysamaaBaaabaGaamiCaaqabaGaaiiFaaqaaiaacYhacaWGjbWaaSbaaeaacaWGRbaabeaacaGG8baaaaaa@50CC@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Note that by design of the experiments, we have |<it>I</it><sub><it>p</it></sub>| = |<it>I</it><sub><it>k</it></sub>|, making the precision and recall identical. This convenient scenario was the motivation behind choosing the mean CRM length as the window length input to the evaluated methods. It lets us avoid having to compare different methods that may outdo each other on one of these dimensions (precision or recall). In real-world applications, a program has to predict not only the locations of CRMs but also their lengths. However, here we chose not to test the ability to predict CRM lengths, by requiring each program to predict CRMs of a given length. This desired CRM length was made equal for all control regions, to mimic real applications where the true CRM lengths are not known <it>a priori</it>.</p>
            <p>In light of the above discussion, the sensitivity <smcaps>sens</smcaps> is used as the measure for performance in the rest of this paper. The sensitivity allows us to compare the performance of several methods on the same data set, but is not comparable across data sets. The expected sensitivity of a random prediction depends on several aspects of the data set, most notably its total length. Therefore, to normalize against this chance expectation, we compute an 'empirical <it>p</it>-value' of the sensitivity, as follows. We randomly select in each control region a window of the same length as the module prediction. The sensitivity of this random set of window locations is calculated, the process is repeated 100,000 times, and the empirical <it>p</it>-value is defined as the fraction of times that the sensitivity was greater than that observed for the actual predictions. We consider the predictions of any method to be significant if its sensitivity <it>p</it>-value is less than 0.05.</p>
            <sec>
               <st>
                  <p>Maximum sensitivity</p>
               </st>
               <p>We note that due to the way the evaluation is done, and because of the variable lengths of the true CRMs, a sensitivity of 100% is usually impossible to achieve. If the predicted CRM lengths are always of length equal to the mean CRM length, the modules longer than this mean length cannot be predicted entirely. Therefore, when reporting results on a data set, we also note the maximum sensitivity achievable on that data set. We point out that the sensitivity <it>p</it>-value automatically accounts for the fact that a 100% sensitivity is usually not achievable.</p>
            </sec>
            <sec>
               <st>
                  <p>CRM-level sensitivity</p>
               </st>
               <p>Apart from the nucleotide-level sensitivity, we also assess sensitivity at the CRM level, as follows. We declare a predicted module (in a control region) as a 'hit' if its overlap with the known module is at least half as long as the smaller of the two known and predicted modules. We then count the number (and percentage) of hits in a data set, and call it the 'CRM-level sensitivity'. This measure has an intuitive appeal, since partial identification of the module is often enough for follow-up experiments to refine upon. Also, some of the known CRMs are likely to be 'too long', that is, the true CRM is only a part of the annotated delineation <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. In such cases, even perfectly accurate predictions would earn less than 100% sensitivity at the nucleotide level. Considering the CRM-level sensitivity addresses this issue.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Existing methods and their performance</p>
            </st>
            <sec>
               <st>
                  <p>Stubb</p>
               </st>
               <p>We begin our evaluations with a program that uses the knowledge of motifs to scan for modules, since this is currently the standard approach to CRM discovery, and provides a useful reference point for programs that do not rely on known motifs. The Stubb program <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> takes a set of known position weight matrix (PWM) motifs and scans the input sequences in sliding windows of a fixed length. It scores each such window by its likelihood of being generated by a certain probabilistic model parameterized by the input PWMs. In our tests, the highest scoring window in each control region was considered as Stubb's prediction. As a preliminary test, we evaluated Stubb on the well-studied blastoderm data set (<smcaps>mapping1.blastoderm</smcaps>) of 77 CRMs, using a small set of 8 PWMs known to regulate this gene battery. We obtained a sensitivity of 46% (compared to a maximum achievable sensitivity of 77%), with <it>p</it>-value ~0. This is consistent with the expectation that knowledge of relevant motifs leads to high accuracy. We also point out that a sensitivity of 46%, though not phenomenal in its absolute value, is highly significant, and represents the state-of-the-art in motif-driven CRM prediction. Such predictions have been reported in the literature to lead to novel CRM discoveries <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
               <p>For the remaining data sets of our benchmark, we typically do not know the relevant motifs. Hence, in the full-scale evaluation on all data sets, Stubb was run with a large collection of 53 PWMs from the FlyREG database (see Additional data file 1 for a list of these PWMs). Most of these 53 motifs will be largely irrelevant to any particular data set, and may cause Stubb to predict biologically incoherent combinations of transcription factor binding sites as modules. The sensitivity of Stubb predictions and their empirical <it>p</it>-values are shown in Table <tblr tid="T2">2</tblr>. Stubb performed significantly well (<it>p</it>-value &#8804;0.05) on 12 of the 33 data sets. These results, from an approach where the relevant motifs are not known, but a modest collection of motifs is utilized, provide an interesting base line for other approaches, where no motif information is utilized.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>Performance of Stubb, D2Z-set, and CSam on 33 data sets in our benchmark</p>
                  </caption>
                  <tblbdy cols="10">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Stubb<sup>&#167;</sup></p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>D2Z-set<sup>&#167;</sup></p>
                        </c>
                        <c cspan="2" ca="center">
                           <p>CSam<sup>&#167;</sup></p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Data set</p>
                        </c>
                        <c ca="center">
                           <p>Sequence number*</p>
                        </c>
                        <c ca="center">
                           <p>Length<sup>&#8224;</sup></p>
                        </c>
                        <c ca="center">
                           <p>Maximum sensitivity<sup>&#8225;</sup></p>
                        </c>
                        <c ca="center">
                           <p><it>P</it>-value</p>
                        </c>
                        <c ca="center">
                           <p>Sensitivity</p>
                        </c>
                        <c ca="center">
                           <p><it>P</it>-value</p>
                        </c>
                        <c ca="center">
                           <p>Sensitivity</p>
                        </c>
                        <c ca="center">
                           <p><it>P</it>-value</p>
                        </c>
                        <c ca="center">
                           <p>Sensitivity</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING3.ADULT</p>
                        </c>
                        <c ca="center">
                           <p>34</p>
                        </c>
                        <c ca="center">
                           <p>254,800</p>
                        </c>
                        <c ca="center">
                           <p>0.71</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.20</p>
                        </c>
                        <c ca="center">
                           <p>0.72</p>
                        </c>
                        <c ca="center">
                           <p>0.07</p>
                        </c>
                        <c ca="center">
                           <p>0.15</p>
                        </c>
                        <c ca="center">
                           <p>0.13</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping1.adult mesoderm</p>
                        </c>
                        <c ca="center">
                           <p>5</p>
                        </c>
                        <c ca="center">
                           <p>28,085</p>
                        </c>
                        <c ca="center">
                           <p>0.76</p>
                        </c>
                        <c ca="center">
                           <p>0.51</p>
                        </c>
                        <c ca="center">
                           <p>0.05</p>
                        </c>
                        <c ca="center">
                           <p>0.11</p>
                        </c>
                        <c ca="center">
                           <p>0.22</p>
                        </c>
                        <c ca="center">
                           <p>0.51</p>
                        </c>
                        <c ca="center">
                           <p>0.05</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping1.amnioserosa</p>
                        </c>
                        <c ca="center">
                           <p>7</p>
                        </c>
                        <c ca="center">
                           <p>49,635</p>
                        </c>
                        <c ca="center">
                           <p>0.84</p>
                        </c>
                        <c ca="center">
                           <p>0.25</p>
                        </c>
                        <c ca="center">
                           <p>0.15</p>
                        </c>
                        <c ca="center">
                           <p>0.34</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                        <c ca="center">
                           <p>0.09</p>
                        </c>
                        <c ca="center">
                           <p>0.23</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.BLASTODERM</p>
                        </c>
                        <c ca="center">
                           <p>77</p>
                        </c>
                        <c ca="center">
                           <p>698,840</p>
                        </c>
                        <c ca="center">
                           <p>0.77</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.36</p>
                        </c>
                        <c ca="center">
                           <p>0.10</p>
                        </c>
                        <c ca="center">
                           <p>0.13</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.26</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.CARDIAC MESODERM</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c ca="center">
                           <p>42,979</p>
                        </c>
                        <c ca="center">
                           <p>0.76</p>
                        </c>
                        <c ca="center">
                           <p>0.08</p>
                        </c>
                        <c ca="center">
                           <p>0.22</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.03</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.28</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                        <c ca="center">
                           <p>0.19</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.CNS</p>
                        </c>
                        <c ca="center">
                           <p>34</p>
                        </c>
                        <c ca="center">
                           <p>352,108</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                        <c ca="center">
                           <p>0.48</p>
                        </c>
                        <c ca="center">
                           <p>0.10</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.20</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.02</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.18</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping1.dorsal ectoderm</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c ca="center">
                           <p>67,490</p>
                        </c>
                        <c ca="center">
                           <p>0.77</p>
                        </c>
                        <c ca="center">
                           <p>0.08</p>
                        </c>
                        <c ca="center">
                           <p>0.22</p>
                        </c>
                        <c ca="center">
                           <p>0.88</p>
                        </c>
                        <c ca="center">
                           <p>0.00</p>
                        </c>
                        <c ca="center">
                           <p>0.08</p>
                        </c>
                        <c ca="center">
                           <p>0.22</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.ECTODERM</p>
                        </c>
                        <c ca="center">
                           <p>37</p>
                        </c>
                        <c ca="center">
                           <p>311,000</p>
                        </c>
                        <c ca="center">
                           <p>0.72</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.20</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.20</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.21</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING2.ECTODERM</p>
                        </c>
                        <c ca="center">
                           <p>51</p>
                        </c>
                        <c ca="center">
                           <p>416,473</p>
                        </c>
                        <c ca="center">
                           <p>0.74</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.18</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.05</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.15</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.23</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.ENDODERM</p>
                        </c>
                        <c ca="center">
                           <p>16</p>
                        </c>
                        <c ca="center">
                           <p>92,723</p>
                        </c>
                        <c ca="center">
                           <p>0.82</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.24</p>
                        </c>
                        <c ca="center">
                           <p>0.31</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.26</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.EYE</p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c ca="center">
                           <p>49,494</p>
                        </c>
                        <c ca="center">
                           <p>0.70</p>
                        </c>
                        <c ca="center">
                           <p>1.00</p>
                        </c>
                        <c ca="center">
                           <p>0.00</p>
                        </c>
                        <c ca="center">
                           <p>0.48</p>
                        </c>
                        <c ca="center">
                           <p>0.08</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.02</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.32</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping2.eye</p>
                        </c>
                        <c ca="center">
                           <p>18</p>
                        </c>
                        <c ca="center">
                           <p>156,531</p>
                        </c>
                        <c ca="center">
                           <p>0.69</p>
                        </c>
                        <c ca="center">
                           <p>0.19</p>
                        </c>
                        <c ca="center">
                           <p>0.14</p>
                        </c>
                        <c ca="center">
                           <p>0.68</p>
                        </c>
                        <c ca="center">
                           <p>0.07</p>
                        </c>
                        <c ca="center">
                           <p>0.88</p>
                        </c>
                        <c ca="center">
                           <p>0.04</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping1.fat body</p>
                        </c>
                        <c ca="center">
                           <p>5</p>
                        </c>
                        <c ca="center">
                           <p>22,831</p>
                        </c>
                        <c ca="center">
                           <p>0.93</p>
                        </c>
                        <c ca="center">
                           <p>0.14</p>
                        </c>
                        <c ca="center">
                           <p>0.20</p>
                        </c>
                        <c ca="center">
                           <p>1.00</p>
                        </c>
                        <c ca="center">
                           <p>0.00</p>
                        </c>
                        <c ca="center">
                           <p>0.45</p>
                        </c>
                        <c ca="center">
                           <p>0.09</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.FEMALE GONAD</p>
                        </c>
                        <c ca="center">
                           <p>10</p>
                        </c>
                        <c ca="center">
                           <p>44,269</p>
                        </c>
                        <c ca="center">
                           <p>0.62</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.03</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.24</p>
                        </c>
                        <c ca="center">
                           <p>0.97</p>
                        </c>
                        <c ca="center">
                           <p>0.00</p>
                        </c>
                        <c ca="center">
                           <p>0.86</p>
                        </c>
                        <c ca="center">
                           <p>0.02</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping1.glia</p>
                        </c>
                        <c ca="center">
                           <p>7</p>
                        </c>
                        <c ca="center">
                           <p>63,008</p>
                        </c>
                        <c ca="center">
                           <p>0.82</p>
                        </c>
                        <c ca="center">
                           <p>0.49</p>
                        </c>
                        <c ca="center">
                           <p>0.09</p>
                        </c>
                        <c ca="center">
                           <p>0.16</p>
                        </c>
                        <c ca="center">
                           <p>0.19</p>
                        </c>
                        <c ca="center">
                           <p>0.21</p>
                        </c>
                        <c ca="center">
                           <p>0.17</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.IMAGINAL DISC</p>
                        </c>
                        <c ca="center">
                           <p>47</p>
                        </c>
                        <c ca="center">
                           <p>441,597</p>
                        </c>
                        <c ca="center">
                           <p>0.77</p>
                        </c>
                        <c ca="center">
                           <p>0.55</p>
                        </c>
                        <c ca="center">
                           <p>0.09</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.20</p>
                        </c>
                        <c ca="center">
                           <p>0.24</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping2.imaginal disc</p>
                        </c>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="center">
                           <p>149,915</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                        <c ca="center">
                           <p>0.57</p>
                        </c>
                        <c ca="center">
                           <p>0.08</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                        <c ca="center">
                           <p>0.18</p>
                        </c>
                        <c ca="center">
                           <p>0.33</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING3.LARVA</p>
                        </c>
                        <c ca="center">
                           <p>69</p>
                        </c>
                        <c ca="center">
                           <p>616,635</p>
                        </c>
                        <c ca="center">
                           <p>0.76</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.05</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.14</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.02</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.15</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.18</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping1.male gonad</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c ca="center">
                           <p>69,044</p>
                        </c>
                        <c ca="center">
                           <p>0.85</p>
                        </c>
                        <c ca="center">
                           <p>0.22</p>
                        </c>
                        <c ca="center">
                           <p>0.15</p>
                        </c>
                        <c ca="center">
                           <p>0.46</p>
                        </c>
                        <c ca="center">
                           <p>0.10</p>
                        </c>
                        <c ca="center">
                           <p>0.15</p>
                        </c>
                        <c ca="center">
                           <p>0.18</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping1.malpighian tubules</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c ca="center">
                           <p>31,338</p>
                        </c>
                        <c ca="center">
                           <p>0.81</p>
                        </c>
                        <c ca="center">
                           <p>0.10</p>
                        </c>
                        <c ca="center">
                           <p>0.25</p>
                        </c>
                        <c ca="center">
                           <p>1.00</p>
                        </c>
                        <c ca="center">
                           <p>0.00</p>
                        </c>
                        <c ca="center">
                           <p>0.30</p>
                        </c>
                        <c ca="center">
                           <p>0.16</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.MESECTODERM</p>
                        </c>
                        <c ca="center">
                           <p>5</p>
                        </c>
                        <c ca="center">
                           <p>45,712</p>
                        </c>
                        <c ca="center">
                           <p>0.83</p>
                        </c>
                        <c ca="center">
                           <p>0.18</p>
                        </c>
                        <c ca="center">
                           <p>0.20</p>
                        </c>
                        <c ca="center">
                           <p>0.43</p>
                        </c>
                        <c ca="center">
                           <p>0.10</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.46</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.MESODERM</p>
                        </c>
                        <c ca="center">
                           <p>16</p>
                        </c>
                        <c ca="center">
                           <p>87,140</p>
                        </c>
                        <c ca="center">
                           <p>0.72</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.02</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.21</p>
                        </c>
                        <c ca="center">
                           <p>0.09</p>
                        </c>
                        <c ca="center">
                           <p>0.17</p>
                        </c>
                        <c ca="center">
                           <p>0.22</p>
                        </c>
                        <c ca="center">
                           <p>0.13</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING2.MESODERM</p>
                        </c>
                        <c ca="center">
                           <p>45</p>
                        </c>
                        <c ca="center">
                           <p>233,441</p>
                        </c>
                        <c ca="center">
                           <p>0.75</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.22</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.20</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.02</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.16</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.NEUROECTODERM</p>
                        </c>
                        <c ca="center">
                           <p>7</p>
                        </c>
                        <c ca="center">
                           <p>40,315</p>
                        </c>
                        <c ca="center">
                           <p>0.80</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.34</p>
                        </c>
                        <c ca="center">
                           <p>1.00</p>
                        </c>
                        <c ca="center">
                           <p>0.00</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.51</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING2.NEURONAL</p>
                        </c>
                        <c ca="center">
                           <p>54</p>
                        </c>
                        <c ca="center">
                           <p>534,081</p>
                        </c>
                        <c ca="center">
                           <p>0.78</p>
                        </c>
                        <c ca="center">
                           <p>0.24</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.19</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.26</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.PNS</p>
                        </c>
                        <c ca="center">
                           <p>24</p>
                        </c>
                        <c ca="center">
                           <p>234,532</p>
                        </c>
                        <c ca="center">
                           <p>0.78</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.03</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.19</p>
                        </c>
                        <c ca="center">
                           <p>0.07</p>
                        </c>
                        <c ca="center">
                           <p>0.17</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.21</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping2.reproductive system</p>
                        </c>
                        <c ca="center">
                           <p>21</p>
                        </c>
                        <c ca="center">
                           <p>154,400</p>
                        </c>
                        <c ca="center">
                           <p>0.69</p>
                        </c>
                        <c ca="center">
                           <p>0.16</p>
                        </c>
                        <c ca="center">
                           <p>0.14</p>
                        </c>
                        <c ca="center">
                           <p>0.34</p>
                        </c>
                        <c ca="center">
                           <p>0.10</p>
                        </c>
                        <c ca="center">
                           <p>0.24</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping1.salivary gland</p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c ca="center">
                           <p>47,232</p>
                        </c>
                        <c ca="center">
                           <p>0.74</p>
                        </c>
                        <c ca="center">
                           <p>0.55</p>
                        </c>
                        <c ca="center">
                           <p>0.06</p>
                        </c>
                        <c ca="center">
                           <p>1.00</p>
                        </c>
                        <c ca="center">
                           <p>0.00</p>
                        </c>
                        <c ca="center">
                           <p>0.36</p>
                        </c>
                        <c ca="center">
                           <p>0.11</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.SOMATIC MUSCLE</p>
                        </c>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="center">
                           <p>86,317</p>
                        </c>
                        <c ca="center">
                           <p>0.79</p>
                        </c>
                        <c ca="center">
                           <p>0.29</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                        <c ca="center">
                           <p>0.05</p>
                        </c>
                        <c ca="center">
                           <p>0.21</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.28</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>mapping1.tracheal system</p>
                        </c>
                        <c ca="center">
                           <p>9</p>
                        </c>
                        <c ca="center">
                           <p>111,351</p>
                        </c>
                        <c ca="center">
                           <p>0.85</p>
                        </c>
                        <c ca="center">
                           <p>0.55</p>
                        </c>
                        <c ca="center">
                           <p>0.08</p>
                        </c>
                        <c ca="center">
                           <p>0.21</p>
                        </c>
                        <c ca="center">
                           <p>0.16</p>
                        </c>
                        <c ca="center">
                           <p>0.18</p>
                        </c>
                        <c ca="center">
                           <p>0.17</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.VENTRAL ECTODERM</p>
                        </c>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="center">
                           <p>84,154</p>
                        </c>
                        <c ca="center">
                           <p>0.77</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.38</p>
                        </c>
                        <c ca="center">
                           <p>0.32</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.27</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING1.VISCERAL MESODERM</p>
                        </c>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="center">
                           <p>54,278</p>
                        </c>
                        <c ca="center">
                           <p>0.77</p>
                        </c>
                        <c ca="center">
                           <p>0.46</p>
                        </c>
                        <c ca="center">
                           <p>0.10</p>
                        </c>
                        <c ca="center">
                           <p>0.32</p>
                        </c>
                        <c ca="center">
                           <p>0.12</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.01</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.28</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MAPPING2.WING</p>
                        </c>
                        <c ca="center">
                           <p>33</p>
                        </c>
                        <c ca="center">
                           <p>340,094</p>
                        </c>
                        <c ca="center">
                           <p>0.78</p>
                        </c>
                        <c ca="center">
                           <p>0.14</p>
                        </c>
                        <c ca="center">
                           <p>0.13</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.23</p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>0.00</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.22</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>*The number of sequences in a data set; <sup>&#8224;</sup>the total sequence length; <sup>&#8225;</sup>the maximum sensitivity possible. <sup>&#167;</sup>The sensitivity and its empirical <it>p</it>-value are given for each method tested. Data set names are capitalized if at least one of the three methods performs significantly (<it>p</it>-value &#8804;0.05; shown in bold) on it.</p>
                  </tblfn>
               </tbl>
               <p>The program EMCModule <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> has functionality that is similar to Stubb, and uses a given database of motifs to find CRMs. Due to its similarities with Stubb, we chose not to evaluate this program here, instead focusing on Stubb, a program we are much more familiar with.</p>
            </sec>
            <sec>
               <st>
                  <p>CisModule</p>
               </st>
               <p>CisModule is a powerful CRM prediction program that does not require input motifs: it attempts to learn the relevant PWMs while searching for modules. When run on our benchmark with default settings, we found CisModule to consistently overpredict modules, leading to very low positive predictive value (PPV; precision) and very high sensitivity (data not shown). Since our evaluations require every method to predict a single, fixed-length window in each control region, we then processed CisModule's output as described in Materials and methods. The result, however, was that the prediction was significant (sensitivity <it>p</it>-value &#8804;0.05) on only one data set. (Table S1 in Additional data file 1.) We explored alternative settings of the CisModule parameters (such as five motifs instead of three), but the results were similar.</p>
               <p>The poor performance of CisModule on our data sets is possibly the result of an incorrect choice of parameters (we used default parameters), or our post-processing step that forces a fixed length window to be predicted in each input sequence, or both. More insight into the workings of this program should lead to better predictions, which we leave as a future exercise. It is also worth noting that CisModule has been tested <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> previously as a 'motif finding application' that uses clustering of binding sites to improve the extremely difficult motif finding task. In a separate paper <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, the authors used the CisModule-predicted motifs as input to another program called CisModScan, which searches for significant clusters of matches to the motifs, similar to Stubb. Our preliminary tests with this strategy, followed by the post-processing step to obtain equal length predicted CRMs, did not show improved performance. Again, we speculate that a carefully designed combination of CisModule and CisModScan may provide high performance accuracy in our data sets. The public availability of our benchmark and evaluation tools will greatly facilitate testing of CisModule and similar methods by other researchers.</p>
            </sec>
            <sec>
               <st>
                  <p>Markov chain discrimination method</p>
               </st>
               <p>The 'Markov chain discrimination' (MCD) method is our implementation of the 'PFRSampler' algorithm of Grad <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. This method considers the word frequency distribution in the given set of candidate CRMs and a set of background sequences, and uses a Markov chain approach to discriminate between the two. More specifically, the MCD score is obtained by training a fifth order Markov chain on the given set of sequences, evaluating the likelihood of these sequences being generated by the trained Markov chain, and contrasting this likelihood to the likelihood of their generation by a null (background) model. The stronger the contrast, the more different the sequences are from the background, and the higher their chances of being CRMs. Our implementation uses a simulated annealing search strategy to find the highest scoring set of windows in the control regions. Details of the algorithm are presented in Materials and methods. We note that unlike the original PFRSampler algorithm, which exploits evolutionary conservation, our implementation is designed for single species data. The MCD method performed significantly well on only 3 of the 33 data sets, and its sensitivity <it>p</it>-values are shown in Table S1 in Additional data file 1.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Design of new methods</p>
            </st>
            <p>We designed and implemented two new strategies for the gene battery CRM discovery problem that do not require given PWM motifs. In fact, their common theme is that they do not attempt to discover accurate PWMs as part of their module search. We briefly describe these new methods next. Details are presented in Materials and methods.</p>
            <sec>
               <st>
                  <p>CSam</p>
               </st>
               <p>We propose a new strategy, called CSam (short for CRM Sampler; pronounced see-sam), to predict CRMs in given control regions. Here, a set of candidate CRMs is evaluated by the number of statistically overrepresented short words in that set. The intuition is that if a set of CRMs share binding sites for the same factor, this will cause many short words (that are similar to the true binding motif for the factor) to be statistically overrepresented. Note that all overrepresented words in a set of CRMs may not represent transcription factor binding motifs, nor are we interested in determining which words are real motifs; all that matters is that the count of such words be greater in a collection of related CRMs than in random windows of the same size. The new approach is motivated by our recent work <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, where we found the count of overrepresented words to be significantly higher in CRMs than in random non-coding sequences.</p>
               <p>As a design principle in CSam, we avoid determining the precise form of the true motif(s), for example, learning a few distinct, high-confidence PWMs. (This 'motif-finding' problem has been demonstrated empirically to be extremely hard to solve <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.) We instead rely on broad statistical effects of the shared binding sites on the word frequency distribution in the set of CRMs. This is what sets this method clearly apart from the other approaches to this problem, such as CisModule or EMCModule. Also, there is no need in this approach to know the number of distinct functional motifs <it>a priori</it>. With a clearly defined score for any set of candidate CRMs, the CSam algorithm searches for the highest scoring set using a technique called 'simulated annealing' (see Materials and methods). We also experimented with a different search strategy, namely, 'Gibbs sampling' in conjunction with the same scoring scheme.</p>
            </sec>
            <sec>
               <st>
                  <p>D2Z-set</p>
               </st>
               <p>In the D2Z-set method, we make use of our previous work <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> on measuring the similarity between any two regulatory sequences based on their word frequency distributions. In a set of functionally related CRMs (for example, those belonging to a gene battery), many or all pairs of CRMs should share binding sites. The challenge is to capture the resulting similarity between CRMs by a suitable statistical measure. The 'D2 score' <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> is the number of <it>k</it>-mer matches between two given sequences, and the 'D2Z score' introduced in our earlier work <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> computes the statistical significance (z-score) of this number. The z-score is a way to normalize the raw D2 score for dependence on the nucleotide frequencies ('background models') of the sequences. The D2Z score was found in <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> to perform favorably in comparison to a modest number of existing methods for alignment-free sequence comparison <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>.</p>
               <p>The D2Z score measures the similarity between two sequences that results from the shared binding sites within them. Here, we build upon this pairwise measure to develop a score for an arbitrary set of candidate CRMs, called the 'D2Z-set' score (see Materials and methods). We then devised a search algorithm based on 'simulated annealing' that looks for the highest scoring set in the given control regions. This entire method is called the 'D2Z-set' method.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Performance of new methods</p>
            </st>
            <p>The sensitivity <it>p</it>-values for CSam and D2Z-set, along with those of Stubb, are shown in Table <tblr tid="T2">2</tblr>. At a <it>p</it>-value threshold of 0.05, we expected each method to perform significantly well on two sets on average. CSam performs significantly on 16 of the 33 data sets, while D2Z-set does so for 9 data sets. Both compare well with Stubb's predictions (significant for 12 data sets). Of particular interest is the observation that CSam outperforms Stubb in these tests. This suggests that if the set of PWMs relevant to a gene battery are not known, it may be more advantageous to predict CRMs using a motif-agnostic method (CSam), as compared to a state-of-the-art motif-driven approach (Stubb) that relies on a broad collection of PWMs.</p>
            <p>We first make a few observations on Table <tblr tid="T2">2</tblr>. Firstly, we consider the performance figures for the new motif-agnostic methods CSam and D2Z-set, and find as many as 25 (of the 33 &#215; 2 = 66 entries) to be 0.05 or below. To get a rough idea of how significant this is, consider these numbers as independently obtained <it>p</it>-values (which should follow a uniform distribution): one would expect 0.05 &#215; 66 = 3 entries at 0.05 or below. Secondly, we note to what extent the different methods perform well on the same data sets. This is shown in Table <tblr tid="T3">3</tblr>. We find a substantial overlap (Hypergeometric test, <it>p </it>&lt; 0.03) among the data sets on which CSam and D2Z-set perform well. In fact, there is only one data set on which D2Z-set performs significantly and CSam does not. Similarly, there is a significant overlap (Hypergeometric test, <it>p </it>&lt; 0.06) between the data sets on which Stubb and CSam perform well.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Entry for any pair of methods is the number of data sets on which both methods performed significantly well (sensitivity <it>p</it>-value &lt;0.05)</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Stubb</p>
                     </c>
                     <c ca="center">
                        <p>CSam</p>
                     </c>
                     <c ca="center">
                        <p>D2Z-set</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Stubb</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CSam</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>D2Z</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Diagonals indicate the number of data sets on which the corresponding method performed well.</p>
               </tblfn>
            </tbl>
            <p>We also noted, from Table <tblr tid="T2">2</tblr>, that data sets with larger numbers of CRMs tended to show better performance overall. To quantify this, we partitioned the 33 data sets into those where at least one of the two methods (CSam or D2Z-set) performed significantly well, and those where neither method performed well. The data sets in the second partition were significantly smaller than those in the first (Wilcoxon rank-sum test, <it>p </it>&lt; 0.009).</p>
            <p>Next, we turn our attention to the raw values of the sensitivities achieved on these data sets. Limiting ourselves to the cases where the <it>p</it>-value is significant, we find that CSam achieves a raw sensitivity in the range 16-51%, at an average of 27%. Recall that due to the way our tests are designed, a 100% sensitivity is often impossible to achieve; in fact, as Table <tblr tid="T2">2</tblr> reveals, the maximum possible sensitivity is about 77% on average. Next, to get an idea of the practical importance of the observed sensitivity levels, consider a typical 500 bp module in a typical 5,000 bp control region. A sensitivity of approximately 27% means that the predicted window overlaps the known module in about 135 positions. To be able to find the location of the module to this resolution, in a 5,000 bp search region, is clearly useful from a biological perspective. The precise delineation of that module may be recovered from follow-up experiments.</p>
            <p>We next look at the performance of our CRM prediction methods pictorially, to get a better understanding of the sensitivity values of Table <tblr tid="T2">2</tblr>. Figure <figr fid="F1">1</figr> shows the known and CSam-predicted modules in five different data sets. These are selected from the data sets where CSam performed significantly well (<it>p </it>&lt; 0.05), but with raw sensitivity values ranging from 0.21 to 0.51. The plotted data sets are a representative sample, and not the ones with the five highest sensitivity values. Figure <figr fid="F1">1a</figr> ('mapping1.neuroectoderm') has the highest sensitivity (0.51), and we see that the known CRM (red rectangle below line) is correctly predicted (green rectangle above line) in five of the seven sequences (these cases are marked with ovals). Note that even though the nucleotide level sensitivity is 51%, the method has identified 71% of the modules in the data set. We find the same theme in the other data sets shown in Figure <figr fid="F1">1</figr>. Thus, the mapping1.mesectoderm data set (Figure <figr fid="F1">1b</figr>) has three of five (that is, 60%) of its modules correctly identified while the nucleotide-level sensitivity is 46%. The next two panels (Figure <figr fid="F1">1c,d</figr>) show mapping1.ventral_ectoderm and mapping1.eye, where CSam has sensitivity values of 27% and 32%, respectively. In these two data sets, the percentage of modules discovered is 50% (6 of 12, and 3 of 6, respectively). Finally, we look at the data set mapping1.ectoderm (Figure <figr fid="F1">1e</figr>), which has 'only' 21% sensitivity, but at the CRM-level this translates to 16 of the 37 modules (that is, 43%) being correctly identified. Thus, visual inspection reveals that the data sets assessed as showing 'significant' performance indeed show a high rate of correct module discovery.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Performance of CSam on five data sets where its sensitivity <it>p</it>-value was below 0.05</p>
               </caption>
               <text>
                  <p>Performance of CSam on five data sets where its sensitivity <it>p</it>-value was below 0.05. The data sets are <b>(a) </b>mapping1.neuroectoderm, <b>(b) </b>mapping1.mesectoderm, <b>(c) </b>mapping1.ventral ectoderm, <b>(d) </b>mapping1.eye and <b>(e) </b>mapping1.ectoderm. In each panel, every sequence is shown as a blue line, the location of a known module is shown as a red rectangle below the line and the location of a predicted module is shown as a green rectangle above the line. The displays of different panels are to different scales.</p>
               </text>
               <graphic file="gb-2008-9-1-r22-1"/>
            </fig>
            <p>We next extended the above analysis to all data sets and methods. We counted the number (and percentage) of CRMs that are correctly predicted (as described in the section 'Performance evaluation'), thereby obtaining a CRM-level sensitivity. These results are shown in Table <tblr tid="T4">4</tblr>. We find CSam to provide the best CRM-level sensitivity for 18 of the 33 data sets - more than any other method, including the motif-driven program Stubb. Restricting ourselves to the 16 data sets in which CSam performed significantly well (sensitivity <it>p</it>-value &lt;0.05), we find 13 data sets (81%) to have a CRM-level sensitivity of 30% or above, and 6 data sets (38%) to have over 40% of their CRMs correctly predicted. This clearly shows that the statistically significant nucleotide-level sensitivity values of Table <tblr tid="T2">2</tblr> correspond to high accuracy in predicting CRMs.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>CRM-level sensitivity of data sets</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Set name</p>
                     </c>
                     <c ca="center">
                        <p>CRMs*</p>
                     </c>
                     <c ca="center">
                        <p>Stubb<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>CSam<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>D2Z-set<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>CisModule<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>MCD<sup>&#8224;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping3.adult</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p><b>0.35 </b>(12)</p>
                     </c>
                     <c ca="center">
                        <p>0.24 (8)</p>
                     </c>
                     <c ca="center">
                        <p>0.21 (7)</p>
                     </c>
                     <c ca="center">
                        <p>0.24 (8)</p>
                     </c>
                     <c ca="center">
                        <p>0.26 (9)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.adult mesoderm</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>0.20 (1)</p>
                     </c>
                     <c ca="center">
                        <p>0.20 (1)</p>
                     </c>
                     <c ca="center">
                        <p><b>0.40 </b>(2)</p>
                     </c>
                     <c ca="center">
                        <p>0.00 (0)</p>
                     </c>
                     <c ca="center">
                        <p>0.20 (1)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.amnioserosa</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>0.14 (1)</p>
                     </c>
                     <c ca="center">
                        <p><b>0.29 </b>(2)</p>
                     </c>
                     <c ca="center">
                        <p>0.14 (1)</p>
                     </c>
                     <c ca="center">
                        <p>0.00 (0)</p>
                     </c>
                     <c ca="center">
                        <p>0.29 (2)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.blastoderm</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                     <c ca="center">
                        <p><b>0.53 </b>(41)</p>
                     </c>
                     <c ca="center">
                        <p>0.42 (32)</p>
                     </c>
                     <c ca="center">
                        <p>0.21 (16)</p>
                     </c>
                     <c ca="center">
                        <p>0.14 (11)</p>
                     </c>
                     <c ca="center">
                        <p>0.12 (9)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.cardiac mesoderm</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>0.38 (3)</p>
                     </c>
                     <c ca="center">
                        <p>0.25 (2)</p>
                     </c>
                     <c ca="center">
                        <p><b>0.50 </b>(4)</p>
                     </c>
                     <c ca="center">
                        <p>0.12 (1)</p>
                     </c>
                     <c ca="center">
                        <p><b>0.50 </b>(4)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.cns</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>0.12 (4)</p>
                     </c>
                     <c ca="center">
                        <p><b>0.26 </b>(9)</p>
                     </c>
                     <c ca="center">
                        <p>0.24 (8)</p>
                     </c>
                     <c ca="center">
                        <p>0.15 (5)</p>
                     </c>
                     <c ca="center">
                        <p>0.15 (5)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.dorsal ectoderm</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p><b>0.38 </b>(3)</p>
                     </c>
                     <c ca="center">
                        <p>0.25 (2)</p>
                     </c>
                     <c ca="center">
                        <p>0.00 (0)</p>
                     </c>
                     <c ca="center">
                        <p>0.12 (1)</p>
                     </c>
                     <c ca="center">
                        <p>0.25 (2)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping1.ectoderm</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>0.30 (11)</p>
                     </c>
                     <c ca="center">
                        <p><b>0.38 </b>(14)</p>
                     </c>
                     <c ca="center">
                        <p>0.35 (13)</p>
                     </c>
                     <c ca="center">
                        <p>0.22 (8)</p>
                     </c>
                     <c ca="center">
                        <p>0.19 (7)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mapping2.ectoderm</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>0.24 (12)</p>
                     </c>
                     <c ca="center">
                        <p><b>0.39 </b>(20)</p>
                     </c>
                     <c ca="center">
                        <p>0.25 (13)</p>
                     </c>
                     <c ca="center">
                        <p>0.14 (7)</p>