<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2009-10-4-r44</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Method</dochead>
      <bibl>
         <title>
            <p>Choosing the right path: enhancement of biologically relevant sets of genes or proteins using pathway structure</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Thomas</snm>
               <fnm>Reuben</fnm>
               <insr iid="I1"/>
               <email>thomasr3@niehs.nih.gov</email>
            </au>
            <au id="A2" ce="yes">
               <snm>Gohlke</snm>
               <mi>M</mi>
               <fnm>Julia</fnm>
               <insr iid="I1"/>
               <email>gohlkej@niehs.nih.gov</email>
            </au>
            <au id="A3">
               <snm>Stopper</snm>
               <mi>F</mi>
               <fnm>Geffrey</fnm>
               <insr iid="I2"/>
               <email>stopperg@sacredheart.edu</email>
            </au>
            <au id="A4">
               <snm>Parham</snm>
               <mi>M</mi>
               <fnm>Frederick</fnm>
               <insr iid="I1"/>
               <email>parham@niehs.nih.gov</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Portier</snm>
               <mi>J</mi>
               <fnm>Christopher</fnm>
               <insr iid="I1"/>
               <email>portier@niehs.nih.gov</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Environmental Systems Biology Group, Laboratory of Molecular Toxicology, National Institute of Environmental Health Sciences, RTP, NC 27709, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biology, Sacred Heart University, Fairfield, CT 06825, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>4</issue>
         <fpage>R44</fpage>
         <url>http://genomebiology.com/2009/10/4/R44</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19393085</pubid>
               <pubid idtype="doi">10.1186/gb-2009-10-4-r44</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>21</day>
               <month>11</month>
               <year>2008</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>19</day>
               <month>3</month>
               <year>2009</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>24</day>
               <month>4</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>24</day>
               <month>4</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Thomas et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Finding enriched pathways</p>
      </shorttitle>
      <shortabs>
         <p>A method is proposed that finds enriched pathways relevant to a studied condition, using molecular and network data.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>A method is proposed that finds enriched pathways relevant to a studied condition using the measured molecular data and also the structural information of the pathway viewed as a network of nodes and edges. Tests are performed using simulated data and genomic data sets and the method is compared to two existing approaches. The analysis provided demonstrates the method proposed is very competitive with the current approaches and also provides biologically relevant results.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010013">Methods</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Data on the molecular scale obtained under different sampling conditions are becoming increasingly available from platforms like DNA microarrays. Generally, the reason for obtaining molecular data is to use these data to understand the behavior of a system under insult or during perturbations such as occurs following exposure to certain toxicants or when studying the cause and progression of certain diseases. Toxins or diseases will hereafter be commonly referred to as perturbations to the biological system. Genomics is capable of providing information on the gene expression levels for an entire cellular system. When faced with such large amounts of molecular data, there are two options available that can enable one to focus on a small number of interesting sets of genes or proteins. One can cluster the data <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and use the clusters to identify sets of genes that were significantly affected by the perturbations. This represents an unsupervised approach. Other similar approaches include principal component analysis <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and self-organizing maps <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <p>Alternatively, biologically relevant sets of genes/proteins are deduced to exist <it>a priori </it>in the form of biochemical pathways and cytogenetic sets. A supervised approach can be linked with the data to identify these <it>a priori</it>-defined sets that are significantly affected by the perturbations seen in the data. The method proposed in this paper is an example of this approach applied to the scenario of distinguishing between two conditions (such as normal patient versus disease patient, or unexposed versus exposed). The data we wish to link to a given set of pathways are assumed to be genomic data such as gene expression levels or the presence of gene polymorphisms known to be associated with diseases.</p>
         <p>Supervised approaches for the identification of biologically relevant gene expression sets have typically been identified as 'gene set' or 'pathway enrichment' methods in the literature. Recent years have seen significant work done on proposals for new approaches guided by criticisms and limitations of the existing ones; references <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> provide a critical review of the existing methods in terms of their different features, such as the null hypotheses of the underlying statistical tests used and the independence assumption between genes. These reviews essentially inform us that the pathway enrichment methods can be viewed as falling on two sides of a number of different coins. A few of these classifications are given below.</p>
         <p>Firstly, methods could be interested in testing either whether the genes in a specific pathway of interest are affected as a result of a treatment (the implied null hypothesis has been referred to as 'self-contained' <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> or denoted as 'Q2' <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>) or whether the genes in the pathway of interest are more affected than the other genes in the system (this implied null hypothesis has been referred to as 'competitive' <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> or as 'class 1, 2, 3' <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> or denoted as 'Q1' <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>). There are of course good reasons for preferring either of these null hypotheses. One would prefer the 'competitive' hypothesis if the treatment had a wide ranging impact on the genes in the system. This could have an undesirable consequence of having randomly chosen (and hence not biologically relevant) sets of genes attaining significance for the 'self-contained' tests; a nice illustration of a case like this is provided in <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. One could use a 'self-contained' test if the belief is that the treatment had quite a restricted impact on the genes in the system and/or if their only focus is on one or a small number of pathways.</p>
         <p>Some of the pathway enrichment methods treat the genes in the system as being independent of each other <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. Ignoring the gene-gene correlations has been shown to have the effect of elevated false-positive discoveries <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B6">6</abbr></abbrgrp>. However, the need to prioritize the different biological pathways with respect to their relevance to the treatment and the lack of a sufficient number of biological replicates (one in some cases) may force the need for this independence assumption. Examples of methods that try to take into account the gene-gene correlations include <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>.</p>
         <p>Pathway enrichment methods can be distinguished by the use or the absence of an explicit gene-wise statistic to measure the gene's association with the treatment in determining a pathway's relevance to the treatment. Examples of gene-wise statistics used include the two-sample <it>t</it>-statistic, log of fold change <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, the significance analysis of microarrays (SAM) statistic <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and the <it>maxmean </it>statistic <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Methods like those in <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B34">34</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp> treat the problem as a multivariate statistical one and avoid the need for an explicit definition of a gene-wise statistic.</p>
         <p>The method proposed in this paper defines versions for both the 'self-contained' and the 'competitive' null hypotheses and utilizes the idea of the <it>maxmean </it>statistic <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. It improves upon the previous methods by its use of structural information present in biochemical pathways. A pathway is said to have structural information if its components can be placed on a network of nodes and edges. For example, a gene set corresponding to a pathway can be viewed to be associated with a network where the nodes represent the gene products (that is, proteins, protein complexes, mRNAs) while the edges represent either signal transfer between the gene products in signaling pathways or the activity of a catalyst between two metabolites in metabolic pathways.</p>
         <p>Classic signal transduction pathways, such as the mitogen-activated protein kinase (MAPK) pathways, transduce a large variety of external signals, leading to a wide range of cellular responses, including growth, differentiation, inflammation and apoptosis. In part, the specificity of these pathways is thought to be regulated at the ligand/receptor level (for example, different cells express different receptors and/or ligands). Furthermore, the ultimate response is dictated by the downstream activation of transcription factors. Alternatively, intermediate kinase components are shared by numerous pathways and, in general, do not convey specificity nor do they directly dictate the ultimate response (see <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> for a review). Therefore, we test the value of implementing a Heavy Ends Rule (<it>HER</it>) in which the initial and final components of a signaling pathway are given a higher weight than intermediate components.</p>
         <p>Signal transduction relies on the sequential activation of components in order to implement an ultimate response. Therefore, we hypothesize that activation of components that are directly connected to each other in a pathway conveys greater significance than activation of components that are not closely connected to each other. Therefore, we also test the implementation of a Distance Rule (<it>DR</it>) scoring rule in which genes that are closely connected to each other are given a higher score.</p>
         <p>The use of structural information based on an underlying network in an analysis of gene expression data is not new. Similar ideas have been used to identify activated pathways from time profile data (here the attempt was to distinguish between two phenotypes) <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, while structural information of the pathways has been used to enhance the clusters deduced from the gene expression data <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> and to find differentially expressed genes <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. The study by Draghici <it>et al</it>. <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> appears to be the only existing work that incorporates pathway network information to the problem of pathway enrichment. However, this appears to be limited by the need to define an arbitrary cut-off for differential expression, the assumption of independence between genes and the parametric assumption of an exponential distribution for computing the significance.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>The method proposed in this paper is named 'structurally enhanced pathway enrichment analysis' (<it>SEPEA</it>). It is a pathway enrichment method that incorporates the associated network information of the biochemical pathway using two rules, the <it>HER </it>and <it>DR</it>. <it>SEPEA </it>provides three options for null hypothesis testing (<it>SEPEA_NT1</it>, <it>SEPEA_NT2 </it>and <it>SEPEA_NT3</it>) that depend on the goal of the pathway enrichment analysis and the properties of genomic data available. <it>SEPEA_NT1 </it>and <it>SEPEA_NT2 </it>require multiple array samples per gene and are tests that take into account inherent gene-gene correlations. <it>SEPEA_NT3 </it>just requires a summary statistic per gene (that indicates association with the treatment) but assumes that genes are independent of each other. The need for the test <it>SEPEA_NT3 </it>is motivated by the fact that there are situations where the data are just not sufficient to estimate gene-gene correlations, such as the case where the only information available is whether a gene is or is not affected by the treatment; analyzing the situation of having a set of gene polymorphisms known to be associated with breast cancer is one such example. <it>SEPEA_NT1 </it>and <it>SEPEA_NT3 </it>are proposed to be used in situations where the goal is to compare the genes in the pathway of interest to the other genes in the system in terms of their associations with the treatment. <it>SEPEA_NT2 </it>is used for analyses involving only the genes in the pathway in relation to the treatment. The main objective of this paper is to demonstrate the utility of incorporating pathway network information in a pathway enrichment analysis. Therefore, comparisons are made with results from corresponding versions of <it>SEPEA </it>that do not use the network information - <it>SEPEA_NT1</it>*, <it>SEPEA_NT2</it>* and <it>SEPEA_NT3</it>*. In addition, two literature methods are used for comparison with the results from <it>SEPEA_NT1 </it>- gene set enrichment analysis (<it>GSEA</it>) <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and the <it>maxmean </it>method <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> - the null hypotheses of <it>GSEA </it>and <it>maxmean </it>being very similar to <it>SEPEA_NT1</it>.</p>
         <sec>
            <st>
               <p>Motivation for the Heavy Ends Rule score</p>
            </st>
            <p>By giving greater weight to genes whose products are nearest to the terminal gene products of a pathway, the <it>HER </it>score gives more weight to genes specific to a particular pathway. This is illustrated in Figure <figr fid="F1">1</figr>, which uses the concept of terminal gene products. They are gene products like either receptors that initiate the pathway activity or transcription factors that are made to initiate transcription as a result of the pathway activity (see Materials and methods for a more mathematical definition). The genes involved in each of the signaling pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> were evaluated for the position of their gene products with respect to the terminal gene products and the total number of signaling pathways that these genes are involved in. It is clear from Figure <figr fid="F1">1</figr> that genes associated with products that are closer to the terminal gene products are more pathway-specific.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Empirical distribution function of number of pathways associated with genes at given distances from terminal nodes</p>
               </caption>
               <text>
                  <p>Empirical distribution function of number of pathways associated with genes at given distances from terminal nodes. Empirical cumulative distribution function of the number of pathways that are associated with genes that have gene products located at a given distance, <it>d </it>(= 0, 1, 2, 3, 4), from a terminal node of the pathway network. Gene products that are at a distance <it>d </it>= 0 are the terminal gene products. The data used were those of all the genes associated with human signaling pathways in the KEGG pathway database <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
               </text>
               <graphic file="gb-2009-10-4-r44-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Justification for the Distance Rule score</p>
            </st>
            <p>To illustrate the utility of the <it>DR </it>as a scoring method, we consider the linkage between the full set of pathways in KEGG <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>; that is, the pathways themselves can be viewed to be part of a higher level network, the nodes of which are pathways while the edges indicate the transfer of signal or material between pathways (Figure S1 in Additional data file 2). For example, the MAPK signaling pathway and the p53 signaling pathway can be considered to be linked. It seems reasonable to expect that after perturbation of the system, the affected pathways that are linked are more likely to respond similarly. We test this intuition using different microarray data (from the Gene Expression Omnibus (GEO) database <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> in a statistical test on the above network of pathways. The details are provided in the Materials and methods section. The <it>P</it>-values for the eight comparisons (estimated using 1,000 random networks) are given in Table <tblr tid="T1">1</tblr>. Significant <it>P</it>-values across the comparisons support our use of the <it>DR </it>as a reasonable score for differentiating between pathways.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Significance of observed pattern of <it>DR </it>scores across all KEGG pathways for different GEO datasets</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>GEO accession number</p>
                     </c>
                     <c ca="left">
                        <p>Description</p>
                     </c>
                     <c ca="center">
                        <p><it>P</it>-value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[GEO:GDS2744]</p>
                     </c>
                     <c ca="left">
                        <p>MCF-7 breast cancer cells - dioxin treatment versus control</p>
                     </c>
                     <c ca="center">
                        <p>0.005</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[GEO:GDS2649](1)</p>
                     </c>
                     <c ca="left">
                        <p>Early HIV infection CD8+T cells versus uninfected</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[GEO:GDS2649](2)</p>
                     </c>
                     <c ca="left">
                        <p>Chronic HIV infection CD8+T cells versus uninfected</p>
                     </c>
                     <c ca="center">
                        <p>0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[GEO:GDS2649](3)</p>
                     </c>
                     <c ca="left">
                        <p>Non-progressive HIV infection CD8+T cells versus uninfected</p>
                     </c>
                     <c ca="center">
                        <p>0.004</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[GEO:GDS2852](1)</p>
                     </c>
                     <c ca="left">
                        <p>Bronchial A549 cells - cytokine treatment at 0 h versus control</p>
                     </c>
                     <c ca="center">
                        <p>0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[GEO:GDS2852](2)</p>
                     </c>
                     <c ca="left">
                        <p>Bronchial A549 cells - cytokine treatment at 4 h versus control</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[GEO:GDS2852](3)</p>
                     </c>
                     <c ca="left">
                        <p>Bronchial A549 cells - cytokine treatment at 12 h versus control</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[GEO:GDS2852](4)</p>
                     </c>
                     <c ca="left">
                        <p>Bronchial A549 cells - cytokine treatment at 24 h versus control</p>
                     </c>
                     <c ca="center">
                        <p>0.016</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Different control versus treated conditions in three microarray datasets indicated by the GDS accession numbers [GEO:GDS2744], [GEO:GDS2649] and [GEO:GDS2852] from the GEO database were used <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> to compare the <it>DR </it>scores across all the pathways on the pathway network (Figure S1 in Additional data file 2) using the <it>meta_DR </it>term in Equation 9. The <it>P</it>-value for the significance of <it>meta_DR </it>is computed using 1,000 random networks whose generation is described in the Materials and methods section.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Analysis using simulated data</p>
            </st>
            <p>Simulated data were generated from two pathway networks having different patterns of correlation between the various genes in the pathway, with each network having genes in a pool of genes representing a biological system. The pair of networks and the correlation patterns of genes in the pathway, denoted by pattern numbers, are listed in Table <tblr tid="T2">2</tblr>. Patterns 1, 2, 3 and 4 have non-zero correlation between a subset of genes in the system. All genes in pattern 5 are assumed to be independent of each other. Patterns 1 and 3 are biased to the scoring rules proposed here whereas patterns 2 and 4 are not. The treatments had the effect of increasing (as given in the variable, <it>pert</it>) the expressions of certain genes in the system.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Simulation conditions for comparing various methods for pathway enrichment</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Pattern number</p>
                     </c>
                     <c ca="left">
                        <p>Network</p>
                     </c>
                     <c ca="left">
                        <p>Correlated set (&#931;)</p>
                     </c>
                     <c ca="left">
                        <p>Target set (&#934;)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Linear</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>{<it>g</it><sub>1</sub>,..., <it>g</it><sub>9</sub>}</p>
                     </c>
                     <c ca="left">
                        <p>
                           <inline-formula>
                              <graphic file="gb-2009-10-4-r44-i1.gif"/>
                           </inline-formula>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Linear</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>U</it>
                           <sup>
                              <it>L</it>
                           </sup>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <inline-formula>
                              <graphic file="gb-2009-10-4-r44-i2.gif"/>
                           </inline-formula>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ErbbSignaling</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <inline-formula>
                              <graphic file="gb-2009-10-4-r44-i3.gif"/>
                           </inline-formula>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <inline-formula>
                              <graphic file="gb-2009-10-4-r44-i4.gif"/>
                           </inline-formula>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>ErbbSignaling</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>U</it>
                           <sup>
                              <it>E</it>
                           </sup>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <inline-formula>
                              <graphic file="gb-2009-10-4-r44-i5.gif"/>
                           </inline-formula>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Linear</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>&#216;</p>
                     </c>
                     <c ca="left">
                        <p>
                           <inline-formula>
                              <graphic file="gb-2009-10-4-r44-i6.gif"/>
                           </inline-formula>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Different correlation patterns (1-5) considered for the generation of simulated data along with the underlying networks, the set of correlated genes, &#931;, and the set of genes that are the targets of the treatment, &#934;. <it>U</it><sup><it>L </it></sup>denotes a uniformly randomly drawn set of nine genes drawn from the set of genes associated with the pathway displayed in Figure <figr fid="F1">1a</figr>. <it>V</it><sub><it>41</it></sub><sup><it>L </it></sup>denotes a set of 41 randomly drawn genes from the set of 470 genes not associated with the pathway displayed in Figure <figr fid="F1">1a</figr>. <it>U</it><sup><it>E </it></sup>denotes a uniformly randomly drawn set of seven genes drawn from the set of genes associated with the pathway displayed in Figure <figr fid="F1">1b</figr>. <it>V</it><sub><it>3</it></sub><sup><it>E </it></sup> denotes a set of three randomly drawn genes from the set of 413 genes not associated with the pathway displayed in Figure <figr fid="F1">1b</figr>. &#216; denotes the empty set. The symbol &#8746; denotes the set union operation.</p>
               </tblfn>
            </tbl>
            <p>Table <tblr tid="T3">3</tblr> gives estimates of the type 1 errors of the five methods, at the 0.01 and 0.05 significance levels, for patterns 1 and 5. Table <tblr tid="T4">4</tblr> gives estimates of the power of the <it>SEPEA_NT1</it>, <it>GSEA </it>and <it>SEPEA_NT2 </it>methods at 0.01 and 0.05 significance levels, for a <it>pert </it>value of 1.2 and for patterns 1-4. The empirical sizes of the methods <it>maxmean </it>and <it>SEPEA_NT3 </it>do not match their nominal sizes. So the results are provided at empirical sizes of 0.07 and 0.05 (corresponding to a nominal size of 0.001 for both cases).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Type 1 error of different pathway enrichment methods</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Pattern number</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>A</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>SEPEA_NT1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>GSEA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Maxmean</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>SEPEA_NT2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>SEPEA_NT3</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.01</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>135</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>187</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>266</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>0.01</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>45</p>
                     </c>
                     <c ca="center">
                        <p>48</p>
                     </c>
                     <c ca="center">
                        <p>52</p>
                     </c>
                     <c ca="center">
                        <p>53</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Type 1 errors (in terms of the number of experiments out of 1,000 that gave <it>P</it>-values for the randomization tests below <it>&#945; </it>= 0.01 and 0.05 levels) for each of the five methods and for correlation patterns 1 and 5.</p>
               </tblfn>
            </tbl>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Power of different pathway enrichment methods</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Pattern number</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>A</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>SEPEA_NT1</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>GSEA</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Maxmean</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>SEPEA_NT2</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>SEPEA_NT3</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0.01</p>
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>328</p>
                        <p>610</p>
                     </c>
                     <c ca="center">
                        <p>188</p>
                        <p>510</p>
                     </c>
                     <c ca="center">
                        <p>52</p>
                     </c>
                     <c ca="center">
                        <p>357</p>
                        <p>686</p>
                     </c>
                     <c ca="center">
                        <p>321</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>0.01</p>
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>271</p>
                        <p>505</p>
                     </c>
                     <c ca="center">
                        <p>189</p>
                        <p>508</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>295</p>
                        <p>580</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>0.01</p>
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>344</p>
                        <p>692</p>
                     </c>
                     <c ca="center">
                        <p>222</p>
                        <p>496</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>347</p>
                        <p>712</p>
                     </c>
                     <c ca="center">
                        <p>480</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>0.01</p>
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>166</p>
                        <p>361</p>
                     </c>
                     <c ca="center">
                        <p>212</p>
                        <p>468</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>157</p>
                        <p>379</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Power estimates for the <it>SEPEA_NT1</it>, <it>GSEA </it>and <it>SEPEA_NT2 </it>methods (in terms of the number of experiments out of 1,000 that gave <it>P</it>-values for the randomization tests below nominal sizes of <it>&#945; </it>= 0.01 and 0.05). The estimates for <it>maxmean </it>are given at an empirical size of 0.07 (nominal size of 0.001) and those for <it>SEPEA_NT3 </it>at an empirical size of 0.05 (nominal size of 0.001). These are results from simulations in which the treatment resulted in an over-expression of the mean expression of the target genes by the factor <it>pert </it>= 1.2. The methods were evaluated on correlation patterns 1-4.</p>
               </tblfn>
            </tbl>
            <p>Only patterns 1 and 5 were used to analyze the type 1 error behavior because they represented the two scenarios (presence or absence of gene-gene correlations) where pathway enrichment methods have been shown to have different behaviors <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B10">10</abbr></abbrgrp>. Because of the presence of correlations in the data, <it>SEPEA_NT3 </it>gives an incorrect type 1 error value for pattern 1 (Table <tblr tid="T3">3</tblr>). As has been stated previously, in spite of this incorrect behavior, there are situations (like those in which the only information available for each gene is a summary statistic representing the effect of the treatment) where methods like <it>SEPEA_NT3 </it>need to be used in order to create relevant hypotheses regarding affected processes due to the treatment. <it>SEPEA_NT1</it>, <it>SEPEA_NT2 </it>and <it>GSEA </it>do maintain the right type 1 error behavior in both the presence and absence of gene-gene correlations. In the presence of gene-gene correlations, the <it>maxmean </it>method <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> also does not maintain the appropriate type 1 error behavior. As expected, the power estimates of all three <it>SEPEA </it>methods for patterns 1 and 3 were significantly higher (<it>P </it>&lt; 0.05, two-sample test of proportions) than those for patterns 2 and 4, respectively. The power estimates for patterns 1 and 3 using <it>SEPEA_NT1 </it>were higher than those for <it>GSEA</it>, demonstrating improvement in the ability to detect these biologically relevant patterns. For the other two 'not-so-relevant' patterns (2 and 4), <it>SEPEA_NT1 </it>was not always more powerful than the <it>GSEA </it>method. This loss of power can again be explained by the bias of <it>SEPEA </it>to detect conditions favored by the scoring rules. For example, the power estimates of <it>SEPEA_NT1 </it>were also higher than those for <it>GSEA </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp> for pattern 2 whereas this was not the case for pattern 4. At an empirical size of 0.07, <it>maxmean </it>does not appear to be competitive with the other methods. <it>SEPEA_NT1 </it>also provides a more powerful method than <it>GSEA </it>on pattern 1 across a range of perturbation levels and signal to noise levels (Tables S3 and S4 in Additional data file 1). In addition, power results for four other correlation patterns are presented in Table S2 in Additional data file 1.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis using lung cancer data</p>
            </st>
            <p>The study by Raponi <it>et al</it>. <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> analyzes gene expression data taken from 130 lung cancer patients in different stages of the disease. They also provide survival times for each patient. The data are divided into two groups of 85 patients (training set) and 45 patients (test set). This was done such that the proportion of patients in each stage was approximately the same for the two groups. Using these data, the Cox proportional hazards statistic is computed for each gene on the microarray (indicating how predictive it is of the survival time of a patient). The next logical step is then an attempt to find what biochemical pathways are predictive of survival. All of the human KEGG <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> pathways are used in this analysis. The methods used were <it>SEPEA_NT1</it>, <it>GSEA </it>and <it>maxmean</it>. Also, to estimate the value of including information on the network structure, <it>SEPEA_NT1 </it>was applied to the data assuming that all the genes in the pathway are given equal weight and the <it>DR </it>score is zero. This analysis is denoted by <it>SEPEA_NT1*</it>. The goal of our analysis is to evaluate consistency in choosing 'significant' pathways found using the training set versus the test set. Curves for sensitivity versus '1 - specificity' and positive predictive value versus negative predictive value are obtained by using different cut-offs for the log of the <it>P</it>-values obtained using each method; the results are shown in Figure <figr fid="F2">2</figr>. The sensitivity, specificity, positive predictive and negative predictive values for <it>SEPEA </it>analyses have better ranges than those for <it>GSEA </it>and <it>maxmean</it>. For a significant portion of the ranges of sensitivity and specificity for <it>GSEA </it>and <it>maxmean</it>, the <it>SEPEA </it>analyses provide higher sensitivity for a given level of false positives (a point on the '1 - specificity' axis). The same can be said about the portion of the ranges of the positive and negative predictive values of <it>maxmean </it>dominated by the <it>SEPEA </it>analyses. From the curves for <it>SEPEA_NT1 </it>and <it>SEPEA_NT1*</it>, we also observe the benefit of incorporating pathway network information. An updated Figure <figr fid="F2">2</figr> that also includes results from <it>SEPEA_NT2 </it>and <it>SEPEA_NT3 </it>is provided as Figure S2 in Additional data file 3.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Receiver-operator characteristic and positive predictive power versus negative predictive power plots for lung cancer data</p>
               </caption>
               <text>
                  <p>Receiver-operator characteristic and positive predictive power versus negative predictive power plots for lung cancer data. <b>(a) </b>Sensitivity versus '1 - specificity' of enriched pathways that are predictive of survival from lung cancer for four methods: <it>SEPEA_NT1</it>, <it>SEPEA_NT1*</it>, <it>GSEA </it>and <it>maxmean</it>. <it>SEPEA_NT1* </it>is the same analysis as <it>SEPEA_NT1 </it>except that the pathway network information was not used. <b>(b) </b>Positive predictive power (ppp) versus negative predictive power (npp) for the same data and using the same methods of analysis as in (a).</p>
               </text>
               <graphic file="gb-2009-10-4-r44-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Analysis using exposure of <it>Xenopus laevis </it>to cyclopamine data</p>
            </st>
            <p>Enriched KEGG pathways using <it>SEPEA_NT2 </it>and <it>SEPEA_NT2</it>* (which is essentially the <it>SEPEA_NT2 </it>analysis but does not make use of the network information of the pathways and is identical to the analysis of the Q2 test in <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>) methods were determined for a microarray dataset (see Materials and methods section) examining the consequences of inhibition of Sonic hedgehog (SHH) signaling by cyclopamine treatment of developing <it>Xenopus laevis </it>(Tables <tblr tid="T5">5</tblr> and <tblr tid="T6">6</tblr>). Based on the specificity of cyclopamine to inhibit the SHH pathway, we expected to see the SHH signaling pathway significantly enriched; however, the <it>P</it>-value for this pathway was not significant using either method (<it>SEPEA_NT2 </it>and <it>SEPEA_NT2</it>*). This may be due to the time point at which gene expression was evaluated, which was optimized to evaluate downstream effectors of SHH pathway inhibition. Alternatively, this result may also reflect the limitation of the method when using only gene expression datasets, as several components of the SHH pathway, including Hedgehog (Hh) and Patched (PTCH), are known to be regulated at the protein level. Finally, when results obtained using <it>SEPEA_NT2 </it>versus <it>SEPEA_NT2</it>* are examined in the context of pathways linked to the SHH pathway (Figure S1 in Additional data file 2), we see that only the MAPK and Proteasome pathways are reachable from the SHH pathway by two and three edges, respectively, suggesting that results from <it>SEPEA_NT2 </it>may be more consistent with targets downstream of the SHH pathway. None of the other pathways listed in Tables <tblr tid="T5">5</tblr> and <tblr tid="T6">6</tblr> were reachable along the network of pathways (Figure S1 in Additional data file 2) from the SHH pathway. In fact, recent evidence suggests that SHH promotion of proliferation and differentiation in muscle <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> and gastric mucosal cells <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> is through transcription-independent activation of the MAPK/ERK pathway. This analysis suggests benefits of using pathway network information. Additional results from analysis of these data with <it>SEPEA_NT1</it>, <it>SEPEA_NT3</it>, <it>GSEA </it>and <it>maxmean </it>are provided in Additional data file 4.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Enriched <it>X. laevis </it>pathways due to cyclopamine treatment using <it>SEPEA_NT2</it></p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>KEGG pathway ID</p>
                     </c>
                     <c ca="left">
                        <p>Pathway description</p>
                     </c>
                     <c ca="center">
                        <p><it>P</it>-value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla03022]</p>
                     </c>
                     <c ca="left">
                        <p>Basal transcription factors</p>
                     </c>
                     <c ca="center">
                        <p>0.01</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla04010]</p>
                     </c>
                     <c ca="left">
                        <p>MAPK signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.02</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00460]</p>
                     </c>
                     <c ca="left">
                        <p>Cyanoamino acid metabolism</p>
                     </c>
                     <c ca="center">
                        <p>0.024</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00550]</p>
                     </c>
                     <c ca="left">
                        <p>Peptidoglycan biosynthesis</p>
                     </c>
                     <c ca="center">
                        <p>0.031</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla02010]</p>
                     </c>
                     <c ca="left">
                        <p>ABC transporters</p>
                     </c>
                     <c ca="center">
                        <p>0.045</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla03050]</p>
                     </c>
                     <c ca="left">
                        <p>Proteasome</p>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00982]</p>
                     </c>
                     <c ca="left">
                        <p>Drug metabolism - cytochrome P450</p>
                     </c>
                     <c ca="center">
                        <p>0.053</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00830]</p>
                     </c>
                     <c ca="left">
                        <p>Retinol metabolism</p>
                     </c>
                     <c ca="center">
                        <p>0.059</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla04630]</p>
                     </c>
                     <c ca="left">
                        <p>Jak-STAT signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.07</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla04012]</p>
                     </c>
                     <c ca="left">
                        <p>ErbB signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.1</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Enriched KEGG <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> pathways (with <it>P</it>-value &#8804; 0.1) due to cyclopamine treatment of developing <it>X. laevis</it>, designed to inhibit SHH signaling, using microarray data from GEO <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> [GEO:GSE8293]. <it>P</it>-values were obtained using the <it>SEPEA_NT2 </it>analysis with 1,000 randomizations to compute significance.</p>
               </tblfn>
            </tbl>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>Enriched <it>X. laevis </it>pathways due to cyclopamine treatment using <it>SEPEA_NT2</it>*</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>KEGG pathway ID</p>
                     </c>
                     <c ca="left">
                        <p>Pathway description</p>
                     </c>
                     <c ca="center">
                        <p><it>P</it>-value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00930]</p>
                     </c>
                     <c ca="left">
                        <p>Caprolactam degradation</p>
                     </c>
                     <c ca="center">
                        <p>0.006</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla03030]</p>
                     </c>
                     <c ca="left">
                        <p>DNA replication</p>
                     </c>
                     <c ca="center">
                        <p>0.011</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00480]</p>
                     </c>
                     <c ca="left">
                        <p>Glutathione metabolism</p>
                     </c>
                     <c ca="center">
                        <p>0.016</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00561]</p>
                     </c>
                     <c ca="left">
                        <p>Glycerolipid metabolism</p>
                     </c>
                     <c ca="center">
                        <p>0.023</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla03010]</p>
                     </c>
                     <c ca="left">
                        <p>Ribosome</p>
                     </c>
                     <c ca="center">
                        <p>0.045</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00982]</p>
                     </c>
                     <c ca="left">
                        <p>Drug metabolism - cytochrome P450</p>
                     </c>
                     <c ca="center">
                        <p>0.057</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00983]</p>
                     </c>
                     <c ca="left">
                        <p>Drug metabolism - other enzymes</p>
                     </c>
                     <c ca="center">
                        <p>0.057</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla04012]</p>
                     </c>
                     <c ca="left">
                        <p>ErbB signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.067</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla03060]</p>
                     </c>
                     <c ca="left">
                        <p>Protein export</p>
                     </c>
                     <c ca="center">
                        <p>0.072</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla00562]</p>
                     </c>
                     <c ca="left">
                        <p>Inositol phosphate metabolism</p>
                     </c>
                     <c ca="center">
                        <p>0.086</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla04914]</p>
                     </c>
                     <c ca="left">
                        <p>Progesterone-mediated oocyte maturation</p>
                     </c>
                     <c ca="center">
                        <p>0.087</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:xla04020]</p>
                     </c>
                     <c ca="left">
                        <p>Calcium signaling pathway</p>
                     </c>
                     <c ca="center">
                        <p>0.089</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Enriched KEGG <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> pathways (with <it>P</it>-value &#8804; 0.1) due to cyclopamine treatment of developing <it>X. laevis</it>, designed to inhibit SHH signaling, using microarray data from GEO <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> [GEO:GSE8293]. <it>P</it>-values were obtained using the <it>SEPEA_NT2</it>* analysis with 1,000 randomizations to compute significance.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Analysis using OMIM breast cancer data</p>
            </st>
            <p>Genes associated with breast cancer were downloaded from the Online Inheritance in Man (OMIM) database <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. This group of genes was pruned to include only those genes that participate in a pathway in the KEGG pathway database <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The list of genes used is provided in Table S5 in Additional data file 1. The <it>SEPEA </it>analysis was used to test whether there is an overabundance of 'important' (as defined by the scoring rules) breast cancer genes in pathways relative to the remaining set of genes that participate in some pathway in the KEGG pathway database <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Using these data, <it>SEPEA_NT3 </it>and <it>SEPEA_NT3</it>* (which is essentially the <it>SEPEA_NT3 </it>analysis but does not make use of the network information of the pathways and is very similar to those used in <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>) was used to find the enriched human pathways associated; the results are given in Table <tblr tid="T7">7</tblr>. Several of the pathways known to be important for breast cancer initiation and progression are significant using either method, such as the ErbB, p53, and apoptosis pathways. In contrast, the adherens junction, regulation of actin cytoskeleton, cell adhesion molecules, and focal adhesion pathways are significant using <it>SEPEA_NT3</it>, but are not considered significant using the <it>SEPEA_NT3</it>* method (<it>P </it>&#8804; 0.05). These pathways, in particular the focal and cell adhesion pathways, all deal with cell to cell communication and are thought to be key modulators of progression and invasion of malignant phenotypic characteristics <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. In fact, several novel cancer chemotherapy drugs are being designed to specifically act on the focal adhesion pathway and many standard chemotherapy drugs modulate this pathway in conjunction with their primary mode of action <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. So this analysis again suggests gains in the pathway enrichment analysis when network details of pathways are incorporated in the analysis.</p>
            <tbl id="T7">
               <title>
                  <p>Table 7</p>
               </title>
               <caption>
                  <p>Enriched human pathways for susceptibility to breast cancer</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>KEGG pathway ID</p>
                     </c>
                     <c ca="left">
                        <p>Pathway description</p>
                     </c>
                     <c ca="center">
                        <p>SEPEA_NT3</p>
                     </c>
                     <c ca="center">
                        <p>SEPEA_NT3*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04370]</p>
                     </c>
                     <c ca="left">
                        <p>VEGF signaling pathway</p>
                     </c>
                     <c ca="center">
                        <p>1.69E-04</p>
                     </c>
                     <c ca="center">
                        <p>5.14E-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04662]</p>
                     </c>
                     <c ca="left">
                        <p>B-cell receptor signaling</p>
                     </c>
                     <c ca="center">
                        <p>3.32E-04</p>
                     </c>
                     <c ca="center">
                        <p>3.51E-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04630]</p>
                     </c>
                     <c ca="left">
                        <p>Jak-STAT signaling</p>
                     </c>
                     <c ca="center">
                        <p>8.91E-04</p>
                     </c>
                     <c ca="center">
                        <p>0.0417</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04520]</p>
                     </c>
                     <c ca="left">
                        <p>Adherens junction</p>
                     </c>
                     <c ca="center">
                        <p>0.0014</p>
                     </c>
                     <c ca="center">
                        <p>0.1438</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04810]</p>
                     </c>
                     <c ca="left">
                        <p>Regulation of actin cytoskeleton</p>
                     </c>
                     <c ca="center">
                        <p>0.0027</p>
                     </c>
                     <c ca="center">
                        <p>0.0717</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04150]</p>
                     </c>
                     <c ca="left">
                        <p>mTOR signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.0047</p>
                     </c>
                     <c ca="center">
                        <p>0.0052</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04664]</p>
                     </c>
                     <c ca="left">
                        <p>Fc epsilon RI signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.0081</p>
                     </c>
                     <c ca="center">
                        <p>5.99E-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04510]</p>
                     </c>
                     <c ca="left">
                        <p>Focal adhesion</p>
                     </c>
                     <c ca="center">
                        <p>0.0103</p>
                     </c>
                     <c ca="center">
                        <p>0.0648</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04012]</p>
                     </c>
                     <c ca="left">
                        <p>ErbB signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.0103</p>
                     </c>
                     <c ca="center">
                        <p>8.51E-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04210]</p>
                     </c>
                     <c ca="left">
                        <p>Apoptosis</p>
                     </c>
                     <c ca="center">
                        <p>0.0108</p>
                     </c>
                     <c ca="center">
                        <p>7.97E-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa03440]</p>
                     </c>
                     <c ca="left">
                        <p>Homologous recombination</p>
                     </c>
                     <c ca="center">
                        <p>0.0147</p>
                     </c>
                     <c ca="center">
                        <p>0.0016</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04660]</p>
                     </c>
                     <c ca="left">
                        <p>T cell receptor signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.0182</p>
                     </c>
                     <c ca="center">
                        <p>0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04010]</p>
                     </c>
                     <c ca="left">
                        <p>MAPK signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.0183</p>
                     </c>
                     <c ca="center">
                        <p>0.0183</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04910]</p>
                     </c>
                     <c ca="left">
                        <p>Insulin signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.0191</p>
                     </c>
                     <c ca="center">
                        <p>0.0032</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04514]</p>
                     </c>
                     <c ca="left">
                        <p>Cell adhesion molecules</p>
                     </c>
                     <c ca="center">
                        <p>0.0274</p>
                     </c>
                     <c ca="center">
                        <p>0.2407</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04115]</p>
                     </c>
                     <c ca="left">
                        <p>P53 signaling</p>
                     </c>
                     <c ca="center">
                        <p>0.0306</p>
                     </c>
                     <c ca="center">
                        <p>0.0093</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[path:hsa04620]</p>
                     </c>
                     <c ca="left">
                        <p>Toll-like receptor signaling pathway</p>
                     </c>
                     <c ca="center">
                        <p>0.0391</p>
                     </c>
                     <c ca="center">
                        <p>0.0193</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Enriched KEGG <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> pathways (with <it>P</it>-value &#8804; 0.05) obtained using genes from the OMIM database <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> that confer susceptibility to breast cancer. <it>P</it>-values were obtained using the <it>SEPEA_NT3 </it>and <it>SEPEA_NT3</it>* analysis.</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>This paper presents a new method that uses biological data in order to find biochemical pathways that are relevant to the different responses of an organism to two different conditions. Biochemical pathways, instead of being treated as just sets of genes, are viewed as a network of interactions between proteins or metabolites. The extensive analysis using simulated and real data clearly demonstrates the utility of incorporating information on the interactions between the genes present in a pathway network.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Notation</p>
            </st>
            <p>Assume there are <it>m </it>genes (identified by indices in the set <it>G </it>= {1, 2,..., <it>m</it>}) in the system and <it>n </it>array measurements (<it>n</it><sub><it>c </it></sub>control and <it>n</it><sub><it>t </it></sub>treated, <it>n</it><sub><it>c </it></sub>+ <it>n</it><sub><it>t </it></sub>= <it>n</it>) per gene. We will analyze one particular pathway made up of a subset <it>m</it><sub><it>P </it></sub>of the <it>m </it>genes in the system. Without loss of generality, assume that these genes correspond to the first <it>m</it><sub><it>P </it></sub>gene indices in <it>G</it>. The genes in this pathway are part of an underlying network of their gene products. On the basis of this network, gene <it>i </it>of the pathway is assigned a weight <it>w</it><sub><it>i </it></sub>and a gene pair (<it>i </it>and <it>j</it>) is assigned two weights <it>d</it><sub><it>ij </it></sub>(denoting a measure of the distance between these two genes on the network) and <it>e</it><sub><it>ij </it></sub>(which is equal to 1 for a non-zero value of <it>d</it><sub><it>ij</it></sub>). Each of the <it>m </it>genes is also assigned a value, <it>t</it><sub><it>stat</it>, <it>k </it></sub>for gene <it>k </it>capturing the treatment effect on it as found in the observed data. This value obtained under the different null distributions (as defined in the next section) is denoted by <it>T</it><sub><it>stat</it>, <it>i</it></sub>. The two scores, from the Heavy Ends Rule and the Distance Rule are denoted by <it>HER </it>and <it>DR</it>, respectively. They are a function of <it>t</it><sub><it>stat</it>, <it>k</it></sub>. <it>HER</it><sub><it>obs </it></sub>and <it>DR</it><sub><it>obs </it></sub>denote those obtained from the observed experimental data while <it>HER</it><sub><it>rand </it></sub>and <it>DR</it><sub><it>rand </it></sub>those obtained from the different null distributions.</p>
         </sec>
         <sec>
            <st>
               <p>Null hypotheses</p>
            </st>
            <p>Null hypotheses for the three statistical tests performed are given below and share similarities with those stated in <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
            <p>Network test 1 (<it>NT1</it>): <it>T</it><sub><it>stat</it>, <it>i</it></sub>, <it>i </it>= 1, 2,...<it>m </it>are identically distributed (and possibly dependent) with common distribution, <it>F</it><sub><it>0 </it></sub>corresponding to the lack of association with the treatment, for each gene.</p>
            <p>Network test 2 (<it>NT2</it>): <it>T</it><sub><it>stat</it>, <it>i</it></sub>, <it>i </it>= 1, 2,...<it>m</it><sub><it>p </it></sub>(only genes in the pathway) are identically distributed (and possibly dependent) with common distribution, <it>F</it><sub><it>0 </it></sub>corresponding to the lack of association with the treatment, for each gene.</p>
            <p>Network test 3 (<it>NT3</it>): <it>T</it><sub><it>stat</it>, <it>i</it></sub>, <it>i </it>= 1, 2,...<it>m </it>are independent and identically distributed with a common distribution, <it>F </it>(which can take any form).</p>
            <p>In all three hypotheses, <it>HER</it><sub><it>obs </it></sub>and <it>DR</it><sub><it>obs </it></sub>are each drawn from the distribution of <it>HER</it><sub><it>rand </it></sub>and <it>DR</it><sub><it>rand</it></sub>, respectively.</p>
         </sec>
         <sec>
            <st>
               <p>Association value computations</p>
            </st>
            <p>For each gene we define by a pair of values (<inline-formula><graphic file="gb-2009-10-4-r44-i7.gif"/></inline-formula>, <inline-formula><graphic file="gb-2009-10-4-r44-i8.gif"/></inline-formula>) corresponding to the association with the treatment in the context of the observed data. The association of any given gene with treatment is given in terms of the square of the two-sample t-statistic (similar to what has been done in <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B25">25</abbr><abbr bid="B35">35</abbr></abbrgrp>) and also shares similarities with the <it>maxmean </it>statistic defined in <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Mathematically:</p>
            <p>
               <display-formula id="M1">
                  <graphic file="gb-2009-10-4-r44-i9.gif"/>
               </display-formula>
            </p>
            <p>
               <display-formula id="M2">
                  <graphic file="gb-2009-10-4-r44-i10.gif"/>
               </display-formula>
            </p>
            <p>
               <display-formula id="M3">
                  <graphic file="gb-2009-10-4-r44-i11.gif"/>
               </display-formula>
            </p>
            <p>where <inline-formula><graphic file="gb-2009-10-4-r44-i12.gif"/></inline-formula>, <inline-formula><graphic file="gb-2009-10-4-r44-i13.gif"/></inline-formula> are the sample mean gene expression for gene <it>g</it><sub><it>i </it></sub>of the control and treated data, respectively, <inline-formula><graphic file="gb-2009-10-4-r44-i14.gif"/></inline-formula>, <inline-formula><graphic file="gb-2009-10-4-r44-i15.gif"/></inline-formula> are the associated standard deviations, <it>I</it><sub><it>NT1 </it></sub>is equal to 1 when the <it>NT1 </it>test is being used and is equal to zero otherwise, <inline-formula><graphic file="gb-2009-10-4-r44-i16.gif"/></inline-formula> denotes the position of gene <it>i </it>in the sorted (in descending order) list of max(<it>t</it><sub><it>stat</it>, <it>k</it></sub>, 0) over all the <it>m </it>genes, and, similarly, <inline-formula><graphic file="gb-2009-10-4-r44-i17.gif"/></inline-formula> denotes the position of gene <it>i </it>in the sorted (in ascending order) list of min(<it>t</it><sub><it>stat</it>, <it>k</it></sub>, 0). <it>a </it>and <it>b </it>are parameters chosen empirically in order to control for the selection of the pathway with the most significant genes (relative to the other genes in the system). The first terms in the products on the right-hand side of Equation 2 will be called <it>importance </it>factors for a gene. These are values between 0 and 1. The functions 'mean' and 'var' refer to the standard definitions of mean and variance. The term <it>CF </it>denotes a (competitive) factor that is a measure of difference in the mean of differential expression of the genes in the pathway and that of the other genes in the system. Higher <it>CF </it>values indicate higher individual association values for genes in the pathway relative to the other genes and vice versa. Therefore, for similar values for changes in gene expression (<it>t</it><sub><it>stat</it>, <it>i </it></sub>s) the power to detect treatment effect decreases as the <it>CF </it>factor decreases (or as more genes in the system are affected as a result of the treatment). For high values of the <it>CF </it>factor, parameter <it>a </it>controls the (decreasing) <it>importance </it>of genes along the sorted list. The parameter <it>b </it>provides a much steeper decrease in the <it>importance </it>of genes down the sorted list for small values of the <it>CF </it>factor.</p>
            <p>Here, <it>t</it><sub><it>stat</it>, <it>i </it></sub>is the standard two sample t-statistic. In some instances, the only information of the association of a gene with a treated condition may be just a summary statistic. For example, there are a set of known gene polymorphisms associated with breast cancer; in trying to identify pathways relevant for breast cancer, these genes would then be arbitrarily assigned a <it>t</it><sub><it>stat</it>, <it>i </it></sub>equal to 1 while the other genes would be given values of 0. Note that in these situations, <it>n</it>, the number of array measurements per gene, is zero.</p>
         </sec>
         <sec>
            <st>
               <p>Definition of the scoring rules</p>
            </st>
            <p>The score for linking the observed expression data to a given pathway has two components. The first component is called the Heavy Ends Rule score <it>HER</it><sub><it>obs </it></sub>and will have a high value when a combination of the more 'important' genes (those associated with gene products close to a terminal of a pathway) is significantly associated with the treated condition. The second component called the Distance Rule score <it>DR</it><sub><it>obs </it></sub>has a high value when the genes that are significantly associated with the treated condition have their gene products located close together. It is in fact the reciprocal of the weighted average distance between the genes in the network. The weights <it>w</it><sub><it>i</it></sub>, <it>d</it><sub><it>ij </it></sub>and <it>e</it><sub><it>ij </it></sub>are defined in a subsequent section. Each score is defined as the maximum of individual expressions dependent either only on the genes whose expression increased due to the treatment or on the genes whose expression decreased as a result of the treatment. This should make it more robust to detect changes in both scale and location as discussed in <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. The two scores are defined as:</p>
            <p>
               <display-formula id="M4">
                  <graphic file="gb-2009-10-4-r44-i18.gif"/>
               </display-formula>
            </p>
            <p>For the <it>DR </it>score computation, 0/0 is defined to be equal to zero. The scores obtained under the null distributions are denoted by <it>HER</it><sub><it>rand </it></sub>and <it>DR</it><sub><it>rand </it></sub>and are defined as in Equation 4 with <it>t</it><sub><it>i </it></sub>replaced by <it>T</it><sub><it>i</it></sub>.</p>
         </sec>
         <sec>
            <st>
               <p>Test statistic and significance evaluation</p>
            </st>
            <p>For each of the three hypotheses (<it>NT1</it>, <it>NT2 </it>or <it>NT3</it>) the test statistic is defined as:</p>
            <p>
               <display-formula id="M5">
                  <graphic file="gb-2009-10-4-r44-i19.gif"/>
               </display-formula>
            </p>
            <p>where mean(<it>HER</it>) and std(<it>HER</it>) refer to the mean and standard deviation of the <it>HER </it>score for the given test and mean(<it>DR</it>) and std(<it>DR</it>) are those for the <it>DR </it>score.</p>
            <p>For the <it>NT1 </it>and <it>NT2 </it>tests, multiple random samples of arrays are taken from the common set of treated and control data (without replacement) and randomly assigned to control or treated groups. For each random sample, the <it>T</it><sub><it>stat</it>, <it>i</it></sub>s are calculated and then <it>HER</it><sub><it>rand </it></sub>and <it>DR</it><sub><it>rand </it></sub>are computed. The <it>NT1 </it>test requires <it>T</it><sub><it>stat</it>, <it>i </it></sub>to be computed for all the <it>m </it>genes while the <it>NT2 </it>test requires computation for just the <it>m</it><sub><it>P </it></sub>genes that are part of the pathway. For the <it>NT3 </it>test, multiple random samples of <it>m</it><sub><it>P </it></sub><it>T</it><sub><it>stat</it>, <it>i </it></sub>s are drawn from the global set of <it>m </it>observed t<sub><it>stat</it>, <it>i</it></sub>.</p>
            <p>The estimate of the <it>P</it>-value for each of the tests is computed as:</p>
            <p>
               <display-formula id="M6">
                  <graphic file="gb-2009-10-4-r44-i20.gif"/>
               </display-formula>
            </p>
            <p>where <it>I</it>(<it>S</it><sub><it>i </it></sub>&#8805; <it>S</it><sub><it>obs</it></sub>) is an indicator function that equals 1 when the i<sup>th </sup>randomly estimated test statistic value, <it>S</it><sub><it>i</it></sub>, equals or exceeds the observed value and 0 otherwise. The estimation procedure used for the special case when the data are in the form of a list of differentially expressed genes or a list of genes associated with a disease is provided in Additional data file 1.</p>
            <p>The way the significance computations are performed, tests <it>NT1 </it>and <it>NT3 </it>could be viewed as belonging to the class of 'competitive' hypotheses (as elaborated in the Background section) while <it>NT2 </it>could be viewed as a 'self-contained' hypothesis.</p>
            <p>The method when applied to each of the three null hypotheses <it>NT1</it>, <it>NT2 </it>and <it>NT3 </it>is denoted by <it>SEPEA_NT1</it>, <it>SEPEA_NT2 </it>and <it>SEPEA_NT3</it>, respectively.</p>
         </sec>
         <sec>
            <st>
               <p>Generation of simulated data</p>
            </st>
            <p>Data were simulated from two genetic systems (<it>Linear </it>(<it>L</it>) and <it>ErbbSignaling </it>(<it>E</it>)) of 500 genes (<inline-formula><graphic file="gb-2009-10-4-r44-i21.gif"/></inline-formula> and <inline-formula><graphic file="gb-2009-10-4-r44-i22.gif"/></inline-formula>). Each system had two subnetworks of interest and each subnetwork was assumed to have no interactions with the other subnetwork. The <it>Linear </it>network had a set of 30 genes <inline-formula><graphic file="gb-2009-10-4-r44-i23.gif"/></inline-formula> that were connected in a linear fashion (Figure <figr fid="F3">3a</figr>). A set of 87 genes <inline-formula><graphic file="gb-2009-10-4-r44-i24.gif"/></inline-formula> in the <it>ErbbSignaling </it>network interacted in the same manner as described by the Erbb signaling pathway in the KEGG pathway database <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> (Figure <figr fid="F3">3b</figr>). Pathway enrichment analysis was performed on these two subnetworks.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Schematic of networks used to generate simulated data</p>
               </caption>
               <text>
                  <p>Schematic of networks used to generate simulated data. Illustrative schematic of the two pathways used to generate the simulated data. <b>(a) </b>The <it>Linear </it>network of 30 nodes/gene products, each of which is associated with one gene. The pair of squiggly lines across some arrows is used to indicate that there are more nodes that are not shown. <b>(b) </b>The Erbb signaling pathway from the KEGG pathway database <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The expressions of the genes associated with the nodes circled in red are correlated with each other and are the genes that were affected by the treatment.</p>
               </text>
               <graphic file="gb-2009-10-4-r44-3"/>
            </fig>
            <p>Each set &#923; and <it>H </it>had a subset of genes (with indices <inline-formula><graphic file="gb-2009-10-4-r44-i25.gif"/></inline-formula>), <inline-formula><graphic file="gb-2009-10-4-r44-i26.gif"/></inline-formula>, whose expressions were perfectly correlated with each other (&#931;<sup><it>L </it></sup>had <it>n</it><sub><it>corr </it></sub>= 0 or 9 genes and &#931;<sup><it>E </it></sup>had <it>n</it><sub><it>corr </it></sub>= 7 genes). The gene expressions in the complement of each of the sets &#931;<sup><it>L </it></sup>and &#931;<sup><it>E</it></sup>, (&#931;<sup><it>L</it></sup>)<sup><it>c </it></sup>and (&#931;<sup><it>E</it></sup>)<sup><it>c</it></sup>, were assumed to be independent of each other even though some of them could be assumed to be known to have gene products that interact with gene products of genes in &#931;<sup><it>L </it></sup>and &#931;<sup><it>E</it></sup>. This could be justified by the fact that the interaction was not at the gene expression level and involved changes in the phosphorylation/binding states of the protein, for example. Let <inline-formula><graphic file="gb-2009-10-4-r44-i27.gif"/></inline-formula> denote the set of gene indices associated with the proteins circled in Figure <figr fid="F3">3b</figr>, ordered from left to right. The random variable defining the gene expression of gene <it>g</it><sub><it>n </it></sub>is denoted by <it>X</it><sub><it>n</it></sub>. Let N(<it>&#956;</it>, <it>&#963;</it>) represent the normal probability distribution with mean <it>&#956; </it>and standard deviation <it>&#963;</it>. Then data for all the 500 genes in each of the two systems were generated for one experiment under control conditions in the following manner:</p>
            <p>
               <display-formula id="M7">
                  <graphic file="gb-2009-10-4-r44-i28.gif"/>
               </display-formula>
            </p>
            <p>Let <it>&#934;</it> (<it>&#934;</it><sup><it>L </it></sup>and <it>&#934;</it><sup><it>E</it></sup>) denote the set of genes that are direct targets of the treatment. The total number of genes in the system affected by the treatment (that includes the set <it>&#934;</it>) was chosen to be 50 and 10 for the <it>Linear </it>and <it>ErbbSignaling </it>networks, respectively. The effect of the treatment was to increase the mean of the expressions of the direct targets by a factor <it>pert</it>, <it>&#956;</it>' = <it>pert</it>&#183;<it>&#956;</it>. Results from the assignment <it>pert </it>= 1.2 are discussed here while those resulting from other assignments are discussed in Table S3 in Additional data file 1. Let <it>U</it><sup><it>L </it></sup>and <it>U</it><sup><it>E </it></sup>denote a uniformly random selection of <it>n</it><sub><it>corr </it></sub>genes from the sets <it>&#923;</it> and <it>H</it>, respectively, let <it>V</it><sub><it>n</it></sub><sup><it>L </it></sup>and <it>V</it><sub><it>n</it></sub><sup><it>E </it></sup>denote sets of <it>n </it>genes drawn from the complements of the sets <it>&#923;</it> and <it>H</it>, respectively, and let <it>&#216;</it> denote the empty set. The details of the different correlation patterns considered here are given in Table <tblr tid="T1">1</tblr>. Patterns 1 and 3 were the correlation patterns that were favored by the scoring rules described in this paper.</p>
            <p>All methods in this paper were coded using the Java programming language. For each combination of correlation pattern and <it>pert </it>assignment, 1,000 independent experiments were simulated. Each experiment involved the generation of <it>n</it><sub><it>c </it></sub>= 5 control samples and <it>n</it><sub><it>t </it></sub>= 5 treatment samples. For the randomization tests of each method, 1,000 randomizations were chosen. The performance measures chosen were the number of experiments out of the 1,000 performed that resulted in <it>P</it>-values for the test (Equation 6) below different chosen significance levels. The methods evaluated were <it>GSEA </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, <it>maxmean </it><abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, <it>SEPEA_NT1</it>, <it>SEPEA_NT2 </it>and <it>SEPEA_NT3</it>. For the <it>SEPEA_NT1 </it>method, the parameters <it>a </it>and <it>b </it>in Equation 2 were empirically set to equal 2 and 5, respectively. Parameter <it>a </it>= 2 provides a quadratic decrease in the <it>importance </it>of genes along the sorted list for high values of the <it>CF </it>factor (when the mean changes in expression of the genes in the pathway are higher than that of the rest of the genes in the system). In the situation of low values of the <it>CF </it>factor, the value <it>b </it>= 5 was chosen such that the top 20% of genes in the sorted list approximately receive <it>importance </it>in the interval (0.2, 1) while the remaining genes receive weights in the interval (0, 0.2). Results from <it>GSEA </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, <it>maxmean </it><abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and <it>SEPEA_NT1 </it>are comparable because all test a similar null hypothesis. The main difference between these methods is that while <it>GSEA </it>and <it>maxmean </it>are blind to the structure of the biochemical pathway, <it>SEPEA-NT1 </it>is not.</p>
         </sec>
         <sec>
            <st>
               <p>Assignment of network weights</p>
            </st>
            <p>The pathway network is represented by a set of nodes/gene products and set of edges between these nodes. The nodes represent gene products such as individual proteins or protein complexes. There is an edge from node/protein <it>u </it>to node/protein <it>v </it>if <it>u </it>transfers the signal it received immediately to <it>v </it>(either in the form of increasing the transcription of genes associated with <it>v</it>, changing the phosphorylation state of <it>v</it>, causing disassociation of <it>v </it>from a complex that it is part of) in the case of signaling pathways or that <it>u </it>and <it>v </it>catalyze two successive reactions in the case of metabolic pathways.</p>
            <p>Let <inline-formula><graphic file="gb-2009-10-4-r44-i29.gif"/></inline-formula> denote the set of <it>P </it>nodes of the network and <inline-formula><graphic file="gb-2009-10-4-r44-i30.gif"/></inline-formula> denote the set of <it>N </it>genes associated with the nodes. The number of edges entering node <it>v</it><sub><it>i </it></sub>is defined as its in-degree and the number of edges leaving <it>v</it><sub><it>i </it></sub>is defined as its out-degree. We define a node to be a terminal node if either its in-degree or out-degree is zero.</p>
            <p>Assume that each edge represents a unit distance between the two nodes that it connects. So if the shortest route between two nodes is via two edges in the pathway network, then the two nodes are said to be 2 units of distance apart. Note the phrase 'distance between a pair of nodes' is used to imply 'shortest distance between this pair', considering that there may be more than one path connecting the two nodes in the pathway network. Let <it>&#948;</it><sub><it>j </it></sub>denote the shortest distance of node <it>v</it><sub><it>j </it></sub>to a terminal node of the pathway. Let <it>G</it>(<it>v</it><sub><it>i</it></sub>, <it>g</it><sub><it>a</it></sub>) denote the indicator function, which is equal to 1 when gene <it>g</it><sub><it>a </it></sub>is associated with node <it>v</it><sub><it>i </it></sub>and is equal to 0 otherwise. The number of genes associated with node <it>v</it><sub><it>i </it></sub>is denoted by <it>N</it><sub><it>i</it></sub>. Let <it>s</it><sub><it>ij </it></sub>denote the distance from node <it>v</it><sub><it>i </it></sub>to node <it>v</it><sub><it>j </it></sub>in the network. <it>s</it><sub><it>ij </it></sub>is assigned a value of 0 either when <it>i </it>= <it>j </it>or when node <it>v</it><sub><it>j </it></sub>is unreachable from node <it>v</it><sub><it>i</it></sub>. Define the positive indicator function, <it>I</it><sup>+</sup>(<it>x</it>), which is equal to 1 when <it>x </it>is positive and equal to 0 otherwise.</p>
            <p>The weights for gene <it>g</it><sub><it>a</it></sub>, <it>w</it><sub><it>a</it></sub>, and gene pair (<it>g</it><sub><it>a</it></sub>, <it>g</it><sub><it>b</it></sub>), <it>d</it><sub><it>ab </it></sub>and <it>e</it><sub><it>ab</it></sub>, are given by:</p>
            <p>
               <display-formula id="M8">
                  <graphic file="gb-2009-10-4-r44-i31.gif"/>
               </display-formula>
            </p>
            <p>The weight <it>w</it><sub><it>a </it></sub>is defined such that genes associated with nodes closer to the terminal nodes have higher weights than those that are further away. The choice of a linear function to capture the intuition behind the <it>HER </it>is arbitrary and other functions will be experimented with as part of future work. The non-zero weights <it>d</it><sub><it>ab </it></sub>for genes <it>a </it>and <it>b </it>are smaller if they are associated with gene products that are closer together in the pathway network than for pairs of genes whose gene products are further away.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical test for Distance Rule justification</p>
            </st>
            <p>Let the total number of pathways (nodes) in the network in Figure S1 in Additional data file 2 be denoted by <it>N</it><sub><it>p</it></sub>. Denote the distance between pathways <it>i </it>and <it>j </it>on this pathway network by <inline-formula><graphic file="gb-2009-10-4-r44-i32.gif"/></inline-formula>. Define <inline-formula><graphic file="gb-2009-10-4-r44-i32.gif"/></inline-formula> to be equal to zero if pathway <it>j </it>is not reachable from pathway <it>i</it>. Also define variable <inline-formula><graphic file="gb-2009-10-4-r44-i33.gif"/></inline-formula>, which is equal to 1 for all non-zero values of the corresponding <inline-formula><graphic file="gb-2009-10-4-r44-i32.gif"/></inline-formula> and 0 otherwise. Perturbations to one pathway are transferred across the edges of the network to multiple pathways. Using human microarray data randomly chosen from the GEO database <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, we considered eight comparisons between two conditions (Table <tblr tid="T1">1</tblr>). For each comparison, the <it>DR </it>score was computed (Equation 4) for every human pathway on the network of pathways described above. In order to make the comparison possible across all the pathways, the <it>DR </it>scores obtained above using experimental data were normalized with <it>DR </it>scores obtained by setting the <it>T</it><sub><it>stat</it>, <it>i </it></sub>values for all the genes equal to 1. Let the normalized <it>DR </it>score for pathway <it>i </it>be denoted by <inline-formula><graphic file="gb-2009-10-4-r44-i34.gif"/></inline-formula>. A meta score can now be defined on the pathway network as follows:</p>
            <p>
               <display-formula id="M9">
                  <graphic file="gb-2009-10-4-r44-i35.gif"/>
               </display-formula>
            </p>
            <p>Higher values of <it>meta_DR </it>would indicate that pathways with higher values of the normalized <it>DR </it>scores <inline-formula><graphic file="gb-2009-10-4-r44-i34.gif"/></inline-formula> are closer to each other. The significance of the obtained <it>meta_DR </it>scores are tested using random networks generated by the Markov-chain switching algorithm <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. The properties of these random networks are that they have the same number of nodes and edges as the original pathway network and the degree sequence among all the nodes is also maintained. These networks differ, however, from the original network due to a number of random edge swaps across the network.</p>
         </sec>
         <sec>
            <st>
               <p>GeneChip experiments</p>
            </st>
            <p>Cyclopamine powder (11-deoxyjervine; Toronto Research Chemicals Inc., North York, Ontario, Canada) was dissolved in 100% ethanol to a concentration of 5 mg/ml and this stock solution was stored at -20&#176;C. A similar volume of 100% ethanol was stored at -20&#176;C for use in vehicle control exposures. Approximately 200 tadpoles from each of two clutches (designated 'clutch A' and 'clutch B') of the species <it>Xenopus laevis </it>were obtained from Nasco Biology (Fort Atkinson, WI, USA) for a total of approximately 400 tadpoles. Animals were raised at an air temperature of 25 &#177; 1&#176;C in tanks of 9 liters of tap water treated with Stress Coat (Aquarium Pharmaceuticals, Chalfont, PA, USA) and aged 1 day. Each day for three consecutive days, as animals reached stage 52 <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>, the population of stage 52 individuals from each clutch was removed from the clutch tanks and divided in half indiscriminately, resulting in four exposure groups per day: a control group for clutch A; an experimental group for clutch A; a control group for clutch B; and an experimental group for clutch B. Each exposure tank had between 10 and 20 individuals, with 150 ml treated water per individual. After sorting into exposure tanks, 30 &#956;l per animal of 5 mg/ml cyclopamine solution was added to all experimental tanks, and 30 &#956;l per animal of 100% ethanol was added to each control tank. After 24 hours of exposure, animals were sacrificed by over-anesthesia with MS222, dried on a paper towel, then put into vials of RNAlater (Ambion, Austin, TX, USA). Vials were kept at 4&#176;C overnight, then moved to -20&#176;C for storage. Both hindlimb buds were dissected off each animal at the base of the limb using surgical scissors, placed in fresh vials of RNAlater, and returned to -20&#176;C for continued storage. RNA extractions were performed using the RNeasy Mini Kit and optional RNase-Free DNase Set (QIAGEN, Valencia, CA, USA), with the following notes: limbs were put into a 1.5 ml microcentrifuge tube, residual RNAlater was pipetted off, and limbs were crushed with a homegenizer in 200 &#956;l buffer RLT, then 300 &#956;l more buffer RLT was added; and elution was carried out with two washes of 50 &#956;l RNase-free water. Extracted total RNA was stored at -80&#176;C and transferred to the WM Keck Foundation Biotechnology Resource Center, Affymetrix Resource Center (Yale University, New Haven, CT), where they were again run through DNase treatment. Four control-experimental pairs of samples were chosen, from a total of 12 pairs, based on quantity and quality of RNA as determined by analysis on an Agilent 2100 Bioanalyzer RNA Nano chip (Agilent Technologies Inc., Santa Clara, CA, USA). Samples in each pairwise comparison were extracted from the same number of limbs, were from the same clutch, were exposed to cyclopamine solution or ethanol on the same day, and their total RNA was extracted in the same batch of extractions. The eight chosen samples were each hybridized to an Affymetrix<sup>&#174; </sup>GeneChip<sup>&#174; </sup><it>Xenopus laevis </it>Genome Array (Affymetrix, Santa Clara, CA, USA) using 3 &#956;g total RNA. Data have been deposited in the National Center for Biotechnology Information, NCBI GEO with series record ID [GEO:GSE8293].</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p><it>DR</it>: Distance Rule; GEO: Gene Expression Omnibus; <it>GSEA</it>: gene set enrichment analysis; <it>HER</it>: Heavy Ends Rule; KEGG: Kyoto Encyclopedia of Genes and Genomes; MAPK: mitogen-activated protein kinase; <it>NT</it>: network test; OMIM: Online Mendelian Inheritance in Man; <it>SEPEA</it>: structurally enhanced pathway enrichment analysis; SHH: Sonic hedgehog.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>RT and JMG designed and evaluated the research with important suggestions from CJP and FMP. RT implemented the research and drafted the manuscript. GFS performed the <it>Xenopus laevis </it>experiments. All the authors read and approved the final manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper: a Word document that provides a section on a particular estimation of <it>P</it>-values and additional tables of results (Additional data file <supplr sid="S1">1</supplr>); a figure of the network of pathways in the KEGG pathway database <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> (Additional data file <supplr sid="S2">2</supplr>); a figure that demonstrates the receiver-operator characteristics of the different methodologies used (Additional data file <supplr sid="S3">3</supplr>); a table with the <it>P</it>-values for KEGG <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> pathways after cyclopamine treatment of developing <it>X. laevis</it>, designed to inhibit SHH signaling, using microarray data from GEO <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> [GEO:GSE8293] (Additional data file <supplr sid="S4">4</supplr>).</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Estimation of <it>P</it>-values and additional tables of results</p>
            </caption>
            <text>
               <p>Estimation of <it>P</it>-values and additional tables of results.</p>
            </text>
            <file name="gb-2009-10-4-r44-S1.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Network of pathways in the KEGG pathway database</p>
            </caption>
            <text>
               <p>The nodes of this network are pathways while the edges indicate the transfer of signal or material between the pathways.</p>
            </text>
            <file name="gb-2009-10-4-r44-S2.png">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Receiver-operator characteristics of the different methodologies used</p>
            </caption>
            <text>
               <p><b>(a) </b>Sensitivity versus '1 - specificity' of enriched pathways that are predictive of survival from lung cancer for the six methods: <it>SEPEA_NT1</it>, <it>SEPEA_NT1*</it>, <it>SEPEA_NT2</it>, <it>SEPEA_NT3</it>, <it>GSEA </it>and <it>maxmean</it>. <it>SEPEA_NT1* </it>is the same analysis as in <it>SEPEA_NT1 </it>except that the pathway network information was not used. <b>(b) </b>Positive predictive power (ppp) versus negative predictive power (npp) for the same data and using the same methods of analysis as in (a).</p>
            </text>
            <file name="gb-2009-10-4-r44-S3.eps">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p><it>P</it>-values for KEGG <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> pathways after cyclopamine treatment of developing <it>X. laevis</it></p>
            </caption>
            <text>
               <p><it>P</it>-values for KEGG <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> pathways after cyclopamine treatment of developing <it>X. laevis</it>, designed to inhibit SHH signaling, using microarray data from GEO <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> [GEO:GSE8293]. <it>P</it>-values were obtained using <it>SEPEA_NT1</it>, <it>SEPEA_NT2</it>, <it>SEPEA_NT3</it>, <it>GSEA </it>and <it>maxmean </it>analyses with 1,000 randomizations to compute significance.</p>
            </text>
            <file name="gb-2009-10-4-r44-S4.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This research was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (NIEHS). RT specially thanks Dr Shyamal Peddada of the Biostatistics Branch at NIEHS for valuable suggestions regarding the methodology. The <it>Xenopus </it>cyclopamine exposure experiments were performed by GFS in Yale University's Department of Ecology and Evolutionary Biology, and were supported in part by a grant to Gunter P Wagner by the Yale Core Center for Musculoskeletal Disorders, which is funded by NIH grant P30 AR-46032 from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Cluster analysis and display of genome-wide expression patterns.</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>14863</fpage>
            <lpage>14868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">9843981</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.25.14863</pubid>
                  <pubid idtype="pmcid">24541</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Principal components analysis to summarize microarray experiments: application to sporulation time series.</p>
            </title>
            <aug>
               <au>
                  <snm>Raychaudhuri</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Stuart</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2000</pubdate>
            <volume>5</volume>
            <fpage>455</fpage>
            <lpage>466</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid">10902193</pubid>
                  <pubid idtype="pmcid">2669932</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Analysis of gene expression data using self-organizing maps.</p>
            </title>
            <aug>
               <au>
                  <snm>Toronen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kolehmainen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Castren</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>1999</pubdate>
            <volume>451</volume>
            <fpage>142</fpage>
            <lpage>146</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">10371154</pubid>
                  <pubid idtype="doi">10.1016/S0014-5793(99)00524-4</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Analyzing gene expression data in terms of gene sets: methodological issues.</p>
            </title>
            <aug>
               <au>
                  <snm>Goeman</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Buhlmann</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>980</fpage>
            <lpage>987</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17303618</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btm051</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Comparative evaluation of gene-set analysis methods.</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Dinu</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Adewale</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Potter</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Yasui</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>431</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17988400</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-8-431</pubid>
                  <pubid idtype="pmcid">2238724</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A statistical framework for testing functional categories in microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Barry</snm>
                  <fnm>WT</fnm>
               </au>
               <au>
                  <snm>Nobel</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Wright</snm>
                  <fnm>FA</fnm>
               </au>
            </aug>
            <source>Ann Appl Stat</source>
            <pubdate>2008</pubdate>
            <volume>2</volume>
            <fpage>286</fpage>
            <lpage>315</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1214/07-AOAS146</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Newton</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Quintana</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>den Boon</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Sengupta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ahlquist</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Ann Appl Stat</source>
            <pubdate>2007</pubdate>
            <volume>1</volume>
            <fpage>85</fpage>
            <lpage>106</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1214/07-AOAS104</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Gene-set approach for expression pattern analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Nam</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SY</fnm>
               </au>
            </aug>
            <source>Brief Bioinform</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>189</fpage>
            <lpage>197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">18202032</pubid>
                  <pubid idtype="doi">10.1093/bib/bbn001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Discovering statistically significant pathways in expression profiling studies.</p>
            </title>
            <aug>
               <au>
                  <snm>Tian</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Greenberg</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Kong</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Altschuler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kohane</snm>
                  <fnm>IS</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>PJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>13544</fpage>
            <lpage>13549</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16174746</pubid>
                  <pubid idtype="doi">10.1073/pnas.0506577102</pubid>
                  <pubid idtype="pmcid">1200092</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>On testing the significance of sets of genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Efron</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Ann Appl Stat</source>
            <pubdate>2007</pubdate>
            <volume>1</volume>
            <fpage>107</fpage>
            <lpage>129</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1214/07-AOAS101</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>From genes to functional classes in the study of biological systems.</p>
            </title>
            <aug>
               <au>
                  <snm>Al-Shahrour</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Arbiza</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Dopazo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Huerta-Cepas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Minguez</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Montaner</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dopazo</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>114</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17407596</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-8-114</pubid>
                  <pubid idtype="pmcid">1853114</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Al-Shahrour</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Diaz-Uriarte</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dopazo</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>578</fpage>
            <lpage>580</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14990455</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btg455</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>GOstat: find statistically overrepresented Gene Ontologies within a group of genes.</p>
            </title>
            <aug>
               <au>
                  <snm>'Beissbarth</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Speed</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>1464</fpage>
            <lpage>1465</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14962934</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/bth088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Pathway processor: A tool for integrating whole-genome expression results into metabolic networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Grosu</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Townsend</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Hartl</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Cavalieri</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1121</fpage>
            <lpage>1126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12097350</pubid>
                  <pubid idtype="doi">10.1101/gr.226602</pubid>
                  <pubid idtype="pmcid">186628</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>GEPAS: A web-based resource for microarray gene expression data analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Herrero</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Al-Shahrour</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Diaz-Uriarte</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Mateos</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vaquerizas</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Santoyo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dopazo</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>3461</fpage>
            <lpage>3467</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12824345</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg591</pubid>
                  <pubid idtype="pmcid">168997</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments.</p>
            </title>
            <aug>
               <au>
                  <snm>Khatri</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bhavsar</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bawa</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Draghici</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>W449</fpage>
            <lpage>W456</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15215428</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh409</pubid>
                  <pubid idtype="pmcid">441547</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>PathMAPA: a tool for displaying gene expression and performing statistical tests on metabolic pathways at multiple levels for <it>Arabidopsis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Pan</snm>
                  <fnm>DY</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cheung</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Guan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Holford</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Deng</snm>
                  <fnm>XW</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>HY</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>56</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14604444</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-4-56</pubid>
                  <pubid idtype="pmcid">302111</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Pathway Miner: extracting gene association networks from molecular pathways for predicting the biological significance of gene expression microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Pandey</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Guru</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Mount</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>2156</fpage>
            <lpage>2158</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15145817</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/bth215</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>A module map showing conditional activity of expression modules in cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Segal</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Koller</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Regev</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2004</pubdate>
            <volume>36</volume>
            <fpage>1090</fpage>
            <lpage>1098</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15448693</pubid>
                  <pubid idtype="doi">10.1038/ng1434</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology.</p>
            </title>
            <aug>
               <au>
                  <snm>Shah</snm>
                  <fnm>NH</fnm>
               </au>
               <au>
                  <snm>Fedoroff</snm>
                  <fnm>NV</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>1196</fpage>
            <lpage>1197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14764555</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/bth056</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>GoMiner: a resource for biological interpretation of genomic and proteomic data.</p>
            </title>
            <aug>
               <au>
                  <snm>Zeeberg</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Fojo</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Sunshine</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Narasimhan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kane</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Reinhold</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Lababidi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bussey</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Riss</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Weinstein</snm>
                  <fnm>JN</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>R28</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12702209</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-4-r28</pubid>
                  <pubid idtype="pmcid">154579</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>ChipInfo: software for extracting gene annotation and gene ontology information for microarray analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhong</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>3483</fpage>
            <lpage>3486</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12824349</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg598</pubid>
                  <pubid idtype="pmcid">169004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Significance analysis of functional categories in gene expression studies: a structured permutation approach.</p>
            </title>
            <aug>
               <au>
                  <snm>Barry</snm>
                  <fnm>WT</fnm>
               </au>
               <au>
                  <snm>Nobel</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Wright</snm>
                  <fnm>FA</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1943</fpage>
            <lpage>1949</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15647293</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/bti260</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Significance analysis of groups of genes in expression profiling studies.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Delongchamp</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tsai</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>2104</fpage>
            <lpage>2112</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17553853</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btm310</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Improving gene set analysis of microarray data by SAM-GS.</p>
            </title>
            <aug>
               <au>
                  <snm>Dinu</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Potter</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Mueller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Adewale</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Jhangri</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Einecke</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Famulski</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Halloran</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Yasui</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>242</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17612399</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-8-242</pubid>
                  <pubid idtype="pmcid">1931607</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Testing association of a pathway with survival using gene expression data.</p>
            </title>
            <aug>
               <au>
                  <snm>Goeman</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Oosting</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cleton-Jansen</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Anninga</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>van Houwelingen</snm>
                  <fnm>HC</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1950</fpage>
            <lpage>1957</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15657105</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/bti267</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Extensions to gene set enrichment.</p>
            </title>
            <aug>
               <au>
                  <snm>Jiang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gentleman</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>306</fpage>
            <lpage>313</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17127676</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btl599</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>GAzer: gene set analyzer.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Woo</snm>
                  <fnm>HG</fnm>
               </au>
               <au>
                  <snm>Volsky</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>IS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>1697</fpage>
            <lpage>1699</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17468122</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btm144</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>PAGE: Parametric analysis of gene set enrichment.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Volsky</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>144</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15941488</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-144</pubid>
                  <pubid idtype="pmcid">1183189</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Semiparametric regression of multidimensional genetic pathway data: Least-squares kernel machines and linear mixed models.</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Ghosh</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Biometrics</source>
            <pubdate>2007</pubdate>
            <volume>63</volume>
            <fpage>1079</fpage>
            <lpage>1088</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2665800</pubid>
                  <pubid idtype="pmpid" link="fulltext">18078480</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models.</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Ghosh</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>XH</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>292</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">18577223</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-9-292</pubid>
                  <pubid idtype="pmcid">2483287</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Statistical assessment of functional categories of genes deregulated in pathological conditions by using microarray data.</p>
            </title>
            <aug>
               <au>
                  <snm>Maglietta</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Piepoli</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Catalano</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Licciulli</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Carella</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Liuni</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Perri</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ancona</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>2063</fpage>
            <lpage>2072</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17540679</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btm289</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>PGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes.</p>
            </title>
            <aug>
               <au>
                  <snm>Mootha</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Lindgren</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Eriksson</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sihag</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lehar</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Puigserver</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Carlsson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ridderstrale</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Laurila</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Houstis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Daly</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Patterson</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Spiegelman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Hirschhorn</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Altshuler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Groop</snm>
                  <fnm>LC</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2003</pubdate>
            <volume>34</volume>
            <fpage>267</fpage>
            <lpage>273</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12808457</pubid>
                  <pubid idtype="doi">10.1038/ng1180</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Nettleton</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Recknor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Reecy</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>192</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">18042553</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btm583</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.</p>
            </title>
            <aug>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mootha</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Mukherjee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ebert</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Gillette</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Paulovich</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pomeroy</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>15545</fpage>
            <lpage>15550</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16199517</pubid>
                  <pubid idtype="doi">10.1073/pnas.0506580102</pubid>
                  <pubid idtype="pmcid">1239896</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Pathway level analysis of gene expression using singular value decomposition.</p>
            </title>
            <aug>
               <au>
                  <snm>Tomfohr</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kepler</snm>
                  <fnm>TB</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>225</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16156896</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-225</pubid>
                  <pubid idtype="pmcid">1261155</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>An integrated approach for the analysis of biological pathways using mixed models.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wolfinger</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2008</pubdate>
            <volume>4</volume>
            <fpage>e1000115</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">18852846</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.1000115</pubid>
                  <pubid idtype="pmcid">2565842</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A global test for groups of genes: testing association with a clinical outcome.</p>
            </title>
            <aug>
               <au>
                  <snm>Goeman</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Geer</snm>
                  <mnm>van de</mnm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>de Kort</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>van Houwelingen</snm>
                  <fnm>HC</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>93</fpage>
            <lpage>99</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">14693814</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btg382</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Mitogen-activated protein kinases: Specific messages from ubiquitous messengers.</p>
            </title>
            <aug>
               <au>
                  <snm>Schaeffer</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Mol Cell Biol</source>
            <pubdate>1999</pubdate>
            <volume>19</volume>
            <fpage>2435</fpage>
            <lpage>2444</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">84036</pubid>
                  <pubid idtype="pmpid" link="fulltext">10082509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Extracting active pathways from gene expression data.</p>
            </title>
            <aug>
               <au>
                  <snm>Vert</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>Suppl 2</issue>
            <fpage>ii238</fpage>
            <lpage>244</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14534196</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Co-clustering of biological networks and gene expression data.</p>
            </title>
            <aug>
               <au>
                  <snm>Hanisch</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zien</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zimmer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lengauer</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>Suppl 1</issue>
            <fpage>S145</fpage>
            <lpage>154</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12169542</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model.</p>
            </title>
            <aug>
               <au>
                  <snm>Wei</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>404</fpage>
            <lpage>411</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">18083717</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btm612</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>A systems biology approach for pathway level analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Draghici</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Khatri</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tarca</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Amin</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Done</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Voichita</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Georgescu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Romero</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>1537</fpage>
            <lpage>1545</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17785539</pubid>
                  <pubid idtype="doi">10.1101/gr.6202607</pubid>
                  <pubid idtype="pmcid">1987343</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>From genomics to chemical genomics: new developments in KEGG.</p>
            </title>
            <aug>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Goto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aoki-Kinoshita</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kawashima</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Katayama</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Araki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hirakawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D354</fpage>
            <lpage>357</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16381885</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj102</pubid>
                  <pubid idtype="pmcid">1347464</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>NCBI GEO: mining tens of millions of expression profiles - database and tools update.</p>
            </title>
            <aug>
               <au>
                  <snm>Barrett</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Troup</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Wilhite</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Ledoux</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rudnev</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Evangelista</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>IF</fnm>
               </au>
               <au>
                  <snm>Soboleva</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tomashevsky</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Edgar</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>D760</fpage>
            <lpage>765</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17099226</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl887</pubid>
                  <pubid idtype="pmcid">1669752</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung.</p>
            </title>
            <aug>
               <au>
                  <snm>Raponi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Macdonald</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Moskaluk</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Beer</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2006</pubdate>
            <volume>66</volume>
            <fpage>7466</fpage>
            <lpage>7472</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16885343</pubid>
                  <pubid idtype="doi">10.1158/0008-5472.CAN-06-1191</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Sonic hedgehog promotes proliferation and differentiation of adult muscle cells: Involvement of MAPK/ERK and PI3K/Akt pathways.</p>
            </title>
            <aug>
               <au>
                  <snm>Elia</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Madhala</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ardon</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Reshef</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Halevy</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta</source>
            <pubdate>2007</pubdate>
            <volume>1773</volume>
            <fpage>1438</fpage>
            <lpage>1446</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17688959</pubid>
                  <pubid idtype="doi">10.1016/j.bbamcr.2007.06.006</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Sonic hedgehog stimulates the proliferation of rat gastric mucosal cells through ERK activation by elevating intracellular calcium concentration.</p>
            </title>
            <aug>
               <au>
                  <snm>Osawa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ohnishi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Takano</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Noguti</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Mashima</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hoshino</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kita</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Matsui</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Sugano</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Biochem Biophys Res Commun</source>
            <pubdate>2006</pubdate>
            <volume>344</volume>
            <fpage>680</fpage>
            <lpage>687</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16630542</pubid>
                  <pubid idtype="doi">10.1016/j.bbrc.2006.03.188</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Online Mendelian Inheritance in Man (OMIM)</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim</url>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Early stage cancer cell invasion: signaling, biomarkers and therapeutic targeting.</p>
            </title>
            <aug>
               <au>
                  <snm>Behmoaram</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bijian</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bismar</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Alaoui-Jamali</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Front Biosci</source>
            <pubdate>2008</pubdate>
            <volume>13</volume>
            <fpage>6314</fpage>
            <lpage>6325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">18508662</pubid>
                  <pubid idtype="doi">10.2741/3156</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Focal adhesion kinase: a promising target for anticancer therapy.</p>
            </title>
            <aug>
               <au>
                  <snm>Chatzizacharias</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Kouraklis</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Theocharis</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Expert Opin Ther Targets</source>
            <pubdate>2007</pubdate>
            <volume>11</volume>
            <fpage>1315</fpage>
            <lpage>1328</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17907961</pubid>
                  <pubid idtype="doi">10.1517/14728222.11.10.1315</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Specificity and stability in topology of protein networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Maslov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sneppen</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>296</volume>
            <fpage>910</fpage>
            <lpage>913</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">11988575</pubid>
                  <pubid idtype="doi">10.1126/science.1065103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Normal Table of <it>Xenopus laevis </it>(Daudin): A Systematical and Chronological Survey of the Development from the Fertilized Egg Till the End of Metamorphosis.</p>
            </title>
            <aug>
               <au>
                  <snm>Nieuwkoop</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Faber</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <publisher>New York: Routledge</publisher>
            <pubdate>1994</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>
