<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2009-10-12-r139</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods</p>
         </title>
         <aug>
            <au id="A1" ce="yes"><snm>Adler</snm><fnm>Priit</fnm><insr iid="I1"/><email>adler@ut.ee</email></au>
            <au id="A2" ce="yes"><snm>Kolde</snm><fnm>Raivo</fnm><insr iid="I2"/><insr iid="I3"/><email>kolde@ut.ee</email></au>
            <au id="A3"><snm>Kull</snm><fnm>Meelis</fnm><insr iid="I2"/><insr iid="I3"/><email>mkull@ut.ee</email></au>
            <au id="A4"><snm>Tkachenko</snm><fnm>Aleksandr</fnm><insr iid="I2"/><insr iid="I3"/><email>aleksandr.tkatsenko@ut.ee</email></au>
            <au id="A5"><snm>Peterson</snm><fnm>Hedi</fnm><insr iid="I1"/><insr iid="I3"/><email>peterson@quretec.com</email></au>
            <au id="A6"><snm>Reimand</snm><fnm>J&#252;ri</fnm><insr iid="I2"/><email>reimand@ut.ee</email></au>
            <au ca="yes" id="A7"><snm>Vilo</snm><fnm>Jaak</fnm><insr iid="I2"/><insr iid="I3"/><email>vilo@ut.ee</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>Institute of Molecular and Cell Biology, Riia 23, 51010 Tartu, Estonia</p></ins>
            <ins id="I2"><p>Institute of Computer Science, University of Tartu, Liivi 2-314, 50409 Tartu, Estonia</p></ins>
            <ins id="I3"><p>Quretec, &#220;likooli 6a, 51003 Tartu, Estonia</p></ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>12</issue>
         <fpage>R139</fpage>
         <url>http://genomebiology.com/2009/10/12/R139</url>
         <xrefbib><pubidlist><pubid idtype="pmpid">19961599</pubid><pubid idtype="doi">10.1186/gb-2009-10-12-r139</pubid></pubidlist></xrefbib>
      </bibl>
      <history><rec><date><day>13</day><month>8</month><year>2009</year></date></rec><revrec><date><day>25</day><month>10</month><year>2009</year></date></revrec><acc><date><day>4</day><month>12</month><year>2009</year></date></acc><pub><date><day>4</day><month>12</month><year>2009</year></date></pub></history>
      <cpyrt><year>2009</year><collab>Adler et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <shorttitle>
         <p>Multiple-experiment matrix</p>
      </shorttitle>
      <shortabs>
         <p>The MEM web resource allows users to search for co-expressed genes across all microarray datasets in the ArrayExpress database.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>We present a web resource MEM (Multi-Experiment Matrix) for gene expression similarity searches across many datasets. MEM features large collections of microarray datasets and utilizes rank aggregation to merge information from different datasets into a single global ordering with simultaneous statistical significance estimation. Unique features of MEM include automatic detection, characterization and visualization of datasets that includes the strongest coexpression patterns. MEM is freely available at <url>http://biit.cs.ut.ee/mem/</url>.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification id="30010002" subtype="man_spc_id" type="BMC">Bioinformatics</classification>
         <classification id="300100010" subtype="man_spc_id" type="BMC">Genome studies</classification>
         <classification id="300100013" subtype="man_spc_id" type="BMC">Methods</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Rationale</p>
         </st>
         <p>During the last decade, the gene expression microarrays have become a standard tool in studying a large variety of biological questions <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Beginning from the first experiments <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, microarrays have been used for pinpointing disease-specific genes and drug targets <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, uncovering signaling networks <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, describing cellular processes <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, among many other applications. While the methods for single experiment analysis are well established and popular <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, it is clear that information extracted from a single experiment is constrained by details of experimental design such as conditions and cell types. Integrating data from different experiments widens the spectrum of biological conditions and increases the power to find subtler effects.</p>
         <p>Coexpression is one of the central ideas in gene expression analysis. The 'Guilt by association' principle states that gene coexpression might indicate shared regulatory mechanisms and roles in related biological processes. The validity of the principle is proved in several studies, see for example <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. The idea can be applied in many tasks of computational biology, such as inferring functions to poorly characterized genes <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>, discovering new putative members for metabolic pathways <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, or predicting and validating of protein-protein interactions <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Many <it>de novo </it>regulatory motif discovery methods use gene expression similarity information as a primary input for identifying co-regulated genes <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. More recently, gene expression similarity search has been utilized in a pathway reconstruction study <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
         <p>Multi-experiment coexpression analysis can be a labour-intensive and computationally challenging task. First steps involve collecting suitable datasets, data downloads, preprocessing, normalization, and gene annotation management. Then, methodological and technical questions arise, namely the integration of different datasets, merging cross-platform data, and handling ambiguous mappings between genes and probesets. Finally, the sheer size of targeted data requires efficient computational strategies or caching of pre-calculated results. The complexity of multi-experiment microarray analysis is likely its main limitation, as researchers often lack the time and resource to take on such a task. Consequently, there is a clear need for services that provide coexpression information in an easy and accessible format.</p>
         <p>Surprisingly, the resources and tools for finding genes with similar expression profiles in multiple experiments are still rather scarce.</p>
         <p>Microarray databases ArrayExpress <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and Gene Expression Omnibus (GEO) <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> have implemented a data mining layer for finding and analyzing most relevant datasets, but neither yet provides a comprehensive gene coexpression search over many datasets simultaneously. Gemma is a web based resource that utilizes a global inference strategy to detect genes that have similar expression profiles in all covered datasets <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. However, global coexpression analysis is likely to miss similarities that occur in a tissue or condition specific manner <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. SPELL is a resource that puts a strong emphasis on selecting the appropriate datasets for the query <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. The method identifies the subset of most relevant datasets by analyzing the coexpression of a user-defined list of genes, and uses the subset to find additional genes. Unfortunately, detecting relevant datasets relies on the user's knowledge of genes that are likely to have similar expression profiles. Furthermore, it currently features relatively small number of datasets, all of them describing yeast.</p>
         <p>We have developed the query engine MEM that detects coexpressed genes in large platform-specific microarray collections. The Affymetrix microarray data originates from ArrayExpress and also includes datasets submitted to GEO and automatically uploaded to ArrayExpress. MEM encompasses a variety of conditions, tissues and disease states and incorporates nearly a thousand datasets for both human and mouse, as well as hundreds of datasets for other model organisms.</p>
         <p>MEM coexpression search requires two types of input: first, the user types in a gene ID of interest, and second, chooses a collection of relevant datasets. The user may pick the datasets manually by browsing their annotations, or allow MEM to make an automatic selection based on statistical criteria such as gene variability. MEM performs the coexpression analysis individually for each dataset and assembles the final list of similar genes using a novel statistical rank aggregation algorithm. Efficient programming guarantees rapid performance of the computationally intensive real-time analysis that does not rely on precomputed or indexed data. The results are presented in highly interactive graphical format with strong emphasis on further data mining. Query results and datasets can be ordered by significance or clustered. The MEM visualization method helps highlights datasets with highest coexpression to input gene and helps the user distinguish evidence with poor or negative correlation. Datasets are additionally characterized with automatic text analysis of experiment descriptions, and represented as word clouds that highlight predominant terms. With MEM we aim to make multi-experiment coexpression analysis accessible to a wider community of researchers.</p>
      </sec>
      <sec>
         <st>
            <p>MEM web interface</p>
         </st>
         <sec>
            <st>
               <p>Input</p>
            </st>
            <sec>
               <st>
                  <p>Primary input</p>
               </st>
               <p>The primary input of MEM is a single query gene that acts as the template pattern for the coexpression search. The tool recognizes common gene identifiers and automatically retrieves corresponding probesets, the conversion is based on g: Profiler <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and Ensembl <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> ID mappings. When several probesets link to a gene, the user needs to choose one of the probesets for further analysis.</p>
               <p>Second, the user needs to select the collection of datasets where similarities between expression profiles are detected (the search space). ArrayExpress datasets are organized into platform-specific collections and the user may choose perform the search over all datasets of a specific platform. The search space may be further narrowed by browsing dataset annotations and composing a collection that covers a specific disease or tissue type.</p>
            </sec>
            <sec>
               <st>
                  <p>Dataset selection</p>
               </st>
               <p>In multi-experiment coexpression analysis, some individual datasets may produce noisy or even entirely random results that are either caused by poor data quality or low expression levels of the query gene. The quality of the analysis can be improved considerably by eliminating the datasets that create a noise bias for the query gene. Low dataset-wide variability of expression levels is one of the key indicators of spurious results. Minute changes in gene expression are often caused by experimental noise rather than cellular mechanics. Therefore, corresponding similarity searches are likely to be less informative about gene function.</p>
               <p>We have included a standard deviation filter in the MEM interface that allows the users to detect and disregard datasets where the variability of the query gene is low. Based on extensive simulations detailed in the Methods section, we conclude that the standard deviation <it>&#963; </it>= 0.29 is a reasonable threshold for distinguishing informative datasets. The above filter holds for the entire analysis since all related datasets are normalized and preprocessed using the same algorithm.</p>
            </sec>
            <sec>
               <st>
                  <p>Search algorithm parameters</p>
               </st>
               <p>The first step of MEM multi-experiment coexpression analysis detects the most similar candidate genes for each individual dataset. The most important parameter for this stage is the distance measure that defines the similarity between expression profiles and has a significant impact on the contents and interpretation of results. Pearson correlation is the default distance measure in MEM. It evaluates the dynamic similarity of expression profiles and has become a standard method of measuring coexpression <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Another useful measure is the anti-correlation distance that detects inverse expression patterns, such as genes responding to repressor activity. For example, anti-correlation queries have been used to validate predicted micro RNA targets <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Absolute correlation distance is a combination of the above measures, as it detects both direct and inverse similarity.</p>
               <p>After detecting the most similar genes in individual datasets, we apply a novel rank aggregation algorithm that merges candidates of different datasets and creates the final list of coexpressed genes. The rank aggregation algorithm assigns a <it>P</it>-value to each gene, in order to evaluate its similarity to the query gene across the given collection of datasets. Statistically, the <it>P</it>-value reflects the likelihood of the gene appearing with certain observed ranks in the datasets if the similarity lists were shuffled randomly. Selecting the expression profiles with most significant <it>P</it>-values accurately retrieves genes with high expression similarity and functional relevance to the query gene (Figure <figr fid="F1">1</figr>).</p>
               <fig id="F1"><title><p>Figure 1</p></title><caption><p>MEM user interface and results for the transcription factor <it>NANOG</it></p></caption><text>
   <p>MEM user interface and results for the transcription factor <it>NANOG</it>. The top of the page contains controls for the query: gene input field, dataset selection and advanced options. Bottom of the page shows the results of the query. The genes, which are displayed as rows, are ordered by multi-experiment similarity to the query gene. Additionally, the single experiment similarity ranks are displayed as a matrix of colored squares, where red and blue denote small and large ranks, respectively. The larger squares indicate the ranks that contributed to the final <it>P</it>-value. Each element corresponds to a experiment and the columns are clustered. Hovering over the results brings up context specific information: <b>(a) </b>word cloud that characterizes the corresponding experiments; <b>(b) </b>single dataset annotations; <b>(c) </b>gene names with short descriptions. The row of links above the results facilitates the further analysis of results. For example, the user can visualize the expression of selected datasets (marked with green ticks) as a heat map <b>(d)</b>.</p>
</text><graphic file="gb-2009-10-12-r139-1"/></fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Output</p>
            </st>
            <p>The principal output of MEM is a ranked list of genes that are coexpressed with the query gene in the provided datasets. For each resulting gene, MEM provides a <it>P</it>-value that reflects the significance of its expression similarity to the query gene across the collection on analyzed datasets. A wealth of interesting information is presented in the graphical rank matrix (Figure <figr fid="F1">1</figr>). Each column of the matrix stands for a dataset, each row represents a gene, and each matrix element reflects the individual similarity rank for the given gene in the given dataset. Visual inspection of the rank matrix allows the researcher to detect patterns of correlation across datasets and spot significantly stronger coexpression profiles. The rank aggregation algorithm provides a natural cutoff between informative and non-informative ranks for each gene. Colors and cell size is used to highlight datasets where the given gene was particularly similar to the query gene and hence contributed significantly to the final <it>P</it>-value.</p>
            <p>Genes with the greatest similarity rankings are frequently in strong correlation only within a relatively small fraction of datasets that are biologically relevant to gene function. If the contributing datasets can be related in the context of experimental design, one may learn additional information about the query gene and its association to the resulting genes. Columns of the rank matrix are clustered hierarchically, so that datasets with similar correlation patterns are grouped together using a tree visualization, and datasets with most impact are aligned to the left. While the default policy is to filter datasets based on the standard deviation criterion, one may take advantage of the high contribution of few datasets and manually remove experiments that have little impact on the final list of correlated genes. Single clicks on datasets or tree nodes toggle whether selected experiments or entire experiment groups are regarded in downstream analysis.</p>
            <p>A text mining technique called word cloud gives a compact semantic overview of a selected group of datasets through the descriptions of experimental designs. The word cloud detects keywords that are enriched in the experimental descriptions of the group, and uses different font sizes to highlight terms with strong statistical significance. One may study the experiment descriptions of single datasets and dataset clusters by moving the mouse over the dataset clustering tree.</p>
            <p>Additional features of the tool reveal finer details of underlying data and create multiple pointers for further analysis. Besides coexpression associations in the rank matrix, MEM also displays standard heat maps with expression profiles and experimental details of individual datasets. The heat maps provide an easy visual validation of detected coexpression patterns. MEM includes filters that constrain the output to certain genes and allow the researcher to seek answers to interesting problems. For instance, one may study the association of the query gene in relation to a certain pathway or biological process, by comparing the expression patterns of its members. The URLMap feature provides easy access to external resources, as it automatically links resulting genes to multiple genomic databases <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Coexpressed genes can be directed to the g: Profiler toolset for functional enrichment analysis of Gene Ontology terms, pathways and <it>cis</it>-regulatory motifs <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Case studies</p>
         </st>
         <sec>
            <st>
               <p>MEM query with embryonic stem cell regulator NANOG retrieves ES cells related genes and datasets</p>
            </st>
            <p>The homeobox transcription factor <it>NANOG </it>is a key regulator of differentiation and pluripotency maintenance in mammalian embryonic stem cells <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. <it>NANOG </it>forms a complex circuitry together with the factors <it>OCT4 </it>and <it>SOX2 </it>and is involved in the combinatorial regulation of a range of downstream developmental processes.</p>
            <p>We demonstrate the power of the MEM toolset by analyzing the genes that show strong coexpression patterns with <it>NANOG </it>across multiple datasets (see Figure <figr fid="F1">1</figr>). We chose a collection of 487 mouse datasets of the Affymetrix 430-2 platform, as the platform includes the largest amount of ES cells related experiments. After applying the default standard deviation filter (<it>&#963; </it>= 0.29), MEM automatically removed 419 datasets where the expression level of <it>NANOG </it>was insufficient for coexpression analysis. As the role of <it>NANOG </it>role is believed to be restricted to embryonic stem cells only, datasets covering other tissues and conditions are expectedly uninformative and provide no results of statistical significance (data not shown). On the other hand, datasets considered relevant by MEM appear to be related to the role of <it>NANOG</it>. Keyword analysis of experimental annotations reveals enriched terms like 'embryonic', 'pluripotent', 'stem cell' and so on (see word cloud, Figure <figr fid="F1">1a</figr>).</p>
            <p>In response to the <it>NANOG </it>query, MEM retrieves a list of coexpressed genes that appear to be functionally related to embryonic stem cells. Enrichment analysis with top 50 probesets reveals important functional terms from Gene Ontology (for example, stem cell development <it>P </it>&lt; 10<sup>-12 </sup>and regulation of transcription <it>P </it>&lt; 10<sup>-6</sup>). The top list includes key transcription factors <it>OCT4 </it>(position 1) and <it>SOX2 </it>(position 7) as well as other genes with known roles in stem cell regulation and maintenance of pluripotency. For instance, UTF1 is a ES cell specific transcriptional coactivator <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, while <it>DPPA2/3/4/5A </it>are nuclear factors with a role in regulating pluripotency <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. <it>NODAL </it>is a member of the TGF-beta superfamily whose signaling is required for maintaining pluripotency in human embryonic stem cells <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Signaling of <it>TDGF </it>(Cripto) in a <it>NODAL</it>-dependent manner directs the differentiation and fate determination of ES cells <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. <it>TGF3 </it>is another growth factor that has been shown to involve in the patterning of the anterior-posterior axis and exhibit signaling similar to NODAL <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>.</p>
            <p>In a previous study, Sharov <it>et al</it>. inferred direct targets of <it>NANOG </it>by computational integration of gene expression and chromatin immunoprecipitation data <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. 14 of the 281 targets of the above study are also detected by MEM among top-50 most significant genes (<it>P </it>&lt; 10<sup>-13</sup>). To put this result into context, we performed a similarity search in each of the 487 datasets individually, and found that each dataset yielded a smaller number of targets than the composite MEM query (Figure <figr fid="F2">2</figr>). To show the utility of the standard deviation based filter, we highlighted the datasets that passed the filter. Only 20 out of 487 datasets had overlap larger then 4 and only two of them did not pass the standard deviation filter, confirming the accuracy of the filter in selecting relevant datasets.</p>
            <fig id="F2"><title><p>Figure 2</p></title><caption><p><it>NANOG </it>targets among first 50 MEM results</p></caption><text>
   <p><it>NANOG </it>targets among first 50 MEM results. MEM query with transcription factor <it>NANOG </it>retrieves more of its targets among top 50 genes, than queries on any one dataset individually. Each point represents the overlap between <it>NANOG </it>targets and top 50 query results in one of the 487 datasets. The datasets are sorted by variation and the ones that pass standard deviation filter are highlighted. Most of the datasets that retrieve high number of <it>NANOG </it>targets pass the filter, which shows the specificity of the filter.</p>
</text><graphic file="gb-2009-10-12-r139-2"/></fig>
         </sec>
         <sec>
            <st>
               <p>Analysis of MEM coexpression network reveals functional modules of cell cycle, proteasome and the immune system</p>
            </st>
            <p>Coexpression information can be used to reconstruct biological networks and regulatory pathways <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. In such a network, genes act as network nodes, that are associated via edges if their expression patterns are in strong correlation. Coexpression networks have been shown to contain densely connected modules that include genes of related function <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>.</p>
            <p>We used MEM to build a coexpression network of the mouse genome, using a collection of 89 datasets (Additional file 1) of the Affymetrix U74Av2 platform as the search space. In the first stage, we retrieved the list of coexpressed genes for every mouse gene, and constructed the network by connecting gene pairs where both genes of the pair had significant MEM similarity scores with one another. After applying a Bonferroni multiple testing correction, we ended up with a dense network with 115664 edges between 5440 genes with statistical significance below 0.001. In the second stage, we applied the Markov Cluster (MCL) algorithm <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> via the GraphWeb tool <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> to prune the network and find gene modules. The MCL algorithm simulates a stochastic flow in the expression graph and removes edges that are visited infrequently, resulting in a collection of densely connected groups of genes. In the third stage, we assessed the functional relevance of detected modules with GraphWeb, by finding significantly enriched Gene Ontology terms (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome biological pathways, and cis-regulatory motifs.</p>
            <p>The size, density and functional descriptions of the six largest modules can be seen on Figure <figr fid="F3">3a</figr>. All have strong and clear functional annotations, that is, proteasome (KEGG, <it>P </it>&lt; 10<sup>-11</sup>), mitochondria (GO, <it>P </it>&lt; 10<sup>-146</sup>), cell cycle (GO, <it>P </it>&lt; 10<sup>-50</sup>), biological adhesion (GO, <it>P </it>&lt; 10<sup>-18</sup>), immune system process (GO, <it>P </it>&lt; 10<sup>-21</sup>) and protein transport (GO, <it>P </it>&lt; 10<sup>-5</sup>). Several smaller modules with interesting functional annotations are also detected, for instance one related to T-cell generation (Figure <figr fid="F3">3b</figr>, <it>P </it>&lt; 10<sup>-12</sup>) and one related to regulation of heart contraction (Figure <figr fid="F3">3c</figr>, <it>P </it>&lt; 10<sup>-7</sup>).</p>
            <fig id="F3"><title><p>Figure 3</p></title><caption><p>Functional descriptions of the modules found in the mouse coexpression network constructed with MEM</p></caption><text>
   <p>Functional descriptions of the modules found in the mouse coexpression network constructed with MEM. Annotations of the six largest modules are shown in <b>(a)</b>. Two smaller modules are shown in the Figure, along with their functional annotations in <b>(b) </b>and <b>(c)</b>.</p>
</text><graphic file="gb-2009-10-12-r139-3"/></fig>
         </sec>
         <sec>
            <st>
               <p>MCM complex of DNA replication initiation shows consistent expression patterns with ORC, GMNN and CDC6L/45L</p>
            </st>
            <p>Stable protein complexes are made up of several physically interacting proteins. In order to keep essential complexes intact, corresponding subunits need to have consistent expression patterns across many diverse conditions and tissues. Hence, a MEM query with a selected complex subunit should retrieve the remaining complex subunits with high ranks. Queries with different subunits are expected to retrieve similar lists of well-correlated genes whose functional role is related to that of the complex in question. In order to validate MEM performance on protein complexes, we studied the expression patterns of the essential <it>MCM </it>(Mini Chromosome Maintenance) complex that is conserved in eukaryotes from yeast to human. <it>MCM </it>is involved in the regulation of DNA replication during cell cycle, a complex multistep process that involves the cooperation of a number of proteins <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. <it>MCM </it>is a helicase of six subunits (<it>MCM2</it>-<it>MCM7</it>) that forms the Pre-Replicative Complex (preRC) together with the Origin Recognition Complex (<it>ORC1</it>-<it>ORC6</it>) and cell division cycle proteins (<it>CDC6</it>, <it>CDC45</it>) <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. The preRC binds to the origins of recognition on the DNA and initiates replication during the G1 phase of the cell cycle. The <it>MCM </it>complex acts as the licensing factor of replication, ensuring that DNA is synthesized only once per cell cycle <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Besides initializing DNA replication, <it>MCM </it>also has a later role during DNA synthesis in strand elongation. The presence of the complex appears to be correlated with cell proliferation and suggests roles in cancer <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>.</p>
            <p>We composed a compendium of 145 cancer-related microarray datasets (Additional file 2) of the human Affymetrix U133A platform from ArrayExpress to analyze the expression profiles of <it>MCM </it>complex subunits <it>MCM2</it>-<it>MCM7</it>. For each of the <it>MCM </it>subunits, we used MEM to retrieve a ranked list of 100 probesets with most correlation relative to the subunit, referred to its cohort. In case of multiple probesets corresponding to a subunit, we picked the probeset whose cohort contained most cell cycle related genes. We excluded <it>MCM7</it>, as the corresponding probeset also maps to several unrelated genes.</p>
            <p>The subunits of the <it>MCM </it>complex have extremely consistent expression profiles across the compendium of cancer-related datasets. Among the cohorts of <it>MCM </it>subunits, other <it>MCM </it>probesets are always delivered with a high rank (median rank 17.5). The <it>MCM </it>cohorts are generally very similar, as on average, a pair of <it>MCM </it>subunits shares 65 probesets of the 100-element cohorts and the six 100-probeset cohorts contain a total of 116 probesets that occur in more than two cohorts (Additional file 3). These overlaps are very unlikely to occur by random chance, as even the protein pair with least common probesets has a highly significant <it>P</it>-value (<it>MCM5 </it>and <it>MCM6</it>, 47 common probesets, <it>P </it>&lt; 10<sup>-87</sup>).</p>
            <p>MEM coexpression patterns are functionally well reflected in the cohorts. The probesets have strong enrichments that are related to the role of the <it>MCM </it>complex as well as the cancer-specific context of the analyzed datasets. g: Profiler reveals enrichments of generic terms such as the cell cycle (GO, <it>P </it>&lt; 10<sup>-42</sup>) and DNA replication (GO, <it>P </it>&lt; 10<sup>-37</sup>), as well as more specific functions like DNA replication pre-initiation (Reactome, <it>P </it>&lt; 10<sup>-11</sup>) and DNA strand elongation (Reactome, <it>P </it>&lt; 10<sup>-21</sup>). The promoters of coexpressed genes have enrichments for the binding site of <it>E2F1</it>, a transcription factor with a recognized role in replication regulation and oncogenesis (for example, Transfac, M00427, consensus sequence TTTSGCGS, <it>P </it>&lt; 10<sup>-6</sup>) <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. The enrichment in the <it>P53 </it>pathway (KEGG, <it>P </it>= 10<sup>-4</sup>) suggests a link with the well-identified tumor suppressor gene <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. Moreover, the cohorts contain microRNAs as well as enrichments for microRNA target sites that may have cancer-specific roles. For instance, the coexpressed genes have a greater than expected proportion of target sites for the microRNA <it>miR-142-5p </it>(miRBase, <it>P </it>&lt; 10<sup>-4</sup>), a regulatory RNA that has been detected in the context of leukemia <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>.</p>
            <p>In order to investigate the advantage of MEM analysis for coexpression over multiple datasets, we conducted a computational experiment where varying numbers of datasets were incorporated for delivering <it>MCM </it>cohorts (Figure <figr fid="F4">4</figr>). For each of the sample sizes ranging from 2 to 125, we used 300 randomized collections of input datasets from the above cancer compendium to measure the median distance between <it>MCM </it>subunits in individual cohorts. As expected, adding more datasets into MEM analysis brings <it>MCM </it>subunits closer in resulting ranked gene lists. According to the Kolmogorov-Smirnov one-sided test, using MEM queries over several datasets always gives significantly better results (for example, increased similarity between <it>MCM </it>subunits) than correlation over any of the datasets individually. The advantage of MEM analysis appears to increase exponentially in relation to analyzed datasets. Importantly, the MEM query over all 145 cancer-specific datasets provides a smaller median distance between <it>MCM </it>subunits (<it>m </it>= 17.5), compared to the correlation over the concatenation of corresponding datasets (<it>m </it>= 22.5).</p>
            <fig id="F4"><title><p>Figure 4</p></title><caption><p>Increasing the number of datasets for MEM queries improves prediction of Mini Chromosome Maintenance (<it>MCM</it>) subunits</p></caption><text>
   <p>Increasing the number of datasets for MEM queries improves prediction of Mini Chromosome Maintenance (<it>MCM</it>) subunits. As additional datasets are incorporated for MEM analysis, <it>MCM </it>complex subunits show more consistent expression patterns as measured by median distance between subunits in MEM ranked lists of most correlated genes (decreasing bar height). According to one-sided Kolmogorov-Smirnov tests, MEM analysis with different numbers of datasets (left bars) significantly outperforms correlation (rightmost bar). In addition, MEM analysis for all the 145 selected datasets gives improved results compared to plain correlation across the concatenated dataset (light blue and orange lines).</p>
</text><graphic file="gb-2009-10-12-r139-4"/></fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>As the amount of publicly available microarray data grows, methods that extract useful information from multiple datasets become ever more valuable. However, without specialized tools, the task of analyzing hundreds of datasets can be very labour-intensive. With the development of the MEM resource we have solved many of the technical challenges and aim to make high-throughput coexpression mining accessible for a larger audience.</p>
         <p>MEM includes a large collection of up-to-date microarray datasets from the ArrayExpress database. We have developed a flexible strategy for coexpression analysis that puts great emphasis on selecting the most appropriate datasets for the query and uses a novel statistical algorithm to detect significant correlation patterns. Finally, MEM results are presented in an interactive graphical user interface that opens up several paths for further data analysis.</p>
         <p>Still the MEM analysis has some limitations and possibilities for further development. The main limitation of the tool is the lack of across-platform similarity search, that is due to the complexity of mappings between probesets of different platforms, and comparability of normalizations. Fortunately, the number of various platforms for each model organism is relatively low and the bulk of experiments is often available in a single platform. In a number of network reconstruction applications, one might be interested in the coexpression of units of multiple genes such as protein complexes. Therefore, providing methods that allow comparison of groups of genes would be a natural development of MEM.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Rank aggregation</p>
            </st>
            <p>Rank aggregation is the heart of MEM coexpression analysis. It uses the statistical distribution of orderings to integrate individual lists of similar genes into final lists with significance <it>P</it>-values for each gene. The rank aggregation problem has been studied mainly in the context of voting and social choice, but there are also several bioinformatics applications, for example, <abbrgrp><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp>.</p>
            <p>Most classical methods assume that each individual ranking is reasonable and should be taken into account in composing the final ordering. However, in the case of gene coexpression analysis, some rankings include considerable amounts of noise as they are derived from genes and conditions with low variation. In order to overcome this, we first identify reliable gene lists that are based on sufficient variation, and then compute the rank aggregation based on the limited set of lists.</p>
            <p>The input of rank aggregation is a collection of ordered lists, where every element in a list corresponds to a gene in a specific experiment, showing the rank of similarity to the query gene <it>g*</it>, relative to all other genes in the organism. We normalize the lists into the range [0.1], by dividing each individual rank by the maximal rank, that is, the number of genes in the microarray platform. We transform the ranks so that for each gene <it>g</it><sub><it>i</it></sub>, we have a rank vector <it>r</it>(<it>g*, g</it><sub><it>i</it></sub>) = [<inline-formula><graphic file="gb-2009-10-12-r139-i1.gif"/></inline-formula>, ..., <inline-formula><graphic file="gb-2009-10-12-r139-i2.gif"/></inline-formula>] where <inline-formula><graphic file="gb-2009-10-12-r139-i3.gif"/></inline-formula> corresponds to the position of <it>g</it><sub><it>i </it></sub>in the query on dataset <it>j</it>.</p>
            <p>A straightforward solution for rank aggregation involves reordering the genes <it>g</it><sub><it>i </it></sub>based on their arithmetic means of individual ranks <it>r</it>(<it>g*, g</it><sub><it>i</it></sub>). Unfortunately this approach is rather sensitive to noise, since the mean is heavily influenced by large ranks that indicate no strong correlation. Geometric mean is more sensitive to small ranks and robust to fluctuations among large uninformative ranks. An alternative and empirically more successful approach uses trimmed mean that only considers <it>k </it>smallest elements, but requires the estimation of the parameter <it>k</it>.</p>
            <p>We developed a statistical strategy for robust rank aggregation that overcomes the problems of mean-based methods and allows us to evaluate the statistical significance of detected similarity. As a null hypothesis, we consider a model ranking where similar genes are permuted randomly and the distribution of each rank vector <it>r</it>(<it>g*, g</it><sub><it>i</it></sub>) is approximately uniform. In the biological case of strong coexpression, we observe an unexpectedly large amount of small ranks between genes with correlated expression patterns, so that the distribution of <it>r</it>(<it>g*, g</it><sub><it>i</it></sub>) is skewed towards small values and significantly different from a uniform distribution. We can reorder the rank vector <it>r</it>(<it>g*, g</it><sub><it>i</it></sub>) increasingly to gain the vector of order statistics <inline-formula><graphic file="gb-2009-10-12-r139-i4.gif"/></inline-formula> which range from the smallest to the largest value of <it>r</it>(<it>g*</it>, <it>g</it><sub><it>i</it></sub>). Assuming the null hypothesis, we can use the binomial distribution to calculate the probability that <it>k </it>or more ranks are smaller than <inline-formula><graphic file="gb-2009-10-12-r139-i5.gif"/></inline-formula>, for every <it>k</it>:</p>
            <p>
               <display-formula id="M1">
                  <graphic file="gb-2009-10-12-r139-i6.gif"/>
               </display-formula>
            </p>
            <p>The final similarity score <it>&#961; </it>between <it>g</it>* and <it>g</it><sub><it>i </it></sub>is defined as follows:</p>
            <p>
               <display-formula id="M2">
                  <graphic file="gb-2009-10-12-r139-i7.gif"/>
               </display-formula>
            </p>
            <p>In other words, for every value of <it>k</it>, we compute the <it>P</it>-value for each rank statistic <it>r</it><sub>(<it>k</it>) </sub>being randomly as small as observed in the dataset, and as a final score we use the minimal <it>P</it>-value.</p>
            <p>The final <it>&#961; </it>score itself is not a <it>P</it>-value, since it is a minimum of <it>P</it>-values. Still, we may use a multiple testing correction to remove false positives that occur due to several independent tests. As we calculate the <it>&#961; </it>scores for each gene, we actually find a <it>P</it>-value corresponding to each rank matrix element. According to Bonferroni correction for multiple testing, an individual <it>P</it>-value is significant if it is smaller than the desired significance level after multiplication by the number of rows and columns of the rank matrix. We cannot use any less stringent criteria for correction, since <it>P</it>-values for the same gene are strongly correlated.</p>
            <p>As a byproduct of the above computation, we gain information about the datasets that contain significant coexpression between any two genes. A dataset with a ranking <inline-formula><graphic file="gb-2009-10-12-r139-i3.gif"/></inline-formula> that is smaller than the ranking that gave rise to <it>&#961;</it>(<it>g*</it>, <it>g</it><sub><it>i</it></sub>) can be considered significant. This feature allows us to highlight the contributions of different datasets into the final similarity ranking, and observe interesting patterns between related datasets. The score <it>&#961; </it>also has the advantage of being non-parametric, as it makes no requirements on the number of input datasets or the magnitude of relevant ranks. In a way our <it>&#961;</it>-score represents a natural balance between two scenarios: a gene that strongly correlates with the query gene in a small number of samples, and a gene that shows weak correlation in a large range of samples.</p>
         </sec>
         <sec>
            <st>
               <p>Microarray data</p>
            </st>
            <p>All data used in the analyses has been obtained from ArrayExpress and it also includes datasets that were originally submitted to GEO. We only included Affymetrix datasets where raw data was available, and performed a uniform Robust Multi-array Average (RMA) normalization <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> with the Bioconductor <it>affy </it>package <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> using the default parameters. MEM also includes biological annotations of the datasets as annotated according to the Minimum Information About a Microarray Experiment (MIAME) standard <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The annotations are used for building word clouds and annotation tracks in heat map visualization of gene expression data.</p>
         </sec>
         <sec>
            <st>
               <p>Standard deviation threshold selection</p>
            </st>
            <p>We performed a simulation study to find the threshold for query gene variation that would best identify the datasets where the gene has meaningful expression patterns. All the experiments in MEM are normalized and preprocessed the same way, so we may compute a uniform threshold that applies to all datasets. In the simulation, we chose random sets of 2000 genes and 140 experiments on human Affymetrix platform HG-U133A, and calculated the standard deviation for each gene in each experiment. We also performed a MEM query with each of the genes and used similarity score cutoff that yielded on average 20 genes per query. Now we tried several thresholds for the standard deviation and in each case we calculated correlation between the number of experiments exceeding the threshold and the number of genes in the result of the query. We achieved strongest coexpression patterns between the query genes and the resulting genes when using a standard deviation cutoff between 0.25 and 0.39, while the peak performance was observed at the threshold 0.29 (Additional file 4).</p>
         </sec>
         <sec>
            <st>
               <p>Dataset annotation word cloud</p>
            </st>
            <p>MEM uses word clouds to display aggregated annotations of multiple datasets. As a first step in generating the word clouds, we process textual annotations of each dataset to extract words and multi-word expressions. Out of all the words present in the dataset description we pick only nouns, adjectives and some other matching predefined patterns. Selected words are then normalized to ignore inflected forms (for example, gene, genes) using WordNet lemmatiser <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. Besides single words, we also extract noun and adjective phrases. Syntactic analysis is performed using MedPost part-of-speech tagger <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>.</p>
            <p>Next, for a given group of datasets, we figure out a set of descriptive terms (words and phrases) that are over-represented in this group, compared to all the available datasets. We use hypergeometric <it>P</it>-value to identify such group-specific terms. The word cloud is then composed out of the terms with the lowest P-value. Within the word cloud, font size depicts their extent of over-representation of the term in the corresponding group of datasets.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>ES: embryonic stem; GEO: gene expression omnibus; GO: gene ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; MCL: Markov cluster; MCM: mini chromosome maintenance; MEM: multi experiment matrix; MIAME: minimum information about a microarray experiment; ORC: origin recognition complex; preRC: pre-replicative complex; RMA: robust multi-array average.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>PA and MK implemented the resource. RK and PA developed the methods for the query. AT provided the annotation word clouds. PA, RK and JR performed the case studies. RK and JR drafted the manuscript. JV and HP conceived the study and provided general guidance. All authors read and approved the final manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Additional files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional file <supplr sid="S1">1</supplr> is a table listing datasets used for network reconstruction. The datasets were all on mouse platform Affymetrix U74Av2. In addition the analysis included an unpublished dataset that cannot be found in databases. Additional file <supplr sid="S2">2</supplr> is a table listing datasets used for MCM complex study. Additional file <supplr sid="S3">3</supplr> is a table listing the 116 genes that occur in more than two of the six cohorts of subunits MCM1-MCM6, where each cohort contains 100 probesets with most correlation relative to the corresponding subunit. Additional file <supplr sid="S4">4</supplr> is a figure describing the selection of standard deviation cutoff. The figure shows correlation between number of significant query results and the number of datasets where the query gene standard deviation exceeds certain threshold. The maximal correlation is achieved when the threshold is 0.29.</p>
         <suppl id="S1">
            <title>
               <p>Additional file 1</p>
            </title>
            <caption>
               <p>A table listing datasets used for network reconstruction</p>
            </caption>
            <text>
               <p>The datasets were all on mouse platform Affymetrix U74Av2. In addition the analysis included an unpublished dataset that cannot be found in databases.</p>
            </text>
            <file name="gb-2009-10-12-r139-S1.xls">
   <p>Click here for file</p>
</file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional file 2</p>
            </title>
            <caption>
               <p>A table listing datasets used for MCM complex study</p>
            </caption>
            <text>
               <p>A table listing datasets used for MCM complex study.</p>
            </text>
            <file name="gb-2009-10-12-r139-S2.xls">
   <p>Click here for file</p>
</file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional file 3</p>
            </title>
            <caption>
               <p>A table listing the 116 genes that occur in more than two of the six cohorts of subunits MCM1-MCM6</p>
            </caption>
            <text>
               <p>A table listing the 116 genes that occur in more than two of the six cohorts of subunits MCM1-MCM6, where each cohort contains 100 probesets with most correlation relative to the corresponding subunit.</p>
            </text>
            <file name="gb-2009-10-12-r139-S3.xls">
   <p>Click here for file</p>
</file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional file 4</p>
            </title>
            <caption>
               <p>A figure describing the selection of standard deviation cutoff</p>
            </caption>
            <text>
               <p>The figure shows correlation between number of significant query results and the number of datasets where the query gene standard deviation exceeds certain threshold. The maximal correlation is achieved when the threshold is 0.29.</p>
            </text>
            <file name="gb-2009-10-12-r139-S4.pdf">
   <p>Click here for file</p>
</file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>Authors wish to thank Tambet Arak for technical ingenuity and support, Sven Laur for proofreading, Toomas Neuman for initial biological setup and Misha Kapushesky for help in ArrayExpress data download. The financial support was provided by EU FP6 grants (ENFIN LSHG-CT-2005-518254 and COBRED LSHB-CT-2007-037730), ERDF through the Estonian Centre of Excellence in Computer Science project and Estonian Science Foundation ETF7427. JR acknowledges funding from Ustus Agur and Artur Lind foundations.</p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.</p></title><aug><au><snm>Brazma</snm><fnm>A</fnm></au><au><snm>Hingamp</snm><fnm>P</fnm></au><au><snm>Quackenbush</snm><fnm>J</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au><au><snm>Spellman</snm><fnm>P</fnm></au><au><snm>Stoeckert</snm><fnm>C</fnm></au><au><snm>Aach</snm><fnm>J</fnm></au><au><snm>Ansorge</snm><fnm>W</fnm></au><au><snm>Ball</snm><fnm>CA</fnm></au><au><snm>Causton</snm><fnm>HC</fnm></au><au><snm>Gaasterland</snm><fnm>T</fnm></au><au><snm>Glenisson</snm><fnm>P</fnm></au><au><snm>Holstege</snm><fnm>FC</fnm></au><au><snm>Kim</snm><fnm>IF</fnm></au><au><snm>Markowitz</snm><fnm>V</fnm></au><au><snm>Matese</snm><fnm>JC</fnm></au><au><snm>Parkinson</snm><fnm>H</fnm></au><au><snm>Robinson</snm><fnm>A</fnm></au><au><snm>Sarkans</snm><fnm>U</fnm></au><au><snm>Schulze-Kremer</snm><fnm>S</fnm></au><au><snm>Stewart</snm><fnm>J</fnm></au><au><snm>Taylor</snm><fnm>R</fnm></au><au><snm>Vilo</snm><fnm>J</fnm></au><au><snm>Vingron</snm><fnm>M</fnm></au></aug><source>Nat Genet</source><pubdate>2001</pubdate><volume>29</volume><fpage>365</fpage><lpage>371</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1201-365</pubid><pubid idtype="pmpid" link="fulltext">11726920</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Quantitative monitoring of gene expression patterns with a complementary DNA microarray.</p></title><aug><au><snm>Schena</snm><fnm>M</fnm></au><au><snm>Shalon</snm><fnm>D</fnm></au><au><snm>Davis</snm><fnm>RW</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au></aug><source>Science</source><pubdate>1995</pubdate><volume>270</volume><fpage>467</fpage><lpage>470</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.270.5235.467</pubid><pubid idtype="pmpid" link="fulltext">7569999</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Gene expression profiles in normal and cancer cells.</p></title><aug><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>Zhou</snm><fnm>W</fnm></au><au><snm>Velculescu</snm><fnm>VE</fnm></au><au><snm>Kern</snm><fnm>SE</fnm></au><au><snm>Hruban</snm><fnm>RH</fnm></au><au><snm>Hamilton</snm><fnm>SR</fnm></au><au><snm>Vogelstein</snm><fnm>B</fnm></au><au><snm>Kinzler</snm><fnm>KW</fnm></au></aug><source>Science</source><pubdate>1997</pubdate><volume>276</volume><fpage>1268</fpage><lpage>1272</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.276.5316.1268</pubid><pubid idtype="pmpid" link="fulltext">9157888</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer.</p></title><aug><au><snm>Welsh</snm><fnm>JB</fnm></au><au><snm>Sapinoso</snm><fnm>LM</fnm></au><au><snm>Su</snm><fnm>AI</fnm></au><au><snm>Kern</snm><fnm>SG</fnm></au><au><snm>Wang-Rodriguez</snm><fnm>J</fnm></au><au><snm>Moskaluk</snm><fnm>CA</fnm></au><au><snm>Frierson</snm><fnm>HF</fnm></au><au><snm>Hampton</snm><fnm>GM</fnm></au></aug><source>Cancer Res</source><pubdate>2001</pubdate><volume>61</volume><fpage>5974</fpage><lpage>5978</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">11507037</pubid></xrefbib></bibl><bibl id="B5"><title><p>Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data.</p></title><aug><au><snm>Segal</snm><fnm>E</fnm></au><au><snm>Shapira</snm><fnm>M</fnm></au><au><snm>Regev</snm><fnm>A</fnm></au><au><snm>Pe'er</snm><fnm>D</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Koller</snm><fnm>D</fnm></au><au><snm>Friedman</snm><fnm>N</fnm></au></aug><source>Nat Genet</source><pubdate>2003</pubdate><volume>34</volume><fpage>166</fpage><lpage>176</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1165</pubid><pubid idtype="pmpid" link="fulltext">12740579</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.</p></title><aug><au><snm>Spellman</snm><fnm>PT</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au><au><snm>Zhang</snm><fnm>MQ</fnm></au><au><snm>Iyer</snm><fnm>VR</fnm></au><au><snm>Anders</snm><fnm>K</fnm></au><au><snm>Eisen</snm><fnm>MB</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Futcher</snm><fnm>B</fnm></au></aug><source>Mol Biol Cell</source><pubdate>1998</pubdate><volume>9</volume><fpage>3273</fpage><lpage>3297</lpage><xrefbib><pubidlist><pubid idtype="pmcid">25624</pubid><pubid idtype="pmpid" link="fulltext">9843569</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Microarray data analysis: from disarray to consolidation and consensus.</p></title><aug><au><snm>Allison</snm><fnm>DB</fnm></au><au><snm>Cui</snm><fnm>X</fnm></au><au><snm>Page</snm><fnm>GP</fnm></au><au><snm>Sabripour</snm><fnm>M</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2006</pubdate><volume>7</volume><fpage>55</fpage><lpage>65</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg1749</pubid><pubid idtype="pmpid" link="fulltext">16369572</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks.</p></title><aug><au><snm>Wolfe</snm><fnm>CJ</fnm></au><au><snm>Kohane</snm><fnm>IS</fnm></au><au><snm>Butte</snm><fnm>AJ</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2005</pubdate><volume>6</volume><fpage>227</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-6-227</pubid><pubid idtype="pmcid">1239911</pubid><pubid idtype="pmpid" link="fulltext">16162296</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Functional discovery via a compendium of expression profiles.</p></title><aug><au><snm>Hughes</snm><fnm>TR</fnm></au><au><snm>Marton</snm><fnm>MJ</fnm></au><au><snm>Jones</snm><fnm>AR</fnm></au><au><snm>Roberts</snm><fnm>CJ</fnm></au><au><snm>Stoughton</snm><fnm>R</fnm></au><au><snm>Armour</snm><fnm>CD</fnm></au><au><snm>Bennett</snm><fnm>HA</fnm></au><au><snm>Coffey</snm><fnm>E</fnm></au><au><snm>Dai</snm><fnm>H</fnm></au><au><snm>He</snm><fnm>YD</fnm></au><au><snm>Kidd</snm><fnm>MJ</fnm></au><au><snm>King</snm><fnm>AM</fnm></au><au><snm>Meyer</snm><fnm>MR</fnm></au><au><snm>Slade</snm><fnm>D</fnm></au><au><snm>Lum</snm><fnm>PY</fnm></au><au><snm>Stepaniants</snm><fnm>SB</fnm></au><au><snm>Shoemaker</snm><fnm>DD</fnm></au><au><snm>Gachotte</snm><fnm>D</fnm></au><au><snm>Chakraburtty</snm><fnm>K</fnm></au><au><snm>Simon</snm><fnm>J</fnm></au><au><snm>Bard</snm><fnm>M</fnm></au><au><snm>Friend</snm><fnm>SH</fnm></au></aug><source>Cell</source><pubdate>2000</pubdate><volume>102</volume><fpage>109</fpage><lpage>126</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0092-8674(00)00015-5</pubid><pubid idtype="pmpid" link="fulltext">10929718</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>A gene-coexpression network for global discovery of conserved genetic modules.</p></title><aug><au><snm>Stuart</snm><fnm>JM</fnm></au><au><snm>Segal</snm><fnm>E</fnm></au><au><snm>Koller</snm><fnm>D</fnm></au><au><snm>Kim</snm><fnm>SK</fnm></au></aug><source>Science</source><pubdate>2003</pubdate><volume>302</volume><fpage>249</fpage><lpage>255</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1087447</pubid><pubid idtype="pmpid" link="fulltext">12934013</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Identification of novel pathway partners of p68 and p72 RNA helicases through Oncomine meta-analysis.</p></title><aug><au><snm>Wilson</snm><fnm>BJ</fnm></au><au><snm>Gigu&#232;re</snm><fnm>V</fnm></au></aug><source>BMC Genomics</source><pubdate>2007</pubdate><volume>8</volume><fpage>419</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-8-419</pubid><pubid idtype="pmpid" link="fulltext">18005418</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Reverse engineering of regulatory networks in human B cells.</p></title><aug><au><snm>Basso</snm><fnm>K</fnm></au><au><snm>Margolin</snm><fnm>AA</fnm></au><au><snm>Stolovitzky</snm><fnm>G</fnm></au><au><snm>Klein</snm><fnm>U</fnm></au><au><snm>Dalla-Favera</snm><fnm>R</fnm></au><au><snm>Califano</snm><fnm>A</fnm></au></aug><source>Nat Genet</source><pubdate>2005</pubdate><volume>37</volume><fpage>382</fpage><lpage>390</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1532</pubid><pubid idtype="pmpid" link="fulltext">15778709</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Probabilistic model of the human protein-protein interaction network.</p></title><aug><au><snm>Rhodes</snm><fnm>DR</fnm></au><au><snm>Tomlins</snm><fnm>SA</fnm></au><au><snm>Varambally</snm><fnm>S</fnm></au><au><snm>Mahavisno</snm><fnm>V</fnm></au><au><snm>Barrette</snm><fnm>T</fnm></au><au><snm>Kalyana-Sundaram</snm><fnm>S</fnm></au><au><snm>Ghosh</snm><fnm>D</fnm></au><au><snm>Pandey</snm><fnm>A</fnm></au><au><snm>Chinnaiyan</snm><fnm>AM</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2005</pubdate><volume>23</volume><fpage>951</fpage><lpage>959</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1103</pubid><pubid idtype="pmpid" link="fulltext">16082366</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Protein interaction verification and functional annotation by integrated analysis of genome-scale data.</p></title><aug><au><snm>Kemmeren</snm><fnm>P</fnm></au><au><snm>van Berkum</snm><fnm>NL</fnm></au><au><snm>Vilo</snm><fnm>J</fnm></au><au><snm>Bijma</snm><fnm>T</fnm></au><au><snm>Donders</snm><fnm>R</fnm></au><au><snm>Brazma</snm><fnm>A</fnm></au><au><snm>Holstege</snm><fnm>FCP</fnm></au></aug><source>Mol Cell</source><pubdate>2002</pubdate><volume>9</volume><fpage>1133</fpage><lpage>1143</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1097-2765(02)00531-2</pubid><pubid idtype="pmpid" link="fulltext">12049748</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Predicting tissue-specific enhancers in the human genome.</p></title><aug><au><snm>Pennacchio</snm><fnm>LA</fnm></au><au><snm>Loots</snm><fnm>GG</fnm></au><au><snm>Nobrega</snm><fnm>MA</fnm></au><au><snm>Ovcharenko</snm><fnm>I</fnm></au></aug><source>Genome Res</source><pubdate>2007</pubdate><volume>17</volume><fpage>201</fpage><lpage>211</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.5972507</pubid><pubid idtype="pmcid">1781352</pubid><pubid idtype="pmpid" link="fulltext">17210927</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Predicting gene regulatory elements in silico on a genomic scale.</p></title><aug><au><snm>Brazma</snm><fnm>A</fnm></au><au><snm>Jonassen</snm><fnm>I</fnm></au><au><snm>Vilo</snm><fnm>J</fnm></au><au><snm>Ukkonen</snm><fnm>E</fnm></au></aug><source>Genome Res</source><pubdate>1998</pubdate><volume>8</volume><fpage>1202</fpage><lpage>1215</lpage><xrefbib><pubidlist><pubid idtype="pmcid">310790</pubid><pubid idtype="pmpid" link="fulltext">9847082</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Ranking genes by their co-expression to subsets of pathway members.</p></title><aug><au><snm>Adler</snm><fnm>P</fnm></au><au><snm>Peterson</snm><fnm>H</fnm></au><au><snm>Agius</snm><fnm>P</fnm></au><au><snm>Reimand</snm><fnm>J</fnm></au><au><snm>Vilo</snm><fnm>J</fnm></au></aug><source>Ann NY Acad Sci</source><pubdate>2009</pubdate><volume>1158</volume><fpage>1</fpage><lpage>13</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1749-6632.2008.03747.x</pubid><pubid idtype="pmpid" link="fulltext">19348627</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>ArrayExpress - a public database of microarray experiments and gene expression profiles.</p></title><aug><au><snm>Parkinson</snm><fnm>H</fnm></au><au><snm>Kapushesky</snm><fnm>M</fnm></au><au><snm>Shojatalab</snm><fnm>M</fnm></au><au><snm>Abeygunawardena</snm><fnm>N</fnm></au><au><snm>Coulson</snm><fnm>R</fnm></au><au><snm>Farne</snm><fnm>A</fnm></au><au><snm>Holloway</snm><fnm>E</fnm></au><au><snm>Kolesnykov</snm><fnm>N</fnm></au><au><snm>Lilja</snm><fnm>P</fnm></au><au><snm>Lukk</snm><fnm>M</fnm></au><au><snm>Mani</snm><fnm>R</fnm></au><au><snm>Rayner</snm><fnm>T</fnm></au><au><snm>Sharma</snm><fnm>A</fnm></au><au><snm>William</snm><fnm>E</fnm></au><au><snm>Sarkans</snm><fnm>U</fnm></au><au><snm>Brazma</snm><fnm>A</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><fpage>D747</fpage><lpage>D750</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl995</pubid><pubid idtype="pmcid">1716725</pubid><pubid idtype="pmpid" link="fulltext">17132828</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>NCBI GEO: mining tens of millions of expression profiles - database and tools update.</p></title><aug><au><snm>Barrett</snm><fnm>T</fnm></au><au><snm>Troup</snm><fnm>DB</fnm></au><au><snm>Wilhite</snm><fnm>SE</fnm></au><au><snm>Ledoux</snm><fnm>P</fnm></au><au><snm>Rudnev</snm><fnm>D</fnm></au><au><snm>Evangelista</snm><fnm>C</fnm></au><au><snm>Kim</snm><fnm>IF</fnm></au><au><snm>Soboleva</snm><fnm>A</fnm></au><au><snm>Tomashevsky</snm><fnm>M</fnm></au><au><snm>Edgar</snm><fnm>R</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><fpage>D760</fpage><lpage>D765</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl887</pubid><pubid idtype="pmcid">1669752</pubid><pubid idtype="pmpid" link="fulltext">17099226</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Coexpression analysis of human genes across many microarray data sets.</p></title><aug><au><snm>Lee</snm><fnm>HK</fnm></au><au><snm>Hsu</snm><fnm>AK</fnm></au><au><snm>Sajdak</snm><fnm>J</fnm></au><au><snm>Qin</snm><fnm>J</fnm></au><au><snm>Pavlidis</snm><fnm>P</fnm></au></aug><source>Genome Res</source><pubdate>2004</pubdate><volume>14</volume><fpage>1085</fpage><lpage>1094</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.1910904</pubid><pubid idtype="pmcid">419787</pubid><pubid idtype="pmpid" link="fulltext">15173114</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>A scalable method for integration and functional analysis of multiple microarray datasets.</p></title><aug><au><snm>Huttenhower</snm><fnm>C</fnm></au><au><snm>Hibbs</snm><fnm>M</fnm></au><au><snm>Myers</snm><fnm>C</fnm></au><au><snm>Troyanskaya</snm><fnm>OG</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><fpage>2890</fpage><lpage>2897</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl492</pubid><pubid idtype="pmpid" link="fulltext">17005538</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Exploring the functional landscape of gene expression: directed search of large microarray compendia.</p></title><aug><au><snm>Hibbs</snm><fnm>MA</fnm></au><au><snm>Hess</snm><fnm>DC</fnm></au><au><snm>Myers</snm><fnm>CL</fnm></au><au><snm>Huttenhower</snm><fnm>C</fnm></au><au><snm>Li</snm><fnm>K</fnm></au><au><snm>Troyanskaya</snm><fnm>OG</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><fpage>2692</fpage><lpage>2699</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm403</pubid><pubid idtype="pmpid" link="fulltext">17724061</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>g: Profiler - a web-based toolset for functional profiling of gene lists from large-scale experiments.</p></title><aug><au><snm>Reimand</snm><fnm>J</fnm></au><au><snm>Kull</snm><fnm>M</fnm></au><au><snm>Peterson</snm><fnm>H</fnm></au><au><snm>Hansen</snm><fnm>J</fnm></au><au><snm>Vilo</snm><fnm>J</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><fpage>W193</fpage><lpage>W200</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm226</pubid><pubid idtype="pmcid">1933153</pubid><pubid idtype="pmpid" link="fulltext">17478515</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Ensembl 2009.</p></title><aug><au><snm>Hubbard</snm><fnm>TJP</fnm></au><au><snm>Aken</snm><fnm>BL</fnm></au><au><snm>Ayling</snm><fnm>S</fnm></au><au><snm>Ballester</snm><fnm>B</fnm></au><au><snm>Beal</snm><fnm>K</fnm></au><au><snm>Bragin</snm><fnm>E</fnm></au><au><snm>Brent</snm><fnm>S</fnm></au><au><snm>Chen</snm><fnm>Y</fnm></au><au><snm>Clapham</snm><fnm>P</fnm></au><au><snm>Clarke</snm><fnm>L</fnm></au><au><snm>Coates</snm><fnm>G</fnm></au><au><snm>Fairley</snm><fnm>S</fnm></au><au><snm>Fitzgerald</snm><fnm>S</fnm></au><au><snm>Fernandez-Banet</snm><fnm>J</fnm></au><au><snm>Gordon</snm><fnm>L</fnm></au><au><snm>Graf</snm><fnm>S</fnm></au><au><snm>Haider</snm><fnm>S</fnm></au><au><snm>Hammond</snm><fnm>M</fnm></au><au><snm>Holland</snm><fnm>R</fnm></au><au><snm>Howe</snm><fnm>K</fnm></au><au><snm>Jenkinson</snm><fnm>A</fnm></au><au><snm>Johnson</snm><fnm>N</fnm></au><au><snm>Kahari</snm><fnm>A</fnm></au><au><snm>Keefe</snm><fnm>D</fnm></au><au><snm>Keenan</snm><fnm>S</fnm></au><au><snm>Kinsella</snm><fnm>R</fnm></au><au><snm>Kokocinski</snm><fnm>F</fnm></au><au><snm>Kulesha</snm><fnm>E</fnm></au><au><snm>Lawson</snm><fnm>D</fnm></au><au><snm>Longden</snm><fnm>I</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><fpage>D690</fpage><lpage>D697</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn828</pubid><pubid idtype="pmcid">2686571</pubid><pubid idtype="pmpid" link="fulltext">19033362</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>A robust measure of correlation between two genes on a microarray.</p></title><aug><au><snm>Hardin</snm><fnm>J</fnm></au><au><snm>Mitani</snm><fnm>A</fnm></au><au><snm>Hicks</snm><fnm>L</fnm></au><au><snm>Vankoten</snm><fnm>B</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2007</pubdate><volume>8</volume><fpage>220</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-8-220</pubid><pubid idtype="pmcid">1929126</pubid><pubid idtype="pmpid" link="fulltext">17592643</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>MicroRNA target prediction by expression analysis of host genes.</p></title><aug><au><snm>Gennarino</snm><fnm>VA</fnm></au><au><snm>Sardiello</snm><fnm>M</fnm></au><au><snm>Avellino</snm><fnm>R</fnm></au><au><snm>Meola</snm><fnm>N</fnm></au><au><snm>Maselli</snm><fnm>V</fnm></au><au><snm>Anand</snm><fnm>S</fnm></au><au><snm>Cutillo</snm><fnm>L</fnm></au><au><snm>Ballabio</snm><fnm>A</fnm></au><au><snm>Banfi</snm><fnm>S</fnm></au></aug><source>Genome Res</source><pubdate>2009</pubdate><volume>19</volume><fpage>481</fpage><lpage>490</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.084129.108</pubid><pubid idtype="pmcid">2661810</pubid><pubid idtype="pmpid" link="fulltext">19088304</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Expression profiler.</p></title><aug><au><snm>Vilo</snm><fnm>J</fnm></au><au><snm>Kapushesky</snm><fnm>M</fnm></au><au><snm>Kemmeren</snm><fnm>P</fnm></au><au><snm>Sarkans</snm><fnm>U</fnm></au><au><snm>Brazma</snm><fnm>A</fnm></au></aug><source>The Analysis of Gene Expression Data: Methods and Software</source><publisher>New York: Springer</publisher><pubdate>2003</pubdate></bibl><bibl id="B28"><title><p>Core transcriptional regulatory circuitry in human embryonic stem cells.</p></title><aug><au><snm>Boyer</snm><fnm>LA</fnm></au><au><snm>Lee</snm><fnm>TI</fnm></au><au><snm>Cole</snm><fnm>MF</fnm></au><au><snm>Johnstone</snm><fnm>SE</fnm></au><au><snm>Levine</snm><fnm>SS</fnm></au><au><snm>Zucker</snm><fnm>JP</fnm></au><au><snm>Guenther</snm><fnm>MG</fnm></au><au><snm>Kumar</snm><fnm>RM</fnm></au><au><snm>Murray</snm><fnm>HL</fnm></au><au><snm>Jenner</snm><fnm>RG</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Melton</snm><fnm>DA</fnm></au><au><snm>Jaenisch</snm><fnm>R</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au></aug><source>Cell</source><pubdate>2005</pubdate><volume>122</volume><fpage>947</fpage><lpage>956</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2005.08.020</pubid><pubid idtype="pmpid" link="fulltext">16153702</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells.</p></title><aug><au><snm>Loh</snm><fnm>YH</fnm></au><au><snm>Wu</snm><fnm>Q</fnm></au><au><snm>Chew</snm><fnm>JL</fnm></au><au><snm>Vega</snm><fnm>VB</fnm></au><au><snm>Zhang</snm><fnm>W</fnm></au><au><snm>Chen</snm><fnm>X</fnm></au><au><snm>Bourque</snm><fnm>G</fnm></au><au><snm>George</snm><fnm>J</fnm></au><au><snm>Leong</snm><fnm>B</fnm></au><au><snm>Liu</snm><fnm>J</fnm></au><au><snm>Wong</snm><fnm>KY</fnm></au><au><snm>Sung</snm><fnm>KW</fnm></au><au><snm>Lee</snm><fnm>CWH</fnm></au><au><snm>Zhao</snm><fnm>XD</fnm></au><au><snm>Chiu</snm><fnm>KP</fnm></au><au><snm>Lipovich</snm><fnm>L</fnm></au><au><snm>Kuznetsov</snm><fnm>VA</fnm></au><au><snm>Robson</snm><fnm>P</fnm></au><au><snm>Stanton</snm><fnm>LW</fnm></au><au><snm>Wei</snm><fnm>CL</fnm></au><au><snm>Ruan</snm><fnm>Y</fnm></au><au><snm>Lim</snm><fnm>B</fnm></au><au><snm>Ng</snm><fnm>HH</fnm></au></aug><source>Nat Genet</source><pubdate>2006</pubdate><volume>38</volume><fpage>431</fpage><lpage>440</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1760</pubid><pubid idtype="pmpid" link="fulltext">16518401</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>UTF1, a novel transcriptional coactivator expressed in pluripotent embryonic stem cells and extra-embryonic cells.</p></title><aug><au><snm>Okuda</snm><fnm>A</fnm></au><au><snm>Fukushima</snm><fnm>A</fnm></au><au><snm>Nishimoto</snm><fnm>M</fnm></au><au><snm>Orimo</snm><fnm>A</fnm></au><au><snm>Yamagishi</snm><fnm>T</fnm></au><au><snm>Nabeshima</snm><fnm>Y</fnm></au><au><snm>Kuro-o</snm><fnm>M</fnm></au><au><snm>i Nabeshima</snm><fnm>Y</fnm></au><au><snm>Boon</snm><fnm>K</fnm></au><au><snm>Keaveney</snm><fnm>M</fnm></au><au><snm>Stunnenberg</snm><fnm>HG</fnm></au><au><snm>Muramatsu</snm><fnm>M</fnm></au></aug><source>EMBO J</source><pubdate>1998</pubdate><volume>17</volume><fpage>2019</fpage><lpage>2032</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/emboj/17.7.2019</pubid><pubid idtype="pmcid">1170547</pubid><pubid idtype="pmpid" link="fulltext">9524124</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Dppa2 and Dppa4 are closely linked SAP motif genes restricted to pluripotent cells and the germ line.</p></title><aug><au><snm>Maldonado-Saldivia</snm><fnm>J</fnm></au><au><snm>Bergen</snm><mnm>van den </mnm><fnm>J</fnm></au><au><snm>Krouskos</snm><fnm>M</fnm></au><au><snm>Gilchrist</snm><fnm>M</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Li</snm><fnm>R</fnm></au><au><snm>Sinclair</snm><fnm>AH</fnm></au><au><snm>Surani</snm><fnm>MA</fnm></au><au><snm>Western</snm><fnm>PS</fnm></au></aug><source>Stem Cells</source><pubdate>2007</pubdate><volume>25</volume><fpage>19</fpage><lpage>28</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1634/stemcells.2006-0269</pubid><pubid idtype="pmpid" link="fulltext">16990585</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>TGFbeta/activin/nodal signaling is necessary for the maintenance of pluripotency in human embryonic stem cells.</p></title><aug><au><snm>James</snm><fnm>D</fnm></au><au><snm>Levine</snm><fnm>AJ</fnm></au><au><snm>Besser</snm><fnm>D</fnm></au><au><snm>Hemmati-Brivanlou</snm><fnm>A</fnm></au></aug><source>Development</source><pubdate>2005</pubdate><volume>132</volume><fpage>1273</fpage><lpage>1282</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1242/dev.01706</pubid><pubid idtype="pmpid" link="fulltext">15703277</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Nodal-dependent Cripto signaling promotes cardiomyogenesis and redirects the neural fate of embryonic stem cells.</p></title><aug><au><snm>Parisi</snm><fnm>S</fnm></au><au><snm>D'Andrea</snm><fnm>D</fnm></au><au><snm>Lago</snm><fnm>CT</fnm></au><au><snm>Adamson</snm><fnm>ED</fnm></au><au><snm>Persico</snm><fnm>MG</fnm></au><au><snm>Minchiotti</snm><fnm>G</fnm></au></aug><source>J Cell Biol</source><pubdate>2003</pubdate><volume>163</volume><fpage>303</fpage><lpage>314</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1083/jcb.200303010</pubid><pubid idtype="pmcid">2173524</pubid><pubid idtype="pmpid" link="fulltext">14581455</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>The Vg1-related protein Gdf3 acts in a Nodal signaling pathway in the pre-gastrulation mouse embryo.</p></title><aug><au><snm>Chen</snm><fnm>C</fnm></au><au><snm>Ware</snm><fnm>SM</fnm></au><au><snm>Sato</snm><fnm>A</fnm></au><au><snm>Houston-Hawkins</snm><fnm>DE</fnm></au><au><snm>Habas</snm><fnm>R</fnm></au><au><snm>Matzuk</snm><fnm>MM</fnm></au><au><snm>Shen</snm><fnm>MM</fnm></au><au><snm>Brown</snm><fnm>CW</fnm></au></aug><source>Development</source><pubdate>2006</pubdate><volume>133</volume><fpage>319</fpage><lpage>329</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1242/dev.02210</pubid><pubid idtype="pmpid" link="fulltext">16368929</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data.</p></title><aug><au><snm>Sharov</snm><fnm>AA</fnm></au><au><snm>Masui</snm><fnm>S</fnm></au><au><snm>Sharova</snm><fnm>LV</fnm></au><au><snm>Piao</snm><fnm>Y</fnm></au><au><snm>Aiba</snm><fnm>K</fnm></au><au><snm>Matoba</snm><fnm>R</fnm></au><au><snm>Xin</snm><fnm>L</fnm></au><au><snm>Niwa</snm><fnm>H</fnm></au><au><snm>Ko</snm><fnm>MSH</fnm></au></aug><source>BMC Genomics</source><pubdate>2008</pubdate><volume>9</volume><fpage>269</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-9-269</pubid><pubid idtype="pmcid">2424064</pubid><pubid idtype="pmpid" link="fulltext">18522731</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Functional annotation and network reconstruction through cross-platform integration of microarray data.</p></title><aug><au><snm>Zhou</snm><fnm>XJ</fnm></au><au><snm>Kao</snm><fnm>MCJ</fnm></au><au><snm>Huang</snm><fnm>H</fnm></au><au><snm>Wong</snm><fnm>A</fnm></au><au><snm>Nunez-Iglesias</snm><fnm>J</fnm></au><au><snm>Primig</snm><fnm>M</fnm></au><au><snm>Aparicio</snm><fnm>OM</fnm></au><au><snm>Finch</snm><fnm>CE</fnm></au><au><snm>Morgan</snm><fnm>TE</fnm></au><au><snm>Wong</snm><fnm>WH</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2005</pubdate><volume>23</volume><fpage>238</fpage><lpage>243</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1058</pubid><pubid idtype="pmpid" link="fulltext">15654329</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Towards reconstruction of gene networks from expression data by supervised learning.</p></title><aug><au><snm>Soinov</snm><fnm>LA</fnm></au><au><snm>Krestyaninova</snm><fnm>MA</fnm></au><au><snm>Brazma</snm><fnm>A</fnm></au></aug><source>Genome Biol</source><pubdate>2003</pubdate><volume>4</volume><fpage>R6</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2003-4-1-r6</pubid><pubid idtype="pmcid">151290</pubid><pubid idtype="pmpid" link="fulltext">12540298</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Dynamic network reconstruction from gene expression data applied to immune response during bacterial infection.</p></title><aug><au><snm>Guthke</snm><fnm>R</fnm></au><au><snm>M&#246;ller</snm><fnm>U</fnm></au><au><snm>Hoffmann</snm><fnm>M</fnm></au><au><snm>Thies</snm><fnm>F</fnm></au><au><snm>T&#246;pfer</snm><fnm>S</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><fpage>1626</fpage><lpage>1634</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti226</pubid><pubid idtype="pmpid" link="fulltext">15613398</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>Graph clustering by flow simulation.</p></title><aug><au><snm>van Dongen</snm><fnm>S</fnm></au></aug><source>PhD thesis</source><publisher>University of Utrecht</publisher><pubdate>2000</pubdate></bibl><bibl id="B40"><title><p>GraphWeb: mining heterogeneous biological networks for gene modules with functional significance.</p></title><aug><au><snm>Reimand</snm><fnm>J</fnm></au><au><snm>Tooming</snm><fnm>L</fnm></au><au><snm>Peterson</snm><fnm>H</fnm></au><au><snm>Adler</snm><fnm>P</fnm></au><au><snm>Vilo</snm><fnm>J</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>W452</fpage><lpage>W459</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn230</pubid><pubid idtype="pmcid">2447774</pubid><pubid idtype="pmpid" link="fulltext">18460544</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>Regulation of chromosome replication.</p></title><aug><au><snm>Kelly</snm><fnm>TJ</fnm></au><au><snm>Brown</snm><fnm>GW</fnm></au></aug><source>Annu Rev Biochem</source><pubdate>2000</pubdate><volume>69</volume><fpage>829</fpage><lpage>880</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1146/annurev.biochem.69.1.829</pubid><pubid idtype="pmpid" link="fulltext">10966477</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>MCM proteins in DNA replication.</p></title><aug><au><snm>Tye</snm><fnm>BK</fnm></au></aug><source>Annu Rev Biochem</source><pubdate>1999</pubdate><volume>68</volume><fpage>649</fpage><lpage>686</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1146/annurev.biochem.68.1.649</pubid><pubid idtype="pmpid" link="fulltext">10872463</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Preventing re-replication of chromosomal DNA.</p></title><aug><au><snm>Blow</snm><fnm>JJ</fnm></au><au><snm>Dutta</snm><fnm>A</fnm></au></aug><source>Nat Rev Mol Cell Biol</source><pubdate>2005</pubdate><volume>6</volume><fpage>476</fpage><lpage>486</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrm1663</pubid><pubid idtype="pmcid">2688777</pubid><pubid idtype="pmpid" link="fulltext">15928711</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Analysis of minichromosome maintenance proteins as a novel method for detection of colorectal cancer in stool.</p></title><aug><au><snm>Davies</snm><fnm>RJ</fnm></au><au><snm>Freeman</snm><fnm>A</fnm></au><au><snm>Morris</snm><fnm>LS</fnm></au><au><snm>Bingham</snm><fnm>S</fnm></au><au><snm>Dilworth</snm><fnm>S</fnm></au><au><snm>Scott</snm><fnm>I</fnm></au><au><snm>Laskey</snm><fnm>RA</fnm></au><au><snm>Miller</snm><fnm>R</fnm></au><au><snm>Coleman</snm><fnm>N</fnm></au></aug><source>Lancet</source><pubdate>2002</pubdate><volume>359</volume><fpage>1917</fpage><lpage>1919</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0140-6736(02)08739-1</pubid><pubid idtype="pmpid" link="fulltext">12057556</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Inhibiting the expression of DNA replication-initiation proteins induces apoptosis in human cancer cells.</p></title><aug><au><snm>Feng</snm><fnm>D</fnm></au><au><snm>Tu</snm><fnm>Z</fnm></au><au><snm>Wu</snm><fnm>W</fnm></au><au><snm>Liang</snm><fnm>C</fnm></au></aug><source>Cancer Res</source><pubdate>2003</pubdate><volume>63</volume><fpage>7356</fpage><lpage>7364</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">14612534</pubid></xrefbib></bibl><bibl id="B46"><title><p>Minichromosome maintenance protein 2 is a strong independent prognostic marker in breast cancer.</p></title><aug><au><snm>Gonzalez</snm><fnm>MA</fnm></au><au><snm>Pinder</snm><fnm>SE</fnm></au><au><snm>Callagy</snm><fnm>G</fnm></au><au><snm>Vowler</snm><fnm>SL</fnm></au><au><snm>Morris</snm><fnm>LS</fnm></au><au><snm>Bird</snm><fnm>K</fnm></au><au><snm>Bell</snm><fnm>JA</fnm></au><au><snm>Laskey</snm><fnm>RA</fnm></au><au><snm>Coleman</snm><fnm>N</fnm></au></aug><source>J Clin Oncol</source><pubdate>2003</pubdate><volume>21</volume><fpage>4306</fpage><lpage>4313</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1200/JCO.2003.04.121</pubid><pubid idtype="pmpid" link="fulltext">14645419</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>Oncogenic capacity of the E2F1 gene.</p></title><aug><au><snm>Johnson</snm><fnm>DG</fnm></au><au><snm>Cress</snm><fnm>WD</fnm></au><au><snm>Jakoi</snm><fnm>L</fnm></au><au><snm>Nevins</snm><fnm>JR</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1994</pubdate><volume>91</volume><fpage>12823</fpage><lpage>12827</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.91.26.12823</pubid><pubid idtype="pmcid">45532</pubid><pubid idtype="pmpid">7809128</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>Cellular targets for activation by the E2F1 transcription factor include DNA synthesis- and G1/S-regulatory genes.</p></title><aug><au><snm>DeGregori</snm><fnm>J</fnm></au><au><snm>Kowalik</snm><fnm>T</fnm></au><au><snm>Nevins</snm><fnm>JR</fnm></au></aug><source>Mol Cell Biol</source><pubdate>1995</pubdate><volume>15</volume><fpage>4215</fpage><lpage>4224</lpage><xrefbib><pubidlist><pubid idtype="pmcid">230660</pubid><pubid idtype="pmpid" link="fulltext">7623816</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>p53 mutations in human cancers.</p></title><aug><au><snm>Hollstein</snm><fnm>M</fnm></au><au><snm>Sidransky</snm><fnm>D</fnm></au><au><snm>Vogelstein</snm><fnm>B</fnm></au><au><snm>Harris</snm><fnm>CC</fnm></au></aug><source>Science</source><pubdate>1991</pubdate><volume>253</volume><fpage>49</fpage><lpage>53</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1905840</pubid><pubid idtype="pmpid" link="fulltext">1905840</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>Altered expression profiles of microRNAs during TPA-induced differentiation of HL-60 cells.</p></title><aug><au><snm>Kasashima</snm><fnm>K</fnm></au><au><snm>Nakamura</snm><fnm>Y</fnm></au><au><snm>Kozu</snm><fnm>T</fnm></au></aug><source>Biochem Biophys Res Commun</source><pubdate>2004</pubdate><volume>322</volume><fpage>403</fpage><lpage>410</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.bbrc.2004.07.130</pubid><pubid idtype="pmpid" link="fulltext">15325244</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>Finding common genes in multiple cancer types through meta-analysis of microarray experiments: a rank aggregation approach.</p></title><aug><au><snm>Pihur</snm><fnm>V</fnm></au><au><snm>Datta</snm><fnm>S</fnm></au><au><snm>Datta</snm><fnm>S</fnm></au></aug><source>Genomics</source><pubdate>2008</pubdate><volume>92</volume><fpage>400</fpage><lpage>403</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ygeno.2008.05.003</pubid><pubid idtype="pmpid" link="fulltext">18565726</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>Integration of ranked lists via cross entropy Monte Carlo with applications to mRNA and microRNA studies.</p></title><aug><au><snm>Lin</snm><fnm>S</fnm></au><au><snm>Ding</snm><fnm>J</fnm></au></aug><source>Biometrics</source><pubdate>2009</pubdate><volume>65</volume><fpage>9</fpage><lpage>18</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1541-0420.2008.01044.x</pubid><pubid idtype="pmpid" link="fulltext">18479487</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>Summaries of Affymetrix GeneChip probe level data.</p></title><aug><au><snm>Irizarry</snm><fnm>RA</fnm></au><au><snm>Bolstad</snm><fnm>BM</fnm></au><au><snm>Collin</snm><fnm>F</fnm></au><au><snm>Cope</snm><fnm>LM</fnm></au><au><snm>Hobbs</snm><fnm>B</fnm></au><au><snm>Speed</snm><fnm>TP</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><fpage>e15</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gng015</pubid><pubid idtype="pmcid">150247</pubid><pubid idtype="pmpid" link="fulltext">12582260</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>Affy-analysis of Affymetrix GeneChip data at the probe level.</p></title><aug><au><snm>Gautier</snm><fnm>L</fnm></au><au><snm>Cope</snm><fnm>L</fnm></au><au><snm>Bolstad</snm><fnm>BM</fnm></au><au><snm>Irizarry</snm><fnm>RA</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><fpage>307</fpage><lpage>315</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btg405</pubid><pubid idtype="pmpid" link="fulltext">14960456</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>Five papers on WordNet.</p></title><aug><au><snm>Miller</snm><fnm>G</fnm></au><au><snm>Beckwith</snm><fnm>R</fnm></au><au><snm>Fellbaum</snm><fnm>C</fnm></au><au><snm>Gross</snm><fnm>D</fnm></au><au><snm>Miller</snm><fnm>K</fnm></au></aug><source>CSL Report 43</source><publisher>Cognitive Science Laboratory, Princeton University</publisher><pubdate>1990</pubdate></bibl><bibl id="B56"><title><p>MedPost: a part-of-speech tagger for bioMedical text.</p></title><aug><au><snm>Smith</snm><fnm>L</fnm></au><au><snm>Rindflesch</snm><fnm>T</fnm></au><au><snm>Wilbur</snm><fnm>WJ</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><fpage>2320</fpage><lpage>2321</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth227</pubid><pubid idtype="pmpid" link="fulltext">15073016</pubid></pubidlist></xrefbib></bibl></refgrp>
   </bm>
</art>