<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
<ui>gb-2011-12-6-r57</ui>
<ji>1465-6906</ji>
<fm>
<dochead>Software</dochead>
<bibl>
<title><p>BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation</p></title>
<aug>
<au ca="yes" id="A1"><snm>Liekens</snm><mi>ML</mi><fnm>Anthony</fnm><insr iid="I1"/><email>anthony@liekens.net</email></au>
<au id="A2"><snm>De Knijf</snm><fnm>Jeroen</fnm><insr iid="I2"/><email>jeroen.deknijf@ua.ac.be</email></au>
<au id="A3"><snm>Daelemans</snm><fnm>Walter</fnm><insr iid="I3"/><email>walter.daelemans@ua.ac.be</email></au>
<au id="A4"><snm>Goethals</snm><fnm>Bart</fnm><insr iid="I2"/><email>bart.goethals@ua.ac.be</email></au>
<au id="A5"><snm>De Rijk</snm><fnm>Peter</fnm><insr iid="I1"/><email>Peter.DeRijk@molgen.vib-ua.be</email></au>
<au id="A6"><snm>Del-Favero</snm><fnm>Jurgen</fnm><insr iid="I1"/><email>jurgen.delfavero@molgen.vib-ua.be</email></au>
</aug>
<insg>
<ins id="I1"><p>Applied Molecular Genomics group, VIB Department of Molecular Genetics, Universiteit Antwerpen, Universiteitsplein 1, 2610 Wilrijk, Belgium</p></ins>
<ins id="I2"><p>Advanced Database Research and Modelling group, Department of Mathematics and Computer Science, Universiteit Antwerpen, Groenenborgerlaan 171, 2020 Antwerpen, Belgium</p></ins>
<ins id="I3"><p>Computational Linguistics and Psycholinguistics Research Center, Universiteit Antwerpen, Prinsstraat 13, 2000, Antwerpen, Belgium</p></ins>
</insg>
<source>Genome Biology</source>
<issn>1465-6906</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>6</issue>
<fpage>R57</fpage>
<url>http://genomebiology.com/2011/12/6/R57</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2011-12-6-r57</pubid><pubid idtype="pmpid">21696594</pubid></pubidlist></xrefbib></bibl>
<history><rec><date><day>24</day><month>1</month><year>2011</year></date></rec><revrec><date><day>24</day><month>3</month><year>2011</year></date></revrec><acc><date><day>22</day><month>6</month><year>2011</year></date></acc><pub><date><day>22</day><month>6</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Liekens et al; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited</note></cpyrt>
<abs>
<sec><st><p>Abstract</p></st>
<p>We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at <url>http://www.biograph.be</url>.</p>
</sec>
</abs>
</fm>
<bdy>
<sec><st><p>Rationale</p></st>
<p>High-throughput methods for large scale and genome-wide identification of disease-related genes often result in large sets of potential targets requiring expensive and arduous experimental validation <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. For the high-throughput discovery of genes associated with disease (further referred to as 'disease genes'), it is necessary to identify functionally interesting research targets among large sets of candidates. The latter often requires a thorough understanding of possibly indirect functional relations between the research subject and its putative targets. However, one of the most common problems facing biomedical researchers today is finding or keeping up with the knowledge relevant to research interests in the shear amount of available literature and data. Especially when required information is functionally only indirectly connected to a researcher's main field of interest, the data deluge becomes unmanageable.</p>
<p>Based on the availability of large volumes of curated biomedical databases, various methods for gene prioritization have emerged in recent years <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. These computational technologies rank putative disease genes with the goal of identifying true disease genes as prominent genes in the ranking. Computational technologies are complementary to conventional 'wet lab' gene discovery technologies in that they can support the prioritization and comprehension of, for example, associated regions from genome wide association studies or linkage studies, allowing researchers to more efficiently select the most compelling variants for further study. A common prioritization approach is the identification of potential causative genes that complement sets of known genes associated with disease, utilizing genetic interaction networks, regulatory networks or high-throughput datasets <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. The statistical fusion of prioritizations from multiple, heterogeneous resources allows for ranking by incorporating diverse types of knowledge <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Alternatively, literature mining is a related research theme that employs natural language processing to extract biomedical information from the literature and to adopt this information for the discovery of new knowledge <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
<p>Prioritization platforms commonly lack an easily accessible user interface for the formulation of queries and the intelligible interpretation of the results. One common problem is that most of the data mining platforms are supervised, that is, they require prior domain knowledge from the user. For example, in disease gene prioritization techniques, it is commonly required to define a set of known disease genes on which the system can be trained for the identification of new genes. Since these training gene sets are subjective, they will consequently vary between users and outcomes are strongly dependent on them, and the robustness of the predictions becomes impaired. These platforms offer rankings of possible susceptibility genes, but often lack comprehensible support for these prioritizations. Often, rankings of research targets are offered without references to the literature, inhibiting the user from evaluating the rationale behind the predictions. Still, platforms that offer rationale and incentives for researching functional support are mostly limited to a specific domain of interactions. A common paradigm, for example, is to adopt protein or gene interaction networks for the construction of functional hypotheses, which excludes alternative functional explanations in support of the predictions. Here, we propose BioGraph, a user-friendly computational platform that strives to overcome such deficiencies by applying novel data mining techniques on integrated databases of diverse types of biomedical knowledge.</p>
<p>Summarized, BioGraph provides an online resource and data mining method for the automated inference of functional hypotheses between biomedical entities. Assessment of these hypotheses can consequently be used for the ranking of targets in the context of a research domain, such as a disease. BioGraph's resource is a knowledge base that integrates many biomedical databases into a common network of heterogeneous relations. These databases are selected based on their practices of manual curation by experts, guaranteeing that the integrated knowledge is accurate and valid. Our methodology generates a map of relations linking biomedical research subjects to potential targets, such as diseases, genes, ontology annotations, pathways, and so on, and offers literature support for these putative functional hypotheses. Assessment of these hypotheses' plausibility and specificity to source and targets allows for various applications in the identification of promising research targets. Here, we focus on the genome-wide identification of susceptibility genes for heritable disorders. The overall framework of BioGraph's methodology is schematically represented in Figure <figr fid="F1">1</figr>.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Schematic representation of the data integration and data mining methodology</p></caption><text>
   <p><b>Schematic representation of the data integration and data mining methodology</b>. <b>(a) </b>Public databases with heterogeneous biomedical relations are integrated into a common network. <b>(b) </b>Illustratively, genes (green circles), diseases (red boxes) and protein domains (blue diamonds) are related through gene-disease associations, gene-gene interactions and gene-domain annotations and integrated into a unified graph. <b>(c) </b>The <it>a priori </it>accessibility of each concept is computed by performing stochastic random walks to detect highly connected hubs in the network (area of a node scales with its rank score). <b>(d) </b>The <it>a posteriori </it>rank of each concept with respect to a source concept, in this case disease A, is computed by performing random walks with restarts in the source. <b>(e) </b>The posterior probabilities are adjusted using the prior probabilities to score the importance of each concept, specific to the source target (area of node scales with log of rank score). Genes (green circles) are ranked according to this score, gene 1 being most specific to disease A and gene 8 least specific.</p>
</text><graphic file="gb-2011-12-6-r57-1"/></fig>
</sec>
<sec><st><p>Methods and principles</p></st>
<sec><st><p>Integration of heterogeneous knowledge sources</p></st>
<p>BioGraph is based on the data integration of 21 publicly available curated databases containing biomedical relations (Table <tblr tid="T1">1</tblr>; Additional materials and methods in Additional file <supplr sid="S1">1</supplr>) between heterogeneous biomedical entities such as genes, diseases, compounds, pathways, ontology terms, protein domains, disease and gene families, and microRNAs.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Integrated databases</p></caption><tblbdy cols="6">
      <r>
         <c ca="left">
            <p>
               <b>Database</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Concept 1</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Relation</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Concept 2</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Literature references</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Number of relations</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>BioGRID <abbrgrp><abbr bid="B25">25</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>PPI</p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>29,566</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CTD <abbrgrp><abbr bid="B26">26</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Compound</p>
         </c>
         <c ca="left">
            <p>Association</p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>62,336</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Compound</p>
         </c>
         <c ca="left">
            <p>Association</p>
         </c>
         <c ca="left">
            <p>Disease</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>5,438</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>Association</p>
         </c>
         <c ca="left">
            <p>Disease</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>8,123</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>DIP <abbrgrp><abbr bid="B27">27</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>PPI</p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>1,524</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>GOA <abbrgrp><abbr bid="B28">28</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>Annotation</p>
         </c>
         <c ca="left">
            <p>Gene Ontology term</p>
         </c>
         <c ca="center">
            <p>No</p>
         </c>
         <c ca="right">
            <p>26,949</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>HPRD <abbrgrp><abbr bid="B29">29</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>PPI</p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>149,036</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>IntAct <abbrgrp><abbr bid="B30">30</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>PPI</p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>37,258</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>InterPro <abbrgrp><abbr bid="B31">31</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>Contains</p>
         </c>
         <c ca="left">
            <p>Protein domain/repeat/region</p>
         </c>
         <c ca="center">
            <p>No</p>
         </c>
         <c ca="right">
            <p>26,652</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>Is member of</p>
         </c>
         <c ca="left">
            <p>Gene family</p>
         </c>
         <c ca="center">
            <p>No</p>
         </c>
         <c ca="right">
            <p>22,988</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Gene/gene family/protein domain/repeat/region</p>
         </c>
         <c ca="left">
            <p>Annotation</p>
         </c>
         <c ca="left">
            <p>Gene Ontology term</p>
         </c>
         <c ca="center">
            <p>No</p>
         </c>
         <c ca="right">
            <p>18,446</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>KEGG <abbrgrp><abbr bid="B32">32</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>Is part of</p>
         </c>
         <c ca="left">
            <p>Pathway</p>
         </c>
         <c ca="center">
            <p>No</p>
         </c>
         <c ca="right">
            <p>14,100</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>Has metabolite</p>
         </c>
         <c ca="left">
            <p>Compound</p>
         </c>
         <c ca="center">
            <p>No</p>
         </c>
         <c ca="right">
            <p>19,073</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>MeSH <abbrgrp><abbr bid="B33">33</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Disease</p>
         </c>
         <c ca="left">
            <p>Belongs to</p>
         </c>
         <c ca="left">
            <p>Disease (family)</p>
         </c>
         <c ca="center">
            <p>No</p>
         </c>
         <c ca="right">
            <p>21,282</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>MINT <abbrgrp><abbr bid="B34">34</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>PPI</p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>11,389</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>miR2Disease <abbrgrp><abbr bid="B35">35</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>MicroRNA</p>
         </c>
         <c ca="left">
            <p>Targets</p>
         </c>
         <c ca="left">
            <p>Gene</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>2,615</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>MicroRNA</p>
         </c>
         <c ca="left">
            <p>Association</p>
         </c>
         <c ca="left">
            <p>Disease</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>344</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NetworKIN <abbrgrp><abbr bid="B36">36</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>Phosphorylates</p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="center">
            <p>No</p>
         </c>
         <c ca="right">
            <p>2,811</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>OMIM Morbid Map <abbrgrp><abbr bid="B11">11</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Gene/protein</p>
         </c>
         <c ca="left">
            <p>Association</p>
         </c>
         <c ca="left">
            <p>Disease</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>6,199</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>OMIM <abbrgrp><abbr bid="B11">11</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>Disease</p>
         </c>
         <c ca="left">
            <p>Is related to</p>
         </c>
         <c ca="left">
            <p>Disease</p>
         </c>
         <c ca="center">
            <p>Yes</p>
         </c>
         <c ca="right">
            <p>2,467</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>TarBase <abbrgrp><abbr bid="B37">37</abbr></abbrgrp></p>
         </c>
         <c ca="left">
            <p>MicroRNA</p>
         </c>
         <c ca="left">
            <p>Targets</p>
         </c>
         <c ca="left">
            <p>Gene</p>
         </c>
         <c ca="center">
            <p>No</p>
         </c>
         <c ca="right">
            <p>858</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Overview of the 21 publicly available curated databases used to create BioGraph's heterogeneous knowledge base. Specific concept types were extracted from the various databases and integrated into a central graph. Note that these represent relations selected for <it>Homo sapiens </it>only. OMIM's disease-disease relations have been added after the data freeze of March 2010. CTD, Comparative Toxicogenomics Database; DIP, Database of Interacting Proteins; GOA, Gene Ontology Annotations; HPRD, Human Protein Reference Database; KEGG, Kyoto Encyclopedia of Genes and Genomes; MeSH, Medical Subject Headings; MINT, Molecular Interactions Database; OMIM, Online Mendelian Inheritance in Man; PPI, protein-protein interaction.</p>
   </tblfn></tbl>
<suppl id="S1">
<title><p>Additional file 1</p></title>
<text><p><b>Additional materials and methods and Additional Tables</b>. Detailed methods describing technicalities of the database integration and algorithms, with the following sections. Knowledge integration: detecting hub nodes by computing <it>a priori </it>probabilities with random walks; computing <it>a posteriori </it>probabilities and ranking relations; backtracking heuristic for the automated generation of functional hypotheses; additional results. Additional Table 1: top 50 hubs or highest ranking concepts of the computation of the <it>a priori </it>rank score in the integrated network. Additional Table 2: area under the receiver operator characteristic (ROC) curve (AUC) for the prioritization of disease genes in the Endeavour benchmark. Additional Table 3: effect on the Endeavour benchmark after leaving out each separate database from the data integration process.</p></text>
<file name="gb-2011-12-6-r57-S1.DOC">
   <p>Click here for file</p>
</file>
</suppl>
<p>The integrated databases were selected based on their quality of relations with respect to curation methods and peer-reviewed references to the literature. Curated database producers employ domain experts to read and extract proven knowledge from the peer-reviewed scientific literature. Such processes of indexing, albeit time-consuming, ensure that the collected knowledge is accurate and complete, allowing for the successive establishment of new relations, for example, with BioGraph or related prioritization algorithms. We did not integrate databases constructed from high-throughput experiments with statistical or computational inferences where no manual curation of the indexed relations was performed. Such databases may include information of lower quality and consequently impair the predictive quality of consecutive data mining. We provide an assessment of each database's quality in the Results section.</p>
<p>The integrated databases in BioGraph consist of three types. (1) Curated databases (for example, Online Mendelian Inheritance in Man (OMIM) and various protein-protein interaction databases) constructed by manual extraction of published, peer-reviewed information about a specific type of information, guaranteeing the quality of the relations in these databases. (2) Curated ontology databases (for example, Gene Ontology (GO) and Medical Subject Headings) using hierarchical classifications of subjects. (3) Curated annotation databases (for example, GO Annotations (GOA) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database) that relate biomedical entities or concepts to ontology terms.</p>
<p>With regard to the integration of diverse databases with diverse identifiers for the concepts, each concept is provided with a distinct accession number, based on the Unified Medical Language System (UMLS) <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, to guarantee each concept's uniqueness. It should be noted that some of the integrated concepts (especially microRNAs and pathways) are underrepresented in UMLS. In these cases, we have extended the index of UMLS identifiers by these concepts' originating identifiers (for example, by adopting miRBase and KEGG pathway accession numbers). Relations between concepts are extracted from the knowledge resources, represented in a common format, annotated with semantic relation types (denoting the meaning of the relations, for example, 'protein interaction' or 'disease drug') and references to supporting literature, as provided by the integrated databases. All relations in the network are equally weighed independent of their support in the databases or the literature. We have experimented with weighing relations differently, dependent on the quality of the resource database, semantic type or references in the literature, but have not noticed a significant effect of such weights on test benchmarks, as discussed later. To sanitize the resulting network for the subsequent data mining algorithms, disconnected concepts from the largest connected network are removed and dangling concepts (that is, concepts connected to only one other concept) are pruned. As a result, the integrated network comprises 54,567 biomedical entities representing unique biomedical concepts and 425,353 unique relations among these entities, supported by 244,258 references to 52,866 items from the biomedical literature. The integrated network is frequently updated with updates of its dependent resources and the list of integrated databases may be appended with additional resources.</p>
</sec>
<sec><st><p>Prioritization principle</p></st>
<p>Provided with the integrated network, one can intuitively conjecture that nearby concepts in the integrated network are related. Indeed, since functionally related concepts are connected in the graph, we may assume that concepts that are close but only indirectly related in the network may also be functionally related in the real world. However, empirical analysis of the network shows that most of the concepts in the network are interconnected in only a few steps. This indicates that the network shows so-called small-world properties. Indeed, there is a considerable abundance of highly connected nodes. For example, interactions of proteins with water and ATP compounds or functional annotations such as the location of a protein in membranes or protein binding are prevalent. These unspecific hubs serve as ubiquitous connections mediating short path lengths between functionally unrelated concepts. This characteristic of the network prevents successful prioritization using simple shortest path methods. Still, our prioritization technique relies on the detection of nearby concepts in the network with respect to a source concept, but we correct the ranking of concepts for their global accessibility in the network.</p>
<p>We provide a short technical summary of the methods here, but refer the interested user to the full implementation details in the Additional materials and methods in Additional file <supplr sid="S1">1</supplr>. We utilize stochastic random walks (trajectories on the network that consist of taking successive steps from one entity to a random related entity) on the knowledge network to measure the <it>a priori </it>importance or accessibility of concepts in a graph. This technique determines the global centrality of concepts in our integrated network. For this purpose, we compute the limit distribution that yields the probability of visiting the concepts when performing an infinite random walk on the integrated network. Google's PageRank algorithm <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> adopts a similar link analysis algorithm to rank web pages by their relative importance. Network hubs (top ranked concepts with a high prior probability) are generic and unspecific target concepts in the network (Additional Table <tblr tid="T1">1</tblr> in Additional file <supplr sid="S1">1</supplr>). These hubs indicate important concepts for diverse biomedical processes, but should be avoided when trying to find relevant and non-obvious links between seemingly unrelated concepts.</p>
<p>For computing the vicinity of targets to a source concept in similarity to the prior probabilities, we compute the limit distribution of a stochastic model of random walks with restarts in the source concept (with probability 0.25 at each step). As such, we compute the <it>a posteriori </it>accessibility of each concept from the source concept, measuring the probability of visiting each target concept from the source disease, pathway, and so on. Concepts are scored by their posterior probability, divided by the square root of their respective prior probabilities and ranked with respect to this resulting score. In practice, for a gene prioritization query, a user of the web application provides a 'research subject' (for example, a disease, but also a pathway, a GO annotation or a gene may represent a research subject) and a list of 'research targets' (for example, putative genes or compounds) that need to be ranked in relation to the research subject. Our algorithm then assesses and ranks the relations between the source concept and each of the target concepts as above. Since any type of concept can be provided as the subject or target of a prioritization, our method does not require prior domain knowledge from the user, that is, there is no need to define a gene set of known disease-causing genes for the identification of related genes, which results in a more reproducible and robust user experience.</p>
</sec>
<sec><st><p>Automated generation of functional hypotheses</p></st>
<p>The method of performing random walks to determine the accessibility of target concepts implicitly generates ensembles of indirect paths between source and target concepts, which may serve as functional hypotheses for highly ranking targets. We can heuristically determine highly probable simple paths, that is, paths that do not contain cycles, of the random walk that starts in the source concept and ends in the target concept by adopting backtracking (Figure <figr fid="F2">2</figr>). The backtracking heuristic incrementally builds partial candidate paths, starting from the target to the source, while abandoning least likely paths along the way, leading to valid and specific paths that offer incentives for further functional research. A detailed description of the heuristic is available in the Additional materials and methods in Additional file <supplr sid="S1">1</supplr>.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Schematic representation of the backtracking heuristic to find most probable paths from a source concept <it>s </it>to a target <it>t</it></p></caption><text>
   <p><b>Schematic representation of the backtracking heuristic to find most probable paths from a source concept <it>s </it>to a target <it>t</it></b>. <b>(a) </b>Assume a network with source and target concepts. For clarity, the nodes are ordered by their accessibility from <it>s </it>(leftmost nodes are most accessible, rightmost nodes least accessible). <b>(b) </b>As a first step in the backtracking process, we find the neighbors of the target <it>t</it>, leading in the direction of the source, that is, the neighbors of <it>t </it>with highest accessibility with respect to <it>s</it>. <b>(c) </b>The paths from the target are repeatedly expanded to include highly accessible nodes leading toward the source concept. Pruning of least probable paths keeps the growing set of paths to a workable size (not shown). <b>(d) </b>Most probable paths that arrive in the source (continuous lines) are considered as functional hypotheses linking the target to the source concept. Unfinished paths (dashed paths) continue being expanded until <it>k </it>paths between <it>s </it>and <it>t </it>have been found.</p>
</text><graphic file="gb-2011-12-6-r57-2"/></fig>
<p>The resulting set of paths is presented to the user as a network with putative hypotheses linking the source to the target. Each directed edge represents a supporting relation among intermediate concepts, with annotated semantic meanings and literature references intelligibly supporting the relation for evaluation by the user. In cases where the target is highly ranked, specific and relevant connections and concepts are included in the constructed hypotheses. If the functional hypotheses linking concepts is limited to visiting general hub concepts, this is usually a sign that the linked source and target concepts can be considered unrelated, reflected by a bad ranking score.</p>
</sec>
</sec>
<sec><st><p>Results</p></st>
<p>In order to assess the quality of BioGraph in prioritizing interesting research targets, we study its application in the identification of genes known to be associated with disease. Test sets of proven disease-related genes were selected from the OMIM Morbid Map database and Comparative Toxicogenomics Database (CTD). OMIM Morbid Map contains several thousand diseases and disease genes with a proven underlying molecular basis, manually selected and indexed from the peer-reviewed medical literature by experts <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Similar to the curation process for OMIM Morbid Map, the CTD employs professional biocurators who read and manually curate the literature to derive proven relations among genotypes and phenotypes, ensuring that the indexed data are valid and accurate.</p>
<p>We have adopted the BioGraph framework to prioritize all human genes in the context of diseases selected from these databases and evaluate the positions of the diseases' proven susceptibility genes in this ranking. We then compute sensitivity and specificity values and observe the area under the receiver operator characteristic (ROC) curve (AUC) as the standard performance measure for analyzing the quality of prioritizations or classifications <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. A perfect ranking algorithm that manages to put the true disease genes at the top would score 100% on such a test, where random rankings score 50%. Provided with a reliable and valid AUC measure, it can be interpreted as the probability that when we randomly pick one positive and one negative example, the prioritization algorithm will assign a higher rank to the positive example than to the negative. An algorithm that scores well on this assessment is thus likely to identify disease-associated genes as high-ranking genes and vice versa.</p>
<sec><st><p>Disease-gene prioritization benchmark</p></st>
<p>As a first application, we analyzed the performance of our platform in prioritizing known disease genes among all genes in our integrated knowledge base. For testing a known disease-gene association, we first removed the link between the disease and its susceptibility gene from the knowledge base. We then ranked all genes in the network in relation to the disease and evaluated the ranking of the test gene. If the test genes ranked high, BioGraph allows retrieval of these genes as valid disease genes based on integrated information linking the disease to its susceptibility gene. For this test, we adopted published benchmarks and compared our prediction performance to that of Endeavour.</p>
<p>Endeavour is a related and mature technology for gene prioritization <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, which adopts a data fusion method to build statistical models of known disease-causing genes with respect to various data sets. Using order statistics, genes are prioritized by measuring the matching quality of test genes to these training profiles.</p>
<p>We have computed the performance of BioGraph's prioritization method for the disease-gene prioritization benchmark initially published to evaluate the performance of Endeavour. This benchmark consists of 627 genes known to cause 29 diseases, selected from the OMIM database, of which 609 disease genes are present in our integrated knowledge base <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
<p>Benchmarking a disease gene with BioGraph requires that each disease gene is evaluated by first removing the direct relations between the gene and the disease from the integrated network to ensure that the relation to be prioritized is not already in the network. Moreover, variants of the disease (for example, subtypes or syndromes that have the disease as one of its symptoms) are also disconnected from the disease gene. In order to identify these related diseases, we have selected the diseases for which at least one of its UMLS synonyms has the original disease's name as its substring. For example, we identify Charcot-Marie-Tooth disease, type 4C as a related disease to Charcot-Marie-Tooth or Alport's disease as related to Deafness, since a synonym of Alport's disease is Nephritis with nerve deafness. This method provides an objective interpretation of the benchmark by guaranteeing that no prior direct information can be exploited by our prioritization algorithm. Subsequently, for the prioritization of a disease gene, we perform a ranking of the integrated network's 16,912 known human genes with respect to the disease concept.</p>
<p>For benchmarking Endeavour, each disease-gene relation was tested by removing the gene from the disease's known gene set, by training Endeavour on the remaining disease genes and by ranking the gene among a set of 99 random test genes. For both platforms, we adopt the AUC for analyzing the quality of these prioritizations.</p>
<p>The mean AUC for BioGraph's prioritization of disease genes among all human genes is 92.92%, where the reported AUC for Endeavour in prioritizing disease genes among 99 random genes is 86.6% <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Additional Table <tblr tid="T2">2</tblr> in Additional file <supplr sid="S1">1</supplr> lists the AUC scores for the prioritization results per disease. Of the 609 disease genes in the benchmark, 181 prioritizations (29.72%) are ranked in the top 1% of the test set of all genes and 449 (73.73%) are ranked in the top 10%. In other words, in an experimental application where a causative gene is among a set of 99 random genes, BioGraph is consequently expected to rank the defecting gene as the top gene in 29.72% of the cases and in the top 10 with probability 73.73%.</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>Top inferred genes for schizophrenia</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Number</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Gene</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Prioritization hypothesis</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>SZ association studies</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>1</p>
         </c>
         <c ca="left">
            <p>
               <it>PRL</it>
            </p>
         </c>
         <c ca="left">
            <p>Affected by the antipsychotics aripiprazole and risperidone, neuroactive ligand-receptor interaction, associated with autistic disorder</p>
         </c>
         <c ca="left">
            <p>No association studies. Associated with autistic disorder <abbrgrp><abbr bid="B16">16</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>
               <it>ARID4B</it>
            </p>
         </c>
         <c ca="left">
            <p>Target of mir-20b</p>
         </c>
         <c ca="left">
            <p>No association studies</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>3</p>
         </c>
         <c ca="left">
            <p>
               <it>HTR1A</it>
            </p>
         </c>
         <c ca="left">
            <p>Related to <it>HTR2A</it></p>
         </c>
         <c ca="left">
            <p>Positive association <abbrgrp><abbr bid="B19">19</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>4</p>
         </c>
         <c ca="left">
            <p>
               <it>DRD2</it>
            </p>
         </c>
         <c ca="left">
            <p>Related to <it>DRD3</it></p>
         </c>
         <c ca="left">
            <p>Positive association <abbrgrp><abbr bid="B20">20</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>5</p>
         </c>
         <c ca="left">
            <p>
               <it>DNMT3B</it>
            </p>
         </c>
         <c ca="left">
            <p>Target of mir-29*, related to <it>COMT</it>, folic acid</p>
         </c>
         <c ca="left">
            <p>Positive association <abbrgrp><abbr bid="B21">21</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>6</p>
         </c>
         <c ca="left">
            <p>
               <it>DNMT3A</it>
            </p>
         </c>
         <c ca="left">
            <p>Target of mir-29*, related to <it>COMT</it>, folic acid</p>
         </c>
         <c ca="left">
            <p>No association studies</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>7</p>
         </c>
         <c ca="left">
            <p>
               <it>FSTL1</it>
            </p>
         </c>
         <c ca="left">
            <p>Target of mir-206</p>
         </c>
         <c ca="left">
            <p>No association studies</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>8</p>
         </c>
         <c ca="left">
            <p>
               <it>SYN3</it>
            </p>
         </c>
         <c ca="left">
            <p>Related to <it>SYN2</it></p>
         </c>
         <c ca="left">
            <p>No association found <abbrgrp><abbr bid="B23">23</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>9</p>
         </c>
         <c ca="left">
            <p>
               <it>MYLIP</it>
            </p>
         </c>
         <c ca="left">
            <p>Target of mir-20b, involved in CNS development</p>
         </c>
         <c ca="left">
            <p>No association studies</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>10</p>
         </c>
         <c ca="left">
            <p>
               <it>EFEMP2</it>
            </p>
         </c>
         <c ca="left">
            <p>Target of mir-346</p>
         </c>
         <c ca="left">
            <p>No association studies</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>11</p>
         </c>
         <c ca="left">
            <p>
               <it>UTRN</it>
            </p>
         </c>
         <c ca="left">
            <p>Interacts with <it>DISC1</it>, target of mir-206</p>
         </c>
         <c ca="left">
            <p>No association studies</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>12</p>
         </c>
         <c ca="left">
            <p>
               <it>OMG</it>
            </p>
         </c>
         <c ca="left">
            <p>Myelin sheet, interacts with <it>RTN4R</it>, axonogenesis</p>
         </c>
         <c ca="left">
            <p>Weak positive association <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Putatively associated with mental retardation <abbrgrp><abbr bid="B38">38</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>13</p>
         </c>
         <c ca="left">
            <p>
               <it>BACE1</it>
            </p>
         </c>
         <c ca="left">
            <p>Target of mir-29*, Alzheimer's disease</p>
         </c>
         <c ca="left">
            <p>No association studies. Schizophrenia-like phenotypes in <it>BACE1</it>-null mice <abbrgrp><abbr bid="B39">39</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>14</p>
         </c>
         <c ca="left">
            <p>
               <it>HIPK3</it>
            </p>
         </c>
         <c ca="left">
            <p>Target of mir-20b</p>
         </c>
         <c ca="left">
            <p>No association studies</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>15</p>
         </c>
         <c ca="left">
            <p>
               <it>TAC1</it>
            </p>
         </c>
         <c ca="left">
            <p>Target of mir-206, axonal and synaptic transmission</p>
         </c>
         <c ca="left">
            <p>No association studies. Down-regulated in psychosis <abbrgrp><abbr bid="B40">40</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>16</p>
         </c>
         <c ca="left">
            <p>
               <it>ATXN1</it>
            </p>
         </c>
         <c ca="left">
            <p>Interacts with <it>ZNF804A </it>and <it>AKT1</it></p>
         </c>
         <c ca="left">
            <p>Positive association <abbrgrp><abbr bid="B18">18</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>17</p>
         </c>
         <c ca="left">
            <p>
               <it>SYN1</it>
            </p>
         </c>
         <c ca="left">
            <p>Related to <it>SYN2</it></p>
         </c>
         <c ca="left">
            <p>No association studies. Associated with epilepsy <abbrgrp><abbr bid="B41">41</abbr></abbrgrp></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>18</p>
         </c>
         <c ca="left">
            <p>
               <it>RTN4IP1</it>
            </p>
         </c>
         <c ca="left">
            <p>Interacts with <it>RTN4R</it>, neurite growth</p>
         </c>
         <c ca="left">
            <p>No association studies</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>19</p>
         </c>
         <c ca="left">
            <p>
               <it>CDKN1A</it>
            </p>
         </c>
         <c ca="left">
            <p>Interacts with <it>AKT1</it>, target of mir-20b</p>
         </c>
         <c ca="left">
            <p>No association studies</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>20</p>
         </c>
         <c ca="left">
            <p>
               <it>LINGO1</it>
            </p>
         </c>
         <c ca="left">
            <p>Interacts with <it>RTN4R</it>, axonogenesis, CNS development</p>
         </c>
         <c ca="left">
            <p>No association studies. Associated with essential tremor and Parkinson's disease <abbrgrp><abbr bid="B42">42</abbr></abbrgrp></p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>BioGraph top inferred genes for schizophrenia that are not known as direct relations in the integrated network. Prioritizations are based on a data freeze of September 2009 to retrospectively verify predictions in more recent literature. CNS, central nervous system.</p>
   </tblfn></tbl>
<p>The benchmark indicates that our prioritization approach yields a considerable improvement over mature technologies. There are two noteworthy differences in the experimental benchmarking design. Our platform does not require a training set of known disease-causing genes since it will implicitly base prioritizations on integrated disease-gene associations in addition to other heterogeneous types of integrated knowledge of the disease. This has a major advantage for the user since no prior knowledge of the disease is required. Secondly, our platform provides a ranking of the disease gene in relation to all known genes, where Endeavour ranks disease genes among a random set of 99 non-disease genes.</p>
<p>As a quality control of the integrated databases, we have assessed the effect of each database on the benchmarking results by leaving out one database at a time and by assessing the prioritization algorithm on the Endeavour benchmark. This experiment showed that none of the included databases significantly harms the overall prediction capabilities. Conversely, it should be noted that some databases (most specifically CTD gene-disease, GOA and Medical Subject Headings) are essential for successful prioritization, since leaving out these databases has a significantly negative impact on the benchmarks. More information on these quality checks is available in the Additional materials and methods in Additional file <supplr sid="S1">1</supplr>.</p>
</sec>
<sec><st><p>Ranking recently discovered disease genes</p></st>
<p>In the above benchmark tests, well-known disease genes are expected to rank high. Indeed, important susceptibility genes usually become the subject of intensive research efforts. Consequently, a literature and database bias may exist toward indirect evidence linking a gene to a disease in the integrated databases. Since BioGraph is capable of using this indirect evidence, the literature bias of important disease genes may strengthen the predictive power of our algorithm. To remove this bias, we can more objectively evaluate the platform by ranking recently discovered disease-gene relations that are not present in the knowledge base.</p>
<p>Provided with the integrated network for which the resource datasets were frozen in March 2010, we identified all recently curated additions of human disease-gene relations from the July 2010 releases of the OMIM Morbid Map (15 new disease genes) and CTD (830 direct, non-inferred relations) that are not present as direct relations in the knowledge base from March 2010. This yielded 845 recent disease-gene relations for which the ranks in the disease's genome-wide prioritization have been determined based on the integrated network of March 2010.</p>
<p>Figure <figr fid="F3">3</figr> shows the ROC curve of the combined results, with AUC 86.14%. Of the 845 curated disease genes, 189 prioritizations (22.73%) are ranked in the top 1% of the test set consisting of all genes for its corresponding disease and 524 (62.01%) are ranked in the top 10%. The median rank of a disease gene is in the top 6.04%.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>ROC curve of prioritization performance on 845 recent disease-gene relations</p></caption><text>
   <p><b>ROC curve of prioritization performance on 845 recent disease-gene relations</b>. The performance of BioGraph prioritizations is 86.14%, confirming the relations recently added to the resource databases but not present in the integrated database. The diagonal dashed line represents a theoretical random algorithm.</p>
</text><graphic file="gb-2011-12-6-r57-3"/></fig>
</sec>
</sec>
<sec><st><p>Applications</p></st>
<p>The above benchmarks demonstrate that BioGraph is capable of retrospectively finding or confirming existing disease genes, indicating that we can adopt the method to predict putative susceptibility genes for heritable diseases. Feasible applications of the framework are the identification of functionally interesting genes from sets of candidate genes - for example, in the identification of promising genes in linked regions, copy number variation regions or for the identification of genes through genome-wide association or expression studies.</p>
<p>Additionally, the automated construction of hypotheses is of interest to explore genetic/genomic findings in peer-reviewed functional support. Collecting functional support for newly discovered disease-gene associations is not always obvious, especially when the functional evidence is indirect and spans several fields of interest. With the advent of high-throughput methodologies and torrents of published material to substantiate these findings, detecting relevant information has become a laborious process where computational techniques, such as those presented here, allow for these processes to be automated.</p>
<p>Beyond applications in genetics and genomics, the framework can similarly be adopted to prioritize or to determine functional support for biomedical relations other than disease-gene associations - for example, in linking drug compounds, annotation terms, pathways, and so on - making the framework a very versatile tool in the discovery of diverse types of biomedical knowledge. In one feasible application, BioGraph can be adopted to determine functional interactions between drug compounds and for the <it>in silico </it>exploration of drug-drug interactions or the prioritization of identifying compounds in screening pipelines. Another example application is the computational inference of clinical biomarkers related to pathways, biochemical functions or disease processes, building on the various integrated types of concepts, relations and integrated literature references to detect promising candidates.</p>
<sec><st><p>Genome-wide prioritization of genes related to schizophrenia</p></st>
<p>To illustrate possible applications of the framework, we have employed the platform to predict candidate genes for schizophrenia (SZ) and substantiate the top predictions with support adopting the automatically generated functional hypotheses.</p>
<p>SZ is a common neuropsychiatric genetic disorder with approximately 1% prevalence and with 64% heritability. It is characterized by a constellation of symptoms, including hallucinations and delusions, and symptoms such as severely inappropriate emotional responses, disordered thinking and concentration, erratic behavior, as well as social and occupational deterioration <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>.</p>
<p>The newly identified genes are indirectly inferred from the integrated knowledge, but not directly associated in our gene-disease resource databases. The predictions in this section are based on a dataset freeze of the integrated databases from September 2009. This data freeze allows us to test if the predicted genes have been observed in genetic studies since the data freeze. Table <tblr tid="T2">2</tblr> shows the top 20 inferred BioGraph genes with respect to the SZ concept, as designated by its UMLS accession ID [UMLS:C0036341], and a short summary of the hypotheses of their relatedness with SZ by our platform.</p>
<p><it>PRL</it>, the top inferred gene that is not a known disease-causing gene for SZ, encodes the prolactin hormone, of which the most commonly known function is to stimulate lactogenesis. Prolactin's relation to SZ is important, especially due to the effects of dopamine-regulating drugs aripiprazole and risperidone on the expression of prolactin and their adverse hyperprolactinemia-associated side effects <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> where the secretion of prolactin is regulated by dopamine, following the current dopamine hypothesis of SZ. We did not find published association studies of prolactin with SZ, although an association with autistic spectrum disorder was reported <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Additionally, <it>PRL </it>is located on chromosome 6p22.3, which is linked to SZ through <it>DTNBP1 </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and <it>ATXN1 </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Although no causal associations have been shown between prolactin function and SZ, BioGraph hypothesizes <it>PRL </it>as a likely candidate gene for SZ.</p>
<p>The automatically inferred hypotheses by BioGraph that support the high ranking of <it>PRL </it>for SZ are along the lines of current understandings and are schematically shown in Figure <figr fid="F4">4</figr>. The most likely indirect links between SZ and <it>PRL </it>are through the antipsychotic compounds aripiprazole and respiredone, which are both dopamine antagonists affecting the expression of prolactin. This hypothesis also shows that both compounds are adopted as drugs for attention deficit disorder, Asperger syndrome and autistic disorders. Additionally, <it>PRL </it>is associated with autistic disorder, strengthening the importance of it for psychiatric disorders. Additional paths from SZ visit SZ-associated genes and commonalities among these genes with <it>PRL</it>; <it>TAAR6 </it>and <it>PRL </it>are both genes in the neuroactive ligand-receptor interaction pathway; <it>CCL2 </it>and <it>PRL </it>are both regulated by 8-bromo cAMP, a derivative of cyclic AMP; <it>DRD3 </it>and <it>PRL </it>share the GO annotation 'Regulation of multicellular organism growth'. These relations may serve as indicators for determining the putative functional involvement of <it>PRL </it>in the etiology of SZ.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Schematic representation of the top ten automatically generated hypotheses supporting the susceptibility of <it>PRL </it>in relation to schizophrenia</p></caption><text>
   <p><b>Schematic representation of the top ten automatically generated hypotheses supporting the susceptibility of <it>PRL </it>in relation to schizophrenia</b>. Solid, dashed and dotted line styles represent the importance of the link in descending order, that is, the probability to visit the relation to reach the target gene concepts while performing random walks from the source schizophrenia concept. All links are grounded in their originating integrated curated knowledge bases, annotated with their semantic meanings and enriched by their references to the literature (not shown).</p>
</text><graphic file="gb-2011-12-6-r57-4"/></fig>
<p>Figure <figr fid="F5">5</figr> provides hypothetical evidence for the involvement of the second inferred candidate gene <it>HTR1A </it>(serotonin receptor 1A) with SZ. The main hypothesis is driven by the receptor's interaction with the antipsychotic drugs aripiprazole and chlorprothixene. <it>HTR1A </it>is additionally linked to its paralog <it>HTR2A</it>, a known susceptibility gene for schizophrenia, via GO annotations on serotonin binding activity. Although our integrated disease-gene databases (OMIM Morbid Map and CTD) have not indexed <it>HTR1A </it>as a schizophrenia susceptibility gene, variants in the gene have previously been shown to be associated with schizophrenia and other psychopathologies <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. This example shows that BioGraph is capable of identifying known disease genes, even if these gene-disease associations are not in the integrated resources.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Schematic representation of the top ten automatically generated hypotheses supporting the susceptibility of <it>HTR1A </it>in relation to schizophrenia</p></caption><text>
   <p><b>Schematic representation of the top ten automatically generated hypotheses supporting the susceptibility of <it>HTR1A </it>in relation to schizophrenia</b>.</p>
</text><graphic file="gb-2011-12-6-r57-5"/></fig>
<p>Significant associations with SZ and polymorphisms in 4 of the top 20 ranked genes, namely <it>HTR1A </it><abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, <it>DRD2 </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, <it>DNMT3B </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and <it>ATXN1 </it><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, have previously been shown. These known disease-gene relations are not indexed by our integrated databases, but were successfully prioritized by our data mining platform. Most notably, significant association of polymorphisms in <it>DNMT3B </it>with SZ was only reported in October 2009, where the data for our predictions are based on integrated databases from September 2009, demonstrating the usefulness of the currently proposed prioritization technique. Additionally, the highly ranking <it>OMG </it>gene has been shown to be associated with SZ, warranting replication studies for confirmation <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. <it>SYN3 </it>is the only gene in the top 20 for which several association studies have been performed, but where the findings show no support for <it>SYN3 </it>as a SZ susceptibility gene <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
<p>For the remaining 15 of the top 20 genes, we did not find published association studies to support or contradict possible roles of these genes in SZ, although for some genes associations with SZ-like symptoms or related psychiatric and various neurological disorders have been shown, supporting the putative role of these genes in SZ (Table <tblr tid="T2">2</tblr>).</p>
</sec>
</sec>
<sec><st><p>Discussion</p></st>
<p>We have constructed BioGraph, an integrated network of curated relations from heterogeneous knowledge sources, such as disease-gene-compound associations, protein-protein interactions, GO and pathway annotations, microRNA targets, protein domains, and so on. In order to guarantee the accurateness of the integrated knowledge, the integrated databases were selected based on their curation processes for the indexing of knowledge from the peer-reviewed scientific literature. We show that the automated generation of functional hypotheses in this integrated network of biomedical knowledge allows the successful prioritization and identification of research targets in the context of a research subject. More specifically, we can successfully identify proven disease genes for hereditary diseases as highly ranking genes among all human genes in the context of their disease and vice versa. We have shown that ensembles of highly probable walks through this network can be adopted to successfully rank putative relations among non-obvious and indirectly associated concepts, with a focus on adopting these automatically generated hypotheses for the prioritization of possible susceptibility genes of diseases. The prioritization and automated hypothesis generation platform is available as a web service <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
<p>BioGraph offers a range of significant improvements over leading prioritization platforms for <it>in silico </it>identification of disease-related genes. Most notably, and in contrast with other methods, our approach is unsupervised and does not require prior domain knowledge from the user. This removes possible user biases and problems with prediction robustness in common supervised machine learning approaches that require, for example, training sets of known disease causing genes to define the subject of an analysis. Furthermore, highly ranked targets are grounded in comprehensible functional hypotheses, consisting of refereed relation paths in support of the prioritization. Since our method is based on the integration of heterogeneous knowledge sources, the generated hypotheses offer richer semantics about inferred biomedical relations compared to related data mining efforts in, for example, gene and protein interaction networks.</p>
<p>Tests on published benchmarks (AUC 92.92%) show that our prioritization method outperforms leading technologies and notable differences in the rankings are supported by comprehensible hypotheses that confidently support the prioritization. In experimental cases where an accountable gene needs to be identified in a set of 100 genes, BioGraph prioritizes the gene among the top 10 genes in 73.73% of the cases. We showed that BioGraph is able to retrospectively confirm recent disease-gene associations to the integrated databases (AUC 86.14%). Additionally, relations that have been confirmed in recent publications were successfully predicted. For example, BioGraph ranked <it>DNMT3B </it>as a top ranking SZ susceptibility gene using integrated data frozen in September 2009 while this association was published in October 2009. Additionally, of the top 20 prioritized inferred genes for schizophrenia, 4 disease genes were not indexed by the integrated resources but are confirmed as true associations by the literature.</p>
<p>Finally, we would like to note that, although the focus of the applications of BioGraph in this paper is in the ranking of disease-gene relations, the presented methodology is generic and applicable in various biological research settings requiring the construction of intelligent and intelligible hypotheses among interrogated concepts. One may use the platform, for example, to identify diseases related to a pathway of interest, or to enrich <it>a priori </it>defined gene sets to determine related ontology terms, compounds or protein domains.</p>
</sec>
<sec><st><p>Abbreviations</p></st>
<p>AUC: area under the ROC curve; CTD: Comparative Toxicogenomics Database; GO: Gene Ontology; GOA: Gene Ontology Annotations; KEGG: Kyoto Encyclopedia of Genes and Genomes; OMIM: Online Mendelian Inheritance in Man; ROC: receiver operator characteristic; SZ: schizophrenia; UMLS: unified medical language system.</p>
</sec>
<sec><st><p>Competing interests</p></st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec><st><p>Authors' contributions</p></st>
<p>WD, PDR, JDF and BG conceived the project. AL, JDK and PDR created the BioGraph resource, data miner and hypothesis generator, designed and carried out the performance tests and built the web service. All authors have read and approved the manuscript for publication.</p>
</sec>
</bdy>
<bm>
<ack>
<sec><st><p>Acknowledgements</p></st>
<p>This work was supported by the GOA project 'BioGraph: Text mining on heterogeneous databases: An application to optimized discovery of disease relevant genetic variants' of the University of Antwerp, Belgium. We wish to thank Leonardo de Almeida Souza and Mojca Stra&#382;i&#353;ar for providing fruitful discussion of the manuscript draft.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Searching for genetic determinants in the new millennium.</p></title><aug><au><snm>Risch</snm><fnm>NJ</fnm></au></aug><source>Nature</source><pubdate>2000</pubdate><volume>405</volume><fpage>847</fpage><lpage>856</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/35015718</pubid><pubid idtype="pmpid" link="fulltext">10866211</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Role of <it>in silico </it>tools in gene discovery.</p></title><aug><au><snm>Yu</snm><fnm>B</fnm></au></aug><source>Mol Biotechnol</source><pubdate>2009</pubdate><volume>41</volume><fpage>296</fpage><lpage>306</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s12033-008-9134-8</pubid><pubid idtype="pmpid" link="fulltext">19101827</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Associating genes and protein complexes with disease via network propagation.</p></title><aug><au><snm>Vanunu</snm><fnm>O</fnm></au><au><snm>Magger</snm><fnm>O</fnm></au><au><snm>Ruppin</snm><fnm>E</fnm></au><au><snm>Shlomi</snm><fnm>T</fnm></au><au><snm>Sharan</snm><fnm>R</fnm></au></aug><source>PLoS Comput Biol</source><pubdate>2010</pubdate><volume>6</volume><fpage>e1000641</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.1000641</pubid><pubid idtype="pmcid">2797085</pubid><pubid idtype="pmpid" link="fulltext">20090828</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Disease gene characterization through large-scale co-expression analysis.</p></title><aug><au><snm>Day</snm><fnm>A</fnm></au><au><snm>Dong</snm><fnm>J</fnm></au><au><snm>Funari</snm><fnm>VA</fnm></au><au><snm>Harry</snm><fnm>B</fnm></au><au><snm>Strom</snm><fnm>SP</fnm></au><au><snm>Cohn</snm><fnm>DH</fnm></au><au><snm>Nelson</snm><fnm>SF</fnm></au></aug><source>PLoS ONE</source><pubdate>2009</pubdate><volume>4</volume><fpage>e8491</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0008491</pubid><pubid idtype="pmcid">2797297</pubid><pubid idtype="pmpid" link="fulltext">20046828</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>A similarity-based method for genome-wide prediction of disease-relevant human genes.</p></title><aug><au><snm>Freudenberg</snm><fnm>J</fnm></au><au><snm>Propping</snm><fnm>P</fnm></au></aug><source>Bioinformatics</source><pubdate>2002</pubdate><volume>18</volume><issue>Suppl 2</issue><fpage>S110</fpage><lpage>115</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/18.suppl_2.S110</pubid><pubid idtype="pmpid" link="fulltext">12385992</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Gene prioritization through genomic data fusion.</p></title><aug><au><snm>Aerts</snm><fnm>S</fnm></au><au><snm>Lambrechts</snm><fnm>D</fnm></au><au><snm>Maity</snm><fnm>S</fnm></au><au><snm>Van Loo</snm><fnm>P</fnm></au><au><snm>Coessens</snm><fnm>B</fnm></au><au><snm>De Smet</snm><fnm>F</fnm></au><au><snm>Tranchevent</snm><fnm>L</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au><au><snm>Marynen</snm><fnm>P</fnm></au><au><snm>Hassan</snm><fnm>B</fnm></au><au><snm>Carmeliet</snm><fnm>P</fnm></au><au><snm>Moreau</snm><fnm>Y</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2006</pubdate><volume>24</volume><fpage>537</fpage><lpage>544</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1203</pubid><pubid idtype="pmpid" link="fulltext">16680138</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Integration of multiple data sources to prioritize candidate genes using discounted rating system.</p></title><aug><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Patra</snm><fnm>JC</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2010</pubdate><volume>11</volume><issue>Suppl 1</issue><fpage>S20</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-11-S1-S20</pubid><pubid idtype="pmcid">3026368</pubid><pubid idtype="pmpid" link="fulltext">20946604</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Literature mining for the biologist: from information retrieval to biological discovery.</p></title><aug><au><snm>Jensen</snm><fnm>LJ</fnm></au><au><snm>Saric</snm><fnm>J</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2006</pubdate><volume>7</volume><fpage>119</fpage><lpage>129</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg1768</pubid><pubid idtype="pmpid" link="fulltext">16418747</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>The Unified Medical Language System (UMLS): integrating biomedical terminology.</p></title><aug><au><snm>Bodenreider</snm><fnm>O</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2004</pubdate><volume>32</volume><fpage>D267</fpage><lpage>270</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkh061</pubid><pubid idtype="pmcid">308795</pubid><pubid idtype="pmpid" link="fulltext">14681409</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>The anatomy of a large-scale hypertextual Web search engine.</p></title><aug><au><snm>Brin</snm><fnm>S</fnm></au><au><snm>Page</snm><fnm>L</fnm></au></aug><source>Comput Netw ISDN Syst</source><pubdate>1998</pubdate><volume>30</volume><fpage>107</fpage><lpage>117</lpage><xrefbib><pubid idtype="doi">10.1016/S0169-7552(98)00110-X</pubid></xrefbib></bibl><bibl id="B11"><title><p>McKusick's Online Mendelian Inheritance in Man (OMIM).</p></title><aug><au><snm>Amberger</snm><fnm>J</fnm></au><au><snm>Bocchini</snm><fnm>CA</fnm></au><au><snm>Scott</snm><fnm>AF</fnm></au><au><snm>Hamosh</snm><fnm>A</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><fpage>D793</fpage><lpage>796</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn665</pubid><pubid idtype="pmcid">2686440</pubid><pubid idtype="pmpid" link="fulltext">18842627</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models.</p></title><aug><au><snm>Zou</snm><fnm>KH</fnm></au><au><snm>O&apos;Malley</snm><fnm>AJ</fnm></au><au><snm>Mauri</snm><fnm>L</fnm></au></aug><source>Circulation</source><pubdate>2007</pubdate><volume>115</volume><fpage>654</fpage><lpage>657</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1161/CIRCULATIONAHA.105.594929</pubid><pubid idtype="pmpid" link="fulltext">17283280</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study.</p></title><aug><au><snm>Lichtenstein</snm><fnm>P</fnm></au><au><snm>Yip</snm><fnm>BH</fnm></au><au><snm>Bj&#246;rk</snm><fnm>C</fnm></au><au><snm>Pawitan</snm><fnm>Y</fnm></au><au><snm>Cannon</snm><fnm>TD</fnm></au><au><snm>Sullivan</snm><fnm>PF</fnm></au><au><snm>Hultman</snm><fnm>CM</fnm></au></aug><source>Lancet</source><pubdate>2009</pubdate><volume>373</volume><fpage>234</fpage><lpage>239</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0140-6736(09)60072-6</pubid><pubid idtype="pmpid" link="fulltext">19150704</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Rare structural variants in schizophrenia: one disorder, multiple mutations; one mutation, multiple disorders.</p></title><aug><au><snm>Sebat</snm><fnm>J</fnm></au><au><snm>Levy</snm><fnm>DL</fnm></au><au><snm>McCarthy</snm><fnm>SE</fnm></au></aug><source>Trends Genet</source><pubdate>2009</pubdate><volume>25</volume><fpage>528</fpage><lpage>535</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.tig.2009.10.004</pubid><pubid idtype="pmpid" link="fulltext">19883952</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Prolactin awareness: an essential consideration for physical health in schizophrenia.</p></title><aug><au><snm>Montejo</snm><fnm>AL</fnm></au></aug><source>Eur Neuropsychopharmacol</source><pubdate>2008</pubdate><volume>18</volume><issue>Suppl 2</issue><fpage>S108</fpage><lpage>114</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">18346598</pubid></xrefbib></bibl><bibl id="B16"><title><p>Genes controlling affiliative behavior as candidate genes for autism.</p></title><aug><au><snm>Yrigollen</snm><fnm>CM</fnm></au><au><snm>Han</snm><fnm>SS</fnm></au><au><snm>Kochetkova</snm><fnm>A</fnm></au><au><snm>Babitz</snm><fnm>T</fnm></au><au><snm>Chang</snm><fnm>JT</fnm></au><au><snm>Volkmar</snm><fnm>FR</fnm></au><au><snm>Leckman</snm><fnm>JF</fnm></au><au><snm>Grigorenko</snm><fnm>EL</fnm></au></aug><source>Biol Psychiatry</source><pubdate>2008</pubdate><volume>63</volume><fpage>911</fpage><lpage>916</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.biopsych.2007.11.015</pubid><pubid idtype="pmcid">2386897</pubid><pubid idtype="pmpid" link="fulltext">18207134</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Genetic variation in the 6p22.3 gene DTNBP1, the human ortholog of the mouse dysbindin gene, is associated with schizophrenia.</p></title><aug><au><snm>Straub</snm><fnm>RE</fnm></au><au><snm>Jiang</snm><fnm>Y</fnm></au><au><snm>MacLean</snm><fnm>CJ</fnm></au><au><snm>Ma</snm><fnm>Y</fnm></au><au><snm>Webb</snm><fnm>BT</fnm></au><au><snm>Myakishev</snm><fnm>MV</fnm></au><au><snm>Harris-Kerr</snm><fnm>C</fnm></au><au><snm>Wormley</snm><fnm>B</fnm></au><au><snm>Sadek</snm><fnm>H</fnm></au><au><snm>Kadambi</snm><fnm>B</fnm></au><au><snm>Cesare</snm><fnm>AJ</fnm></au><au><snm>Gibberman</snm><fnm>A</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>O&apos;Neill</snm><fnm>FA</fnm></au><au><snm>Walsh</snm><fnm>D</fnm></au><au><snm>Kendler</snm><fnm>KS</fnm></au></aug><source>Am J Hum Genet</source><pubdate>2002</pubdate><volume>71</volume><fpage>337</fpage><lpage>348</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1086/341750</pubid><pubid idtype="pmcid">379166</pubid><pubid idtype="pmpid" link="fulltext">12098102</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Bipolar I disorder and schizophrenia: a 440-single-nucleotide polymorphism screen of 64 candidate genes among Ashkenazi Jewish case-parent trios.</p></title><aug><au><snm>Fallin</snm><fnm>MD</fnm></au><au><snm>Lasseter</snm><fnm>VK</fnm></au><au><snm>Avramopoulos</snm><fnm>D</fnm></au><au><snm>Nicodemus</snm><fnm>KK</fnm></au><au><snm>Wolyniec</snm><fnm>PS</fnm></au><au><snm>McGrath</snm><fnm>JA</fnm></au><au><snm>Steel</snm><fnm>G</fnm></au><au><snm>Nestadt</snm><fnm>G</fnm></au><au><snm>Liang</snm><fnm>K</fnm></au><au><snm>Huganir</snm><fnm>RL</fnm></au><au><snm>Valle</snm><fnm>D</fnm></au><au><snm>Pulver</snm><fnm>AE</fnm></au></aug><source>Am J Hum Genet</source><pubdate>2005</pubdate><volume>77</volume><fpage>918</fpage><lpage>936</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1086/497703</pubid><pubid idtype="pmcid">1285177</pubid><pubid idtype="pmpid" link="fulltext">16380905</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Human 5-HT1A receptor C(-1019)G polymorphism and psychopathology.</p></title><aug><au><snm>Huang</snm><fnm>Y</fnm></au><au><snm>Battistuzzi</snm><fnm>C</fnm></au><au><snm>Oquendo</snm><fnm>MA</fnm></au><au><snm>Harkavy-Friedman</snm><fnm>J</fnm></au><au><snm>Greenhill</snm><fnm>L</fnm></au><au><snm>Zalsman</snm><fnm>G</fnm></au><au><snm>Brodsky</snm><fnm>B</fnm></au><au><snm>Arango</snm><fnm>V</fnm></au><au><snm>Brent</snm><fnm>DA</fnm></au><au><snm>Mann</snm><fnm>JJ</fnm></au></aug><source>Int J Neuropsychopharmacol</source><pubdate>2004</pubdate><volume>7</volume><fpage>441</fpage><lpage>451</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1017/S1461145704004663</pubid><pubid idtype="pmpid" link="fulltext">15469667</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>The genetics of schizophrenia.</p></title><aug><au><snm>Bertolino</snm><fnm>A</fnm></au><au><snm>Blasi</snm><fnm>G</fnm></au></aug><source>Neuroscience</source><pubdate>2009</pubdate><volume>164</volume><fpage>288</fpage><lpage>299</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.neuroscience.2009.04.038</pubid><pubid idtype="pmpid" link="fulltext">19393294</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>DNA methyltransferase 3B gene increases risk of early onset schizophrenia.</p></title><aug><au><snm>Zhang</snm><fnm>C</fnm></au><au><snm>Fang</snm><fnm>Y</fnm></au><au><snm>Xie</snm><fnm>B</fnm></au><au><snm>Cheng</snm><fnm>W</fnm></au><au><snm>Du</snm><fnm>Y</fnm></au><au><snm>Wang</snm><fnm>D</fnm></au><au><snm>Yu</snm><fnm>S</fnm></au></aug><source>Neurosci Lett</source><pubdate>2009</pubdate><volume>462</volume><fpage>308</fpage><lpage>311</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.neulet.2009.06.085</pubid><pubid idtype="pmpid" link="fulltext">19576953</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Genetic study of the myelin oligodendrocyte glycoprotein (MOG) gene in schizophrenia.</p></title><aug><au><snm>Zai</snm><fnm>G</fnm></au><au><snm>King</snm><fnm>N</fnm></au><au><snm>Wigg</snm><fnm>K</fnm></au><au><snm>Couto</snm><fnm>J</fnm></au><au><snm>Wong</snm><fnm>GWH</fnm></au><au><snm>Honer</snm><fnm>WG</fnm></au><au><snm>Barr</snm><fnm>CL</fnm></au><au><snm>Kennedy</snm><fnm>JL</fnm></au></aug><source>Genes Brain Behav</source><pubdate>2005</pubdate><volume>4</volume><fpage>2</fpage><lpage>9</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">15660663</pubid></xrefbib></bibl><bibl id="B23"><title><p>Association and expression study of synapsin III and schizophrenia.</p></title><aug><au><snm>Chen</snm><fnm>Q</fnm></au><au><snm>Che</snm><fnm>R</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>O&apos;Neill</snm><fnm>FA</fnm></au><au><snm>Walsh</snm><fnm>D</fnm></au><au><snm>Tang</snm><fnm>W</fnm></au><au><snm>Shi</snm><fnm>Y</fnm></au><au><snm>He</snm><fnm>L</fnm></au><au><snm>Kendler</snm><fnm>KS</fnm></au><au><snm>Chen</snm><fnm>X</fnm></au></aug><source>Neurosci Lett</source><pubdate>2009</pubdate><volume>465</volume><fpage>248</fpage><lpage>251</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.neulet.2009.09.032</pubid><pubid idtype="pmcid">2777515</pubid><pubid idtype="pmpid" link="fulltext">19766700</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Biomedical Knowledge Discovery Server</p></title><url>http://www.biograph.be</url></bibl><bibl id="B25"><title><p>BioGRID: a general repository for interaction datasets.</p></title><aug><au><snm>Stark</snm><fnm>C</fnm></au><au><snm>Breitkreutz</snm><fnm>BJ</fnm></au><au><snm>Reguly</snm><fnm>T</fnm></au><au><snm>Boucher</snm><fnm>L</fnm></au><au><snm>Breitkreutz</snm><fnm>A</fnm></au><au><snm>Tyers</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2006</pubdate><volume>34</volume><fpage>D535</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkj109</pubid><pubid idtype="pmcid">1347471</pubid><pubid idtype="pmpid" link="fulltext">16381927</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>The comparative toxicogenomics database: a cross-species resource for building chemical-gene interaction networks.</p></title><aug><au><snm>Mattingly</snm><fnm>CJ</fnm></au><au><snm>Rosenstein</snm><fnm>MC</fnm></au><au><snm>Davis</snm><fnm>AP</fnm></au><au><snm>Colby</snm><fnm>GT</fnm></au><au><snm>Forrest</snm><fnm>JN</fnm></au><au><snm>Boyer</snm><fnm>JL</fnm></au></aug><source>Toxicol Sci</source><pubdate>2006</pubdate><volume>92</volume><fpage>587</fpage><lpage>595</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/toxsci/kfl008</pubid><pubid idtype="pmcid">1586111</pubid><pubid idtype="pmpid" link="fulltext">16675512</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>The Database of Interacting Proteins: 2004 update.</p></title><aug><au><snm>Salwinski</snm><fnm>L</fnm></au><au><snm>Miller</snm><fnm>CS</fnm></au><au><snm>Smith</snm><fnm>AJ</fnm></au><au><snm>Pettit</snm><fnm>FK</fnm></au><au><snm>Bowie</snm><fnm>JU</fnm></au><au><snm>Eisenberg</snm><fnm>D</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2004</pubdate><volume>32</volume><fpage>D449</fpage><lpage>451</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkh086</pubid><pubid idtype="pmcid">308820</pubid><pubid idtype="pmpid" link="fulltext">14681454</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>The GOA database in 2009--an integrated Gene Ontology Annotation resource.</p></title><aug><au><snm>Barrell</snm><fnm>D</fnm></au><au><snm>Dimmer</snm><fnm>E</fnm></au><au><snm>Huntley</snm><fnm>RP</fnm></au><au><snm>Binns</snm><fnm>D</fnm></au><au><snm>O&apos;Donovan</snm><fnm>C</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><fpage>D396</fpage><lpage>403</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn803</pubid><pubid idtype="pmcid">2686469</pubid><pubid idtype="pmpid" link="fulltext">18957448</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Human Protein Reference Database and Human Proteinpedia as discovery tools for systems biology.</p></title><aug><au><snm>Prasad</snm><fnm>TSK</fnm></au><au><snm>Kandasamy</snm><fnm>K</fnm></au><au><snm>Pandey</snm><fnm>A</fnm></au></aug><source>Methods Mol Biol</source><pubdate>2009</pubdate><volume>577</volume><fpage>67</fpage><lpage>79</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/978-1-60761-232-2_6</pubid><pubid idtype="pmpid" link="fulltext">19718509</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>IntAct--open source resource for molecular interaction data.</p></title><aug><au><snm>Kerrien</snm><fnm>S</fnm></au><au><snm>Alam-Faruque</snm><fnm>Y</fnm></au><au><snm>Aranda</snm><fnm>B</fnm></au><au><snm>Bancarz</snm><fnm>I</fnm></au><au><snm>Bridge</snm><fnm>A</fnm></au><au><snm>Derow</snm><fnm>C</fnm></au><au><snm>Dimmer</snm><fnm>E</fnm></au><au><snm>Feuermann</snm><fnm>M</fnm></au><au><snm>Friedrichsen</snm><fnm>A</fnm></au><au><snm>Huntley</snm><fnm>R</fnm></au><au><snm>Kohler</snm><fnm>C</fnm></au><au><snm>Khadake</snm><fnm>J</fnm></au><au><snm>Leroy</snm><fnm>C</fnm></au><au><snm>Liban</snm><fnm>A</fnm></au><au><snm>Lieftink</snm><fnm>C</fnm></au><au><snm>Montecchi-Palazzi</snm><fnm>L</fnm></au><au><snm>Orchard</snm><fnm>S</fnm></au><au><snm>Risse</snm><fnm>J</fnm></au><au><snm>Robbe</snm><fnm>K</fnm></au><au><snm>Roechert</snm><fnm>B</fnm></au><au><snm>Thorneycroft</snm><fnm>D</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au><au><snm>Hermjakob</snm><fnm>H</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><fpage>D561</fpage><lpage>565</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl958</pubid><pubid idtype="pmcid">1751531</pubid><pubid idtype="pmpid" link="fulltext">17145710</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>InterPro: the integrative protein signature database.</p></title><aug><au><snm>Hunter</snm><fnm>S</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au><au><snm>Attwood</snm><fnm>TK</fnm></au><au><snm>Bairoch</snm><fnm>A</fnm></au><au><snm>Bateman</snm><fnm>A</fnm></au><au><snm>Binns</snm><fnm>D</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au><au><snm>Das</snm><fnm>U</fnm></au><au><snm>Daugherty</snm><fnm>L</fnm></au><au><snm>Duquenne</snm><fnm>L</fnm></au><au><snm>Finn</snm><fnm>RD</fnm></au><au><snm>Gough</snm><fnm>J</fnm></au><au><snm>Haft</snm><fnm>D</fnm></au><au><snm>Hulo</snm><fnm>N</fnm></au><au><snm>Kahn</snm><fnm>D</fnm></au><au><snm>Kelly</snm><fnm>E</fnm></au><au><snm>Laugraud</snm><fnm>A</fnm></au><au><snm>Letunic</snm><fnm>I</fnm></au><au><snm>Lonsdale</snm><fnm>D</fnm></au><au><snm>Lopez</snm><fnm>R</fnm></au><au><snm>Madera</snm><fnm>M</fnm></au><au><snm>Maslen</snm><fnm>J</fnm></au><au><snm>McAnulla</snm><fnm>C</fnm></au><au><snm>McDowall</snm><fnm>J</fnm></au><au><snm>Mistry</snm><fnm>J</fnm></au><au><snm>Mitchell</snm><fnm>A</fnm></au><au><snm>Mulder</snm><fnm>N</fnm></au><au><snm>Natale</snm><fnm>D</fnm></au><au><snm>Orengo</snm><fnm>C</fnm></au><au><snm>Quinn</snm><fnm>AF</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><fpage>D211</fpage><lpage>215</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn785</pubid><pubid idtype="pmcid">2686546</pubid><pubid idtype="pmpid" link="fulltext">18940856</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>KEGG for linking genomes to life and the environment.</p></title><aug><au><snm>Kanehisa</snm><fnm>M</fnm></au><au><snm>Araki</snm><fnm>M</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au><au><snm>Hattori</snm><fnm>M</fnm></au><au><snm>Hirakawa</snm><fnm>M</fnm></au><au><snm>Itoh</snm><fnm>M</fnm></au><au><snm>Katayama</snm><fnm>T</fnm></au><au><snm>Kawashima</snm><fnm>S</fnm></au><au><snm>Okuda</snm><fnm>S</fnm></au><au><snm>Tokimatsu</snm><fnm>T</fnm></au><au><snm>Yamanishi</snm><fnm>Y</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>D480</fpage><lpage>484</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2238879</pubid><pubid idtype="pmpid" link="fulltext">18077471</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Medical Subject Headings (MeSH).</p></title><aug><au><snm>Lipscomb</snm><fnm>CE</fnm></au></aug><source>Bull Med Libr Assoc</source><pubdate>2000</pubdate><volume>88</volume><fpage>265</fpage><lpage>266</lpage><xrefbib><pubidlist><pubid idtype="pmcid">35238</pubid><pubid idtype="pmpid">10928714</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>MINT: the Molecular INTeraction database.</p></title><aug><au><snm>Chatr-aryamontri</snm><fnm>A</fnm></au><au><snm>Ceol</snm><fnm>A</fnm></au><au><snm>Palazzi</snm><fnm>LM</fnm></au><au><snm>Nardelli</snm><fnm>G</fnm></au><au><snm>Schneider</snm><fnm>MV</fnm></au><au><snm>Castagnoli</snm><fnm>L</fnm></au><au><snm>Cesareni</snm><fnm>G</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><fpage>D572</fpage><lpage>574</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl950</pubid><pubid idtype="pmcid">1751541</pubid><pubid idtype="pmpid" link="fulltext">17135203</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>miR2Disease: a manually curated database for microRNA deregulation in human disease.</p></title><aug><au><snm>Jiang</snm><fnm>Q</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au><au><snm>Hao</snm><fnm>Y</fnm></au><au><snm>Juan</snm><fnm>L</fnm></au><au><snm>Teng</snm><fnm>M</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Li</snm><fnm>M</fnm></au><au><snm>Wang</snm><fnm>G</fnm></au><au><snm>Liu</snm><fnm>Y</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><fpage>D98</fpage><lpage>104</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn714</pubid><pubid idtype="pmcid">2686559</pubid><pubid idtype="pmpid" link="fulltext">18927107</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Systematic discovery of in vivo phosphorylation networks.</p></title><aug><au><snm>Linding</snm><fnm>R</fnm></au><au><snm>Jensen</snm><fnm>LJ</fnm></au><au><snm>Ostheimer</snm><fnm>GJ</fnm></au><au><snm>van Vugt</snm><fnm>MATM</fnm></au><au><snm>J&#248;rgensen</snm><fnm>C</fnm></au><au><snm>Miron</snm><fnm>IM</fnm></au><au><snm>Diella</snm><fnm>F</fnm></au><au><snm>Colwill</snm><fnm>K</fnm></au><au><snm>Taylor</snm><fnm>L</fnm></au><au><snm>Elder</snm><fnm>K</fnm></au><au><snm>Metalnikov</snm><fnm>P</fnm></au><au><snm>Nguyen</snm><fnm>V</fnm></au><au><snm>Pasculescu</snm><fnm>A</fnm></au><au><snm>Jin</snm><fnm>J</fnm></au><au><snm>Park</snm><fnm>JG</fnm></au><au><snm>Samson</snm><fnm>LD</fnm></au><au><snm>Woodgett</snm><fnm>JR</fnm></au><au><snm>Russell</snm><fnm>RB</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au><au><snm>Yaffe</snm><fnm>MB</fnm></au><au><snm>Pawson</snm><fnm>T</fnm></au></aug><source>Cell</source><pubdate>2007</pubdate><volume>129</volume><fpage>1415</fpage><lpage>1426</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2007.05.052</pubid><pubid idtype="pmcid">2692296</pubid><pubid idtype="pmpid" link="fulltext">17570479</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>The database of experimentally supported targets: a functional update of TarBase.</p></title><aug><au><snm>Papadopoulos</snm><fnm>GL</fnm></au><au><snm>Reczko</snm><fnm>M</fnm></au><au><snm>Simossis</snm><fnm>VA</fnm></au><au><snm>Sethupathy</snm><fnm>P</fnm></au><au><snm>Hatzigeorgiou</snm><fnm>AG</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><fpage>D155</fpage><lpage>158</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn809</pubid><pubid idtype="pmcid">2686456</pubid><pubid idtype="pmpid" link="fulltext">18957447</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Mutations and novel polymorphisms in coding regions and UTRs of CDK5R1 and OMG genes in patients with non-syndromic mental retardation.</p></title><aug><au><snm>Venturin</snm><fnm>M</fnm></au><au><snm>Moncini</snm><fnm>S</fnm></au><au><snm>Villa</snm><fnm>V</fnm></au><au><snm>Russo</snm><fnm>S</fnm></au><au><snm>Bonati</snm><fnm>MT</fnm></au><au><snm>Larizza</snm><fnm>L</fnm></au><au><snm>Riva</snm><fnm>P</fnm></au></aug><source>Neurogenetics</source><pubdate>2006</pubdate><volume>7</volume><fpage>59</fpage><lpage>66</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s10048-005-0026-9</pubid><pubid idtype="pmpid" link="fulltext">16425041</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>Alteration of BACE1-dependent NRG1/ErbB4 signaling and schizophrenia-like phenotypes in BACE1-null mice.</p></title><aug><au><snm>Savonenko</snm><fnm>AV</fnm></au><au><snm>Melnikova</snm><fnm>T</fnm></au><au><snm>Laird</snm><fnm>FM</fnm></au><au><snm>Stewart</snm><fnm>K</fnm></au><au><snm>Price</snm><fnm>DL</fnm></au><au><snm>Wong</snm><fnm>PC</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2008</pubdate><volume>105</volume><fpage>5585</fpage><lpage>5590</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0710373105</pubid><pubid idtype="pmcid">2291091</pubid><pubid idtype="pmpid" link="fulltext">18385378</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Putative psychosis genes in the prefrontal cortex: combined analysis of gene expression microarrays.</p></title><aug><au><snm>Choi</snm><fnm>KH</fnm></au><au><snm>Elashoff</snm><fnm>M</fnm></au><au><snm>Higgs</snm><fnm>BW</fnm></au><au><snm>Song</snm><fnm>J</fnm></au><au><snm>Kim</snm><fnm>S</fnm></au><au><snm>Sabunciyan</snm><fnm>S</fnm></au><au><snm>Diglisic</snm><fnm>S</fnm></au><au><snm>Yolken</snm><fnm>RH</fnm></au><au><snm>Knable</snm><fnm>MB</fnm></au><au><snm>Torrey</snm><fnm>EF</fnm></au><au><snm>Webster</snm><fnm>MJ</fnm></au></aug><source>BMC Psychiatry</source><pubdate>2008</pubdate><volume>8</volume><fpage>87</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-244X-8-87</pubid><pubid idtype="pmcid">2585075</pubid><pubid idtype="pmpid" link="fulltext">18992145</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>Identification of a mutation in synapsin I, a synaptic vesicle protein, in a family with epilepsy.</p></title><aug><au><snm>Garcia</snm><fnm>CC</fnm></au><au><snm>Blair</snm><fnm>HJ</fnm></au><au><snm>Seager</snm><fnm>M</fnm></au><au><snm>Coulthard</snm><fnm>A</fnm></au><au><snm>Tennant</snm><fnm>S</fnm></au><au><snm>Buddles</snm><fnm>M</fnm></au><au><snm>Curtis</snm><fnm>A</fnm></au><au><snm>Goodship</snm><fnm>JA</fnm></au></aug><source>J Med Genet</source><pubdate>2004</pubdate><volume>41</volume><fpage>183</fpage><lpage>186</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1136/jmg.2003.013680</pubid><pubid idtype="pmcid">1735688</pubid><pubid idtype="pmpid" link="fulltext">14985377</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>Tremor</p></title><aug><au><snm>Raethjen</snm><fnm>J</fnm></au><au><snm>Deuschl</snm><fnm>G</fnm></au></aug><source>Curr Opin Neurol</source><pubdate>2009</pubdate><volume>22</volume><fpage>400</fpage><lpage>405</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1097/WCO.0b013e32832dc056</pubid><pubid idtype="pmpid" link="fulltext">19553813</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Gene Ontology: tool for the unification of biology.</p></title><aug><au><snm>Ashburner</snm><fnm>M</fnm></au><au><snm>Ball</snm><fnm>CA</fnm></au><au><snm>Blake</snm><fnm>JA</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Butler</snm><fnm>H</fnm></au><au><snm>Cherry</snm><fnm>JM</fnm></au><au><snm>Davis</snm><fnm>AP</fnm></au><au><snm>Dolinski</snm><fnm>K</fnm></au><au><snm>Dwight</snm><fnm>SS</fnm></au><au><snm>Eppig</snm><fnm>JT</fnm></au><au><snm>Harris</snm><fnm>MA</fnm></au><au><snm>Hill</snm><fnm>DP</fnm></au><au><snm>Issel-Tarver</snm><fnm>L</fnm></au><au><snm>Kasarskis</snm><fnm>A</fnm></au><au><snm>Lewis</snm><fnm>S</fnm></au><au><snm>Matese</snm><fnm>JC</fnm></au><au><snm>Richardson</snm><fnm>JE</fnm></au><au><snm>Ringwald</snm><fnm>M</fnm></au><au><snm>Rubin</snm><fnm>GM</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au></aug><source>Nat Genet</source><pubdate>2000</pubdate><volume>25</volume><fpage>25</fpage><lpage>29</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/75556</pubid><pubid idtype="pmcid">3037419</pubid><pubid idtype="pmpid" link="fulltext">10802651</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>The Universal Protein Resource (UniProt) 2009.</p></title><aug><au><cnm>UniProt Consortium</cnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><fpage>D169</fpage><lpage>174</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2686606</pubid><pubid idtype="pmpid" link="fulltext">18836194</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>miRBase: tools for microRNA genomics.</p></title><aug><au><snm>Griffiths-Jones</snm><fnm>S</fnm></au><au><snm>Saini</snm><fnm>HK</fnm></au><au><snm>van Dongen</snm><fnm>S</fnm></au><au><snm>Enright</snm><fnm>AJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>D154</fpage><lpage>158</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn221</pubid><pubid idtype="pmcid">2238936</pubid><pubid idtype="pmpid" link="fulltext">17991681</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>NCBI Reference Sequences: current status, policy and new initiatives.</p></title><aug><au><snm>Pruitt</snm><fnm>KD</fnm></au><au><snm>Tatusova</snm><fnm>T</fnm></au><au><snm>Klimke</snm><fnm>W</fnm></au><au><snm>Maglott</snm><fnm>DR</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><fpage>D32</fpage><lpage>36</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn721</pubid><pubid idtype="pmcid">2686572</pubid><pubid idtype="pmpid" link="fulltext">18927115</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>Annotating the human genome with Disease Ontology.</p></title><aug><au><snm>Osborne</snm><fnm>JD</fnm></au><au><snm>Flatow</snm><fnm>J</fnm></au><au><snm>Holko</snm><fnm>M</fnm></au><au><snm>Lin</snm><fnm>SM</fnm></au><au><snm>Kibbe</snm><fnm>WA</fnm></au><au><snm>Zhu</snm><fnm>LJ</fnm></au><au><snm>Danila</snm><fnm>MI</fnm></au><au><snm>Feng</snm><fnm>G</fnm></au><au><snm>Chisholm</snm><fnm>RL</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><issue>Suppl 1</issue><fpage>S6</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-S1-S6</pubid><pubid idtype="pmcid">2788393</pubid><pubid idtype="pmpid" link="fulltext">19958504</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>Estimating an Eigenvector by the Power Method with a Random Start.</p></title><aug><au><snm>Del Corso</snm><fnm>GM</fnm></au></aug><source>SIAM J Matrix Anal Appl</source><pubdate>1997</pubdate><volume>18</volume><fpage>913</fpage><lpage>937</lpage><xrefbib><pubid idtype="doi">10.1137/S0895479895296689</pubid></xrefbib></bibl><bibl id="B49"><title><p>BioGRID</p></title><url>http://www.thebiogrid.org/downloads.php</url></bibl><bibl id="B50"><title><p>The Comparative Toxicogenomics Database, Gene Compound Relations</p></title><url>http://ctd.mdibl.org/reports/CTD_chem_gene_ixns.tsv.gz</url></bibl><bibl id="B51"><title><p>The Coparative Toxicogenomics Database, Disease Compound Relations</p></title><url>http://ctd.mdibl.org/reports/CTD_chem_disease_relations.tsv.gz</url></bibl><bibl id="B52"><title><p>The Comparative Toxicogenomics Database, Gene Disease Relations</p></title><url>http://ctd.mdibl.org/reports/CTD_gene_disease_relations.tsv.gz</url></bibl><bibl id="B53"><title><p>DIP Protein-Protein Interactions File</p></title><url>http://dip.doe-mbi.ucla.edu/dip/File.cgi?FN=2009/tab25/Hsapi20091230.txt</url></bibl><bibl id="B54"><title><p>OA Gene Ontology Annotations File</p></title><url>ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/gene_association.goa_human.gz</url></bibl><bibl id="B55"><title><p>HPRD Protein-Protein Interactions File</p></title><url>http://www.hprd.org/edownload/HPRD_Release_8_070609</url></bibl><bibl id="B56"><title><p>IntAct Protein-Protein Interactions File</p></title><url>ftp://ftp.ebi.ac.uk/pub/databases/intact/current/psimitab/intact.txt</url></bibl><bibl id="B57"><title><p>InterPro Gene-Domain Associations File</p></title><url>ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_human.dat.gz</url></bibl><bibl id="B58"><title><p>InterPro Gene-Gene Family Annotations File</p></title><url>ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml.gz</url></bibl><bibl id="B59"><title><p>KEGG Gene-Pathway Associations File</p></title><url>ftp://ftp.genome.jp/pub/kegg/pathway/pathway</url></bibl><bibl id="B60"><title><p>KEGG Gene Compound Associations File</p></title><url>http://soap.genome.jp/KEGG.wsdl</url></bibl><bibl id="B61"><title><p>MeSH Protein-Protein Annotations File</p></title><url>http://www.nlm.nih.gov/cgi/request.meshdata</url></bibl><bibl id="B62"><title><p>MINT Protein-Protein Interactions File</p></title><url>ftp://mint.bio.uniroma2.it/pub/release/mitab26/current/2010-12-15-mint-human-binary.mitab26.txt</url></bibl><bibl id="B63"><title><p>miR2Disease microRNA-Disease Associations File</p></title><url>http://watson.compbio.iupui.edu:8080/miR2Disease/download/AllEntries.txt</url></bibl><bibl id="B64"><title><p>miR2Disease microRNA-Gene Targeting File</p></title><url>http://watson.compbio.iupui.edu:8080/miR2Disease/download/miRtar.txt</url></bibl><bibl id="B65"><title><p>NetworKIN Kinase-Substrate Annotations File</p></title><url>http://networkin.info/Linding_et_al_NetworKIN_preds_filtered.tsv.gz.php</url></bibl><bibl id="B66"><title><p>OMIM Morbid Map Disease-Gene Associations File</p></title><url>ftp://ftp.ncbi.nih.gov/repository/OMIM/ARCHIVE/morbidmap</url></bibl><bibl id="B67"><title><p>OMIM Disease-Disease Relations File</p></title><url>ftp://ftp.ncbi.nih.gov/repository/OMIM/ARCHIVE/omim.txt.Z</url></bibl><bibl id="B68"><title><p>TarBase miRNA - Gene Targeting</p></title><url>http://diana.cslab.ece.ntua.gr/data/public/TarBase_V5.0.rar</url></bibl></refgrp>
</bm>
</art>