|
Resolution: standard / high Figure 2.
Any concept in the biomedical literature - for instance, a protein or a disease -
can be treated as a source concept (depicted as a blue ball throughout the picture
and the system). There may be curated information in authoritative databases such
as UMLS or UniProtKB/Swiss-Prot concerning the concept and its factual relationships
with other concepts. This information is captured and all concepts that have a 'factual'
relationship with the source concept in any of the participating databases are thus
included in the Knowlet of that concept. These 'factually associated concepts' are
depicted in the Knowlet visualisation as solid green balls. In addition, the source
concept may be mentioned with other concepts in one and the same sentence in the literature.
In that case, especially when there are multiple sentences in which the two concepts
co-occur, there is a high chance for a meaningful, sometimes causal, relationship
between the two concepts. Most concepts that have a factual relationship are likely
to be mentioned in one or more sentences in the literature at large, but as we have
mined only PubMed so far, there might be many other factual associations that are
not easy to recover from PubMed abstracts alone. For instance, many protein-protein
interactions described in UniProtKB/Swiss-Prot cannot be found as co-occurrences in
PubMed. Target concepts that co-occur minimally once in the same sentence as the source
concept are depicted as green rings in the visualisation of the Knowlet. The last
category of concepts is formed by those that have no co-occurrence per sentence in
the indexed resources but have sufficient concepts in common with the source concepts
in their own Knowlet to be of potential interest. These concepts are depicted as yellow
rings and could represent implicit associations. Over one million Knowlets have been
created so far. Each source concept has a relationship of varying strength with other
(target) concepts and each of these distances has been assigned with a value for factual
(F), co-occurrence (C) and associative (A) parameters. All Knowlets are dynamically
coupled into the concept space. The semantic association between each concept pair
is computed based on these values. In the near future additional data will be added,
such as co-expression statistics between genes.
Mons et al. Genome Biology 2008 9:R89 doi:10.1186/gb-2008-9-5-r89 |