Figure 4.
Hierarchical correction using Markov blanket structure. (a) Schematic of the local Markov blanket surrounding a GO term (Y1 is the node of interest
in this example). Each GO term is represented by a blank node while the SVM classifier
output for that GO node is represented by a shaded node. To address the hierarchical
relationships between GO terms, for each GO term (Y1), we included all neighboring
nodes in its Markov blanket to construct a Bayesian network. The distribution of SVM
outputs (observed nodes) for positive and negative examples was encoded in the conditional
probability tables of the Bayesian network. We then infer the probability of a particular
gene's involvement in each GO term (a hidden node) based on its values in these observed
nodes. (b) Improvement of the AUC for the novel set using the HIER-MB classifiers compared to
single SVM predictions for selected terms for biological process terms of size 101
to 300 (number of genes annotated to this GO term in the training set). For each GO
term, the best-performing sub-hierarchy was selected, and the ones that performed
better than single SVM (characterized by held-out values in the training set) are
plotted in this figure. (c) Median improvement of predictions for selected GO terms over different biological
process GO term sizes. Hierarchical correction using Markov blanket structure performs
better (when selected) for smaller terms. AUC, area under receiver operating characteristic
curve; GO, Gene Ontology; HIER-MB, Markov blanket hierarchical correction; SVM, support
vector machine.
Guan et al. Genome Biology 2008 9(Suppl 1):S3 doi:10.1186/gb-2008-9-s1-s3 |