|
Resolution: standard / high Figure 8.
Boolean implication extraction process. The expression levels of each probeset are
sorted and a step function fitted (using StepMiner) to the sorted expression level
w minimizes the square error between the original and the fitted values. A threshold
t is chosen, where the step crosses the original data. The region between t - 0.5 and t + 0.5 is classified as 'intermediate', the region below t - 0.5 is classified as 'low' and the region above t + 0.5 is classified as 'high'. The examples show probesets for two genes, CDH1 and
CDC2. As can be seen, CDH1 has a sharp rise between 6 and 9 and the StepMiner algorithm
was able to assign a threshold in this region. CDC2, however, is very linear, and
the StepMiner algorithm assigns the threshold approximately in the middle of the line.
A scatter plot is shown to illustrate the analysis. Each point in the scatter plot
corresponds to a microarray experiment, where the value for the x-axis is CDC2 expression
and the value for the y-axis is CDH1 expression. Boolean implication discovery analysis
is performed on a pair of probesets, which ignores all the points that lie in the
intermediate region and analyzes the four quadrants of the scatter plot. Four asymmetric
relationships (low ⇒ low, low ⇒ high, high ⇒ low, high ⇒ high) are discovered, each
corresponding to exactly one sparse quadrant in the scatter plot; and two symmetric
relationships (equivalent and opposite) are discovered, each corresponding to two
diagonally opposite sparse quadrants.
Sahoo et al. Genome Biology 2008 9:R157 doi:10.1186/gb-2008-9-10-r157 |