|
Resolution: standard / high Figure 9.
The effect of missing values on the accuracy of quantitative GI prediction. (a-c) The three scenarios producing missing values in E-MAP data. (a) In the 'Random' scenario,
a random subset of gene pairs have hidden GIs. (b) in the 'Submatrix' scenario, a
random subset of genes was selected and all the interactions between them were hidden.
(c) In the 'Cross' scenario, two random disjoint subsets of genes were selected and
all the interactions between them were hidden. In all three examples, 20% of the gene
pairs were hidden. (d, e) Performance for different fractions of missing values in the three scenarios, using
only the GSG+MATRIX features (d) or all the features (e). Performance was tested using
the ER E-MAP, as it contained the least missing values. Imputation was performed by
linear regression. The procedure was repeated 30 times. Performance was evaluated
as the average Pearson correlation between the hidden and the predicted S-scores.
Note that the 'Cross' scenario is not applicable for cases with ≥ 50% missing values.
Ulitsky et al. Genome Biology 2009 10:R140 doi:10.1186/gb-2009-10-12-r140 |