|
Resolution: standard / high Figure 3.
Effect of the number of microarray experiments on the compendium data subset with
215 genes. We compared the extent of co-regulated genes using different numbers of
microarray experiments on the subset of compendium data with 215 genes. In order to
produce typical datasets with E experiments (where E = 5, 10, 20, 50, 100), we randomly sampled (with replacement) 100 different subsets
of E experiments from the compendium data with 215 genes and 273 experiments. The ability
to identify co-regulated genes from clustering results is summarized by the median
z-scores over the 100 randomly sampled datasets. A high median z-score indicates a
high proportion of co-regulated genes from clustering results compared to those from
random partitions. (a) We compared the median z-scores using different numbers of experiments (E) from hierarchical complete-link over a range of different numbers of clusters (from
5 to 100). The transcription factor database SCPD is used as the evaluation criterion
for co-regulated genes. The median z-scores generally increase as E increases over different numbers of clusters. This shows that higher proportions of
co-regulated genes are identified on microarray datasets with higher numbers of experiments.
(b) Using SCPD as our evaluation criterion, we compared the median z-scores using different
numbers of experiments (E) and different clustering algorithms (hierarchical average-link and complete-link
using correlation, model-based clustering algorithms MCLUST and IMM on standardized
data) on the compendium data subset with 215 genes at 25 clusters. We estimated the
optimal number of clusters on this dataset to be 25 using IMM, and we observed similar
results at different numbers of clusters. (c) Using ChIP data as our evaluation criterion, we compared the median z-scores using
different numbers of experiments (E) and different clustering algorithms on the compendium data subset with 215 genes
at 25 clusters. Using either SCPD or ChIP as our evaluation criterion, the median
z-scores typically increase as E increases, and MCLUST typically produces relatively high median z-scores.
Yeung et al. Genome Biology 2004 5:R48 doi:10.1186/gb-2004-5-7-r48 |