Resolution:
standard / ## Figure 1.
Gene expression data of [11]. Each point in the figures corresponds to an expression profile plotted according
to the first and second principal components. All points remain at the same coordinate
throughout the six figures (that is, only the coloring changes). In the top row, colors
indicate the cluster membership of each profile after k-means clustering of the original
data (that is, prior to dimensionality reduction) using a Pearson correlation distance
for P. falciparum 48 h erythrocytic cycle k = 3, 5 and 7. We observe that the clusters are nearly equally sized and that their
edges are rather arbitrary, as they do not follow low density regions and change radically
for different k values. Figures in the bottom row show regions of the expression space enriched for
three regulatory motifs identified in previous studies [6,7,18]. Enrichment is defined on the original data (that is, prior to dimensionality reduction)
by measuring the proportion of genes that contain the motif in their upstream sequence
(1 kb) among the 200 nearest neighbors of each gene (according to their profiles).
Colored points correspond to profiles where this proportion is three standard deviations
above the expected one according to the hypergeometric law (see motif density in Material
and methods). Uncolored points are shaded for clarity. We observe that: (i) each motif
corresponds to a contiguous region of the expression space, (ii) these regions do
not correspond to those defined by any of the above clusterings and (iii) regions
defined by different motifs can strongly overlap, highlighting the weaknesses of clustering-based
approaches for motif discovery.
Lajoie |