Figure 2.
Comparison of k-means and fuzzy k-means clustering. Genes are represented as points
in space, where genes that are similarly expressed are close together. (a) An overview of standard k-means clustering. (1)The process is initiated by randomly
partitioning the genes (small circles) into three groups, indicated by the three colors.
(2) The average expression profile of each group of genes is calculated as the centroid
(large circles), and the genes are reassigned to the centroid to which they are closest.
(4-6) Steps 2 and 3 are iterated until the centroids are stable, at which point the
genes are assigned to the cluster to which they are most similar. (b) An overview of fuzzy k-means clustering. (1) The process is initiated by seeding
each centroid with an eigen vector identified by PCA, as shown here for one centroid
(large circle). (2) For a given centroid, the membership of each gene is calculated
from the distance (or similarity) between each gene-expression pattern and the centroid.
(3) A new centroid is calculated as the weighted average of all of the gene-expression
patterns in the dataset, where each gene's weight is proportionate to its membership
in the cluster. Genes that are closer to the centroid will contribute more to the
cluster mean; therefore the centroid position migrates toward those genes. (4-6) The
process is iterated until convergence, and the membership of each gene in each cluster
is calculated (as shown here for one cluster).
Gasch and Eisen Genome Biology 2002 3:research0059.1 doi:10.1186/gb-2002-3-11-research0059 |