|
Resolution: standard / high Figure 2.
PACK flowchart. (a) A schematic diagram of PACK, as used in this study. For each gene expression profile
an unbiased estimate of its kurtosis, K, is computed. Genes with negative kurtosis
are selected because only these define large subgroups (of sizes >22% of the total
sample size). Further unsupervised clustering may then be performed on this subset
of negative kurtosis profiles to find novel tumor subclasses. Alternatively, to find
robust prognostic markers, negative kurtosis profiles are filtered further based on
whether there is evidence of bimodality (C = 2). This step requires a cluster inference
algorithm and a model selection criterion to discard those profiles that are best
described by a single gaussian (C = 1; by random chance gaussian profiles may have
negative kurtosis). Correlation to phenotypes (here phenotypes) is done with Fisher's
test to evaluate whether the distribution of the categorical phenotype across the
two clusters is significantly different from random. (b) Density curves of typical bimodal negative and positive kurtosis gene expression profiles.
X-axis shows gene expression on a log2 scale. PACK, Profile Analysis using Clustering
and Kurtosis.
Teschendorff et al. Genome Biology 2007 8:R157 doi:10.1186/gb-2007-8-8-r157 |