A gene expression profile of stem cell pluripotentiality and differentiation is conserved across diverse solid and hematopoietic cancers

Nathan P Palmer1, Patrick R Schmid1, Bonnie Berger12* and Isaac S Kohane3*

Author Affiliations

1 Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge MA 02139, USA

2 Department of Mathematics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge MA 02139, USA

3 Center for Biomedical Informatics, Harvard Medical School, 25 Shattuck Street, Boston MA 02115, USA

Genome Biology 2012, 13:R71  doi:10.1186/gb-2012-13-8-r71

Published: 21 August 2012

Additional files

Additional file 1:

Supplementary tables. Tables s1 to s4: genes in the SCGS, organized by the functional module to which they belong. Tables s5 to s8: GO enrichment statistics for each functional module in the SCGS. The file also includes a complete listing of all of the GEO sample identifiers for the microarray data comprising our database.

Additional file 2:

This file contains an animation demonstrating the effect of varying the FIR score threshold for including genes in the SCGS. For each possible number of top-scoring stem genes from 3-502 (displayed at the top of the animation frame), we project all of the samples in the database into the first two PCs of gene space (panel on top right), and highlight in color six relevant phenotypes (as in Figure 3): embryonic/induced pluripotent stem cells in magenta; mesenchymal stem cells in cyan; immortalized cell line samples in blue; blood precursor cells in orange; leukemia samples in green; normal blood in red. The panel below the PCA scatter plot shows the distribution of stemness index values (PC1 projection coordinates) for each highlighted phenotype. The plot on the left of the frame shows the analysis of variance (ANOVA) score (including all highlighted phenotypes) for the clustering defined by the current stemness index highlighted by a magenta dot on the curve showing all ANOVA scores for all of the depicted FIR thresholds. Higher ANOVA scores indicate better multi-way separation of the individual phenotypes along the stemness index. ANOVA was calculated and all plots were generated in the R statistical environment [46,47].

