Figure 4.

Comparison of prediction accuracy across different cell lines. (a) Boxplot of correlation coefficients for seven cell lines (K562, GM12878, H1-hESC, HeLa-S3, HepG2, HUVEC and NHEK) with different types of expression quantification (CAGE, RNA-PET, and RNA-Seq). It shows that the high quantitative relationship between chromatin features and expression exist in various cell lines and using different expression quantification methods. Paired Wilcoxon tests between H1-hESC and other cell lines show that H1-hESC has significantly lower prediction accuracy (P-value = 0.02, 0.02, 0.07, 0.02, and 0.05 for K562, GM12878, HeLa-S3, HepG2 and HUVEC, respectively). (b) Application of the model learned from K562 to other cell lines (GM12878, H1-hESC, HeLa-S3 and NHEK) indicates that the model performs well across cell lines (r = 0.82, 0.86, 0.87 and 0.84, respectively). This indicates that the quantitative relationship between chromatin features and gene expression is not cell line-specific, but rather a general feature.

Dong et al. Genome Biology 2012 13:R53   doi:10.1186/gb-2012-13-9-r53