Figure 6.

Comparison of the prediction accuracy of high- and low-CpG content promoter gene categories. (a) Summary of prediction accuracy for all high-CpG content promoter (HCP) genes in 78 RNA expression experiments on whole cell, cytosolic or nuclear RNA, showing that the median correlation for all experiments is r = 0.8. Each bar is divided into different colors corresponding to the relative contribution of variables in the regression model. (b) Same as in (a), but for low-CpG content promoter (LCP) genes, showing that the median correlation coefficient for all experiments is r = 0.66. This indicates that HCP genes are better predicted than LCP genes. Comparison of the relative contribution of various chromatin features in each experiment indicates that the promoter marks (red and light red) show more importance in predicting LCP genes using TSS-based data (for example, CAGE and RNA-PET), while structural marks (green show most importance in predicting LCP genes for transcript-based data. Code for cell lines: K, K562; G, GM12878; 1, H1-hESC; H, HepG2; E, HeLa-S3; N, NHEK; U, HUVEC. Code for RNA extraction: +, PolyA+; -, PolyA-. Code for cell compartment: W, whole cell; C, cytosol; N, nucleus.

Dong et al. Genome Biology 2012 13:R53   doi:10.1186/gb-2012-13-9-r53