Open Access Highly Accessed Research

Modeling gene expression using chromatin features in various cellular contexts

Xianjun Dong1, Melissa C Greven1, Anshul Kundaje2, Sarah Djebali3, James B Brown4, Chao Cheng5, Thomas R Gingeras6, Mark Gerstein5, Roderic Guigó3, Ewan Birney7 and Zhiping Weng1*

Author affiliations

1 Program in Bioinformatics and Integrative Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA

2 Department of Computer Science, Stanford University, 318 Campus Drive, Stanford, CA 94304, USA

3 Centre for Genomic Regulation (CRG) and UPF, Dr. Aiguader, 88, 08003 Barcelona, Spain

4 Department of Statistics, University of California, Berkeley, 367 Evans Hall, University of California, Berkeley, Berkeley, CA 94720, USA

5 Computational Biology and Bioinformatics Program, Yale University, 266 Whitney Ave, New Haven, CT 06511, USA

6 Cold Spring Harbor Laboratory, Genome Center, Woodbury, New York 11797, USA

7 Vertebrate Genomics Group, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK

For all author emails, please log on.

Citation and License

Genome Biology 2012, 13:R53  doi:10.1186/gb-2012-13-9-r53

Published: 5 September 2012

Abstract

Background

Previous work has demonstrated that chromatin feature levels correlate with gene expression. The ENCODE project enables us to further explore this relationship using an unprecedented volume of data. Expression levels from more than 100,000 promoters were measured using a variety of high-throughput techniques applied to RNA extracted by different protocols from different cellular compartments of several human cell lines. ENCODE also generated the genome-wide mapping of eleven histone marks, one histone variant, and DNase I hypersensitivity sites in seven cell lines.

Results

We built a novel quantitative model to study the relationship between chromatin features and expression levels. Our study not only confirms that the general relationships found in previous studies hold across various cell lines, but also makes new suggestions about the relationship between chromatin features and gene expression levels. We found that expression status and expression levels can be predicted by different groups of chromatin features, both with high accuracy. We also found that expression levels measured by CAGE are better predicted than by RNA-PET or RNA-Seq, and different categories of chromatin features are the most predictive of expression for different RNA measurement methods. Additionally, PolyA+ RNA is overall more predictable than PolyA- RNA among different cell compartments, and PolyA+ cytosolic RNA measured with RNA-Seq is more predictable than PolyA+ nuclear RNA, while the opposite is true for PolyA- RNA.

Conclusions

Our study provides new insights into transcriptional regulation by analyzing chromatin features in different cellular contexts.