Open Access Highly Accessed Method

A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets

Chao Cheng1, Koon-Kiu Yan1, Kevin Y Yip12, Joel Rozowsky1, Roger Alexander1, Chong Shou1 and Mark Gerstein134*

Author Affiliations

1 Department of Molecular Biophysics and Biochemistry, Yale University, 260 Whitney Avenue, New Haven, CT 06520, USA

2 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Rm 1006, Ho Sin-Hang Engineering Bldg, Shatin, New Territories, Hong Kong

3 Program in Computational Biology and Bioinformatics, Yale University, 260 Whitney Avenue, New Haven, CT 06520, USA

4 Department of Computer Science, Yale University, PO Box 208285, New Haven, CT 06520, USA

For all author emails, please log on.

Genome Biology 2011, 12:R15  doi:10.1186/gb-2011-12-2-r15

Published: 16 February 2011

Additional files

Additional file 1:

Signal patterns of Pol II around TSS and TTS regions (from -4 kb to 4 kb) at different developmental stages. At each stage, the signals were normalized by subtracting the average and then divided by the standard deviation of the signals over all the 160 bins. The location of the TSS and TTS are marked as dotted lines.

Format: PDF Size: 288KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Correlation patterns of chromatin features with gene expression at the EEMB stage based on long transcript genes only. Only genes longer than 8 kb were used for correlation computations so that there is no overlap between the TSS and TTS bins.

Format: PDF Size: 876KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Correlation patterns of chromatin features with gene expression at the EEMB stage based on transcripts that are far away from any other transcripts. Only the transcripts that are at least 4 kb away from any other transcripts were used for correlation computations so that there is no overlap between bins of nearby transcripts.

Format: PDF Size: 385KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Correlation patterns of chromatin features with gene expression at the L3 stage. Correlation was calculated based on long transcripts (>8 kb).

Format: PDF Size: 1.7MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Correlation patterns of chromatin features with gene expression at the EEMB stage based on single-transcript genes only.

Format: PDF Size: 1003KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Prediction of gene expression using chromatin features in all the 40 bins around the TSS (from -2 kb to 2 kb). (a) ROC curve of the SVM classification model. (b) Predicted expression levels versus actual expression levels measured by RNA-seq experiment. PCC, Pearson correlation coefficient.

Format: PDF Size: 743KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Interaction between all possible pairs of histone modifications. Interaction between all possible pairs of histone modification as indicated by linear model in bin 1. For each pair, both the results of linear models with the interaction terms (Interaction models) and without the interaction terms (Singleton models) are shown.

Format: XLS Size: 34KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

The significant interactions between chromatin features based on a linear model. The significant interactions between chromatin features based on a linear model with 12 different chromatin features and their pairwise interaction terms.

Format: XLS Size: 18KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

Mutual information between expression and pairwise histone modification signals. For each pair of histone modifications (denoted as H1, H2), the heat map shows the normalized mutual information I(E, H1 AND H2)/max(I(E,H1),I(E,H2)). For pairs such as H3K4me2 and K4K36me3, the combination of two features gives a higher predictive power than the two individual features.

Format: PDF Size: 157KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 10:

Interactions among chromatin features and expression. (a) Node colors indicate the correlation of the corresponding features with gene expression. Edge colors indicate the correlation between the two connected features. Only interactions with a strong correlation (|PCC| >0.3) are shown. (b) The directional relationships inferred from Bayesian network analysis. Arrow sizes indicate the confidence scores of the directed edges. Only interactions with a confidence score (combined for both directions) of at least 80% are shown.

Format: PDF Size: 2MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 11:

Supplementary documents about the Bayesian network analysis and so on. The file contains additional information about the Bayesian network analysis.

Format: PDF Size: 154KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 12:

Correlation patterns of chromatin features in 40 bins around the TSS and TTS (from -2 kb to 2 kb) of the first and the second genes in 881 worm operons.

Format: PDF Size: 4KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 13:

Predicted expression levels of microRNAs at stage L3. MicroRNAs are divided into high (red) and low (green) groups based on their measured expression levels in small RNA-seq experiments.

Format: PDF Size: 1.4MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 14:

Stage specificity of chromatin models for microRNA expression predictions. The chromatin model was trained using the chromatin and expression data of protein-coding genes at the EEMB stage. The model was then used to predict microRNA expression levels at six stages. R indicates the Pearson correlation coefficient between the predicted expression levels and the actual expression levels from RNA-seq experiments.

Format: PDF Size: 51KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 15:

Gene Expression Omnibus accession ID of data sets used in this work.

Format: XLS Size: 44KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data