<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>gb-2011-12-2-r15</ui><ji>GBJ</ji><fm>
<dochead>Method</dochead>
<bibl>
<title>
<p>A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets</p>
</title>
<aug>
<au id="A1"><snm>Cheng</snm><fnm>Chao</fnm><insr iid="I1"/><email>chao.cheng@yale.edu</email></au>
<au id="A2"><snm>Yan</snm><fnm>Koon-Kiu</fnm><insr iid="I1"/><email>koon-kiu.yan@yale.edu</email></au>
<au id="A3"><snm>Yip</snm><mi>Y</mi><fnm>Kevin</fnm><insr iid="I1"/><insr iid="I2"/><email>kevinyip@cse.cuhk.edu.hk</email></au>
<au id="A4"><snm>Rozowsky</snm><fnm>Joel</fnm><insr iid="I1"/><email>joel.rozowsky@yale.edu</email></au>
<au id="A5"><snm>Alexander</snm><fnm>Roger</fnm><insr iid="I1"/><email>roger.alexander@yale.edu</email></au>
<au id="A6"><snm>Shou</snm><fnm>Chong</fnm><insr iid="I1"/><email>chong.shou@yale.edu</email></au>
<au ca="yes" id="A7"><snm>Gerstein</snm><fnm>Mark</fnm><insr iid="I1"/><insr iid="I3"/><insr iid="I4"/><email>mark.gerstein@yale.edu</email></au>
</aug>
<insg>
<ins id="I1"><p>Department of Molecular Biophysics and Biochemistry, Yale University, 260 Whitney Avenue, New Haven, CT 06520, USA</p></ins>
<ins id="I2"><p>Department of Computer Science and Engineering, The Chinese University of Hong Kong, Rm 1006, Ho Sin-Hang Engineering Bldg, Shatin, New Territories, Hong Kong</p></ins>
<ins id="I3"><p>Program in Computational Biology and Bioinformatics, Yale University, 260 Whitney Avenue, New Haven, CT 06520, USA</p></ins>
<ins id="I4"><p>Department of Computer Science, Yale University, PO Box 208285, New Haven, CT 06520, USA</p></ins>
</insg>
<source>Genome Biology</source>
<issn>1465-6906</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>2</issue>
<fpage>R15</fpage>
<url>http://genomebiology.com/2011/12/2/R15</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2011-12-2-r15</pubid><pubid idtype="pmpid">21324173</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>21</day><month>12</month><year>2010</year></date></rec><revrec><date><day>26</day><month>1</month><year>2011</year></date></revrec><acc><date><day>16</day><month>2</month><year>2011</year></date></acc><pub><date><day>16</day><month>2</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Cheng et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<p>We develop a statistical framework to study the relationship between chromatin features and gene expression. This can be used to predict gene expression of protein coding genes, as well as microRNAs. We demonstrate the prediction in a variety of contexts, focusing particularly on the modENCODE worm datasets. Moreover, our framework reveals the positional contribution around genes (upstream or downstream) of distinct chromatin features to the overall prediction of expression levels.</p>
</sec>
</abs>
</fm><meta>
<classifications>
<classification id="30010002" subtype="man_spc_id" type="BMC">Bioinformatics</classification>
<classification id="30010010" subtype="man_spc_id" type="BMC">Genome studies</classification>
<classification id="30010013" subtype="man_spc_id" type="BMC">Methods</classification>
<classification id="30010015" subtype="man_spc_id" type="BMC">Model organisms</classification>
<classification id="30010016" subtype="man_spc_id" type="BMC">Molecular biology</classification>
</classifications>
</meta><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>In eukaryotes, nuclear chromosomes are organized into chains of nucleosomes, which are in turn composed of octamers of four types of histones wrapped around 147 bp of DNA. Modifications of these core histones are central to many biological processes, including transcriptional regulation <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>, replication <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>, alternative splicing <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>, DNA repair <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>, apoptosis <abbrgrp>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
</abbrgrp>, gene silencing <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>, X-chromosome inactivation <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp> and carcinogenesis <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
</abbrgrp>. Among them, transcriptional regulation is one of the most important and thereby intensively investigated processes <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
</abbrgrp>. Histone modifications have been demonstrated to regulate gene transcription in positive or negative manners depending on the modification site and type <abbrgrp>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp>. For example, a genome-wide map of 18 histone acetylation and 19 histone methylation sites in human T cells indicates that H3K9me2, H3K9me3, H3K27me2, H3K27me3 and H4K20me3 are negatively correlated with gene expression, whereas most other modifications, including all the acetylations, are correlated with gene activation <abbrgrp>
<abbr bid="B18">18</abbr>
<abbr bid="B19">19</abbr>
</abbrgrp>. As an extreme case, histone modifications play critical roles in X-chromosome inactivation in females to equalize the expression of X-linked genes to those in male animals <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp>. Histone modifications are thought to affect transcription through two mechanisms: modifying the accessibility of DNA to transcription factors by altering the local chromatin structure; and providing specific binding surfaces for the recruitment of transcriptional activators and repressors <abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B21">21</abbr>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>.</p>
<p>The large number of possible histone modifications has led to the 'histone code' hypothesis, which states that combinations of different histone modifications specify distinct chromatin states and bring about distinct downstream effects <abbrgrp>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
</abbrgrp>. Moreover, one histone modification may influence another by recruiting or activating chromatin-modifying complexes <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>. However, a study in yeast revealed only simple and cumulative functional consequences for combinations of histone H4 acetylation rather than a complicated synergistic histone code <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. Two other studies, one in yeast and the other in <it>Drosophila</it>, also demonstrated that histone modifications are highly correlated with each other and are partially redundant in function <abbrgrp>
<abbr bid="B13">13</abbr>
<abbr bid="B17">17</abbr>
</abbrgrp>, presumably conferring robustness in relation to epigenetic regulation <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>. Alternatively, the high correlation between histone modifications may have been overestimated as a result of differences in nucleosome density or other unknown biases <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>. So far, knowledge about the effect of histone modifications on transcriptional regulation is still limited, and the degree of complexity of the histone code is far from clear. To further understand the relationship between histone modifications and gene expression, we require a systematic analysis that integrates histone modification maps with other genome-wide datasets.</p>
<p>The model organism encyclopedia of DNA elements (modENCODE) project was launched in 2007 for the purpose of generating a comprehensive annotation of functional elements in the <it>Caenorhabditis elegans </it>and <it>Drosophila melanogaster </it>genomes <abbrgrp>
<abbr bid="B30">30</abbr>
</abbrgrp>. By using recently developed genome-wide experimental techniques such as ChIP-chip, ChIP-seq and RNA-seq <abbrgrp>
<abbr bid="B31">31</abbr>
<abbr bid="B32">32</abbr>
</abbrgrp>, modENCODE has generated a large amount of data, including gene expression profiles, histone modification profiles, and DNA binding data for transcription factors and histone-modifying proteins. This large compendium of datasets provides an unprecedented opportunity to investigate the relationship between chromatin modifications and transcriptional regulation using an integrative approach.</p>
<p>In this study, we endeavor to construct a general framework for relating chromatin features with gene expression. We apply a multitude of supervised and unsupervised statistical methods to investigate different aspects of gene regulation by chromatin features. Leveraging the rich data generated by the modENCODE project, we use <it>C. elegans </it>as a primary model to illustrate our formalism. Nevertheless, we tested the generality of our methods using a variety of species ranging from yeast to human. More specifically, we show that chromatin features can accurately predict the expression levels of genes and collectively account for at least 50% of the variation in gene expression. We also study the importance of individual features, examine the combinatorial effects of chromatin features, and investigate to what extent the histone code hypothesis is valid. By applying the chromatin-based model to predict the expression of coding genes and microRNAs at different developmental stages, we further address the developmental stage specificity of chromatin modifications and suggest that chromatin features regulate transcription of coding genes and microRNAs in a similar fashion.</p>
<p>As more and more genome-wide ChIP-Seq and RNA-Seq data are going to be generated via the modENCODE project and the ENCODE project <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp> in the near future, the methods of data integration proposed in this work have various potential applications.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>Chromatin features show distinct signal patterns around genic regions</p>
</st>
<p>To systematically study the genome-wide properties of various chromatin features, we collected more than 50 ChIP-chip and ChIP-seq profiles of histone modifications and DNA binding factors in <it>C. elegans </it>from the modENCODE project (see Materials and methods). We divided the DNA regions around (&#177; 4 kb) the transcription start site (TSS) and transcription termination site (TTS) of each transcript into small 100-bp bins and calculated the average signal of the chromatin features in each bin. As a result, each bin was assigned a matrix whose elements are the average signals of different features in different transcripts (Figure <figr fid="F1">1</figr>). Figure <figr fid="F2">2a</figr> shows the rich spatial pattern of 16 features in the early embryonic (EEMB) stage, where the signals are averaged over all transcripts. We first observed that the upstream and downstream regions of TSSs and TTSs are clearly distinct. Most chromatin features have higher signals in the transcribed regions (downstream of TSSs and upstream of TTSs). Interestingly, we found that RNA polymerase II (Pol II) has the strongest binding signal in regions right after the TTS, rather than within the transcribed region (Figure <figr fid="F2">2a</figr>). The enriched binding signals right after the TTS may indicate the importance of anti-sense transcription as a regulatory mechanism for gene expression <abbrgrp>
<abbr bid="B14">14</abbr>
<abbr bid="B33">33</abbr>
</abbrgrp>. Strong Pol II signal was also observed at regions before the TSS in some other developmental stages (Figure S1 in Additional file <supplr sid="S1">1</supplr>), which was also reported previously in <it>C. elegans </it>by <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>, and was thought to be related to the accumulation of TSS-associated RNAs in mouse and human <abbrgrp>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp>. The signal pattern of histone H3 suggests that nucleosomes have lower occupation density in regions around the TSS and TTS than within the transcribed regions. H3K4me2 and H3K4me3 are enriched upstream of the TSS, consistent with their reported role as histone marks for active promoters <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. On the other hand, signals for H3K9me2 and H3K9me3 are depleted around TSS compared to neighboring regions, which may reflect the low density of nucleosomes around the TSS of genes <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Schematic diagram of our data binning and supervised analysis</p></caption><text>
   <p><b>Schematic diagram of our data binning and supervised analysis</b>. <b>(a) </b>DNA regions around the transcription start site (TSS) and transcription terminal site (TTS) of each transcript were separated into 160 bins of 100 bp in size. Average signal of each chromatin feature was calculated for all transcripts, resulting in a predictor matrix for each bin. These predictor matrices were used to predict expression of transcripts by support vector machine (SVM) or support vector regression (SVR) models. The genome-wide data for chromatin features and gene expression were generated by the modENCODE project using ChIP-chip/ChIP-seq and RNA-seq experiments, respectively. <b>(b) </b>A summary of datasets used in our analysis. L, larval; TF, transcription factor; YA, young adult.</p>
</text><graphic file="gb-2011-12-2-r15-1"/></fig>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Chromatin feature patterns</p></caption><text>
   <p><b>Chromatin feature patterns</b>. <b>(a,b) </b>Signal pattern (a) and correlation pattern (b) of each chromatin feature in the 160 bins around the TSS and TTS (from 4 kb upstream to 4 kb downstream) of worm transcripts at the EEMB stage. In (a), the signal of each chromatin feature for each bin is averaged across all transcripts. In (b), the Spearman correlation coefficient of each chromatin feature with gene expression levels was calculated for each bin. Ab1 and Ab2 represent experimental results using different antibodies for a chromatin feature. DNA region from 2 kb upstream of the TSS to 2 kb downstream of the TTS is shown in the rectangle.</p>
</text><graphic file="gb-2011-12-2-r15-2"/></fig>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Signal patterns of Pol II around TSS and TTS regions (from -4 kb to 4 kb) at different developmental stages</b>. At each stage, the signals were normalized by subtracting the average and then divided by the standard deviation of the signals over all the 160 bins. The location of the TSS and TTS are marked as dotted lines.</p>
</text>
<file name="gb-2011-12-2-r15-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Chromatin features exhibit distinct spatial correlation patterns with gene expression levels</p>
</st>
<p>The different chromatin features display distinct spatial patterns. It is thus worthwhile to explore the relationship between these patterns and the level of gene expression. Making use of RNA-seq data obtained from the different stages of <it>C. elegans</it>, we quantified the expression level of each gene. For each bin, we then calculated the correlation between the gene expression levels and the average signals of each chromatin feature of the bin. Figure <figr fid="F2">2b</figr> shows the spatial variation of these correlation coefficients around TSSs and TTSs. According to the correlation patterns, there are two main types of chromatin features: ones that are positively correlated with gene expression (such as H3K79me1, H3K79me2 and H3K79me3); and ones that are negatively correlated with gene expression (such as H3K9me2 and H3K9me3). While some features show largely uniform correlations across the 16-kb regions, some others are more variable across the regions. For example, H3K79me2 has a high correlation coefficient (0.65) near the TSS, but rather a low correlation (0.10) downstream of the TTS. It is interesting to observe that the negative features tend to have more uniform spatial patterns while the positive features tend to show greater variation. In addition, for chromatin features such as H3K79me2, although the average signal intensity decreases with distance downstream from the TSS, the correlation between the feature signal and the expression level remains high. This pattern suggests that, while some chromatin features have the strongest average signals only at some highly specific regions, the differences of their signals between genes with low and high expression levels remain strong over much broader regions.</p>
<p>We chose the long window size of 4 kb in order to inspect how fast the signals of the chromatin features fade out as we move away from the TSS and TTS. Indeed, the correlations of some chromatin features (for example, H3K9me3) remain strong a few kilobases away from the TSS and TTS, and the fading could only be observed at the 4-kb boundaries. To make sure that our conclusions are not affected by short genes with some bins having both the identities of being within 4 kb downstream of the TSS and within 4 kb upstream of the TTS, we also did the correlation analysis only on transcripts longer than 8 kb, and found that the correlation patterns are the same (Figure S2 in Additional file <supplr sid="S2">2</supplr>). Also, as the <it>C. elegans </it>genome is quite compact, the region 4 kb upstream of a TSS or downstream of a TTS could be overlapping with another gene. We thus repeated the analysis using transcripts that are at least 4 kb away from any other known transcripts, and again obtained similar correlation patterns (Figure S3 in Additional file <supplr sid="S3">3</supplr>). Furthermore, analysis based on bins within intergenic regions again resulted in a similar correlation pattern. Therefore, the high correlation of gene expression with feature signal at distant locations does reflect the long-range effects of their regulation, instead of an artifact caused by chromatin structure of the nearby genes.</p>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>Correlation patterns of chromatin features with gene expression at the EEMB stage based on long transcript genes only</b>. Only genes longer than 8 kb were used for correlation computations so that there is no overlap between the TSS and TTS bins.</p>
</text>
<file name="gb-2011-12-2-r15-S2.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p>
<b>Correlation patterns of chromatin features with gene expression at the EEMB stage based on transcripts that are far away from any other transcripts</b>. Only the transcripts that are at least 4 kb away from any other transcripts were used for correlation computations so that there is no overlap between bins of nearby transcripts.</p>
</text>
<file name="gb-2011-12-2-r15-S3.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>Furthermore, to assess whether the trends we observed are universal to all developmental stages rather than specific to the EEMB stage, we repeated the analysis in other stages, including late embryo, larval stages and young adult. Although the exact values of correlation coefficients vary across stages, the spatial patterns are consistent in all stages (Figure S4 in Additional file <supplr sid="S4">4</supplr>). In addition, a large number of genes are associated with multiple transcripts corresponding to different alternative splicing isoforms. In many cases, the overlap between these transcripts is substantial, which might affect the correlation patterns between chromatin features and expression. We thus repeated the correlation analysis using only genes with a single transcript, and obtained the same qualitative results (Figure S5 in Additional file <supplr sid="S5">5</supplr>).</p>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p>
<b>Correlation patterns of chromatin features with gene expression at the L3 stage</b>. Correlation was calculated based on long transcripts (&gt;8 kb).</p>
</text>
<file name="gb-2011-12-2-r15-S4.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S5">
<title>
<p>Additional file 5</p>
</title>
<text>
<p>
<b>Correlation patterns of chromatin features with gene expression at the EEMB stage based on single-transcript genes only</b>.</p>
</text>
<file name="gb-2011-12-2-r15-S5.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>Among the chromatin features shown in Figure <figr fid="F2">2</figr>, MES-4 and MRG-1 are factors associated with X-chromosome inactivation <abbrgrp>
<abbr bid="B37">37</abbr>
<abbr bid="B38">38</abbr>
</abbrgrp>. These factors are supposed to have different binding patterns in the X chromosome than in autosomes. We therefore analyzed their correlation patterns in X genes and autosomal genes separately. As expected, we found that MES-4 and MRG-4 associate predominantly with autosomal DNAs, while the dosage compensation complex (DCC) subunits bind specifically with X-chromosomal DNAs (data not shown), which is in line with previous reports <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>. Consistent with this finding, MES-4 and MRG-4 show stronger positive correlation with autosomal gene expression.</p>
</sec>
<sec>
<st>
<p>Unsupervised clustering reveals general activating and repressing chromatin features for individual genes</p>
</st>
<p>As some chromatin features are positively correlated with gene expression levels and some are negatively correlated, the two groups potentially represent general active and repressive marks of gene expression. Yet since these correlations capture only the average behavior across all genes, it is still not clear if these features are strong indicators of the expression levels of individual genes.</p>
<p>In order to examine the relationship between chromatin features and the expression levels of all individual genes, we performed a two-way hierarchical clustering of both the chromatin features and the annotated genes, according to the feature signals at the TSS bins (bin 1). As shown in Figure <figr fid="F3">3a</figr>, genes can be divided into two clusters (labeled as H and L, respectively) based on the signals of the 16 features. We found that the two clusters roughly correspond to genes with high expression levels (H) and genes with low expression levels (L), respectively (Figure <figr fid="F3">3b</figr>). These two clusters are characterized by complementary patterns of chromatin features. Cluster H is characterized by high signals of 11 features (the right component of the upper dendrogram), and low signals for the other 5 features. We note in particular that highly expressed genes tend to have a strong H3K36me3 signal, which is consistent with the role of H3K36me3 as a chromatin mark that activates transcription of associated genes. Similarly, the well-known repressive mark H3K9me3 shows a low signal. Compared to cluster H, genes in cluster L show the opposite pattern of chromatin signals.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Hierarchical clustering using either chromatin feature profiles (a-c) or bin profiles (d-f) discriminates highly and lowly expressed genes</p></caption><text>
   <p><b>Hierarchical clustering using either chromatin feature profiles (a-c) or bin profiles (d-f) discriminates highly and lowly expressed genes</b>. <b>(a) </b>Hierarchical clustering of 16 chromatin features in bin 1 (0 to 100 nucleotides upstream of a TSS). The resulting tree is split at the top branch, which divides genes into two clusters, cluster H and cluster L, as labeled. <b>(b) </b>Distributions of expression levels of genes in cluster H (red) and cluster L (green). Expression levels are significantly different between the two clusters according to <it>t</it>-test (<it>P </it>= 3E-202). Expression levels were measured by RNA-seq (see Materials and methods). <b>(c) </b>T-scores for the differential expression of the top two gene clusters based on hierarchical clustering of chromatin features in each of the 160 bins. For each bin, hierarchical clustering was performed to separate genes into two clusters. Expression levels between the two clusters were compared and a t-score calculated to measure the capability of the bin to discriminate between genes with high and low expression levels. <b>(d) </b>Hierarchical clustering of the genes based on the signal profiles of H3K79me2 across the 160 bins. The resulting tree is also split at the top branch, leading to two gene clusters. <b>(e) </b>Distributions of expression levels of genes in the two clusters in (d). The expression levels are significantly different according to <it>t</it>-test (<it>P </it>= 4E-93). <b>(f) </b>T-scores for the differential expression of the two gene clusters based on hierarchical clustering of bin profiles for each individual chromatin feature. Cyan and blue colors indicate a significant positive and negative correlation between a chromatin feature and gene expression levels, respectively. Black color indicates that a chromatin feature could not significantly discriminate between genes with high and low expression levels. To visualize the clustering, 2,000 randomly selected genes are shown. The data for gene expression levels and chromatin features are from the EEMB stage.</p>
</text><graphic file="gb-2011-12-2-r15-3"/></fig>
<p>To explore which regions around the TSS and TTS provide the greatest power in determining gene expression levels, we repeated the two-way clustering procedure for each of the 160 bins around TSSs and TTSs. Figure <figr fid="F3">3c</figr> shows the resulting t-statistics. We observe that the signals slightly downstream of TSSs are the most informative. In general, the t-statistics decrease as the distance from the TSS or TTS increases. The decay is steeper at the region downstream of TTSs.</p>
<p>The above integrative analysis involves all chromatin features. To examine how each feature individually affects gene expression, for each feature we performed hierarchical clustering of the genes based on the collective signals of the feature at all 160 bins. An example is shown in Figure <figr fid="F3">3d</figr>, in which signals of the single feature H3K79me2 at the different bins were used to cluster the genes. As in the case when all chromatin features were used, the signals from single chromatin features can divide genes into two clusters (that are not exactly the same as, but similar to, the ones obtained from all features) with a significant difference in expression level (Figure <figr fid="F3">3e</figr>). Again we quantified the power of each feature in distinguishing genes with high and low expression levels using t-statistics. As shown in Figure <figr fid="F3">3f</figr>, apart from a few exceptions (black bars), most features are informative. The most informative features are H3K79me2, H3K79me3 and H3K4me2. The informative features can be further grouped into two classes. Activating features are those that are positively correlated with gene expression (cyan) and repressive features are those that are negatively correlated (blue).</p>
</sec>
<sec>
<st>
<p>Chromatin features can statistically predict gene expression levels with high accuracy using supervised integrative models</p>
</st>
<p>The above analyses suggest that gene expression levels can be at least partially deduced from chromatin features. To examine how much of gene expression is determined by chromatin features, we tried to predict gene expression levels using the features. We started with the simplified task of distinguishing highly expressed and lowly expressed transcripts, where the two classes of transcripts were constructed by discretizing gene expression levels (see Materials and methods). We divided all the transcripts into training and testing sets, and learned a support vector machine (SVM) model from the signals of all 13 chromatin features of the training transcripts at a certain bin (Figure <figr fid="F1">1</figr>). The model was then used to predict to which class each transcript in the testing set belongs. We repeated the procedure for all 160 bins, and 100 different random splitting of the transcripts into training and testing sets for each bin (see Materials and methods). We represented the overall performance of the model using the receiver operating characteristic (ROC) curve and further quantified the accuracy using the area under the curve (AUC). Figure <figr fid="F4">4a</figr> shows the ROCs corresponding to the prediction performance of five different bins. Compared to random ordering, which would give a diagonal ROC curve on average with an expected AUC of 0.5, we observed that all five curves are much better than random but with diverse performance, which indicates that all the bins are useful to classify gene expression but they are not equally informative. This result is consistent with what we have observed using the unsupervised method described above (Figure <figr fid="F3">3f</figr>). Instead of using SVM, we also learned support vector regression (SVR) models using similar procedures (see Materials and methods) to predict expression values directly. Figure <figr fid="F4">4b</figr> shows that there is a high positive correlation (0.75) between the predicted levels from an SVR model and the actual expression levels measured by RNA-seq. This analysis suggests that chromatin features explain at least 50% of gene expression variation (see Materials and methods).</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Prediction power of the supervised models</p></caption><text>
   <p><b>Prediction power of the supervised models</b>. <b>(a) </b>ROC curves for five different bins based on the results of the SVM classification models. <b>(b) </b>Predicted versus experimentally measured expression levels. The SVR regression model was applied to bin 1 for predicting gene expression levels. (PCC, Pearson correlation coefficient). <b>(c) </b>The prediction accuracy of SVM classification models for all the 160 bins. For each bin, we constructed an SVM classification model and summarized its accuracy using the AUC score. The AUC scores were calculated based on cross-validation repeated 100 times for each bin. The red curve shows the average AUC scores (mean of 100 repeats) of the bins and the blue bars indicate their standard deviations. The positions of the TSS and TTS are marked by dotted lines.</p>
</text><graphic file="gb-2011-12-2-r15-4"/></fig>
<p>We then compared the prediction accuracy of all 160 SVM models learned from the different bins. As shown in Figure <figr fid="F4">4c</figr>, the models learned from regions around the TSS (-300 to 500 bp) and upstream of the TTS (-200 bp to 0 bp) have highest accuracy, with AUC values greater than 0.9. Prediction accuracy decreases gradually as we move away from these regions, which confirms the spatial effects that we observed from the unsupervised analysis (Figure <figr fid="F3">3c</figr>).</p>
<p>We have also tested more comprehensive models that combine the chromatin features in 40 bins around the TSS (-2 kb to 2 kb). These comprehensive models achieve slightly higher prediction accuracy than those based on single bins, yet the enhancement is not dramatic, with an average AUC of 0.94 for the classification model (SVM) and an average correlation coefficient of 0.75 for the regression model (SVR) (Figure <figr fid="F6">6</figr> in Additional file <supplr sid="S6">6</supplr>).</p>
<suppl id="S6">
<title>
<p>Additional file 6</p>
</title>
<text>
<p>
<b>Prediction of gene expression using chromatin features in all the 40 bins around the TSS (from -2 kb to 2 kb)</b>. <b>(a) </b>ROC curve of the SVM classification model. <b>(b) </b>Predicted expression levels versus actual expression levels measured by RNA-seq experiment. PCC, Pearson correlation coefficient.</p>
</text>
<file name="gb-2011-12-2-r15-S6.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>We then learned SVM models using only features of individual types. As shown in Figure <figr fid="F5">5a</figr>, the AUC obtained by using all features (black) is comparable to the AUCs obtained from models using only particular subsets of features. Strikingly, the model involving only the 9 histone modification features is almost as accurate as the model involving all 16 features. We further divided the histone modification features into four subsets: modifications on K4, K9, K36 and K79, respectively. While the integrated model with all histone modifications achieves an AUC value of 0.9, using just one of the subsets can yield an AUC higher than 0.8 (Figure <figr fid="F5">5b</figr>). In particular, the set H3K79 is found to be most predictive, which again confirms our previous finding of the importance of these histone modifications in regulating gene expression (Figure <figr fid="F3">3f</figr>).</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Prediction power of the SVM models using the signals from different subsets of chromatin features in the 100 nucleotides around the TSS (bin 1). The results are based on cross-validation with 100 trials</p></caption><text>
   <p><b>Prediction power of the SVM models using the signals from different subsets of chromatin features in the 100 nucleotides around the TSS (bin 1). The results are based on cross-validation with 100 trials</b>. <b>(a) </b>ALL, all 21 chromatin features; H3, the two H3 features; HIS, the 11 chromatin modification features; XIF, the seven binding profile features for X-inactivation factors; POLII, the binding profile feature for RNA polymerase II. <b>(b) </b>HIS, the 11 chromatin modification features; H3K79ME, H3K79me1, H3K79me2 and H3K79me3; H3K9ME, H3K9me2, H3K9me3(Ab1) and H3K9me3(Ab2); H3K36ME, H3K36me2(Ab1), H3K36me2(Ab2) and H3K36me3; H3K4ME, H3K4me3 and H3K4me3.</p>
</text><graphic file="gb-2011-12-2-r15-5"/></fig>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>Co-regulation of transcription by pairs of histone modifications</p></caption><text>
   <p><b>Co-regulation of transcription by pairs of histone modifications</b>. <b>(a) </b>Categorization of genes into four groups based on signals of H3K4me3 and H3K36me3: HH (magenta), HL (green), LH (cyan) and LL (blue). The signals of histone marks H3K36me3 and H3K4me3 exhibit a bimodal feature. Signals are thus classified into H and L by a Gaussian mixture model. The distributions of expression levels of the four gene groups are shown on the right. <b>(b) </b>Same as (a), based on signals of H3K9me3 and H3K79me3. Same as above, the signal of H3K79me3 is again classified by a Gaussian mixture model. The signals of H3K9me3 do not display a bimodal feature; signals are classified into H and L based on whether the value is higher than or lower than the median.</p>
</text><graphic file="gb-2011-12-2-r15-6"/></fig>
<p>The results of the supervised analysis suggest that chromatin features are not only correlated with expression but are also predictive of the expression levels of individual genes with good accuracy and could explain a large portion of the expression differences between different genes. We note that histone modifications may have other regions of enrichment that are informative about gene expression: for instance, the percentage of gene length with strong histone modification signals. We therefore examined the power of using these features for predicting gene expression levels. Specifically, we calculated the percentage of transcribed regions with strong signals (&gt;10%) for all genes. Using them as predictors, we obtained high prediction accuracy (AUC = 0.90). However, a combination of these percentage features with the original chromatin features does not lead to obvious improvement in prediction accuracy, indicating that they are redundant.</p>
</sec>
<sec>
<st>
<p>Combination of chromatin features contribute to gene expression prediction</p>
</st>
<p>Both the unsupervised and supervised analyses above suggest that chromatin features possess a certain level of redundancy. In the unsupervised clustering (Figure <figr fid="F3">3a</figr>), different chromatin features show similar signal patterns around the TSS regions of genes. In the supervised predictions (Figure <figr fid="F5">5</figr>), high accuracy was achieved by multiple features as well as feature subsets. Though the SVR model offers good prediction power, it may be instructive to build a simpler linear regression model to explore to what extent the chromatin features are redundant, and to what extent they are interacting in a combinatorial fashion. Specifically, for each bin, we modeled the expression level <it>y </it>as a linear combination of the effects of individual histone modification features <it>x</it>
<sub>
<it>i </it>
</sub>and their products <it>x</it>
<sub>
<it>i</it>
</sub>
<it>x</it>
<sub>
<it>j</it>
</sub>:</p>
<p>
<display-formula>
<m:math name="gb-2011-12-2-r15-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>y</m:mi>
   <m:mo>~</m:mo>
   <m:mstyle displaystyle="true">
      <m:mo>&#8721;</m:mo>
      <m:mrow>
         <m:msub>
            <m:mi>x</m:mi>
            <m:mi>i</m:mi>
         </m:msub>
         <m:mo>+</m:mo>
         <m:mstyle displaystyle="true">
            <m:munder>
               <m:mo>&#8721;</m:mo>
               <m:mrow>
                  <m:mi>i</m:mi>
                  <m:mo>&lt;</m:mo>
                  <m:mi>j</m:mi>
               </m:mrow>
            </m:munder>
            <m:mrow>
               <m:msub>
                  <m:mi>x</m:mi>
                  <m:mi>i</m:mi>
               </m:msub>
               <m:msub>
                  <m:mi>x</m:mi>
                  <m:mi>j</m:mi>
               </m:msub>
            </m:mrow>
         </m:mstyle>
      </m:mrow>
   </m:mstyle>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>We found that among the 66 (12 &#215; 11/2) possible interactions between the 12 distinct histone modification features, many interactions are statistically significant. For example, for bin 1, we detected 12 significant interactions (<it>P </it>&lt; 0.001, linear regression) between the histone modifications (Table S7 in Additional file <supplr sid="S7">7</supplr>).</p>
<suppl id="S7">
<title>
<p>Additional file 7</p>
</title>
<text>
<p>
<b>Interaction between all possible pairs of histone modifications</b>. Interaction between all possible pairs of histone modification as indicated by linear model in bin 1. For each pair, both the results of linear models with the interaction terms (Interaction models) and without the interaction terms (Singleton models) are shown.</p>
</text>
<file name="gb-2011-12-2-r15-S7.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<p>To quantify the importance of these interactions in determining gene expression levels, we compared the above regression model with a singleton model that does not contain the interaction terms:</p>
<p>
<display-formula>
<m:math name="gb-2011-12-2-r15-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>y</m:mi>
   <m:mo>~</m:mo>
   <m:mstyle displaystyle="true">
      <m:mo>&#8721;</m:mo>
      <m:mrow>
         <m:msub>
            <m:mi>x</m:mi>
            <m:mi>i</m:mi>
         </m:msub>
      </m:mrow>
   </m:mstyle>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>By evaluating the prediction power of the two models using a cross-validation method, we found that with respect to the singleton model the interaction model improves prediction accuracy by 4%. Thus, the contribution of interactions among chromatin features to gene expression prediction is not substantial.</p>
<p>We further examined each pair of modifications individually to see if there is any redundancy between any of the modifications. Using simplified models each involving only two modification features, we found that no two histone modifications are completely redundant (Table S8 in Additional file <supplr sid="S8">8</supplr>). These results were confirmed by a similar analysis based on mutual information (Figure S9 in Additional file <supplr sid="S9">9</supplr>). Two examples are shown in Figure <figr fid="F6">6</figr>. In each example, we considered a specific pair of histone modification features, and divided all genes into four categories based on the signals of the two features at their TSS bins. In the first example (Figure <figr fid="F6">6a</figr>), expression levels are the lowest when both H3K4me3 and H3K36me3 are low but moderate if either one of them is high. This suggests that both features are activators. When both features have high signals, an even higher expression level is observed, showing that the two are not totally redundant. In the second example (Figure <figr fid="F6">6b</figr>), H3K9me3 is found to repress gene expression in general, while H3K79me3 is found to activate gene expression. As expected, a combination of high H3K9me3 signal and low H3K79me3 signal results in a lower expression level than when both signals are low. When the signals of both features are high, we observe a significant difference in gene expression compared to the other three cases, indicating that the features contribute to gene expression regulation in a collective manner.</p>
<suppl id="S8">
<title>
<p>Additional file 8</p>
</title>
<text>
<p>
<b>The significant interactions between chromatin features based on a linear model</b>. The significant interactions between chromatin features based on a linear model with 12 different chromatin features and their pairwise interaction terms.</p>
</text>
<file name="gb-2011-12-2-r15-S8.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S9">
<title>
<p>Additional file 9</p>
</title>
<text>
<p>
<b>Mutual information between expression and pairwise histone modification signals</b>. For each pair of histone modifications (denoted as H1, H2), the heat map shows the normalized mutual information I(E, H1 AND H2)/max(I(E,H1),I(E,H2)). For pairs such as H3K4me2 and K4K36me3, the combination of two features gives a higher predictive power than the two individual features.</p>
</text>
<file name="gb-2011-12-2-r15-S9.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>Our analyses of the interactions between the above chromatin features only considered binary interactions between two features. For higher-order relationships involving more features, it is infeasible to perform the same type of analyses, as the number of feature combinations would become intractable. Also, the above analyses only suggest which features interact with each other, but do not explain how the features interact. In particular, the complex correlations between features and gene expression make it difficult to extract directional relationships between them (Figure S10 in Additional file <supplr sid="S10">10</supplr>). We therefore used Bayesian networks to study the higher order relationships between the chromatin features and gene expression (see Additional file <supplr sid="S11">11</supplr> for details).</p>
<suppl id="S10">
<title>
<p>Additional file 10</p>
</title>
<text>
<p>
<b>Interactions among chromatin features and expression</b>. <b>(a) </b>Node colors indicate the correlation of the corresponding features with gene expression. Edge colors indicate the correlation between the two connected features. Only interactions with a strong correlation (|PCC| &gt;0.3) are shown. <b>(b) </b>The directional relationships inferred from Bayesian network analysis. Arrow sizes indicate the confidence scores of the directed edges. Only interactions with a confidence score (combined for both directions) of at least 80% are shown.</p>
</text>
<file name="gb-2011-12-2-r15-S10.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S11">
<title>
<p>Additional file 11</p>
</title>
<text>
<p>
<b>Supplementary documents about the Bayesian network analysis and so on</b>. The file contains additional information about the Bayesian network analysis.</p>
</text>
<file name="gb-2011-12-2-r15-S11.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>The chromatin model is developmental stage-specific</p>
</st>
<p>We have previously constructed an integrative model using chromatin features at the EEMB stage of <it>C. elegans </it>development and used it to predict gene expression levels at the same stage. How well can we predict gene expression levels at other developmental stages using the chromatin feature data from EEMB? To answer this question, we applied the model to predict gene expression at EEMB, L1 (larva stage 1), L2, L3, L4, and adult. Specifically, the chromatin feature data from EEMB were combined with expression data from a stage to train a SVM model, which was then used to predict gene expression levels of other genes at that stage. As shown in Figure <figr fid="F7">7</figr>, the chromatin model based on EEMB data is able to predict the expression at other developmental stages with reasonable accuracy (AUC = 0.8). However, the predictions of gene expression levels in all these stages have lower accuracy than the predictions for EEMB itself. This result suggests that signals from chromatin features are developmental stage-specific and regulate biological processes in a dynamic manner depending on the particular stage. The stage specificity is more apparent when we apply the model to genes that are differentially expressed between stages. For example, we have identified 4,042 genes that differ in expression levels by at least four-fold between EEMB and L3 stages. Using the EEMB stage chromatin model to predict the expression level of these genes, the prediction accuracy further decreases (AUC = 0.70).</p>
<fig id="F7"><title><p>Figure 7</p></title><caption><p>Developmental stage specificity of the chromatin model</p></caption><text>
   <p><b>Developmental stage specificity of the chromatin model</b>. The EEMB model was constructed using the chromatin features and gene expression data both at the EEMB stage. The model was then used to predict gene expression levels at the EEMB stage and five other developmental stages: L1, L2, L3, L4 and adult. ROC curves are plotted based on the results of 100 trials of cross-validation. For each trial, the dataset was randomly separated into two halves: one half as training data and the other as testing data to estimate the accuracy of the model. The values in parentheses are AUC scores.</p>
</text><graphic file="gb-2011-12-2-r15-7"/></fig>
</sec>
<sec>
<st>
<p>Chromatin features show different correlation patterns with different genes in an operon</p>
</st>
<p>In <it>C. elegans </it>some neighboring genes are organized into operons. The genes in an operon are co-transcribed as a polycistronic pre-messenger RNA and processed into monocistronic mRNAs <abbrgrp>
<abbr bid="B39">39</abbr>
<abbr bid="B40">40</abbr>
</abbrgrp>. Here we investigate the differential signals of chromatin features among genes in operons and how this organization affects their expression levels. We collected the first, second and last genes in 881 <it>C. elegans </it>operons and calculated the signals of chromatin features in each of the 160 bins around their annotated TSS and TTS. We observed strong correlations between expression levels and chromatin feature signals for the first genes (Figure <figr fid="F8">8</figr>). In comparison, the correlation patterns for the second and last genes of the operons are not as apparent (Figure S12 in Additional file <supplr sid="S12">12</supplr>). The weaker correlations could be caused by the lack of signals for some histone modification types. As we observed, the mark for active promoters, H3K4me3, demonstrates strong signals around the TSS of the first genes, which is the shared promoter of genes in the same operon. In the upstream region of the internal genes, the H3K4me3 signal is often relatively weak. Alternatively, the weak correlation for internal genes may also be explained by the intensive post-transcriptional regulation of these genes, which can not be captured by our chromatin feature based model <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>. In fact there is only weak correlation (Pearson correlation coefficient (PCC) = 0.10) between the expression levels of the first and the second genes. Moreover, on average the first genes are two-fold and three-fold more highly expressed than the second genes and the last genes, respectively. Taken together, although genes in the operons are co-transcribed, they are regulated post-transcriptionally to achieve distinct expression levels <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>.</p>
<fig id="F8"><title><p>Figure 8</p></title><caption><p>Correlation patterns of H3K4me3 and H3K79me3 in the 160 bins around the TSS and TTS (from 4 kb upstream to 4 kb downstream) with the expression levels of the first, second and last genes of 881 <it>C. elegans </it>operons</p></caption><text>
   <p><b>Correlation patterns of H3K4me3 and H3K79me3 in the 160 bins around the TSS and TTS (from 4 kb upstream to 4 kb downstream) with the expression levels of the first, second and last genes of 881 <it>C. elegans </it>operons</b>.</p>
</text><graphic file="gb-2011-12-2-r15-8"/></fig>
<suppl id="S12">
<title>
<p>Additional file 12</p>
</title>
<text>
<p>
<b>Correlation patterns of chromatin features in 40 bins around the TSS and TTS (from -2 kb to 2 kb) of the first and the second genes in 881 worm operons</b>.</p>
</text>
<file name="gb-2011-12-2-r15-S12.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Chromatin models learned from protein-coding genes are able to predict microRNA expression levels with high accuracy</p>
</st>
<p>Do chromatin features influence transcription of microRNAs in the same way as they do with protein-coding genes? As a way to study the similarity of the two mechanisms, we investigated the effectiveness of the chromatin model learned from protein-coding genes in predicting microRNA expression. Since precise TSSs are not available for most worm microRNAs, we calculated the signals of chromatin features in the genomic regions corresponding to pre-microRNAs, and used them as the input features for our chromatin model.</p>
<p>We predicted the expression levels of 162 worm microRNAs with genomic locations obtained from miRBASE <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp>. We then compared our predictions with the experimental measurements performed by Kato <it>et al. </it>
<abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp>. As shown in Figure <figr fid="F9">9</figr>, our predictions are in good agreement with the experimental results in the EEMB stage (see also the prediction results for the L3 stage in Figure S13 in Additional file <supplr sid="S13">13</supplr>). Some microRNAs locate within or near gene loci, which may confound the prediction of microRNA expression. To address this issue, we also checked the prediction accuracy using only microRNAs that are away from any known gene, and obtained similar prediction accuracy (PCC = 0.62).</p>
<fig id="F9"><title><p>Figure 9</p></title><caption><p>Prediction of expression levels of microRNAs at the EEMB stage</p></caption><text>
   <p><b>Prediction of expression levels of microRNAs at the EEMB stage</b>. <b>(a) </b>Predicted expression levels of the experimentally measured highly and lowly expressed microRNAs based on small RNA-seq results. Expression levels of microRNAs at the EEMB stage were predicted using an SVR regression model trained on data for protein-coding genes at the same stage. <b>(b) </b>Predicted versus experimentally measured expression levels of microRNAs at the EEMB stage. R is the Pearson correlation coefficient.</p>
</text><graphic file="gb-2011-12-2-r15-9"/></fig>
<suppl id="S13">
<title>
<p>Additional file 13</p>
</title>
<text>
<p>
<b>Predicted expression levels of microRNAs at stage L3</b>. MicroRNAs are divided into high (red) and low (green) groups based on their measured expression levels in small RNA-seq experiments.</p>
</text>
<file name="gb-2011-12-2-r15-S13.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>It is interesting to see that the expression of microRNAs can be accurately predicted using a chromatin model trained by data for protein-coding genes. Consistent with previous reports on microRNA transcriptional regulation <abbrgrp>
<abbr bid="B44">44</abbr>
<abbr bid="B45">45</abbr>
</abbrgrp>, this result suggests that microRNAs and protein-coding genes share a similar mechanism of transcriptional regulation by chromatin modifications.</p>
<p>As with the prediction of expression levels of protein-coding genes, the prediction accuracy of microRNA expression also shows developmental stage specificity. When the signals of the chromatin features from the EEMB stage were used, the resulting model achieved the best accuracy when predicting microRNA expression at the same stage (PCC = 0.60), whereas for stages L1, L2, L3, L4 and adult, the accuracy is much lower (PCC &lt; 0.50) (Figure S14 in Additional file <supplr sid="S14">14</supplr>). Similarly, when chromatin features at L3 were used to train the model, the model achieved better prediction results in L3 than in other stages.</p>
<suppl id="S14">
<title>
<p>Additional file 14</p>
</title>
<text>
<p>
<b>Stage specificity of chromatin models for microRNA expression predictions</b>. The chromatin model was trained using the chromatin and expression data of protein-coding genes at the EEMB stage. The model was then used to predict microRNA expression levels at six stages. R indicates the Pearson correlation coefficient between the predicted expression levels and the actual expression levels from RNA-seq experiments.</p>
</text>
<file name="gb-2011-12-2-r15-S14.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Application to other organisms</p>
</st>
<p>The models described above provide a useful tool to integrate gene expression and chromatin data. Currently, the <it>C. elegans </it>dataset is the best one to demonstrate the utility of the method and we have focused on it here. However, we know that further integrated genomic datasets (comprising matched genome-wide histone features and expression measurements) are coming in many other organisms. Thus, to illustrate the broad utility of our method, we demonstrate here how readily it can be applied in other contexts. Specifically, we have packaged our methods as a tool and applied it to data sets from four other organisms: yeast, fruit fly, mouse and human. The results indicate that chromatin features, in particular histone modifications, are highly correlated to gene expression levels in all these organisms (Figure <figr fid="F10">10</figr>). More importantly, the relative statistical contribution of each histone modification type to expression is similar in tested organisms (and also in different tissues, cell-lines, and developmental stages). For example, H3K4me3 signals around the TSS of genes show high predictive capability in all the analyses we have performed. We also found that the models based on expression levels measured by RNA-seq achieved higher prediction accuracy than those by microarrays, consistent with the higher measurement accuracy of RNA-seq compared to microarrays. Our method can, of course, be applied to multiple data sets in each species (for example, different developmental stages in fruit fly). Figure <figr fid="F10">10</figr> shows only a single illustrative example for each species. We only show initial statistical analysis here, further biological interpretation would, of course, be the subject of future studies.</p>
<fig id="F10"><title><p>Figure 10</p></title><caption><p>Prediction accuracy of the chromatin model in four other species</p></caption><text>
   <p><b>Prediction accuracy of the chromatin model in four other species</b>. <b>(a-d) </b>Expression levels of genes are predicted using the SVR method. In yeast, average signals of chromatin features from the TSS to 500 bp upstream were used as predictors (a); in the other species, signals of chromatin features within the bin at the TSS (bin 1) were used as predictors (b-d). E4-8 h: embryonic stage at 4 to 8 h; ESC, embryonic stem cell.</p>
</text><graphic file="gb-2011-12-2-r15-10"/></fig>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<p>In this study, we present a systematic analysis of the genome-wide relationship between chromatin features and gene expression. We have shown that, in terms of gene expression prediction, information from different histone modification features is considerably redundant. Here in this paper, we use the modENCODE worm data to exemplify our analysis. In fact, we have applied our methods to two other histone modification data sets: human CD4+ T-cell data <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp> and mouse embryonic stem cell deta <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp>. In both data sets, we found that histone modifications account for more than 50% of variation of gene expression and distinct modification types were redundant for predicting gene expression levels. This is consistent with a recent study by Karlic <it>et al. </it>
<abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp> performed in human CD4+ T cells.</p>
<p>The existence of a 'histone code' has been intensively debated since the time that the hypothesis was first proposed 10 years ago <abbrgrp>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
</abbrgrp>. Previous studies have demonstrated both pros and cons for the hypothesis <abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B28">28</abbr>
<abbr bid="B49">49</abbr>
<abbr bid="B50">50</abbr>
</abbrgrp>. Indeed, for some specific genes, it has been demonstrated that the patterns of a subset of histone marks could be viewed as an accurate predictor of gene regulation in non-trivial manners <abbrgrp>
<abbr bid="B50">50</abbr>
</abbrgrp>. Nevertheless, the readout of these patterns is largely gene specific and dependent on the cellular context, which makes it difficult for these cooperative effects to be viewed as a universal 'code'. Therefore, by using the term histone code, we might have underestimated the complexity and over-generalized the meaning of chromatin modifications and their roles in biological processes. On the other hand, at a global level, previous studies have reported substantial correlations among distinct chromatin features <abbrgrp>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B28">28</abbr>
<abbr bid="B51">51</abbr>
</abbrgrp>. These results, and the information redundancy we observed, are consistent with the simple 'histone code' argument <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>, in which the combinatorial effects are cumulative rather than synergistic.</p>
<p>We have shown that chromatin features are strongly correlated with gene expression. Nevertheless, it should be noted that our models could not reveal if histone modifications are the 'cause' or 'consequence' of transcription. In fact, both directions of causality have been previously reported. Some studies have proposed that some histone modifications are the memory of past transcriptional events resulting from previous active transcription <abbrgrp>
<abbr bid="B52">52</abbr>
<abbr bid="B53">53</abbr>
<abbr bid="B54">54</abbr>
</abbrgrp>. For instance, it has been shown that phosphorylation in the tail of Pol II is required for H3K4me3, revealing that it is a direct consequence of Pol II passing through the TSS <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp>. Other studies, however, have shown that chromatin modification changes precede changes in gene expression <abbrgrp>
<abbr bid="B56">56</abbr>
</abbrgrp>. A recent study in human T cells suggested that, for both protein-coding and miRNA genes, activating histone marks were already in place before induction of expression, and these marks were maintained even after the genes were silenced <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>. This finding shows that histone modification can be both cause and consequence of gene transcription, and that a full explanation will require incorporation of additional data. Generalizing our model to follow a time course of changing histone modifications might be helpful for understanding this issue.</p>
<p>The supervised chromatin model trained from expression data for protein-coding genes can accurately predict the abundance of both protein-coding genes and microRNAs, which suggests that microRNAs and protein-coding genes share similar mechanisms of transcriptional regulation by chromatin modifications <abbrgrp>
<abbr bid="B44">44</abbr>
<abbr bid="B45">45</abbr>
</abbrgrp>. To predict the expression levels of microRNAs, we used the signal of chromatin features around the start sites associated with pre-microRNAs, which might be several kilobases from the actual TSS of microRNA genes. Despite this caveat, our model still achieved high prediction power. We expect to obtain more accurate predictions if more precise annotation for microRNA genes becomes available in the future.</p>
<p>In summary, we have presented a series of supervised and unsupervised methods for analyzing multiple aspects of the regulation of gene expression by chromatin features. Apart from predicting gene expression, these methods can be used to address important biological questions such as combinatorial regulation and microRNA transcription. These and other statistical methods will be essential to gaining new understanding of biological processes from the tremendous amount of data that will soon be made available by large collaborative projects such as modENCODE.</p>
</sec>
<sec>
<st>
<p>Materials and methods</p>
</st>
<sec>
<st>
<p>Datasets and gene annotation</p>
</st>
<p>Expression levels for all annotated worm transcripts at different stages of development, including EEMB, mid-L1, mid-L2, mid-L3, mid-L4 and young adult stages, were quantified using RNA-seq. Pol II binding across the genome at different stages was profiled using ChIP-seq. All the other chromatin features were profiled using ChIP-chip experiments. These chromatin features include histone H3 occupation, histone methylations (H3K4me2, H3k4me3, H3K9me2, H3k9me3, H3k27me3, H3K36me2, H3K36me3, H3K79me1, H3K79me2 and H3K79me3), binding of dosage compensation complex (DCC) proteins (SDC2, SDC3, DPY27, DPY28 and MIX1) and other X-chromosome inactivation factors (MES4 and MRG1). For some chromatin features such as H3K9me3, biological replicates using different antibodies were available. Profiles of these chromatin features were measured for different developmental stages, in particular at EEMB and L3 stages. A list of the data, with their Gene Expression Omnibus IDs can be found in Additional file <supplr sid="S15">15</supplr>. All these data are available from the modENCODE website at <abbrgrp>
<abbr bid="B57">57</abbr>
</abbrgrp>. Operon information for <it>C. elegans </it>was obtained from a previous study by Blumenthal <it>et al. </it>
<abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>. The dataset contains a total of 881 operons with 2.6 genes in each of them on average.</p>
<suppl id="S15">
<title>
<p>Additional file 15</p>
</title>
<text>
<p>
<b>Gene Expression Omnibus accession ID of data sets used in this work</b>.</p>
</text>
<file name="gb-2011-12-2-r15-S15.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<p>MicroRNA expression levels at different developmental stages of <it>C. elegans </it>were obtained from small RNA-seq measurements performed by Kato <it>et al. </it>
<abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp>. Annotation of worm transcripts was downloaded from WormBase at <abbrgrp>
<abbr bid="B58">58</abbr>
<abbr bid="B59">59</abbr>
</abbrgrp>. Annotation of nematode microRNAs was downloaded from the microRNA database miRBASE at <abbrgrp>
<abbr bid="B42">42</abbr>
<abbr bid="B60">60</abbr>
</abbrgrp>. Assembly version WS180 of <it>C. elegans </it>was used for gene and microRNA annotations and data processing of all the chromatin features.</p>
</sec>
<sec>
<st>
<p>Binning DNA regions</p>
</st>
<p>We obtained the genomic locations and structures of 27,310 protein-coding transcripts of <it>C. elegans </it>from WormBase. The contribution of each chromatin feature to gene expression is thought to be affected by many factors, in particular its position relative to the TSS. We therefore divided the DNA region from 4 kb upstream to 4 kb downstream of the TSS of each transcript into 80 small bins, each of 100 bp in size. The DNA region around the TTS of each transcript was also divided into 80 100-bp bins. For each bin, we calculated the average signal of each chromatin feature across all transcripts. Specifically, for chromatin features profiled by ChIP-chip experiments, the signals of the probes that fall into a bin were averaged. For features profiled by ChIP-seq experiments, the number of reads that cover a bin was counted and weighted according to their overlap with the bin. We note that for short transcripts less than 8 kb in length, some bins around the TSS and TTS overlap, and for transcripts representing alternative splicing isoforms of the same gene or located close to each other in the genome, their bins can also overlap. To ensure these issues do not affect our main findings, we have performed analysis using only genes that are longer than 8 kb and genes that are far away from coding genes (see main text). It should also be noted that the precise TSS and TTS of worm transcripts are largely unknown and the locations used here usually represent the start and end positions of the protein-coding regions.</p>
</sec>
<sec>
<st>
<p>Hierarchical clustering</p>
</st>
<p>The data processing described above results in a matrix A<sub>n &#215; m </sub>for each of the 160 bins, where n is the number of transcripts and m is the number of chromatin features. To make the signals for different chromatin features comparable, we normalized the columns of A by subtracting the median and then divided by the standard deviation of each column across all transcripts. We performed hierarchical clustering analysis using the normalized matrix for a given bin. To evaluate the capability of a bin to discriminate between genes with high and low expression levels, we divided the transcripts into two clusters by splitting the resulting hierarchical tree at the top level. The expression levels of transcripts in the two clusters measured by RNA-seq experiments were compared using <it>t</it>-test. We repeated this procedure for all 160 bins, which resulted in a t-score for each bin. Those t-scores reflect the capability of chromatin features in these bins to separate genes with low and high expression levels.</p>
<p>Similarly, given a specific feature, we performed hierarchical clustering using its signals across all 160 bins. The clustering analysis was conducted for all chromatin features, and the capability of each feature to predict gene expression was evaluated and compared by their t-scores calculated as described above.</p>
</sec>
<sec>
<st>
<p>Supervised models for gene expression prediction</p>
</st>
<p>We constructed supervised learning models to integrate the chromatin features for gene expression prediction. In principle, the chromatin features of each of the 160 bins could contribute to regulation of gene expression. We therefore constructed the model in a bin-specific manner to investigate the relative importance of each bin for regulation of gene expression. We devised both classification and regression models, implemented by using the SVM and SVR <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp> methods, respectively.</p>
<p>In the classification model the expression levels of transcripts at a particular developmental stage (measured by RNA-seq and quantified as RPKM (reads per kilobases per million mapped reads)) were discretized into two classes, with high and low expression level, respectively, by setting the median expression levels as the cutoff values. The chromatin features in a given bin were then used as classifiers to predict the two classes. The prediction power of the classification model was evaluated using cross-validation. Specifically, we split the whole dataset into two halves, the training data and the testing data. The SVM model was first trained on the training data and then used to predict the classes of expression levels of the transcripts in the testing data. The predicted classes at various thresholds were compared with their actual classes to calculate the sensitivity (also called true positive rate, the proportion of actual positives that are correctly identified) and specificity (also called true negative rate, the proportion of negatives that are correctly identified). The tradeoff between sensitivity and specificity can be best visualized as a graphical plot of the sensitivity against 1 - specificity, which is called a ROC curve. The area under the ROC curve (AUC) is a frequently used summary statistic for measuring the prediction power of classification models.</p>
<p>In the regression model, we directly predicted the expression levels of transcripts rather than classifying them into two broad expression categories. The prediction power of the regression model was also checked using cross-validation. The SVR model was trained on the training data and applied to the testing data. Then the predicted expression levels for transcripts in the testing data were compared with their actual levels measured by RNA-seq experiment. The correlation between predicted and actual expression level indicates the prediction power of the model.</p>
<p>In a linear regression model, the square of the correlation (R<sup>2</sup>) between the predicted values and the actual values is equal to the fraction of total variance in the observed data explained by the predictions. We used this quantity to estimate how much variation of gene expression can be explained by the chromatin features.</p>
<p>To estimate the predictive power of classification and regression models for each of the 160 bins, we repeated the cross-validation procedure 100 times. The mean and standard deviation of the resulting 100 AUC scores were calculated for each bin as a measurement of the predictive power of the SVM classification model. Similarly, the accuracy of the SVR model for a bin was reflected by the mean and standard deviation of the 100 correlation coefficients.</p>
</sec>
<sec>
<st>
<p>Detecting combinatorial effects of chromatin features using linear models</p>
</st>
<p>To investigate the interaction between chromatin features, we constructed and compared the following two linear models:</p>
<p>
<display-formula>
<m:math name="gb-2011-12-2-r15-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>y</m:mi>
   <m:mo>~</m:mo>
   <m:mstyle displaystyle="true">
      <m:mo>&#8721;</m:mo>
      <m:mrow>
         <m:msub>
            <m:mi>x</m:mi>
            <m:mi>i</m:mi>
         </m:msub>
      </m:mrow>
   </m:mstyle>
   <m:mo>+</m:mo>
   <m:mstyle displaystyle="true">
      <m:munder>
         <m:mo>&#8721;</m:mo>
         <m:mrow>
            <m:mi>i</m:mi>
            <m:mo>&lt;</m:mo>
            <m:mi>j</m:mi>
         </m:mrow>
      </m:munder>
      <m:mrow>
         <m:msub>
            <m:mi>x</m:mi>
            <m:mi>i</m:mi>
         </m:msub>
         <m:msub>
            <m:mi>x</m:mi>
            <m:mi>j</m:mi>
         </m:msub>
      </m:mrow>
   </m:mstyle>
   <m:mtext>&#8195;</m:mtext>
   <m:mrow>
      <m:mo>(</m:mo>
      <m:mrow>
         <m:mtext>Interaction&#160;model</m:mtext>
      </m:mrow>
      <m:mo>)</m:mo>
   </m:mrow>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>
<display-formula>
<m:math name="gb-2011-12-2-r15-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>y</m:mi>
   <m:mo>~</m:mo>
   <m:mstyle displaystyle="true">
      <m:mo>&#8721;</m:mo>
      <m:mrow>
         <m:msub>
            <m:mi>x</m:mi>
            <m:mi>i</m:mi>
         </m:msub>
      </m:mrow>
   </m:mstyle>
   <m:mtext>&#8195;</m:mtext>
   <m:mrow>
      <m:mo>(</m:mo>
      <m:mrow>
         <m:mtext>Singleton&#160;model</m:mtext>
      </m:mrow>
      <m:mo>)</m:mo>
   </m:mrow>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>The Interaction model takes into account the interaction terms. Based on the Interaction model, we identified significant interactions in each bin.</p>
<p>The power of the two models for predicting gene expression was evaluated by cross-validation. Data were randomly split into training and testing data sets. The models were trained on the training model and then applied to the testing data for validation. The accuracy of the models was measured by the correlation between predicted expression levels and experimental measurement.</p>
<p>To investigate the interactions among pairs of chromatin features, we constructed the simplified models involving only two features:</p>
<p>
<display-formula>
<m:math name="gb-2011-12-2-r15-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>y</m:mi>
   <m:mo>~</m:mo>
   <m:msub>
      <m:mi>x</m:mi>
      <m:mi>i</m:mi>
   </m:msub>
   <m:mo>+</m:mo>
   <m:msub>
      <m:mi>x</m:mi>
      <m:mi>j</m:mi>
   </m:msub>
   <m:mo>+</m:mo>
   <m:msub>
      <m:mi>x</m:mi>
      <m:mi>i</m:mi>
   </m:msub>
   <m:msub>
      <m:mi>x</m:mi>
      <m:mi>j</m:mi>
   </m:msub>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>A significant interaction term would indicate that the interaction between the two features has a significant effect on gene expression.</p>
</sec>
<sec>
<st>
<p>Predicting expression levels of microRNAs</p>
</st>
<p>We downloaded the annotation of 162 <it>C. elegans </it>microRNAs from the miRBASE database <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp>. For most microRNAs, the annotation provides no information about the TSSs. Instead, only the start and end positions of the corresponding pre-microRNAs (about 100 nucleotides in length) are available. To predict the expression levels of microRNAs, we calculated the signals of all chromatin features within the associated pre-microRNAs and applied our model trained on chromatin features associated with protein-coding genes. We applied both the SVM classification and the SVR regression models to predict microRNA expression. The resulting predictions were validated using measured microRNA expression levels from small RNA sequencing performed by Kato <it>et al. </it>
<abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Data sets for other organisms</p>
</st>
<p>In yeast, the expression levels of genes were measured by microarrays and available from Wang <it>et al. </it>
<abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>; the histone modification data are performed by Pokholok <it>et al. </it>
<abbrgrp>
<abbr bid="B63">63</abbr>
</abbrgrp>. In fruit fly, the gene expression and chromatin data at 12 different developmental stages were obtained by using RNA-seq and ChIP-seq experiments, respectively, which are available from the modENCODE website at <abbrgrp>
<abbr bid="B57">57</abbr>
</abbrgrp>. In mouse, the expression data for embryonic stem cells and neural progenitor cells were from Cloonan <it>et al. </it>
<abbrgrp>
<abbr bid="B64">64</abbr>
</abbrgrp>; and the histone modification data for matched cell lines were obtained from Mikkelsen <it>et al. </it>
<abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp> and Meissner <it>et al. </it>
<abbrgrp>
<abbr bid="B65">65</abbr>
</abbrgrp>. In human, the gene expression data in K562 and GM12878 cell lines were performed by Mortazavi <it>et al. </it>
<abbrgrp>
<abbr bid="B66">66</abbr>
</abbrgrp>, and chromatin data were downloaded from the ENCODE project at <abbrgrp>
<abbr bid="B2">2</abbr>
<abbr bid="B67">67</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Availability of our code</p>
</st>
<p>All the analysis described in this paper was performed using the R package. The related R code and example data sets are available for download from <abbrgrp>
<abbr bid="B68">68</abbr>
</abbrgrp>.</p>
</sec>
</sec>
<sec>
<st>
<p>Abbreviations</p>
</st>
<p>AUC: area under the curve; bp: base pairs; ChIP: chromatin immunoprecipitation; ChIP-chip: ChIP-on-chip; ChIP-Seq: ChIP-sequencing; EEMB: early embryonic; modENCODE: model organism encyclopedia of DNA elements; PCC: Pearson correlation coefficient; Pol II: RNA polymerase II; RNA-seq: RNA-sequencing; ROC: receiver operating characteristic; SVM: support vector machine; SVR: support vector regression; TSS: transcription start site; TTS: transcription termination site.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>CC and MG conceived and designed the study. CC and KKY performed the full analysis. CC, KKY, KYY, RA, JR, CS and MG wrote the manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>This work was supported by the NHGRI modENCODE project and the AL Williams Professorship funds. We thank Jason Lieb, Robert Waterston and Frank Slack for their comments and suggestions.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>The role of chromatin during transcription.</p></title><aug><au><snm>Li</snm><fnm>B</fnm></au><au><snm>Carey</snm><fnm>M</fnm></au><au><snm>Workman</snm><fnm>JL</fnm></au></aug><source>Cell</source><pubdate>2007</pubdate><volume>128</volume><fpage>707</fpage><lpage>719</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2007.01.015</pubid><pubid idtype="pmpid" link="fulltext">17320508</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.</p></title><aug><au><snm>Birney</snm><fnm>E</fnm></au><au><snm>Stamatoyannopoulos</snm><fnm>JA</fnm></au><au><snm>Dutta</snm><fnm>A</fnm></au><au><snm>Guigo</snm><fnm>R</fnm></au><au><snm>Gingeras</snm><fnm>TR</fnm></au><au><snm>Margulies</snm><fnm>EH</fnm></au><au><snm>Weng</snm><fnm>Z</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><au><snm>Dermitzakis</snm><fnm>ET</fnm></au><au><snm>Thurman</snm><fnm>RE</fnm></au><au><snm>Kuehn</snm><fnm>MS</fnm></au><au><snm>Taylor</snm><fnm>CM</fnm></au><au><snm>Neph</snm><fnm>S</fnm></au><au><snm>Koch</snm><fnm>CM</fnm></au><au><snm>Asthana</snm><fnm>S</fnm></au><au><snm>Malhotra</snm><fnm>A</fnm></au><au><snm>Adzhubei</snm><fnm>I</fnm></au><au><snm>Greenbaum</snm><fnm>JA</fnm></au><au><snm>Andrews</snm><fnm>RM</fnm></au><au><snm>Flicek</snm><fnm>P</fnm></au><au><snm>Boyle</snm><fnm>PJ</fnm></au><au><snm>Cao</snm><fnm>H</fnm></au><au><snm>Carter</snm><fnm>NP</fnm></au><au><snm>Clelland</snm><fnm>GK</fnm></au><au><snm>Davis</snm><fnm>S</fnm></au><au><snm>Day</snm><fnm>N</fnm></au><au><snm>Dhami</snm><fnm>P</fnm></au><au><snm>Dillon</snm><fnm>SC</fnm></au><au><snm>Dorschner</snm><fnm>MO</fnm></au><au><snm>Fiegler</snm><fnm>H</fnm></au><etal/></aug><source>Nature</source><pubdate>2007</pubdate><volume>447</volume><fpage>799</fpage><lpage>816</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature05874</pubid><pubid idtype="pmcid">2212820</pubid><pubid idtype="pmpid">17571346</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Regulation of alternative splicing by histone modifications.</p></title><aug><au><snm>Luco</snm><fnm>RF</fnm></au><au><snm>Pan</snm><fnm>Q</fnm></au><au><snm>Tominaga</snm><fnm>K</fnm></au><au><snm>Blencowe</snm><fnm>BJ</fnm></au><au><snm>Pereira-Smith</snm><fnm>OM</fnm></au><au><snm>Misteli</snm><fnm>T</fnm></au></aug><source>Science</source><pubdate>2010</pubdate><volume>327</volume><fpage>996</fpage><lpage>1000</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1184208</pubid><pubid idtype="pmcid">2913848</pubid><pubid idtype="pmpid">20133523</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>The histone code at DNA breaks: a guide to repair?</p></title><aug><au><snm>van Attikum</snm><fnm>H</fnm></au><au><snm>Gasser</snm><fnm>SM</fnm></au></aug><source>Nat Rev Mol Cell Biol</source><pubdate>2005</pubdate><volume>6</volume><fpage>757</fpage><lpage>765</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrm1737</pubid><pubid idtype="pmpid" link="fulltext">16167054</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Sterile 20 kinase phosphorylates histone H2B at serine 10 during hydrogen peroxide-induced apoptosis in <it>S. cerevisiae</it>.</p></title><aug><au><snm>Ahn</snm><fnm>SH</fnm></au><au><snm>Cheung</snm><fnm>WL</fnm></au><au><snm>Hsu</snm><fnm>JY</fnm></au><au><snm>Diaz</snm><fnm>RL</fnm></au><au><snm>Smith</snm><fnm>MM</fnm></au><au><snm>Allis</snm><fnm>CD</fnm></au></aug><source>Cell</source><pubdate>2005</pubdate><volume>120</volume><fpage>25</fpage><lpage>36</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2004.11.016</pubid><pubid idtype="pmpid" link="fulltext">15652479</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Apoptotic phosphorylation of histone H2B is mediated by mammalian sterile twenty kinase.</p></title><aug><au><snm>Cheung</snm><fnm>WL</fnm></au><au><snm>Ajiro</snm><fnm>K</fnm></au><au><snm>Samejima</snm><fnm>K</fnm></au><au><snm>Kloc</snm><fnm>M</fnm></au><au><snm>Cheung</snm><fnm>P</fnm></au><au><snm>Mizzen</snm><fnm>CA</fnm></au><au><snm>Beeser</snm><fnm>A</fnm></au><au><snm>Etkin</snm><fnm>LD</fnm></au><au><snm>Chernoff</snm><fnm>J</fnm></au><au><snm>Earnshaw</snm><fnm>WC</fnm></au><au><snm>Allis</snm><fnm>CD</fnm></au></aug><source>Cell</source><pubdate>2003</pubdate><volume>113</volume><fpage>507</fpage><lpage>517</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0092-8674(03)00355-6</pubid><pubid idtype="pmpid" link="fulltext">12757711</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Genome regulation by polycomb and trithorax proteins.</p></title><aug><au><snm>Schuettengruber</snm><fnm>B</fnm></au><au><snm>Chourrout</snm><fnm>D</fnm></au><au><snm>Vervoort</snm><fnm>M</fnm></au><au><snm>Leblanc</snm><fnm>B</fnm></au><au><snm>Cavalli</snm><fnm>G</fnm></au></aug><source>Cell</source><pubdate>2007</pubdate><volume>128</volume><fpage>735</fpage><lpage>745</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2007.02.009</pubid><pubid idtype="pmpid" link="fulltext">17320510</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Histone modification patterns associated with the human X chromosome.</p></title><aug><au><snm>Brinkman</snm><fnm>AB</fnm></au><au><snm>Roelofsen</snm><fnm>T</fnm></au><au><snm>Pennings</snm><fnm>SW</fnm></au><au><snm>Martens</snm><fnm>JH</fnm></au><au><snm>Jenuwein</snm><fnm>T</fnm></au><au><snm>Stunnenberg</snm><fnm>HG</fnm></au></aug><source>EMBO Rep</source><pubdate>2006</pubdate><volume>7</volume><fpage>628</fpage><lpage>634</lpage><xrefbib><pubidlist><pubid idtype="pmcid">1479594</pubid><pubid idtype="pmpid">16648823</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer.</p></title><aug><au><snm>Fraga</snm><fnm>MF</fnm></au><au><snm>Ballestar</snm><fnm>E</fnm></au><au><snm>Villar-Garea</snm><fnm>A</fnm></au><au><snm>Boix-Chornet</snm><fnm>M</fnm></au><au><snm>Espada</snm><fnm>J</fnm></au><au><snm>Schotta</snm><fnm>G</fnm></au><au><snm>Bonaldi</snm><fnm>T</fnm></au><au><snm>Haydon</snm><fnm>C</fnm></au><au><snm>Ropero</snm><fnm>S</fnm></au><au><snm>Petrie</snm><fnm>K</fnm></au><au><snm>Iyer</snm><fnm>NG</fnm></au><au><snm>Perez-Rosado</snm><fnm>A</fnm></au><au><snm>Calvo</snm><fnm>E</fnm></au><au><snm>Lopez</snm><fnm>JA</fnm></au><au><snm>Cano</snm><fnm>A</fnm></au><au><snm>Calasanz</snm><fnm>MJ</fnm></au><au><snm>Colomer</snm><fnm>D</fnm></au><au><snm>Piris</snm><fnm>MA</fnm></au><au><snm>Ahn</snm><fnm>N</fnm></au><au><snm>Imhof</snm><fnm>A</fnm></au><au><snm>Caldas</snm><fnm>C</fnm></au><au><snm>Jenuwein</snm><fnm>T</fnm></au><au><snm>Esteller</snm><fnm>M</fnm></au></aug><source>Nat Genet</source><pubdate>2005</pubdate><volume>37</volume><fpage>391</fpage><lpage>400</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1531</pubid><pubid idtype="pmpid" link="fulltext">15765097</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Cancer epigenomics: DNA methylomes and histone-modification maps.</p></title><aug><au><snm>Esteller</snm><fnm>M</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2007</pubdate><volume>8</volume><fpage>286</fpage><lpage>298</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg2005</pubid><pubid idtype="pmpid" link="fulltext">17339880</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>The complex language of chromatin regulation during transcription.</p></title><aug><au><snm>Berger</snm><fnm>SL</fnm></au></aug><source>Nature</source><pubdate>2007</pubdate><volume>447</volume><fpage>407</fpage><lpage>412</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature05915</pubid><pubid idtype="pmpid" link="fulltext">17522673</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Histone modifications as key regulators of transcription.</p></title><aug><au><snm>Khan</snm><fnm>AU</fnm></au><au><snm>Krishnamurthy</snm><fnm>S</fnm></au></aug><source>Front Biosci</source><pubdate>2005</pubdate><volume>10</volume><fpage>866</fpage><lpage>872</lpage><xrefbib><pubidlist><pubid idtype="doi">10.2741/1580</pubid><pubid idtype="pmpid" link="fulltext">15569624</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote.</p></title><aug><au><snm>Schubeler</snm><fnm>D</fnm></au><au><snm>MacAlpine</snm><fnm>DM</fnm></au><au><snm>Scalzo</snm><fnm>D</fnm></au><au><snm>Wirbelauer</snm><fnm>C</fnm></au><au><snm>Kooperberg</snm><fnm>C</fnm></au><au><snm>van Leeuwen</snm><fnm>F</fnm></au><au><snm>Gottschling</snm><fnm>DE</fnm></au><au><snm>O&apos;Neill</snm><fnm>LP</fnm></au><au><snm>Turner</snm><fnm>BM</fnm></au><au><snm>Delrow</snm><fnm>J</fnm></au><au><snm>Bell</snm><fnm>SP</fnm></au><au><snm>Groudine</snm><fnm>M</fnm></au></aug><source>Genes Dev</source><pubdate>2004</pubdate><volume>18</volume><fpage>1263</fpage><lpage>1271</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gad.1198204</pubid><pubid idtype="pmcid">420352</pubid><pubid idtype="pmpid">15175259</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Genomic maps and comparative analysis of histone modifications in human and mouse.</p></title><aug><au><snm>Bernstein</snm><fnm>BE</fnm></au><au><snm>Kamal</snm><fnm>M</fnm></au><au><snm>Lindblad-Toh</snm><fnm>K</fnm></au><au><snm>Bekiranov</snm><fnm>S</fnm></au><au><snm>Bailey</snm><fnm>DK</fnm></au><au><snm>Huebert</snm><fnm>DJ</fnm></au><au><snm>McMahon</snm><fnm>S</fnm></au><au><snm>Karlsson</snm><fnm>EK</fnm></au><au><snm>Kulbokas</snm><fnm>EJ</fnm></au><au><snm>Gingeras</snm><fnm>TR</fnm></au><au><snm>Schreiber</snm><fnm>SL</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au></aug><source>Cell</source><pubdate>2005</pubdate><volume>120</volume><fpage>169</fpage><lpage>181</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2005.01.001</pubid><pubid idtype="pmpid" link="fulltext">15680324</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Single-nucleosome mapping of histone modifications in <it>S. cerevisiae</it>.</p></title><aug><au><snm>Liu</snm><fnm>CL</fnm></au><au><snm>Kaplan</snm><fnm>T</fnm></au><au><snm>Kim</snm><fnm>M</fnm></au><au><snm>Buratowski</snm><fnm>S</fnm></au><au><snm>Schreiber</snm><fnm>SL</fnm></au><au><snm>Friedman</snm><fnm>N</fnm></au><au><snm>Rando</snm><fnm>OJ</fnm></au></aug><source>PLoS Biol</source><pubdate>2005</pubdate><volume>3</volume><fpage>e328</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pbio.0030328</pubid><pubid idtype="pmcid">1195719</pubid><pubid idtype="pmpid">16122352</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Genome-wide patterns of histone modifications in yeast.</p></title><aug><au><snm>Millar</snm><fnm>CB</fnm></au><au><snm>Grunstein</snm><fnm>M</fnm></au></aug><source>Nat Rev Mol Cell Biol</source><pubdate>2006</pubdate><volume>7</volume><fpage>657</fpage><lpage>666</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrm1986</pubid><pubid idtype="pmpid" link="fulltext">16912715</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Mapping global histone acetylation patterns to gene expression.</p></title><aug><au><snm>Kurdistani</snm><fnm>SK</fnm></au><au><snm>Tavazoie</snm><fnm>S</fnm></au><au><snm>Grunstein</snm><fnm>M</fnm></au></aug><source>Cell</source><pubdate>2004</pubdate><volume>117</volume><fpage>721</fpage><lpage>733</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2004.05.023</pubid><pubid idtype="pmpid" link="fulltext">15186774</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Combinatorial patterns of histone acetylations and methylations in the human genome.</p></title><aug><au><snm>Wang</snm><fnm>Z</fnm></au><au><snm>Zang</snm><fnm>C</fnm></au><au><snm>Rosenfeld</snm><fnm>JA</fnm></au><au><snm>Schones</snm><fnm>DE</fnm></au><au><snm>Barski</snm><fnm>A</fnm></au><au><snm>Cuddapah</snm><fnm>S</fnm></au><au><snm>Cui</snm><fnm>K</fnm></au><au><snm>Roh</snm><fnm>TY</fnm></au><au><snm>Peng</snm><fnm>W</fnm></au><au><snm>Zhang</snm><fnm>MQ</fnm></au><au><snm>Zhao</snm><fnm>K</fnm></au></aug><source>Nat Genet</source><pubdate>2008</pubdate><volume>40</volume><fpage>897</fpage><lpage>903</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.154</pubid><pubid idtype="pmcid">2769248</pubid><pubid idtype="pmpid">18552846</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>X chromosome repression by localization of the <it>C. elegans </it>dosage compensation machinery to sites of transcription initiation.</p></title><aug><au><snm>Ercan</snm><fnm>S</fnm></au><au><snm>Giresi</snm><fnm>PG</fnm></au><au><snm>Whittle</snm><fnm>CM</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Green</snm><fnm>RD</fnm></au><au><snm>Lieb</snm><fnm>JD</fnm></au></aug><source>Nat Genet</source><pubdate>2007</pubdate><volume>39</volume><fpage>403</fpage><lpage>408</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1983</pubid><pubid idtype="pmcid">2753834</pubid><pubid idtype="pmpid">17293863</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>The <it>C. elegans </it>dosage compensation complex propagates dynamically and independently of X chromosome sequence.</p></title><aug><au><snm>Ercan</snm><fnm>S</fnm></au><au><snm>Dick</snm><fnm>LL</fnm></au><au><snm>Lieb</snm><fnm>JD</fnm></au></aug><source>Curr Biol</source><pubdate>2009</pubdate><volume>19</volume><fpage>1777</fpage><lpage>1787</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cub.2009.09.047</pubid><pubid idtype="pmcid">2783177</pubid><pubid idtype="pmpid">19853451</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>The logic of chromatin architecture and remodelling at promoters.</p></title><aug><au><snm>Cairns</snm><fnm>BR</fnm></au></aug><source>Nature</source><pubdate>2009</pubdate><volume>461</volume><fpage>193</fpage><lpage>198</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature08450</pubid><pubid idtype="pmpid" link="fulltext">19741699</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Role of histone modifications in defining chromatin structure and function.</p></title><aug><au><snm>Gelato</snm><fnm>KA</fnm></au><au><snm>Fischle</snm><fnm>W</fnm></au></aug><source>Biol Chem</source><pubdate>2008</pubdate><volume>389</volume><fpage>353</fpage><lpage>363</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1515/BC.2008.048</pubid><pubid idtype="pmpid" link="fulltext">18225984</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Chromatin remodelling: the industrial revolution of DNA around histones.</p></title><aug><au><snm>Saha</snm><fnm>A</fnm></au><au><snm>Wittmeyer</snm><fnm>J</fnm></au><au><snm>Cairns</snm><fnm>BR</fnm></au></aug><source>Nat Rev Mol Cell Biol</source><pubdate>2006</pubdate><volume>7</volume><fpage>437</fpage><lpage>447</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrm1945</pubid><pubid idtype="pmpid" link="fulltext">16723979</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>The language of covalent histone modifications.</p></title><aug><au><snm>Strahl</snm><fnm>BD</fnm></au><au><snm>Allis</snm><fnm>CD</fnm></au></aug><source>Nature</source><pubdate>2000</pubdate><volume>403</volume><fpage>41</fpage><lpage>45</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/47412</pubid><pubid idtype="pmpid" link="fulltext">10638745</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Translating the histone code.</p></title><aug><au><snm>Jenuwein</snm><fnm>T</fnm></au><au><snm>Allis</snm><fnm>CD</fnm></au></aug><source>Science</source><pubdate>2001</pubdate><volume>293</volume><fpage>1074</fpage><lpage>1080</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1063127</pubid><pubid idtype="pmpid" link="fulltext">11498575</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Defining an epigenetic code.</p></title><aug><au><snm>Turner</snm><fnm>BM</fnm></au></aug><source>Nat Cell Biol</source><pubdate>2007</pubdate><volume>9</volume><fpage>2</fpage><lpage>6</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ncb0107-2</pubid><pubid idtype="pmpid" link="fulltext">17199124</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Crosstalk among histone modifications.</p></title><aug><au><snm>Suganuma</snm><fnm>T</fnm></au><au><snm>Workman</snm><fnm>JL</fnm></au></aug><source>Cell</source><pubdate>2008</pubdate><volume>135</volume><fpage>604</fpage><lpage>607</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2008.10.036</pubid><pubid idtype="pmpid" link="fulltext">19013272</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Genomic characterization reveals a simple histone H4 acetylation code.</p></title><aug><au><snm>Dion</snm><fnm>MF</fnm></au><au><snm>Altschuler</snm><fnm>SJ</fnm></au><au><snm>Wu</snm><fnm>LF</fnm></au><au><snm>Rando</snm><fnm>OJ</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2005</pubdate><volume>102</volume><fpage>5501</fpage><lpage>5506</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0500136102</pubid><pubid idtype="pmcid">555684</pubid><pubid idtype="pmpid">15795371</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Histone modifications: from genome-wide maps to functional insights.</p></title><aug><au><snm>van Leeuwen</snm><fnm>F</fnm></au><au><snm>van Steensel</snm><fnm>B</fnm></au></aug><source>Genome Biol</source><pubdate>2005</pubdate><volume>6</volume><fpage>113</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2005-6-6-113</pubid><pubid idtype="pmcid">1175962</pubid><pubid idtype="pmpid">15960810</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Unlocking the secrets of the genome.</p></title><aug><au><snm>Celniker</snm><fnm>SE</fnm></au><au><snm>Dillon</snm><fnm>LA</fnm></au><au><snm>Gerstein</snm><fnm>MB</fnm></au><au><snm>Gunsalus</snm><fnm>KC</fnm></au><au><snm>Henikoff</snm><fnm>S</fnm></au><au><snm>Karpen</snm><fnm>GH</fnm></au><au><snm>Kellis</snm><fnm>M</fnm></au><au><snm>Lai</snm><fnm>EC</fnm></au><au><snm>Lieb</snm><fnm>JD</fnm></au><au><snm>MacAlpine</snm><fnm>DM</fnm></au><au><snm>Micklem</snm><fnm>G</fnm></au><au><snm>Piano</snm><fnm>F</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><au><snm>Stein</snm><fnm>L</fnm></au><au><snm>White</snm><fnm>KP</fnm></au><au><snm>Waterston</snm><fnm>RH</fnm></au></aug><source>Nature</source><pubdate>2009</pubdate><volume>459</volume><fpage>927</fpage><lpage>930</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/459927a</pubid><pubid idtype="pmcid">2843545</pubid><pubid idtype="pmpid">19536255</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>ChIP on chip assays: genome-wide analysis of transcription factor binding and histone modifications.</p></title><aug><au><snm>Pillai</snm><fnm>S</fnm></au><au><snm>Chellappan</snm><fnm>SP</fnm></au></aug><source>Methods Mol Biol</source><pubdate>2009</pubdate><volume>523</volume><fpage>341</fpage><lpage>366</lpage><xrefbib><pubidlist><pubid idtype="doi">full_text</pubid><pubid idtype="pmpid" link="fulltext">19381927</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Genome-wide approaches to studying chromatin modifications.</p></title><aug><au><snm>Schones</snm><fnm>DE</fnm></au><au><snm>Zhao</snm><fnm>K</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2008</pubdate><volume>9</volume><fpage>179</fpage><lpage>191</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg2270</pubid><pubid idtype="pmpid" link="fulltext">18250624</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Antisense transcription in the mammalian transcriptome.</p></title><aug><au><snm>Katayama</snm><fnm>S</fnm></au><au><snm>Tomaru</snm><fnm>Y</fnm></au><au><snm>Kasukawa</snm><fnm>T</fnm></au><au><snm>Waki</snm><fnm>K</fnm></au><au><snm>Nakanishi</snm><fnm>M</fnm></au><au><snm>Nakamura</snm><fnm>M</fnm></au><au><snm>Nishida</snm><fnm>H</fnm></au><au><snm>Yap</snm><fnm>CC</fnm></au><au><snm>Suzuki</snm><fnm>M</fnm></au><au><snm>Kawai</snm><fnm>J</fnm></au><au><snm>Suzuki</snm><fnm>H</fnm></au><au><snm>Carninci</snm><fnm>P</fnm></au><au><snm>Hayashizaki</snm><fnm>Y</fnm></au><au><snm>Wells</snm><fnm>C</fnm></au><au><snm>Frith</snm><fnm>M</fnm></au><au><snm>Ravasi</snm><fnm>T</fnm></au><au><snm>Pang</snm><fnm>KC</fnm></au><au><snm>Hallinan</snm><fnm>J</fnm></au><au><snm>Mattick</snm><fnm>J</fnm></au><au><snm>Hume</snm><fnm>DA</fnm></au><au><snm>Lipovich</snm><fnm>L</fnm></au><au><snm>Batalov</snm><fnm>S</fnm></au><au><snm>Engstrom</snm><fnm>PG</fnm></au><au><snm>Mizuno</snm><fnm>Y</fnm></au><au><snm>Faghihi</snm><fnm>MA</fnm></au><au><snm>Sandelin</snm><fnm>A</fnm></au><au><snm>Chalk</snm><fnm>AM</fnm></au><au><snm>Mottagui-Tabar</snm><fnm>S</fnm></au><au><snm>Liang</snm><fnm>Z</fnm></au><au><snm>Lenhard</snm><fnm>B</fnm></au><etal/></aug><source>Science</source><pubdate>2005</pubdate><volume>309</volume><fpage>1564</fpage><lpage>1566</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1112009</pubid><pubid idtype="pmpid" link="fulltext">16141073</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>RNA Pol II accumulates at promoters of growth genes during developmental arrest.</p></title><aug><au><snm>Baugh</snm><fnm>LR</fnm></au><au><snm>Demodena</snm><fnm>J</fnm></au><au><snm>Sternberg</snm><fnm>PW</fnm></au></aug><source>Science</source><pubdate>2009</pubdate><volume>324</volume><fpage>92</fpage><lpage>94</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1169628</pubid><pubid idtype="pmpid" link="fulltext">19251593</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters.</p></title><aug><au><snm>Core</snm><fnm>LJ</fnm></au><au><snm>Waterfall</snm><fnm>JJ</fnm></au><au><snm>Lis</snm><fnm>JT</fnm></au></aug><source>Science</source><pubdate>2008</pubdate><volume>322</volume><fpage>1845</fpage><lpage>1848</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1162228</pubid><pubid idtype="pmcid">2833333</pubid><pubid idtype="pmpid">19056941</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Divergent transcription from active promoters.</p></title><aug><au><snm>Seila</snm><fnm>AC</fnm></au><au><snm>Calabrese</snm><fnm>JM</fnm></au><au><snm>Levine</snm><fnm>SS</fnm></au><au><snm>Yeo</snm><fnm>GW</fnm></au><au><snm>Rahl</snm><fnm>PB</fnm></au><au><snm>Flynn</snm><fnm>RA</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au><au><snm>Sharp</snm><fnm>PA</fnm></au></aug><source>Science</source><pubdate>2008</pubdate><volume>322</volume><fpage>1849</fpage><lpage>1851</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1162253</pubid><pubid idtype="pmcid">2692996</pubid><pubid idtype="pmpid">19056940</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>MES-4: an autosome-associated histone methyltransferase that participates in silencing the X chromosomes in the <it>C. elegans </it>germ line.</p></title><aug><au><snm>Bender</snm><fnm>LB</fnm></au><au><snm>Suh</snm><fnm>J</fnm></au><au><snm>Carroll</snm><fnm>CR</fnm></au><au><snm>Fong</snm><fnm>Y</fnm></au><au><snm>Fingerman</snm><fnm>IM</fnm></au><au><snm>Briggs</snm><fnm>SD</fnm></au><au><snm>Cao</snm><fnm>R</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Reinke</snm><fnm>V</fnm></au><au><snm>Strome</snm><fnm>S</fnm></au></aug><source>Development</source><pubdate>2006</pubdate><volume>133</volume><fpage>3907</fpage><lpage>3917</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1242/dev.02584</pubid><pubid idtype="pmcid">2435371</pubid><pubid idtype="pmpid">16968818</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>MRG-1, an autosome-associated protein, silences X-linked genes and protects germline immortality in <it>Caenorhabditis elegans</it>.</p></title><aug><au><snm>Takasaki</snm><fnm>T</fnm></au><au><snm>Liu</snm><fnm>Z</fnm></au><au><snm>Habara</snm><fnm>Y</fnm></au><au><snm>Nishiwaki</snm><fnm>K</fnm></au><au><snm>Nakayama</snm><fnm>J</fnm></au><au><snm>Inoue</snm><fnm>K</fnm></au><au><snm>Sakamoto</snm><fnm>H</fnm></au><au><snm>Strome</snm><fnm>S</fnm></au></aug><source>Development</source><pubdate>2007</pubdate><volume>134</volume><fpage>757</fpage><lpage>767</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1242/dev.02771</pubid><pubid idtype="pmcid">2435364</pubid><pubid idtype="pmpid">17215300</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>A global analysis of <it>Caenorhabditis elegans </it>operons.</p></title><aug><au><snm>Blumenthal</snm><fnm>T</fnm></au><au><snm>Evans</snm><fnm>D</fnm></au><au><snm>Link</snm><fnm>CD</fnm></au><au><snm>Guffanti</snm><fnm>A</fnm></au><au><snm>Lawson</snm><fnm>D</fnm></au><au><snm>Thierry-Mieg</snm><fnm>J</fnm></au><au><snm>Thierry-Mieg</snm><fnm>D</fnm></au><au><snm>Chiu</snm><fnm>WL</fnm></au><au><snm>Duke</snm><fnm>K</fnm></au><au><snm>Kiraly</snm><fnm>M</fnm></au><au><snm>Kim</snm><fnm>SK</fnm></au></aug><source>Nature</source><pubdate>2002</pubdate><volume>417</volume><fpage>851</fpage><lpage>854</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature00831</pubid><pubid idtype="pmpid" link="fulltext">12075352</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Functional exploration of the <it>C. elegans </it>genome using DNA microarrays.</p></title><aug><au><snm>Reinke</snm><fnm>V</fnm></au></aug><source>Nat Genet</source><pubdate>2002</pubdate><volume>32</volume><issue>Suppl</issue><fpage>541</fpage><lpage>546</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1039</pubid><pubid idtype="pmpid" link="fulltext">12454651</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p><it>Caenorhabditis elegans </it>operons: form and function.</p></title><aug><au><snm>Blumenthal</snm><fnm>T</fnm></au><au><snm>Gleason</snm><fnm>KS</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2003</pubdate><volume>4</volume><fpage>112</fpage><lpage>120</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg995</pubid><pubid idtype="pmpid" link="fulltext">12560808</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>miRBase: tools for microRNA genomics.</p></title><aug><au><snm>Griffiths-Jones</snm><fnm>S</fnm></au><au><snm>Saini</snm><fnm>HK</fnm></au><au><snm>van Dongen</snm><fnm>S</fnm></au><au><snm>Enright</snm><fnm>AJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>D154</fpage><lpage>158</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm952</pubid><pubid idtype="pmcid">2238936</pubid><pubid idtype="pmpid">17991681</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Dynamic expression of small non-coding RNAs, including novel microRNAs and piRNAs/21U-RNAs, during <it>Caenorhabditis elegans </it>development.</p></title><aug><au><snm>Kato</snm><fnm>M</fnm></au><au><snm>de Lencastre</snm><fnm>A</fnm></au><au><snm>Pincus</snm><fnm>Z</fnm></au><au><snm>Slack</snm><fnm>FJ</fnm></au></aug><source>Genome Biol</source><pubdate>2009</pubdate><volume>10</volume><fpage>R54</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2009-10-5-r54</pubid><pubid idtype="pmcid">2718520</pubid><pubid idtype="pmpid">19460142</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>A <it>C. elegans </it>genome-scale microRNA network contains composite feedback motifs with high flux capacity.</p></title><aug><au><snm>Martinez</snm><fnm>NJ</fnm></au><au><snm>Ow</snm><fnm>MC</fnm></au><au><snm>Barrasa</snm><fnm>MI</fnm></au><au><snm>Hammell</snm><fnm>M</fnm></au><au><snm>Sequerra</snm><fnm>R</fnm></au><au><snm>Doucette-Stamm</snm><fnm>L</fnm></au><au><snm>Roth</snm><fnm>FP</fnm></au><au><snm>Ambros</snm><fnm>VR</fnm></au><au><snm>Walhout</snm><fnm>AJ</fnm></au></aug><source>Genes Dev</source><pubdate>2008</pubdate><volume>22</volume><fpage>2535</fpage><lpage>2549</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gad.1678608</pubid><pubid idtype="pmcid">2546694</pubid><pubid idtype="pmpid">18794350</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Chromatin poises miRNA- and protein-coding genes for expression.</p></title><aug><au><snm>Barski</snm><fnm>A</fnm></au><au><snm>Jothi</snm><fnm>R</fnm></au><au><snm>Cuddapah</snm><fnm>S</fnm></au><au><snm>Cui</snm><fnm>K</fnm></au><au><snm>Roh</snm><fnm>TY</fnm></au><au><snm>Schones</snm><fnm>DE</fnm></au><au><snm>Zhao</snm><fnm>K</fnm></au></aug><source>Genome Res</source><pubdate>2009</pubdate><volume>19</volume><fpage>1742</fpage><lpage>1751</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.090951.109</pubid><pubid idtype="pmcid">2765269</pubid><pubid idtype="pmpid">19713549</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>High-resolution profiling of histone methylations in the human genome.</p></title><aug><au><snm>Barski</snm><fnm>A</fnm></au><au><snm>Cuddapah</snm><fnm>S</fnm></au><au><snm>Cui</snm><fnm>K</fnm></au><au><snm>Roh</snm><fnm>TY</fnm></au><au><snm>Schones</snm><fnm>DE</fnm></au><au><snm>Wang</snm><fnm>Z</fnm></au><au><snm>Wei</snm><fnm>G</fnm></au><au><snm>Chepelev</snm><fnm>I</fnm></au><au><snm>Zhao</snm><fnm>K</fnm></au></aug><source>Cell</source><pubdate>2007</pubdate><volume>129</volume><fpage>823</fpage><lpage>837</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2007.05.009</pubid><pubid idtype="pmpid" link="fulltext">17512414</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>Genome-wide maps of chromatin state in pluripotent and lineage-committed cells.</p></title><aug><au><snm>Mikkelsen</snm><fnm>TS</fnm></au><au><snm>Ku</snm><fnm>M</fnm></au><au><snm>Jaffe</snm><fnm>DB</fnm></au><au><snm>Issac</snm><fnm>B</fnm></au><au><snm>Lieberman</snm><fnm>E</fnm></au><au><snm>Giannoukos</snm><fnm>G</fnm></au><au><snm>Alvarez</snm><fnm>P</fnm></au><au><snm>Brockman</snm><fnm>W</fnm></au><au><snm>Kim</snm><fnm>TK</fnm></au><au><snm>Koche</snm><fnm>RP</fnm></au><au><snm>Lee</snm><fnm>W</fnm></au><au><snm>Mendenhall</snm><fnm>E</fnm></au><au><snm>O&apos;Donovan</snm><fnm>A</fnm></au><au><snm>Presser</snm><fnm>A</fnm></au><au><snm>Russ</snm><fnm>C</fnm></au><au><snm>Xie</snm><fnm>X</fnm></au><au><snm>Meissner</snm><fnm>A</fnm></au><au><snm>Wernig</snm><fnm>M</fnm></au><au><snm>Jaenisch</snm><fnm>R</fnm></au><au><snm>Nusbaum</snm><fnm>C</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Bernstein</snm><fnm>BE</fnm></au></aug><source>Nature</source><pubdate>2007</pubdate><volume>448</volume><fpage>553</fpage><lpage>560</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature06008</pubid><pubid idtype="pmcid">2921165</pubid><pubid idtype="pmpid">17603471</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>Histone modification levels are predictive for gene expression.</p></title><aug><au><snm>Karlic</snm><fnm>R</fnm></au><au><snm>Chung</snm><fnm>HR</fnm></au><au><snm>Lasserre</snm><fnm>J</fnm></au><au><snm>Vlahovicek</snm><fnm>K</fnm></au><au><snm>Vingron</snm><fnm>M</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2010</pubdate><volume>107</volume><fpage>2926</fpage><lpage>2931</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0909344107</pubid><pubid idtype="pmcid">2814872</pubid><pubid idtype="pmpid">20133639</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>Chromatin modifications and their function.</p></title><aug><au><snm>Kouzarides</snm><fnm>T</fnm></au></aug><source>Cell</source><pubdate>2007</pubdate><volume>128</volume><fpage>693</fpage><lpage>705</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2007.02.005</pubid><pubid idtype="pmpid" link="fulltext">17320507</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>Is there a code embedded in proteins that is based on post-translational modifications?</p></title><aug><au><snm>Sims</snm><fnm>RJ</fnm></au><au><snm>Reinberg</snm><fnm>D</fnm></au></aug><source>Nat Rev Mol Cell Biol</source><pubdate>2008</pubdate><volume>9</volume><fpage>815</fpage><lpage>820</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrm2502</pubid><pubid idtype="pmpid" link="fulltext">18784729</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>Signaling network model of chromatin.</p></title><aug><au><snm>Schreiber</snm><fnm>SL</fnm></au><au><snm>Bernstein</snm><fnm>BE</fnm></au></aug><source>Cell</source><pubdate>2002</pubdate><volume>111</volume><fpage>771</fpage><lpage>778</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0092-8674(02)01196-0</pubid><pubid idtype="pmpid" link="fulltext">12526804</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>Targeted recruitment of Set1 histone methylase by elongating Pol II provides a localized mark and memory of recent transcriptional activity.</p></title><aug><au><snm>Ng</snm><fnm>HH</fnm></au><au><snm>Robert</snm><fnm>F</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au><au><snm>Struhl</snm><fnm>K</fnm></au></aug><source>Mol Cell</source><pubdate>2003</pubdate><volume>11</volume><fpage>709</fpage><lpage>719</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1097-2765(03)00092-3</pubid><pubid idtype="pmpid" link="fulltext">12667453</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>Association of the histone methyltransferase Set2 with RNA polymerase II plays a role in transcription elongation.</p></title><aug><au><snm>Li</snm><fnm>J</fnm></au><au><snm>Moazed</snm><fnm>D</fnm></au><au><snm>Gygi</snm><fnm>SP</fnm></au></aug><source>J Biol Chem</source><pubdate>2002</pubdate><volume>277</volume><fpage>49383</fpage><lpage>49388</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1074/jbc.M209294200</pubid><pubid idtype="pmpid" link="fulltext">12381723</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>Combinatorial effects of four histone modifications in transcription and differentiation.</p></title><aug><au><snm>Fischer</snm><fnm>JJ</fnm></au><au><snm>Toedling</snm><fnm>J</fnm></au><au><snm>Krueger</snm><fnm>T</fnm></au><au><snm>Schueler</snm><fnm>M</fnm></au><au><snm>Huber</snm><fnm>W</fnm></au><au><snm>Sperling</snm><fnm>S</fnm></au></aug><source>Genomics</source><pubdate>2008</pubdate><volume>91</volume><fpage>41</fpage><lpage>51</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ygeno.2007.08.010</pubid><pubid idtype="pmpid" link="fulltext">17997276</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>Protein modifications in transcription elongation.</p></title><aug><au><snm>Fuchs</snm><fnm>SM</fnm></au><au><snm>Laribee</snm><fnm>RN</fnm></au><au><snm>Strahl</snm><fnm>BD</fnm></au></aug><source>Biochim Biophys Acta</source><pubdate>2009</pubdate><volume>1789</volume><fpage>26</fpage><lpage>36</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2641038</pubid><pubid idtype="pmpid">18718879</pubid></pubidlist></xrefbib></bibl><bibl id="B56"><title><p>Chromatin decondensation and nuclear reorganization of the HoxB locus upon induction of transcription.</p></title><aug><au><snm>Chambeyron</snm><fnm>S</fnm></au><au><snm>Bickmore</snm><fnm>WA</fnm></au></aug><source>Genes Dev</source><pubdate>2004</pubdate><volume>18</volume><fpage>1119</fpage><lpage>1130</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gad.292104</pubid><pubid idtype="pmcid">415637</pubid><pubid idtype="pmpid">15155579</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>modENCODE.</p></title><url>http://www.modencode.org</url></bibl><bibl id="B58"><title><p>WormBase.</p></title><url>http://www.wormbase.org</url></bibl><bibl id="B59"><title><p>WormBase: a comprehensive resource for nematode research.</p></title><aug><au><snm>Harris</snm><fnm>TW</fnm></au><au><snm>Antoshechkin</snm><fnm>I</fnm></au><au><snm>Bieri</snm><fnm>T</fnm></au><au><snm>Blasiar</snm><fnm>D</fnm></au><au><snm>Chan</snm><fnm>J</fnm></au><au><snm>Chen</snm><fnm>WJ</fnm></au><au><snm>De La Cruz</snm><fnm>N</fnm></au><au><snm>Davis</snm><fnm>P</fnm></au><au><snm>Duesbury</snm><fnm>M</fnm></au><au><snm>Fang</snm><fnm>R</fnm></au><au><snm>Fernandes</snm><fnm>J</fnm></au><au><snm>Han</snm><fnm>M</fnm></au><au><snm>Kishore</snm><fnm>R</fnm></au><au><snm>Lee</snm><fnm>R</fnm></au><au><snm>Muller</snm><fnm>HM</fnm></au><au><snm>Nakamura</snm><fnm>C</fnm></au><au><snm>Ozersky</snm><fnm>P</fnm></au><au><snm>Petcherski</snm><fnm>A</fnm></au><au><snm>Rangarajan</snm><fnm>A</fnm></au><au><snm>Rogers</snm><fnm>A</fnm></au><au><snm>Schindelman</snm><fnm>G</fnm></au><au><snm>Schwarz</snm><fnm>EM</fnm></au><au><snm>Tuli</snm><fnm>MA</fnm></au><au><snm>Van Auken</snm><fnm>K</fnm></au><au><snm>Wang</snm><fnm>D</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>Williams</snm><fnm>G</fnm></au><au><snm>Yook</snm><fnm>K</fnm></au><au><snm>Durbin</snm><fnm>R</fnm></au><au><snm>Stein</snm><fnm>LD</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2010</pubdate><volume>38</volume><fpage>D463</fpage><lpage>467</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkp952</pubid><pubid idtype="pmcid">2808986</pubid><pubid idtype="pmpid">19910365</pubid></pubidlist></xrefbib></bibl><bibl id="B60"><title><p>miRBASE.</p></title><url>http://www.mirbase.org</url></bibl><bibl id="B61"><aug><au><snm>Cristianini</snm><fnm>N</fnm></au><au><snm>Shawe-Taylor</snm><fnm>J</fnm></au></aug><source>An Introduction to Support Vector Machines and Other Kernel-based Learning Methods</source><publisher>Cambridge University Press</publisher><pubdate>2000</pubdate></bibl><bibl id="B62"><title><p>Precision and functional specificity in mRNA decay.</p></title><aug><au><snm>Wang</snm><fnm>Y</fnm></au><au><snm>Liu</snm><fnm>CL</fnm></au><au><snm>Storey</snm><fnm>JD</fnm></au><au><snm>Tibshirani</snm><fnm>RJ</fnm></au><au><snm>Herschlag</snm><fnm>D</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2002</pubdate><volume>99</volume><fpage>5860</fpage><lpage>5865</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.092538799</pubid><pubid idtype="pmcid">122867</pubid><pubid idtype="pmpid">11972065</pubid></pubidlist></xrefbib></bibl><bibl id="B63"><title><p>Genome-wide map of nucleosome acetylation and methylation in yeast.</p></title><aug><au><snm>Pokholok</snm><fnm>DK</fnm></au><au><snm>Harbison</snm><fnm>CT</fnm></au><au><snm>Levine</snm><fnm>S</fnm></au><au><snm>Cole</snm><fnm>M</fnm></au><au><snm>Hannett</snm><fnm>NM</fnm></au><au><snm>Lee</snm><fnm>TI</fnm></au><au><snm>Bell</snm><fnm>GW</fnm></au><au><snm>Walker</snm><fnm>K</fnm></au><au><snm>Rolfe</snm><fnm>PA</fnm></au><au><snm>Herbolsheimer</snm><fnm>E</fnm></au><au><snm>Zeitlinger</snm><fnm>J</fnm></au><au><snm>Lewitter</snm><fnm>F</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au></aug><source>Cell</source><pubdate>2005</pubdate><volume>122</volume><fpage>517</fpage><lpage>527</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2005.06.026</pubid><pubid idtype="pmpid" link="fulltext">16122420</pubid></pubidlist></xrefbib></bibl><bibl id="B64"><title><p>Stem cell transcriptome profiling via massive-scale mRNA sequencing.</p></title><aug><au><snm>Cloonan</snm><fnm>N</fnm></au><au><snm>Forrest</snm><fnm>AR</fnm></au><au><snm>Kolle</snm><fnm>G</fnm></au><au><snm>Gardiner</snm><fnm>BB</fnm></au><au><snm>Faulkner</snm><fnm>GJ</fnm></au><au><snm>Brown</snm><fnm>MK</fnm></au><au><snm>Taylor</snm><fnm>DF</fnm></au><au><snm>Steptoe</snm><fnm>AL</fnm></au><au><snm>Wani</snm><fnm>S</fnm></au><au><snm>Bethel</snm><fnm>G</fnm></au><au><snm>Robertson</snm><fnm>AJ</fnm></au><au><snm>Perkins</snm><fnm>AC</fnm></au><au><snm>Bruce</snm><fnm>SJ</fnm></au><au><snm>Lee</snm><fnm>CC</fnm></au><au><snm>Ranade</snm><fnm>SS</fnm></au><au><snm>Peckham</snm><fnm>HE</fnm></au><au><snm>Manning</snm><fnm>JM</fnm></au><au><snm>McKernan</snm><fnm>KJ</fnm></au><au><snm>Grimmond</snm><fnm>SM</fnm></au></aug><source>Nat Methods</source><pubdate>2008</pubdate><volume>5</volume><fpage>613</fpage><lpage>619</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.1223</pubid><pubid idtype="pmpid" link="fulltext">18516046</pubid></pubidlist></xrefbib></bibl><bibl id="B65"><title><p>Genome-scale DNA methylation maps of pluripotent and differentiated cells.</p></title><aug><au><snm>Meissner</snm><fnm>A</fnm></au><au><snm>Mikkelsen</snm><fnm>TS</fnm></au><au><snm>Gu</snm><fnm>H</fnm></au><au><snm>Wernig</snm><fnm>M</fnm></au><au><snm>Hanna</snm><fnm>J</fnm></au><au><snm>Sivachenko</snm><fnm>A</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Bernstein</snm><fnm>BE</fnm></au><au><snm>Nusbaum</snm><fnm>C</fnm></au><au><snm>Jaffe</snm><fnm>DB</fnm></au><au><snm>Gnirke</snm><fnm>A</fnm></au><au><snm>Jaenisch</snm><fnm>R</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au></aug><source>Nature</source><pubdate>2008</pubdate><volume>454</volume><fpage>766</fpage><lpage>770</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2896277</pubid><pubid idtype="pmpid">18600261</pubid></pubidlist></xrefbib></bibl><bibl id="B66"><title><p>Mapping and quantifying mammalian transcriptomes by RNA-Seq.</p></title><aug><au><snm>Mortazavi</snm><fnm>A</fnm></au><au><snm>Williams</snm><fnm>BA</fnm></au><au><snm>McCue</snm><fnm>K</fnm></au><au><snm>Schaeffer</snm><fnm>L</fnm></au><au><snm>Wold</snm><fnm>B</fnm></au></aug><source>Nat Methods</source><pubdate>2008</pubdate><volume>5</volume><fpage>621</fpage><lpage>628</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.1226</pubid><pubid idtype="pmpid" link="fulltext">18516045</pubid></pubidlist></xrefbib></bibl><bibl id="B67"><title><p>ENCODE.</p></title><url>http://genome.ucsc.edu/ENCODE/</url></bibl><bibl id="B68"><title><p>Chromodel.</p></title><url>http://archive.gersteinlab.org/proj/chromodel/index.html</url></bibl></refgrp>
</bm></art>