<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2012-13-3-r16</ui>
   <ji>1465-6906</ji>
   <fm>
      <dochead>Method</dochead>
      <bibl>
         <title>
            <p>MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets</p>
         </title>
         <aug>
            <au id="A1" ce="yes"><snm>Shao</snm><fnm>Zhen</fnm><insr iid="I1"/><insr iid="I2"/><email>zhenshao@jimmy.harvard.edu</email></au>
            <au id="A2" ce="yes"><snm>Zhang</snm><fnm>Yijing</fnm><insr iid="I3"/><email>yijing@chgr.mgh.harvard.edu</email></au>
            <au id="A3"><snm>Yuan</snm><fnm>Guo-Cheng</fnm><insr iid="I1"/><email>gcyuan@jimmy.harvard.edu</email></au>
            <au id="A4" ca="yes"><snm>Orkin</snm><mi>H</mi><fnm>Stuart</fnm><insr iid="I1"/><insr iid="I2"/><insr iid="I4"/><email>orkin@bloodgroup.tch.harvard.edu</email></au>
            <au id="A5" ca="yes"><snm>Waxman</snm><mi>J</mi><fnm>David</fnm><insr iid="I3"/><email>djw@bu.edu</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>Departments of Pediatric Oncology and Computational Biology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA</p></ins>
            <ins id="I2"><p>Division of Pediatric Hematology-Oncology, The Karp Family Research Laboratories, Children's Hospital, 300 Longwood Ave, Boston, MA 02115, USA</p></ins>
            <ins id="I3"><p>Division of Cell and Molecular Biology, Department of Biology, Boston University, 5 Cummington Street, Boston, MA 02215, USA</p></ins>
            <ins id="I4"><p>Harvard Stem Cell Institute and the Howard Hughes Medical Institute, 1 Blackfan Circle, Karp Research Building, Children's Hospital, Boston, MA 02115, USA</p></ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2012</pubdate>
         <volume>13</volume>
         <issue>3</issue>
         <fpage>R16</fpage>
         <url>http://genomebiology.com/2012/13/3/R16</url>
         <xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2012-13-3-r16</pubid><pubid idtype="pmpid">22424423</pubid></pubidlist></xrefbib>
      </bibl>
      <history><rec><date><day>3</day><month>11</month><year>2011</year></date></rec><revrec><date><day>4</day><month>3</month><year>2012</year></date></revrec><acc><date><day>16</day><month>3</month><year>2012</year></date></acc><pub><date><day>16</day><month>3</month><year>2012</year></date></pub></history>
      <cpyrt><year>2012</year><collab>Shao et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>ChIP-Seq is widely used to characterize genome-wide binding patterns of transcription factors and other chromatin-associated proteins. Although comparison of ChIP-Seq data sets is critical for understanding cell type-dependent and cell state-specific binding, and thus the study of cell-specific gene regulation, few quantitative approaches have been developed. Here, we present a simple and effective method, MAnorm, for quantitative comparison of ChIP-Seq data sets describing transcription factor binding sites and epigenetic modifications. The quantitative binding differences inferred by MAnorm showed strong correlation with both the changes in expression of target genes and the binding of cell type-specific regulators.</p>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-Seq) has become the preferred method to determine genome-wide binding patterns of transcription factors and other chromatin-associated proteins <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. With the rapid accumulation of ChIP-Seq data, comparison of multiple ChIP-Seq data sets is increasingly becoming critical for addressing important biological questions. For example, comparison of biological replicates is commonly used to find robust binding sites, and the identification of sites that are differentially bound by chromatin-associated proteins in different cellular contexts is important for elucidating underlying mechanisms of cell type-specific regulation. Although ChIP-Seq data generally exhibit high signal-to-background noise (S/N) ratios compared to ChIP-on-chip datasets, there are still significant challenges in data analysis due to variation in sample preparation and errors introduced in sequencing <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
         <p>Several methods have been proposed for finding ChIP-enriched regions in a ChIP-Seq sample compared to a suitable negative control (for example, mock or non-specific immunoprecipitation). These involve fitting a model derived from negative control and/or sample low read intensity (background) regions, and then applying this model to identify ChIP-enriched regions (peaks) <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. However, few methods have been proposed for comparison of ChIP-Seq samples. The simplest approach classifies the peaks from each sample as either common or unique, based on whether or not the peak overlaps with peaks in other samples <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Although this method can identify general relationships between peak sets from different samples, the results are highly dependent on the cutoff used in peak calling, which is difficult to select in a completely objective manner. Moreover, common peaks may show differential binding between the samples being compared, while other peaks may be identified as unique to one sample simply because they fall below an arbitrary cutoff in the other sample. Differences in background levels further confound analysis. Consequently, quantitative comparison of ChIP-Seq samples, while important for extracting maximal biological information, is fraught with numerous challenges.</p>
         <p>An intuitive and widely used approach of quantitative comparison relies on rescaling data on the basis of the total number of sequence reads. However, this method is inadequate and may introduce errors when the S/N ratio varies between samples. Recently, statistical tools have been developed to discover regions that exhibit significant differences between two ChIP-Seq data sets. For example, <it>Xu et al. </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp> proposed a hidden Markov model-based method to detect broad chromatin domains associated with distinct levels of histone modifications between two cell types. Other peak calling programs identify differential binding regions between two ChIP-Seq data sets by using one data set as sample and the other as control <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Since these methods also rely on the total number of reads (or background region reads) to re-scale the data, they fail to circumvent problems associated with different S/N ratios. In an alternative approach, Taslim <it>et al. </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp> proposed a nonlinear method that uses locally weighted regression (LOWESS) for ChIP-Seq data normalization. The underlying assumption of this method is that the genome-wide distribution of read densities has equal mean value and variance across samples <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. A potential problem with this approach is that global symmetry will be introduced after normalization, an assumption that may not be valid when comparing biological samples with different numbers of binding sites. In addition, this method normalizes samples based on the absolute difference of read counts instead of log<sub>2 </sub>ratio commonly used in traditional MA plot methods <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, and thus the differences deduced by this method cannot be used directly for quantitative comparison with other observations of biological significance, such as fold changes in gene expression.</p>
         <p>Here, we describe a simple and effective model, termed MAnorm, to quantitatively compare ChIP-Seq data sets. To circumvent the issue of differences in S/N ratio between samples, we focused on ChIP-enriched regions (peaks), and introduced a novel idea, that ChIP-Seq common peaks could serve as a reference to build the rescaling model for normalization. This approach is based on the empirical assumption that if a chromatin-associated protein has a substantial number of peaks shared in two conditions, the binding at these common regions will tend to be determined by similar mechanisms, and thus should exhibit similar global binding intensities across samples. This idea is further supported by motif analysis that we present. MAnorm exhibits good performance when applied to ChIP-Seq data for both epigenetic modifications and transcription factor binding site identification. Importantly, quantitative differences inferred by MAnorm are strongly correlated with differential expression of target genes and the binding of cell type-specific regulators. Comparisons to prior methods using genome-wide signals for normalization reveal that MAnorm is free of bias and better reflects authentic biological changes. Therefore, MAnorm should serve as a powerful tool in probing mechanisms of gene regulation.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Model description</p>
            </st>
            <p>Data normalization is an important step in sequencing data analysis. However, normalization of ChIP-Seq data is a difficult task due to the differential S/N ratio across samples (see Discussion). These differences cannot simply be addressed using traditional microarray data normalization methods, such as quantile normalization <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> and MA plot followed by LOWESS regression <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Here we borrow the idea of the MA plot and propose a novel method for quantitative comparison of ChIP-Seq data sets based on two empirical assumptions. First, we assume the true intensities of most common peaks are the same between two ChIP-Seq samples. This assumption is valid when the binding regions represented by the common peaks show a much higher level of co-localization between samples than that expected at random, and thus binding at the common peaks should be determined by similar mechanisms and exhibit similar global binding intensity between samples. Second, the observed differences in sequence read density in common peaks are presumed to reflect the scaling relationship of ChIP-Seq signals between two samples, which can thus be applied to all peaks. Based on these hypotheses, the log<sub>2 </sub>ratio of read density between two samples (<it>M</it>) was plotted against the average log<sub>2 </sub>read density (<it>A</it>) for all peaks, and robust linear regression was applied to fit the global dependence between the <it>M-A </it>values of common peaks. Finally, the derived linear model was used as a reference for normalization and extrapolated to all peaks. The normalized <it>M </it>value was then used as a quantitative measure of differential binding in each peak region between two samples, with peak regions associated with larger absolute <it>M </it>values exhibiting greater differences in binding. The workflow of the method, MAnorm, is shown in Figure <figr fid="F1">1</figr>. The MAnorm package is available for download in Additional file <supplr sid="S1">1</supplr>.</p>
            <fig id="F1"><title><p>Figure 1</p></title><caption><p>Workflow of MAnorm</p></caption><text>
   <p><b>Workflow of MAnorm</b>. MAnorm takes the coordinate of all peaks and aligned reads in both samples as input. The (<it>M, A</it>) value of each common peak is then calculated and plotted, where <it>M </it>= log<sub>2 </sub>(Read density in sample 1/Read density in sample 2) and <it>A </it>= <it>0.5 </it>&#215; log<sub>2 </sub>(Read density in sample 1 &#215; Read density in sample 2). Robust regression is subsequently applied to the (<it>M, A</it>) values of all common peaks and a linear model is derived. Finally, the linear model is extrapolated to all peaks for normalization. A <it>P</it>-value is also calculated for each peak to describe the statistical significance of read intensity difference between the two samples being compared.</p>
</text><graphic file="gb-2012-13-3-r16-1"/></fig>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>MAnorm package written in MATLAB and R</b>.</p>
               </text>
               <file name="gb-2012-13-3-r16-S1.RAR">
   <p>Click here for file</p>
</file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Comparison of cell line-dependent epigenetic modifications using MAnorm</p>
            </st>
            <p>Differential epigenetic modifications are closely associated with many developmental and disease processes <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. As such, quantitative comparison of ChIP-Seq signals across multiple cell types may help elucidate underlying epigenetic mechanisms of disease and tissue-specific regulation. We applied MAnorm to analyze the differences between H1 human embryonic stem (ES) cells and two disease-related cell lines, K562 and HeLaS3, for two histone modifications positively associated with gene expression, H3K4me3 and H3K27ac. For each chromatin mark, peaks identified in each cell line showed substantial overlap with those from the other two cell lines, with the overlap ranging from 16- to 24-fold greater than the overlap observed by random permutations (Figure <figr fid="F2">2a</figr>; Supplementary Figure <figr fid="F1">1</figr> in Additional file <supplr sid="S2">2</supplr>). Before normalization, the MA plots exhibited an overall global dependence of <it>M </it>value on <it>A</it>, which was closely fitted by a linear model derived by robust regression (Figure <figr fid="F2">2b</figr>; Supplementary Figure <figr fid="F2">2</figr> in Additional file <supplr sid="S2">2</supplr>). A similar global dependence was evident in comparisons of biological replicates (Supplementary Figure 7 in Additional file <supplr sid="S2">2</supplr>; discussed below), indicating the dependence of <it>M </it>on <it>A </it>does not reflect biological changes but is due mainly to systemic bias and noise. After application of MAnorm to remove this dependence from the set of common peaks, the distribution of common peaks became highly symmetric with respect to the new <it>A </it>axis. Furthermore, the two sets of unique peaks became more symmetric in all comparisons (Figure <figr fid="F2">2c</figr>; Supplementary Figure <figr fid="F2">2</figr> in Additional file <supplr sid="S2">2</supplr>). These observations suggest that the ChIP-Seq signals in all peaks follow a similar scaling relationship and that the extrapolation of the linear model from common peaks to all peaks is valid. The significance of differential binding in each peak region was determined using a <it>P</it>-value calculated based on a Bayesian model developed by Audic and Claverie <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> (Figure <figr fid="F2">2d</figr>; Supplementary Figure <figr fid="F2">2</figr> in Additional file <supplr sid="S2">2</supplr>).</p>
            <fig id="F2"><title><p>Figure 2</p></title><caption><p>Normalization of H3K4me3 ChIP-Seq data in H1 ES cells and K562 cells</p></caption><text>
   <p><b>Normalization of H3K4me3 ChIP-Seq data in H1 ES cells and K562 cells. (a) </b>Venn diagram representing the overlap of H3K4me3 peaks between H1 ES and K562 cells. The overlap of peaks between the two cell lines was 24-fold greater than that observed for random permutations of the peak sets. <b>(b,c) </b>MA plots of all peaks from both samples before (b) and after MAnorm (c). The red line is the linear model derived from common peaks by robust regression. Purple and green circles represent unique peaks; red and black circles represent common peaks. <b>(d) </b><it>P</it>-values associated with normalized peaks, displayed as an MA plot, with the color range representing -log<sub>10 </sub><it>P</it>-value. Most peaks associated with |<it>M</it>| > 1 have a <it>P</it>-value &lt; 10<sup>-10</sup>.</p>
</text><graphic file="gb-2012-13-3-r16-2"/></fig>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>Supplementary figures</b>.</p>
               </text>
               <file name="gb-2012-13-3-r16-S2.PDF">
   <p>Click here for file</p>
</file>
            </suppl>
            <p>Next, we investigated the relationship between the <it>M </it>value (= <it>log<sub>2 </sub></it>(Read density in cell type 1/Read density in cell type 2)) and the change in expression of peak targets between cell types. In general, target genes associated with positive <it>M </it>values - that is, peaks with higher H3K4me3 and H3K27ac read intensity in cell type 1 - were enriched in genes more highly expressed in cell type 1. Conversely, target genes associated with negative <it>M </it>values were enriched in genes more highly expressed in cell type 2 (Figure <figr fid="F3">3</figr>; Supplementary Figure <figr fid="F3">3</figr> in Additional file <supplr sid="S2">2</supplr>). These findings are consistent with the activating role of these two histone modifications <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Notably, the enrichment score of genes more highly expressed in cell type 1 showed strong positive correlation with the <it>M </it>values, while the enrichment score of genes more highly expressed in cell type 2 correlated negatively with <it>M</it>, suggesting that the <it>M </it>statistics determined by MAnorm serve as an indicator of cell type-specificity for the epigenetic marks in peak regions (Figure <figr fid="F3">3</figr>; Supplementary Figure <figr fid="F3">3</figr> in Additional file <supplr sid="S2">2</supplr>). Furthermore, the target genes associated with an absolute <it>M </it>value &gt; 1 were significantly enriched in genes highly expressed in the corresponding cell type among all our comparisons, implying that the absolute <it>M </it>value of 1 is a suitable cutoff for defining cell type-specifically marked genes. It should be noted that many common target genes were associated with <it>M </it>values far from 0, and were still highly enriched for cell type-specifically expressed genes (Figure <figr fid="F3">3a</figr>; Supplementary Figure <figr fid="F3">3a</figr> in Additional file <supplr sid="S2">2</supplr>), indicating that the differential epigenetic marks at these genes are also functional. On the other hand, those unique target genes with <it>M </it>values near zero displayed much weaker enrichment of cell type-specifically expressed genes (Figure <figr fid="F3">3b, c</figr>; Supplementary Figure <figr fid="F3">3b, c</figr> in Additional file <supplr sid="S2">2</supplr>), indicating that they are not uniquely marked in one cell type. MAnorm also exhibited good performance when applied to ChIP-seq datasets composed of broad, diffuse peaks, such as histone modifications like H3K36me3 (Supplementary Figure <figr fid="F4">4</figr> in Additional file <supplr sid="S2">2</supplr> and Supplementary Text in Additional file <supplr sid="S3">3</supplr>). In conclusion, MAnorm quantitatively describes authentic binding differences of chromatin-associated proteins, and thus represents an improvement over arbitrary definitions of common and unique targets based on peak overlap between samples.</p>
            <fig id="F3"><title><p>Figure 3</p></title><caption><p>Quantitative differences in H3K4me3 marks between two cell lines are strongly correlated with cell type-specific expression of peak targets</p></caption><text>
   <p><b>Quantitative differences in H3K4me3 marks between two cell lines are strongly correlated with cell type-specific expression of peak targets. (a) </b>Enrichment of the target genes of all common H3K4me3 peaks in H1 ES cells and K562 cells in cell type-specifically expressed genes as identified by SAM (see Materials and methods). The target genes were grouped by the <it>M </it>values of nearby peaks and the enrichment scores were calculated as the ratio of overlap between target genes grouped by <it>M </it>value and differentially expressed genes compared to expected overlap at random. <b>(b,c) </b>Enrichment of the the target genes of all unique H3K4me3 peaks in H1 ES cells (b) or K562 cells (c) in cell type-specifically expressed genes.</p>
</text><graphic file="gb-2012-13-3-r16-3"/></fig>
            <fig id="F4"><title><p>Figure 4</p></title><caption><p>Hierarchical clustering of the <it>M </it>value and motif scores in all H3K27ac peaks of H1 ES cells and K562 cells</p></caption><text>
   <p><b>Hierarchical clustering of the <it>M </it>value and motif scores in all H3K27ac peaks of H1 ES cells and K562 cells. (a,b) </b>Hierarchical clustering was applied to the correlation coefficients of <it>M </it>values (= log2 (Read density in H1 ES/Read density in K562)) or -<it>M </it>values (= log2 (Read density in K562/Read density in H1 ES)) of all H3K27ac peaks identified in H1 ES cells (a) or K562 cells (b), with motif scores determined for 130 JASPAR vertebrate core motifs in the peak regions. Only the motifs significantly enriched in the peaks of either cell type are shown here (enrichment score > 1.2 and Bonferroni corrected <it>P</it>-value &lt; 1e-5 by Fisher exact test). The names of the motifs closely clustered with <it>M </it>value or -<it>M </it>value are colored in red.</p>
</text><graphic file="gb-2012-13-3-r16-4"/></fig>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p><b>Supplementary text</b>. Includes text on the use of MAnorm normalized read density to determine whether the peak calling cutoff is comparable between two ChIP-seq data sets; results of downstream analyses following MAnorm are robust to different peak cutoffs; integrating multiple replicates in ChIP-seq data set comparison; derivation of the <it>P</it>-value that quantifies the significance of differential binding at peak regions; using MAnorm to compare H3K36me3 ChIP-seq data; assessing the effect of number of common peaks used in analysis; comparing signal-to-noise ratio before and after normalization; Supplementary Methods.</p>
               </text>
               <file name="gb-2012-13-3-r16-S3.DOC">
   <p>Click here for file</p>
</file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Identification of cell type-specific regulators directly associated with differential binding</p>
            </st>
            <p>A conventional strategy to identify cell type-specific regulators associated with changes in epigenetic marks relies on the identification of transcription factor binding sites that are highly enriched in unique peak regions. This method often yields multiple candidates, and thus complicates the identification of key regulators associated with the differences in epigenetic marks in each cell type. One advantage of the continuous <it>M </it>value determined by MAnorm is that it can be used to identify potential regulators driving cell type-specific epigenetic modifications. To do so, we searched for motifs that show the highest correlation with <it>M </it>values for all peaks. For example, we compared H1 ES and K562 cell lines for differences in H3K27ac, a histone mark that serves as an indicator of both active promoters and cell type-specific enhancers <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. We found that OCT4 (POU5F1) and SOX2 binding motifs were closely clustered with the <it>M </it>value (= log<sub>2 </sub>(H3K27ac read density in H1 ES cells/H3K27ac read density in K562 cells) of H3K27ac peaks (Figure <figr fid="F4">4a</figr>), suggesting the corresponding factors are closely related to the activation of ES cell-specific genes and <it>cis</it>-elements. In contrast, -<it>M </it>value (= log<sub>2 </sub>(H3K27ac read density in K562 cells/H3K27ac read density in H1 ES cells) formed a compact module with the binding motifs for transcription factors GATA1 and SCL (TAL1) (Figure <figr fid="F4">4b</figr>), suggesting their roles as regulators favoring H3K27ac modification in K562 cells. These findings are consistent with the established roles of OCT4-SOX2 in ES cell self-renewal <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> and GATA-SCL in hematopoiesis and leukemogenesis <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. On the other hand, several motifs, including MYC and ETS motifs (for example, ELK1, ELK4, GABPA), were highly enriched in both peak sets, but showed no association with the differential binding of H3K27ac (specifically, <it>M </it>value); this indicates they are involved in H3K27ac modification in a non-cell type-specific manner. This finding in turn supports the working assumption of our model that binding at most common peaks is determined by similar mechanisms. Furthermore, upon comparison of the H3K27ac marks in H1 ES or K562 cells with those in HeLaS3 cells, these same transcription factor motifs were tightly associated with the H1 ES or K562-specific enrichment of H3K27ac marks at the corresponding target regions (Supplementary Figure <figr fid="F5">5</figr> in Additional file <supplr sid="S2">2</supplr>), indicating the clustering results are robust. Thus, MAnorm serves as a powerful tool to uncover transcription factor motifs and factors critical for cell-specific gene regulation.</p>
            <fig id="F5"><title><p>Figure 5</p></title><caption><p>Comparison of c-Myc ChIP-Seq data between HeLaS3 and K562 cell lines</p></caption><text>
   <p><b>Comparison of c-Myc ChIP-Seq data between HeLaS3 and K562 cell lines. (a) </b>Venn diagram showing the overlap of c-Myc binding site peaks between HeLaS3 and K562 cell lines. The overlap of cMyc peaks between the two cell lines was 65-fold greater than that observed for random permutations of the peak sets. <b>(b,c) </b>Hierarchical clustering of correlation coefficients of <it>M </it>value or -<it>M </it>value of all c-Myc peaks in HeLaS3 cells (b) and K562 cells (c) with the motif scores in the corresponding peak regions. Only significantly enriched motifs are shown. <b>(d) </b>Scatter plot of the <it>M </it>values determined for c-Myc binding versus the <it>M </it>values for H3K27ac based on ChIP-Seq comparisons between HeLaS3 and K562 cell lines.</p>
</text><graphic file="gb-2012-13-3-r16-5"/></fig>
         </sec>
         <sec>
            <st>
               <p>Differences in c-Myc binding between HeLaS3 and K562 cells</p>
            </st>
            <p>The oncogene Myc (c-Myc) is an important transcriptional regulator in both ES cells and cancer cells <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Mechanisms underlying its cell type-specific binding are largely unknown. We applied MAnorm to quantify differential binding of c-Myc in HeLaS3 and K562 cells and explored its relationship with other factors. Using a <it>P</it>-value cutoff of 1E-6, 18,924 peaks were detected in the c-Myc ChIP-Seq data set of HeLaS3 cells, and 13,140 peaks were detected in K562 cells; approximately 6,000 peaks were common to both cell lines (Figure <figr fid="F5">5a</figr>). MAnorm largely removed the global dependence of M on A (Supplementary Figure <figr fid="F6">6a, b</figr> in Additional file <supplr sid="S2">2</supplr>). A significant fraction of c-Myc peaks were associated with <it>M </it>values far from zero, suggesting that c-Myc has a large number of differential binding loci between HeLaS3 cells and K562 cells. To search for cell line-specific co-factors that might contribute to such differential binding, we performed hierarchical clustering between the <it>M </it>statistics inferred by MAnorm and the motif scores in the c-Myc binding peaks. The c-Myc motif was highly enriched in both sets of c-Myc peaks (data not shown), but did not show significant correlation with <it>M </it>statistics in either clustering map (Figure <figr fid="F5">5b, c</figr>), indicating that the c-Myc motif is not responsible for the cell line-differential binding seen for c-Myc. Of note, the <it>M </it>statistic (= log<sub>2 </sub>(c-Myc read density in K562/c-Myc read density in HeLaS3)) clustered with the motifs of two other factors, GATA1 and SCL (TAL1) (Figure <figr fid="F5">5b</figr>), and the -<it>M </it>statistic (= log<sub>2 </sub>(c-Myc read density in HeLaS3/c-Myc read density in K562) clustered with the motifs of AP1 and TEAD1 (Figure <figr fid="F5">5c</figr>). Strikingly, these clustering patterns were highly similar to those obtained from the comparison of the H3K27ac mark between these two cell types (Supplementary Figure <figr fid="F5">5b</figr> in Additional file <supplr sid="S2">2</supplr>), suggesting an underlying correlation between the cell type-specific binding of c-Myc and the H3K27ac mark. To test whether this was the case, we mapped c-Myc binding sites to gene promoters, and found that for the 9,013 genes targeted by both c-Myc and H3K27ac, the Pearson correlation coefficient between the <it>M </it>statistics of c-Myc and H3K27ac was 0.73 (Figure <figr fid="F5">5d</figr>), lending further support to our clustering result.</p>
            <fig id="F6"><title><p>Figure 6</p></title><caption><p>Comparison of different normalization models</p></caption><text>
   <p><b>Comparison of different normalization models. (a-c) </b>MA plot of H3K27ac peaks in H1 ES cells and K562 cells after normalization by total reads (a), quantile normalization (b) and genome-wide MA plot followed by LOWESS regression (c). The corresponding MA plot based on MAnorm is shown in Supplementary Figure 2a in Additional file <supplr sid="S2">2</supplr>. <b>(d-g) </b>Scatter plot of log2 expression ratios of target genes between H1 ES cells and K562 cells versus the <it>M </it>values normalized by total reads (d), quantile normalization (e), genome-wide MA plot followed by LOWESS normalization (f), and MAnorm (g). The color bar represents the density of dots in the scatter plot and purple dots represent the outliers separated from the others. <b>(h) </b>Distribution of <it>M </it>values for each normalization method and distribution of log2 expression ratios of non-differentially expressed target genes (fold-change &lt; 1.5). T-statistics and <it>P</it>-values calculated based on one sample Students' <it>t</it>-test comparing to 0 for each normalization method were as follows: MAnorm, t-statistic = -0.55 and <it>P </it>= 0.58 by <it>t</it>-test; total reads normalization, t-statistic = -88 and <it>P </it>&lt; 1E-100; quantile normalization, t-statistic = -140 and <it>P </it>&lt; 1E-100; genome-wide MA, t-statistic = 24 and <it>P </it>&lt; 1E-100. For non-differentially expressed target genes, t-statistic = -0.76 and <it>P </it>= 0.45.</p>
</text><graphic file="gb-2012-13-3-r16-6"/></fig>
         </sec>
         <sec>
            <st>
               <p>Application to the integration of ChIP-Seq replicates</p>
            </st>
            <p>Integrating ChIP-Seq data from multiple biological replicates, which in some cases are generated by different laboratories and/or using different platforms, may be employed to reduce the false positive rate in identified binding sites. A simple approach is to define a stringent set of peaks composed only of the common peaks shared by two or more replicates. However, this method is highly sensitive to peak cutoff and may exclude peaks that have similar ChIP intensities between replicates. Moreover, some common peaks that show dramatic differences in read density are retained. Therefore, to make full use of the information in biological replicates, a quantitative comparison of peak intensity is particularly useful. We have applied MAnorm to compare two replicates of H1 ES cell H3K27ac ChIP-Seq data. After application of MAnorm (Supplementary Figure 7a, b in Additional file <supplr sid="S2">2</supplr>), many of the unique peaks were associated with <it>M </it>values close to zero, indicating that these peaks exhibit good reproducibility between replicates. On the other hand, there remained a small fraction of common peaks with <it>M </it>values far from zero, representing strong signal differences between replicates. Next, we showed that the <it>M </it>value between replicates is a good indicator of H3K27ac target gene expression. We grouped H3K27ac target genes by the absolute value of <it>M </it>statistics and calculated the expression distribution of each gene group. Given that H3K27ac marks are positively associated with gene expression, we anticipated that more highly expressed genes will have stronger H3K27ac marks, and therefore be more reliable. In fact, we observed that genes having higher expression tend to be the targets of H3K27ac peaks with lower absolute <it>M </it>values, that is, peaks showing smaller difference between replicates, for both common peaks and unique peaks (Supplementary Figure 7c-e in Additional file <supplr sid="S2">2</supplr>). Furthermore, by overlapping the above set of ENCODE peaks with H3K27ac peaks for H1 ES cells generated in a different laboratory <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, we found that a much lower proportion of the peaks with |<it>M</it>| &gt; 1 were covered by the new peak set than those with |<it>M</it>| &lt; 1 (Supplementary Figure 7f in Additional file <supplr sid="S2">2</supplr>). This suggests that |<it>M</it>| = 1 can also be used as an empirical cutoff to filter unreliable peaks. Thus, MAnorm can be used both to check whether two replicates are concordant, and also to obtain high confidence peak lists by filtering out inconsistent peaks. Compared with arbitrary removal of unique peaks, MAnorm allows for better use of replicate peak data. The MAnorm package (Additional file <supplr sid="S1">1</supplr>) provides the opportunity to list concordant and non-concordant peaks between two samples based on user-specified cutoffs, with the concordant peak list corresponding to high-confidence peaks.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison with other methods</p>
            </st>
            <p>We compared the performance of MAnorm with three widely used normalization methods that use genome-wide signals as reference, namely, normalization by total reads, quantile normalization, which assumes the genome-wide distribution of read densities to be the same across samples, and normalization using a genome-wide MA plot followed by LOWESS regression. We used all four methods to compare H3K27ac ChIP-Seq data between H1 ES and K562 cells. The MA plot normalized by MAnorm (Supplementary Figure <figr fid="F2">2a</figr> in Additional file <supplr sid="S2">2</supplr>) was relatively symmetric, while corresponding plots obtained by the other three normalization methods remained highly asymmetric. Of note, the common peaks showed a clear global bias towards stronger binding in K562 cells for total read normalization and quantile normalization (Figure <figr fid="F6">6a, b</figr>) and toward H1 ES cells for genome-wide MA plot normalization (Figure <figr fid="F6">6c</figr>). To examine which method better reflects a true biological signal, we compared <it>M </it>values normalized by all four methods with the expression change of target genes. If a specific type of histone modification is closely related to gene regulation, the direction of histone modification change should be consistent with that of the change in expression of the target genes. By visual inspection, we found this was true for the <it>M </it>values normalized by MAnorm (Figure <figr fid="F6">6g</figr>). In contrast, <it>M </it>values normalized by the other three methods were inconsistent with the log2-expression ratios of target genes (Figure <figr fid="F6">6d-f</figr>). Specifically, most of the genes with no change in H3K27ac levels (<it>M </it>= 0) had higher (total read and quantile normalization) or lower (genome-wide MA plot normalization) expression in H1 ES cells compared to K562 cells; while the majority of the genes expressed at similar levels in these two cell types were associated with negative (total read and quantile normalization) or positive (genome-wide MA plot normalization) <it>M </it>values, that is, they had higher (total read and quantile normalization) or lower (genome-wide MA plot normalization) levels of H3K27ac in K562 cells.</p>
            <p>To quantitatively measure the bias of the <it>M </it>values given by the above normalization methods, we first collected non-differentially expressed genes (fold-change &lt; 1.5) between H1 ES cells and K562 cells. As shown in Figure <figr fid="F6">6h</figr>, these genes are indeed not differentially expressed (t-statistics = -0.76 and <it>P</it>-value = 0.45 by Students' <it>t</it>-test in comparison to an expression ratio of 1 (<it>M </it>= 0)), indicating they are suitable for our comparison. Since H3K27ac marks are closely associated with transcriptional activation, it is reasonable to assume that these non-differentially expressed genes should exhibit similar global H3K27ac levels. This is true only for H3K27ac levels determined by MAnorm, where the <it>M </it>values for H3K27ac of the non-differentially expressed target genes were not significantly different from a ratio of 1 (<it>M </it>= 0; t-statistic = -0.55 and <it>P</it>-value = 0.58 by <it>t</it>-test; Figure <figr fid="F6">6h</figr>, red curve). In contrast, <it>M </it>values for H3K27ac obtained by the other three normalization methods exhibited large deviations from <it>M </it>= 0 (t-statistic ranging from 24 to 140 and <it>P</it>-value &lt; 1e-100; Figure <figr fid="F6">6h</figr>). Thus, MAnorm exhibits superior performance in identifying authentic biological changes.</p>
            <p>We also compared the performance of MAnorm in detecting differential binding regions in ChIP-Seq data sets with that of two currently used statistical methods, ChIPdiff <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and MACS <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. For this analysis, one data set was used as sample and the other was used as control in order to detect regions with significantly elevated ChIP-Seq signals in the first data set <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. We applied all three methods to compare ChIP-Seq data for H3K27ac marks between H1 ES cells and K562 cells (Supplementary Table 1 in Additional file <supplr sid="S4">4</supplr>). ChIPdiff and MACS identified four to six times more target regions associated with significantly increased ChIP-Seq signals for K562 cells compared with those found for H1 ES cells, whereas MAnorm yielded a similar number of cell type-biased peaks in each cell line. To compare the enrichment of cell type-specifically expressed genes in the sets of target genes of the differential binding regions discovered by the three methods, we selected the same number of target genes associated with top differential binding regions identified by each method. The target genes of top differential binding regions identified by MAnorm contained similar numbers of H1 ES cell highly expressed genes but a greater number of K562 cell highly expressed genes compared to those identified by ChIPdiff and MACS (Supplementary Table 1 in Additional file <supplr sid="S4">4</supplr>), suggesting MAnorm performs better in detecting differentially binding regions than the other two methods. Importantly, the fold changes of differential binding given by ChIPdiff and MACS were based on the total number of reads, which may not be appropriate, as discussed above. Additionally, MAnorm showed even better enrichment of cell type-specifically expressed genes in differential binding region targets than the method developed by Taslim <it>et al. </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp> when applied to ChIP-Seq data presented in their study (Supplementary Table 2 in Additional file <supplr sid="S4">4</supplr>).</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p><b>Supplementary Tables - comparison of MAnorm with other methods</b>.</p>
               </text>
               <file name="gb-2012-13-3-r16-S4.XLS">
   <p>Click here for file</p>
</file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Normalization methods are typically based on the assumption that certain properties are invariant across samples. For example, quantile normalization in gene expression microarrays renders the distribution of expression levels of all genes constant between samples <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Alternatively, normalization may be based on housekeeping genes, whose expression is presumed to remain constant across samples. The situation is quite different in ChIP-Seq studies, since the binding of most chromatin-associated proteins is highly dynamic and cell type-dependent. Thus, it is arbitrary to assume that the genome-wide distribution of ChIP-seq signals remains constant between samples. It is also challenging to identify reliable control genomic regions bound by a chromatin-associated protein in a non-cell type-specific manner that can serve as an internal reference for normalization. Yet another difficulty underlying ChIP-Seq studies is background noise, which is often difficult to distinguish from authentic ChIP signals. Furthermore, the S/N ratio often varies across samples. These same issues apply to DNase-Seq data sets, as discussed elsewhere <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. In many peak-calling models, the distribution of background signal is used to normalize sample and control data, which is reasonable when control data are composed mainly of background signal, and the purpose is to identify sequence read-enriched regions within a sample that shows significant differences compared to the background. However, this approach is inappropriate for sample-to-sample comparisons, especially when the S/N difference is large across samples. For example, samples relatively free of 'noise' will yield a larger number of statistically significant peaks compared to samples with a higher level of background sequence reads, but these additional peaks may not be true cell line-specific or condition-specific peaks. In MAnorm, we focused only on regions identified as significant peaks, and thus minimized the impact of S/N differences between samples. Accordingly, the output of MAnorm focuses on peak regions most likely to be of biological relevance.</p>
         <p>MAnorm shows improved performance when compared with other methods currently used to detect differential binding regions between ChIP-Seq data sets. More importantly, MAnorm provides a quantitative measurement of binding differences, which reflects authentic biological differences. This feature is an asset for downstream analysis, including expression assays and transcription co-factor identification studies. Although the definition of ChIP-Seq peaks is highly dependent on the cutoff used in peak calling, MAnorm is robust to cutoff selection (Supplementary Figure 8 in Additional file <supplr sid="S2">2</supplr> and Additional file <supplr sid="S3">3</supplr>). Furthermore, the normalized read densities of each peak in both ChIP-Seq samples can be calculated from the (<it>M, A</it>) values normalized by MAnorm, and then used to evaluate whether the cutoffs used to define peaks are comparable between the ChIP-Seq samples being compared (Supplementary Figure 8 in Additional file <supplr sid="S2">2</supplr> and Additional file <supplr sid="S3">3</supplr>).</p>
         <p>MAnorm relies on two working assumptions. First, MAnorm is designed for quantitative comparison of ChIP-Seq data sets that have a substantial number of peak regions in common. Second, MAnorm postulates that there are no global changes in the true ChIP signals at these common peaks. We believe these underlying hypotheses are widely applicable and do not significantly restrict the use of MAnorm, as exemplified by our application of MAnorm to elucidate hormone-regulated, cell state-specific transcription factor binding in mouse liver <it>in vivo </it><abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. For ChIP-seq samples for which there is not a significant overlap in peak sets, the binding of chromatin-associated proteins could be uncorrelated or even anti-correlated at a genome-wide scale and MAnorm would not be applicable. However, in that case a quantitative comparison would likely not be that useful. In addition, in cases where the binding patterns for a chromatin-bound factor change widely across the genome, such as following knock down of a core subunit of a chromatin-associated protein complex <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, more specific analysis would be required to quantitatively determine the global changes.</p>
         <p>The pairwise approach to comparison of ChIP-Seq samples proposed here can be extended to multiple sample comparison, as was successfully demonstrated in the case of two-channel microarray data analysis <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Furthermore, it is well known that transcription factors and epigenetic modifications act together to modulate gene expression <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Most recently, statistical models have been developed to study such combinatorial patterns in a genome-wide fashion <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. However, how changes in epigenetic marks and transcriptional factors correlate with each other across cell lines is still largely unexplored. In this study, we used MAnorm to successfully detect an underlying correlation between cell-type dependent binding of c-Myc and the H3K27ac mark in two disease-related cell types. Thus, it will be interesting to integrate quantitative changes of other epigenetic marks and transcriptional factors for further elucidation of the complex mechanisms underlying cell type-specific regulation.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>MAnorm exhibited excellent performance in quantitative comparison of ChIP-Seq data sets for both epigenetic modifications and transcription factors; the quantitative binding differences inferred by MAnorm were highly correlated with both the changes in expression of target genes and also the binding of cell type-specific regulators. With the accumulation of ChIP-seq data sets, MAnorm should serve as a powerful tool for obtaining a more comprehensive understanding of cell type-specific and cell state-specific regulation during organism development and disease onset.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <p>The workflow of MAnorm is summarized in Figure <figr fid="F1">1</figr>. First, four bed files that describe the coordinates of all predefined peaks and aligned sequence reads of two ChIP-Seq samples are used as input. Second, MAnorm calculates the number of reads in a window of the same length centered at the summit of each peak. Here the window size should be comparable to the median length of ChIP-enriched regions; we recommend 2,000 bp window size for histone modifications and 1,000 bp for transcription factor binding sites. The (<it>M, A</it>) value of each peak is then defined as:</p>
         <p>
            <display-formula id="M1">
               <m:math name="gb-2012-13-3-r16-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>M</m:mi>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">log</m:mtext>
         </m:mstyle>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>R</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">/</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>R</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math>
            </display-formula>
         </p>
         <p>and:</p>
         <p>
            <display-formula id="M2">
               <m:math name="gb-2012-13-3-r16-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>A</m:mi>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">log</m:mtext>
         </m:mstyle>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>R</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">&#215;</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>R</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-bin">/</m:mo>
   <m:mn>2</m:mn>
</m:mrow>
</m:math>
            </display-formula>
         </p>
         <p>Here, <it>R<sub>1 </sub></it>is the read density at this peak region in ChIP-Seq sample 1 and <it>R<sub>2 </sub></it>is the corresponding read density in sample 2. To avoid log<sub>2</sub>0, we added a value of 1 to the real number of reads for all peaks. Thus, the value of <it>M </it>describes the log<sub>2 </sub>fold change of the read density at a peak region between two samples, while <it>A </it>represents the average signal intensity in terms of log<sub>2</sub>-transformed read density. To build the normalization model, each peak of the two samples being compared was further classified as a common or a unique peak, depending on whether or not it overlapped (by at least one nucleotide, as implemented in our analysis in this study) with any peak in the other sample. The downloadable MATLAB MAnorm package (Additional file <supplr sid="S1">1</supplr>) also provides a parameter for users to select common peaks based on a cutoff of peak summit-to-summit distance. By default, this value is set to 500 bp for histone modifications and 250 bp for transcription factors. In addition, when a peak overlaps with multiple peaks in the other sample, MAnorm selects the peak with the smallest summit-to-summit distance to avoid potential bias in building the normalization model. Next, robust regression was applied to the <it>M</it>-<it>A </it>values of common peaks using iterative re-weighted least squares with a bi-square weighting function <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> and a linear model was derived to fit the global dependence between the <it>M-A </it>values of these peaks:</p>
         <p>
            <display-formula id="M3">
               <m:math name="gb-2012-13-3-r16-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>M</m:mi>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mi>a</m:mi>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:mi>b</m:mi>
   <m:mo class="MathClass-bin">&#215;</m:mo>
   <m:mi>A</m:mi>
</m:mrow>
</m:math>
            </display-formula>
         </p>
         <p>To normalize the (<it>M, A</it>) values of all peaks, MAnorm performed coordinate transformation to make the <it>A </it>axis overlap with the linear model derived from regression. The corresponding (<it>M, A</it>) value under the new coordinate system was then taken as the normalized (<it>M, A</it>) value of each peak. Finally, a <it>P</it>-value associated with each peak was calculated to quantify the significance of differential binding at this locus using a Bayesian model developed by Audic and Claverie <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>:</p>
         <p>
            <display-formula>
               <m:math name="gb-2012-13-3-r16-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>p</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>y</m:mi>
         <m:mo class="MathClass-rel">|</m:mo>
         <m:mi>x</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>x</m:mi>
         <m:mo class="MathClass-bin">+</m:mo>
         <m:mi>y</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mstyle mathvariant="italic">
      <m:mo class="MathClass-punc">!</m:mo>
   </m:mstyle>
   <m:mo class="MathClass-bin">/</m:mo>
   <m:mi>x</m:mi>
   <m:mstyle mathvariant="italic">
      <m:mo class="MathClass-punc">!</m:mo>
   </m:mstyle>
   <m:mi>y</m:mi>
   <m:mstyle mathvariant="italic">
      <m:mo class="MathClass-punc">!</m:mo>
   </m:mstyle>
   <m:msup>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>x</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mi>y</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msup>
</m:mrow>
</m:math>
            </display-formula>
         </p>
         <p>in which x and y specify the normalized read count at this peak in sample 1 and sample 2, respectively. Additional file <supplr sid="S3">3</supplr> provides further details on <it>P</it>-value calculations. When the read densities at most peak regions are high, most peaks associated with absolute <it>M </it>values &gt; 1 are associated with significant <it>P</it>-values. Then, the <it>M </it>value can be used to rank peaks and select differential binding regions, as was done in analyzing ENCODE ChIP-Seq data (Supplementary Table 1 in Additional file <supplr sid="S4">4</supplr>). When read densities at most peak regions are relatively low, some of the peaks associated with absolute <it>M </it>values &gt; 1 may still fail to obtain significant <it>P</it>-values. In such a case, we suggest ranking peaks by <it>P</it>-values and defining differential binding regions using combined cutoffs of both <it>M </it>value and <it>P</it>-value, as we did when analyzing the ChIP-seq data from Taslim <it>et. al. </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp> (Supplementary Table 2 in Additional file <supplr sid="S4">4</supplr>).</p>
         <p>The output of MAnorm includes the normalized (<it>M, A</it>) value and the corresponding <it>P</it>-value of each peak. To illustrate the normalization process, the (<it>M, A</it>) values of all peaks before and after normalization are plotted together with the linear model derived from common peaks. The MAnorm package will also generate three bed files presenting the genome coordinates for the non-differential binding region and two differential binding regions based on user-specified cutoffs, together with two wig files (corresponding to the two peak lists under comparison) that can be uploaded to a genome browser for visualization of the <it>M </it>value for each peak (Supplementary Figure 9). MATLAB and R versions of the MAnorm package are available for downloading in Additional file <supplr sid="S1">1</supplr>.</p>
         <sec>
            <st>
               <p>Application of MAnorm to ENCODE ChIP-Seq data</p>
            </st>
            <p>The performance of MAnorm was tested using ENCODE ChIP-Seq data describing histone modifications (H3K4me3 and H3K27ac) <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> and transcription factor binding (c-Myc and Pol II) <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> across three human cell lines: H1 ES cells, HeLaS3 cells, and K562 cells <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Since these data were generated and processed by different laboratories associated with the ENCODE project, the data sets were reanalyzed and the ChIP-Seq peaks in each sample were redefined using MACS <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> using a <it>P</it>-value cutoff of 1e-10 for histone modifications and a <it>P</it>-value cutoff of 1e-6 for transcription factor binding. The peaks of histone modifications were further filtered by the false discovery rate (FDR) values modeled by MACS. The target genes of each group of peaks were defined as those RefSeq genes that have a given peak(s) in the promoter region, defined as the region from 8 kb upstream to 2 kb downstream of the transcription start site.</p>
            <p>Gene expression data for all three cell types were collected from the Gene Expression Omnibus (GEO) database using accession numbers [GEO:GSE26312] (for H1 ES cells) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, [GEO:GSE2735] (for HeLaS3 cells) <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> and [GEO:GSE12056] (for K562 cells) <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, and the raw data were reprocessed by dChip <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. The differentially expressed genes were subsequently identified by Significance Analysis of Microarrays (SAM) <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> using a combined cutoff of fold change &gt; 2 and FDR &lt; 0.01. In total, 3,465 genes more highly expressed in H1 ES cells and 2,224 genes more highly expressed in K562 cells were identified from the H1 ES to K562 comparison; 5,815 genes more highly expressed in H1 ES cells and 1,649 genes more highly expressed in HeLaS3 cells were identified from the H1 ES cell to HeLaS3 cell comparison; and 3,555 genes more highly expressed in HeLaS3 cells and 5,916 genes more highly expressed in K562 cells were identified from the HeLaS3 cell to K562 cell comparison. To study the relationship between binding differences in peak regions and the expression change of the corresponding target genes, we used the <it>M </it>values of peaks to divide the targeted genes into different groups separated by integer <it>M </it>values from -4 to 4, and then calculated the enrichment score of the overlap between each gene group and those differentially expressed genes. To avoid extreme enrichment scores, groups composed of &lt; 50 genes were merged with the larger of the adjacent two gene groups.</p>
         </sec>
         <sec>
            <st>
               <p>Motif scan and hierarchical clustering of motif scores with peak <it>M </it>value</p>
            </st>
            <p>To detect the potential binding of transcription factors in defined peak regions, we downloaded the position weight matrixes of 130 core vertebrate motifs from the JASPAR database <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> and performed motif scan <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> applied to a 1,000 bp window centered at the peak summit. For each motif <it>F</it>, the raw motif matching score at each peak <it>P </it>was calculated as:</p>
            <p>
               <display-formula>
                  <m:math name="gb-2012-13-3-r16-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:munder class="msub">
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">max</m:mtext>
         </m:mstyle>
      </m:mrow>
      <m:mrow>
         <m:mi>s</m:mi>
         <m:mo class="MathClass-rel">&#8712;</m:mo>
         <m:mi>p</m:mi>
      </m:mrow>
   </m:munder>
   <m:mfenced separators="" open="[" close="]">
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">log</m:mtext>
         </m:mstyle>
         <m:mfrac>
            <m:mrow>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>S</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:mi>M</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>S</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:mi>B</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
         </m:mfrac>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>in which <it>S </it>is a sequence fragment of the same length as the motif and <it>B </it>is the background frequency of different nucleotides estimated from 10,000 random 1,000 bp sequences sampled from the genome. The motif score of motif <it>M </it>in peak <it>P </it>was defined as the raw motif matching score divided by the maximum possible score, that is, the raw motif score obtained by the consensus sequence of the motif.</p>
            <p>To identify transcription factors associated with cell type-specific binding of the ChIP'd proteins, we applied hierarchical clustering with Ward's linkage to cluster the <it>M </it>value with the motif matching score of JASPAR motifs in all peaks of cell type 1, and separately the -<it>M </it>value was clustered with the motif scores in all peaks of cell type 2, using '1 - <it>&#961;</it>' as the distance metric, where <it>&#961; </it>is the Pearson correlation coefficient. Only motifs with an enrichment score &gt; 1.2 and Bonferroni-corrected <it>P</it>-value &lt; 1.0E-5 by Fisher exact test are shown in the clustering plots.</p>
         </sec>
         <sec>
            <st>
               <p>Comparing the performance of MAnorm and other methods</p>
            </st>
            <p>For total read normalization, we divided the read intensity of each peak region by the total number of mapped sequence reads. For quantile normalization, we first divided the whole genome into non-overlapping bins of the same size as the window used in MAnorm (2,000 bp for H3K27ac), and then calculated the read count in each bin. Finally, the distribution of bin read counts was normalized to be the same by matching all quantiles between samples. For normalization by genome-wide MA plots, we first divided the whole genome into non-overlapping bins of the same size as the window used in MAnorm (2,000 bp for H3K27ac), and then calculated the <it>M-A </it>value of each bin. The dependence between <it>M</it>-<it>A </it>value was then removed by subtracting <it>M </it>values with local linear model fitted by LOWESS regression from the genome-wide <it>M</it>-<it>A </it>values.</p>
            <p>To compare the performance of MAnorm with the model developed by Taslim <it>et al. </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, we used MACS to identify peaks from the same Pol II ChIP-Seq datasets used by <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, and then applied MAnorm to compare Pol II binding profiles between estradiol (E2)-stimulated MCF7 cells and unstimulated MCF7 cells. The gene expression data of unstimulated and E2-stimulated MCF7 cells was obtained from the GEO database, accession number [GEO:GSE11352] <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. We identified 59 genes showing higher expression in unstimulated MCF7 cells and 130 genes showing higher expression in E2-stimulated (12 h) MCF7 cells using SAM with fold change &gt; 2 and FDR &lt; 0.1. Finally, the performance of MAnorm was evaluated by comparing the difference of Pol II binding determined by both models with the differential expression of target genes.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>ChIP-Seq: chromatin immunoprecipitation followed by massively parallel DNA sequencing; E2: estradiol; ES: embryonic stem; FDR: false discovery rate; GEO: Gene Expression Omnibus; S/N: signal to background noise.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>ZS and YZ conceived the study, developed the algorithms, carried out analyses and drafted the manuscript; DJW and SHO conceived the study, supervised the data analyses and edited the manuscript. All authors discussed the results and revised the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank the laboratories associated with the ENCODE project for generating and maintaining the data sets used in our analyses. We thank Aarathi Sugathan (Boston University) for sharing ideas and scripts during MAnorm BASH/R package development; we also thank Andy Rampersaud (Boston University), Drs Jian Xu and Han Xu (Dana-Farber Cancer Institute) for many useful discussions and suggestions during the course of this study. Supported in part by NIH grants DK033765 (to DJW) and HG005085 (to GCY).</p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>ChIP-seq: advantages and challenges of a maturing technology.</p></title><aug><au><snm>Park</snm><fnm>PJ</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2009</pubdate><volume>10</volume><fpage>669</fpage><lpage>680</lpage><xrefbib><pubidlist><pubid idtype="pmcid">3191340</pubid><pubid idtype="pmpid" link="fulltext">19736561</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>An integrated software system for analyzing ChIP-chip and ChIP-seq data.</p></title><aug><au><snm>Ji</snm><fnm>H</fnm></au><au><snm>Jiang</snm><fnm>H</fnm></au><au><snm>Ma</snm><fnm>W</fnm></au><au><snm>Johnson</snm><fnm>DS</fnm></au><au><snm>Myers</snm><fnm>RM</fnm></au><au><snm>Wong</snm><fnm>WH</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2008</pubdate><volume>26</volume><fpage>1293</fpage><lpage>1300</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.1505</pubid><pubid idtype="pmcid">2596672</pubid><pubid idtype="pmpid" link="fulltext">18978777</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.</p></title><aug><au><snm>Rozowsky</snm><fnm>J</fnm></au><au><snm>Euskirchen</snm><fnm>G</fnm></au><au><snm>Auerbach</snm><fnm>RK</fnm></au><au><snm>Zhang</snm><fnm>ZD</fnm></au><au><snm>Gibson</snm><fnm>T</fnm></au><au><snm>Bjornson</snm><fnm>R</fnm></au><au><snm>Carriero</snm><fnm>N</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><au><snm>Gerstein</snm><fnm>MB</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2009</pubdate><volume>27</volume><fpage>66</fpage><lpage>75</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.1518</pubid><pubid idtype="pmcid">2924752</pubid><pubid idtype="pmpid" link="fulltext">19122651</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Model-based analysis of ChIP-Seq (MACS).</p></title><aug><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Liu</snm><fnm>T</fnm></au><au><snm>Meyer</snm><fnm>CA</fnm></au><au><snm>Eeckhoute</snm><fnm>J</fnm></au><au><snm>Johnson</snm><fnm>DS</fnm></au><au><snm>Bernstein</snm><fnm>BE</fnm></au><au><snm>Nusbaum</snm><fnm>C</fnm></au><au><snm>Myers</snm><fnm>RM</fnm></au><au><snm>Brown</snm><fnm>M</fnm></au><au><snm>Li</snm><fnm>W</fnm></au><au><snm>Liu</snm><fnm>XS</fnm></au></aug><source>Genome Biol</source><pubdate>2008</pubdate><volume>9</volume><fpage>R137</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2008-9-9-r137</pubid><pubid idtype="pmcid">2592715</pubid><pubid idtype="pmpid" link="fulltext">18798982</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Discovering hematopoietic mechanisms through genome-wide analysis of GATA factor chromatin occupancy.</p></title><aug><au><snm>Fujiwara</snm><fnm>T</fnm></au><au><snm>O&apos;Geen</snm><fnm>H</fnm></au><au><snm>Keles</snm><fnm>S</fnm></au><au><snm>Blahnik</snm><fnm>K</fnm></au><au><snm>Linnemann</snm><fnm>AK</fnm></au><au><snm>Kang</snm><fnm>YA</fnm></au><au><snm>Choi</snm><fnm>K</fnm></au><au><snm>Farnham</snm><fnm>PJ</fnm></au><au><snm>Bresnick</snm><fnm>EH</fnm></au></aug><source>Mol Cell</source><pubdate>2009</pubdate><volume>36</volume><fpage>667</fpage><lpage>681</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.molcel.2009.11.001</pubid><pubid idtype="pmcid">2784893</pubid><pubid idtype="pmpid" link="fulltext">19941826</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>PHF8 mediates histone H4 lysine 20 demethylation events involved in cell cycle progression.</p></title><aug><au><snm>Liu</snm><fnm>W</fnm></au><au><snm>Tanasa</snm><fnm>B</fnm></au><au><snm>Tyurina</snm><fnm>OV</fnm></au><au><snm>Zhou</snm><fnm>TY</fnm></au><au><snm>Gassmann</snm><fnm>R</fnm></au><au><snm>Liu</snm><fnm>WT</fnm></au><au><snm>Ohgi</snm><fnm>KA</fnm></au><au><snm>Benner</snm><fnm>C</fnm></au><au><snm>Garcia-Bassets</snm><fnm>I</fnm></au><au><snm>Aggarwal</snm><fnm>AK</fnm></au><au><snm>Desai</snm><fnm>A</fnm></au><au><snm>Dorrestein</snm><fnm>PC</fnm></au><au><snm>Glass</snm><fnm>CK</fnm></au><au><snm>Rosenfeld</snm><fnm>MG</fnm></au></aug><source>Nature</source><pubdate>2010</pubdate><volume>466</volume><fpage>508</fpage><lpage>512</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09272</pubid><pubid idtype="pmcid">3059551</pubid><pubid idtype="pmpid" link="fulltext">20622854</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Insights into GATA-1-mediated gene activation versus repression via genome-wide chromatin occupancy analysis.</p></title><aug><au><snm>Yu</snm><fnm>M</fnm></au><au><snm>Riva</snm><fnm>L</fnm></au><au><snm>Xie</snm><fnm>H</fnm></au><au><snm>Schindler</snm><fnm>Y</fnm></au><au><snm>Moran</snm><fnm>TB</fnm></au><au><snm>Cheng</snm><fnm>Y</fnm></au><au><snm>Yu</snm><fnm>D</fnm></au><au><snm>Hardison</snm><fnm>R</fnm></au><au><snm>Weiss</snm><fnm>MJ</fnm></au><au><snm>Orkin</snm><fnm>SH</fnm></au><au><snm>Bernstein</snm><fnm>BE</fnm></au><au><snm>Fraenkel</snm><fnm>E</fnm></au><au><snm>Cantor</snm><fnm>AB</fnm></au></aug><source>Mol Cell</source><pubdate>2009</pubdate><volume>36</volume><fpage>682</fpage><lpage>695</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.molcel.2009.11.002</pubid><pubid idtype="pmcid">2800995</pubid><pubid idtype="pmpid" link="fulltext">19941827</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding.</p></title><aug><au><snm>Schmidt</snm><fnm>D</fnm></au><au><snm>Wilson</snm><fnm>MD</fnm></au><au><snm>Ballester</snm><fnm>B</fnm></au><au><snm>Schwalie</snm><fnm>PC</fnm></au><au><snm>Brown</snm><fnm>GD</fnm></au><au><snm>Marshall</snm><fnm>A</fnm></au><au><snm>Kutter</snm><fnm>C</fnm></au><au><snm>Watt</snm><fnm>S</fnm></au><au><snm>Martinez-Jimenez</snm><fnm>CP</fnm></au><au><snm>Mackay</snm><fnm>S</fnm></au><au><snm>Talianidis</snm><fnm>I</fnm></au><au><snm>Flicek</snm><fnm>P</fnm></au><au><snm>Odom</snm><fnm>DT</fnm></au></aug><source>Science</source><pubdate>2010</pubdate><volume>328</volume><fpage>1036</fpage><lpage>1040</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1186176</pubid><pubid idtype="pmcid">3008766</pubid><pubid idtype="pmpid" link="fulltext">20378774</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Genome-wide analysis reveals novel molecular features of mouse recombination hotspots.</p></title><aug><au><snm>Smagulova</snm><fnm>F</fnm></au><au><snm>Gregoretti</snm><fnm>IV</fnm></au><au><snm>Brick</snm><fnm>K</fnm></au><au><snm>Khil</snm><fnm>P</fnm></au><au><snm>Camerini-Otero</snm><fnm>RD</fnm></au><au><snm>Petukhova</snm><fnm>GV</fnm></au></aug><source>Nature</source><pubdate>2011</pubdate><volume>472</volume><fpage>375</fpage><lpage>378</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09869</pubid><pubid idtype="pmcid">3117304</pubid><pubid idtype="pmpid" link="fulltext">21460839</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity.</p></title><aug><au><snm>Williams</snm><fnm>K</fnm></au><au><snm>Christensen</snm><fnm>J</fnm></au><au><snm>Pedersen</snm><fnm>MT</fnm></au><au><snm>Johansen</snm><fnm>JV</fnm></au><au><snm>Cloos</snm><fnm>PA</fnm></au><au><snm>Rappsilber</snm><fnm>J</fnm></au><au><snm>Helin</snm><fnm>K</fnm></au></aug><source>Nature</source><pubdate>2011</pubdate><volume>473</volume><fpage>343</fpage><lpage>348</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature10066</pubid><pubid idtype="pmcid">3408592</pubid><pubid idtype="pmpid" link="fulltext">21490601</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data.</p></title><aug><au><snm>Xu</snm><fnm>H</fnm></au><au><snm>Wei</snm><fnm>CL</fnm></au><au><snm>Lin</snm><fnm>F</fnm></au><au><snm>Sung</snm><fnm>WK</fnm></au></aug><source>Bioinformatics</source><pubdate>2008</pubdate><volume>24</volume><fpage>2344</fpage><lpage>2349</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btn402</pubid><pubid idtype="pmpid" link="fulltext">18667444</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Comparative study on ChIP-seq data: normalization and binding pattern characterization.</p></title><aug><au><snm>Taslim</snm><fnm>C</fnm></au><au><snm>Wu</snm><fnm>J</fnm></au><au><snm>Yan</snm><fnm>P</fnm></au><au><snm>Singer</snm><fnm>G</fnm></au><au><snm>Parvin</snm><fnm>J</fnm></au><au><snm>Huang</snm><fnm>T</fnm></au><au><snm>Lin</snm><fnm>S</fnm></au><au><snm>Huang</snm><fnm>K</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><fpage>2334</fpage><lpage>2340</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp384</pubid><pubid idtype="pmcid">2800347</pubid><pubid idtype="pmpid" link="fulltext">19561022</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Limma: linear models for microarray data.</p></title><aug><au><snm>Smyth</snm><fnm>GK</fnm></au></aug><source>Bioinformatics and Computational Biology Solutions Using R and Bioconductor</source><publisher>New York: Springer</publisher><editor>Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S</editor><pubdate>2005</pubdate><fpage>397</fpage><lpage>420</lpage></bibl><bibl id="B14"><title><p>A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.</p></title><aug><au><snm>Bolstad</snm><fnm>BM</fnm></au><au><snm>Irizarry</snm><fnm>RA</fnm></au><au><snm>Astrand</snm><fnm>M</fnm></au><au><snm>Speed</snm><fnm>TP</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><fpage>185</fpage><lpage>193</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/19.2.185</pubid><pubid idtype="pmpid" link="fulltext">12538238</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Nutrition, epigenetics, and developmental plasticity: implications for understanding human disease.</p></title><aug><au><snm>Burdge</snm><fnm>GC</fnm></au><au><snm>Lillycrop</snm><fnm>KA</fnm></au></aug><source>Annu Rev Nutr</source><pubdate>2010</pubdate><volume>30</volume><fpage>315</fpage><lpage>339</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1146/annurev.nutr.012809.104751</pubid><pubid idtype="pmpid" link="fulltext">20415585</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>The significance of digital gene expression profiles.</p></title><aug><au><snm>Audic</snm><fnm>S</fnm></au><au><snm>Claverie</snm><fnm>JM</fnm></au></aug><source>Genome Res</source><pubdate>1997</pubdate><volume>7</volume><fpage>986</fpage><lpage>995</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9331369</pubid></xrefbib></bibl><bibl id="B17"><title><p>Histone modification patterns and epigenetic codes.</p></title><aug><au><snm>Lennartsson</snm><fnm>A</fnm></au><au><snm>Ekwall</snm><fnm>K</fnm></au></aug><source>Biochim Biophys Acta</source><pubdate>2009</pubdate><volume>1790</volume><fpage>863</fpage><lpage>868</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.bbagen.2008.12.006</pubid><pubid idtype="pmpid" link="fulltext">19168116</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Histone H3K27ac separates active from poised enhancers and predicts developmental state.</p></title><aug><au><snm>Creyghton</snm><fnm>MP</fnm></au><au><snm>Cheng</snm><fnm>AW</fnm></au><au><snm>Welstead</snm><fnm>GG</fnm></au><au><snm>Kooistra</snm><fnm>T</fnm></au><au><snm>Carey</snm><fnm>BW</fnm></au><au><snm>Steine</snm><fnm>EJ</fnm></au><au><snm>Hanna</snm><fnm>J</fnm></au><au><snm>Lodato</snm><fnm>MA</fnm></au><au><snm>Frampton</snm><fnm>GM</fnm></au><au><snm>Sharp</snm><fnm>PA</fnm></au><au><snm>Boyer</snm><fnm>LA</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au><au><snm>Jaenisch</snm><fnm>R</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2010</pubdate><volume>107</volume><fpage>21931</fpage><lpage>21936</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.1016071107</pubid><pubid idtype="pmcid">3003124</pubid><pubid idtype="pmpid" link="fulltext">21106759</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>A unique chromatin signature uncovers early developmental enhancers in humans.</p></title><aug><au><snm>Rada-Iglesias</snm><fnm>A</fnm></au><au><snm>Bajpai</snm><fnm>R</fnm></au><au><snm>Swigut</snm><fnm>T</fnm></au><au><snm>Brugmann</snm><fnm>SA</fnm></au><au><snm>Flynn</snm><fnm>RA</fnm></au><au><snm>Wysocka</snm><fnm>J</fnm></au></aug><source>Nature</source><pubdate>2011</pubdate><volume>470</volume><fpage>279</fpage><lpage>283</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09692</pubid><pubid idtype="pmpid" link="fulltext">21160473</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Core transcriptional regulatory circuitry in human embryonic stem cells.</p></title><aug><au><snm>Boyer</snm><fnm>LA</fnm></au><au><snm>Lee</snm><fnm>TI</fnm></au><au><snm>Cole</snm><fnm>MF</fnm></au><au><snm>Johnstone</snm><fnm>SE</fnm></au><au><snm>Levine</snm><fnm>SS</fnm></au><au><snm>Zucker</snm><fnm>JP</fnm></au><au><snm>Guenther</snm><fnm>MG</fnm></au><au><snm>Kumar</snm><fnm>RM</fnm></au><au><snm>Murray</snm><fnm>HL</fnm></au><au><snm>Jenner</snm><fnm>RG</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Melton</snm><fnm>DA</fnm></au><au><snm>Jaenisch</snm><fnm>R</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au></aug><source>Cell</source><pubdate>2005</pubdate><volume>122</volume><fpage>947</fpage><lpage>956</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2005.08.020</pubid><pubid idtype="pmcid">3006442</pubid><pubid idtype="pmpid" link="fulltext">16153702</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Self-renewal of teratocarcinoma and embryonic stem cells.</p></title><aug><au><snm>Chambers</snm><fnm>I</fnm></au><au><snm>Smith</snm><fnm>A</fnm></au></aug><source>Oncogene</source><pubdate>2004</pubdate><volume>23</volume><fpage>7150</fpage><lpage>7160</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.onc.1207930</pubid><pubid idtype="pmpid" link="fulltext">15378075</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>A Myc network accounts for similarities between embryonic stem and cancer cell transcription programs.</p></title><aug><au><snm>Kim</snm><fnm>J</fnm></au><au><snm>Woo</snm><fnm>AJ</fnm></au><au><snm>Chu</snm><fnm>J</fnm></au><au><snm>Snow</snm><fnm>JW</fnm></au><au><snm>Fujiwara</snm><fnm>Y</fnm></au><au><snm>Kim</snm><fnm>CG</fnm></au><au><snm>Cantor</snm><fnm>AB</fnm></au><au><snm>Orkin</snm><fnm>SH</fnm></au></aug><source>Cell</source><pubdate>2010</pubdate><volume>143</volume><fpage>313</fpage><lpage>324</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2010.09.010</pubid><pubid idtype="pmcid">3018841</pubid><pubid idtype="pmpid" link="fulltext">20946988</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>c-Myc regulates transcriptional pause release.</p></title><aug><au><snm>Rahl</snm><fnm>PB</fnm></au><au><snm>Lin</snm><fnm>CY</fnm></au><au><snm>Seila</snm><fnm>AC</fnm></au><au><snm>Flynn</snm><fnm>RA</fnm></au><au><snm>McCuine</snm><fnm>S</fnm></au><au><snm>Burge</snm><fnm>CB</fnm></au><au><snm>Sharp</snm><fnm>PA</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au></aug><source>Cell</source><pubdate>2010</pubdate><volume>141</volume><fpage>432</fpage><lpage>445</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2010.03.030</pubid><pubid idtype="pmcid">2864022</pubid><pubid idtype="pmpid" link="fulltext">20434984</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Unbiased, genome-wide in vivo mapping of transcriptional regulatory elements reveals sex differences in chromatin structure associated with sex-specific liver gene expression.</p></title><aug><au><snm>Ling</snm><fnm>G</fnm></au><au><snm>Sugathan</snm><fnm>A</fnm></au><au><snm>Mazor</snm><fnm>T</fnm></au><au><snm>Fraenkel</snm><fnm>E</fnm></au><au><snm>Waxman</snm><fnm>DJ</fnm></au></aug><source>Mol Cell Biol</source><pubdate>2010</pubdate><volume>30</volume><fpage>5531</fpage><lpage>5544</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/MCB.00601-10</pubid><pubid idtype="pmcid">2976433</pubid><pubid idtype="pmpid" link="fulltext">20876297</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Dynamic, sex-differential STAT5 and BCL6 binding to sex-biased, growth hormone-regulated genes in adult mouse liver.</p></title><aug><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Laz</snm><fnm>EV</fnm></au><au><snm>Waxman</snm><fnm>DJ</fnm></au></aug><source>Mol Cell Biol</source><pubdate>2012</pubdate><volume>32</volume><fpage>880</fpage><lpage>896</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/MCB.06312-11</pubid><pubid idtype="pmcid">3272977</pubid><pubid idtype="pmpid" link="fulltext">22158971</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Role for Dpy-30 in ES cell-fate specification by regulation of H3K4 methylation within bivalent domains.</p></title><aug><au><snm>Jiang</snm><fnm>H</fnm></au><au><snm>Shukla</snm><fnm>A</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>Chen</snm><fnm>WY</fnm></au><au><snm>Bernstein</snm><fnm>BE</fnm></au><au><snm>Roeder</snm><fnm>RG</fnm></au></aug><source>Cell</source><pubdate>2011</pubdate><volume>144</volume><fpage>513</fpage><lpage>525</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2011.01.020</pubid><pubid idtype="pmcid">3572774</pubid><pubid idtype="pmpid" link="fulltext">21335234</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>The mammalian epigenome.</p></title><aug><au><snm>Bernstein</snm><fnm>BE</fnm></au><au><snm>Meissner</snm><fnm>A</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au></aug><source>Cell</source><pubdate>2007</pubdate><volume>128</volume><fpage>669</fpage><lpage>681</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2007.01.033</pubid><pubid idtype="pmpid" link="fulltext">17320505</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Discovery and characterization of chromatin states for systematic annotation of the human genome.</p></title><aug><au><snm>Ernst</snm><fnm>J</fnm></au><au><snm>Kellis</snm><fnm>M</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2010</pubdate><volume>28</volume><fpage>817</fpage><lpage>825</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.1662</pubid><pubid idtype="pmcid">2919626</pubid><pubid idtype="pmpid" link="fulltext">20657582</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Mapping and analysis of chromatin state dynamics in nine human cell types.</p></title><aug><au><snm>Ernst</snm><fnm>J</fnm></au><au><snm>Kheradpour</snm><fnm>P</fnm></au><au><snm>Mikkelsen</snm><fnm>TS</fnm></au><au><snm>Shoresh</snm><fnm>N</fnm></au><au><snm>Ward</snm><fnm>LD</fnm></au><au><snm>Epstein</snm><fnm>CB</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Wang</snm><fnm>L</fnm></au><au><snm>Issner</snm><fnm>R</fnm></au><au><snm>Coyne</snm><fnm>M</fnm></au><au><snm>Ku</snm><fnm>M</fnm></au><au><snm>Durham</snm><fnm>T</fnm></au><au><snm>Kellis</snm><fnm>M</fnm></au><au><snm>Bernstein</snm><fnm>BE</fnm></au></aug><source>Nature</source><pubdate>2011</pubdate><volume>473</volume><fpage>43</fpage><lpage>49</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09906</pubid><pubid idtype="pmcid">3088773</pubid><pubid idtype="pmpid" link="fulltext">21441907</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Epigenetic domains found in mouse embryonic stem cells via a hidden Markov model.</p></title><aug><au><snm>Larson</snm><fnm>JL</fnm></au><au><snm>Yuan</snm><fnm>GC</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2010</pubdate><volume>11</volume><fpage>557</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-11-557</pubid><pubid idtype="pmcid">2992069</pubid><pubid idtype="pmpid" link="fulltext">21073706</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Comprehensive analysis of the chromatin landscape in Drosophila melanogaster.</p></title><aug><au><snm>Kharchenko</snm><fnm>PV</fnm></au><au><snm>Alekseyenko</snm><fnm>AA</fnm></au><au><snm>Schwartz</snm><fnm>YB</fnm></au><au><snm>Minoda</snm><fnm>A</fnm></au><au><snm>Riddle</snm><fnm>NC</fnm></au><au><snm>Ernst</snm><fnm>J</fnm></au><au><snm>Sabo</snm><fnm>PJ</fnm></au><au><snm>Larschan</snm><fnm>E</fnm></au><au><snm>Gorchakov</snm><fnm>AA</fnm></au><au><snm>Gu</snm><fnm>T</fnm></au><au><snm>Linder-Basso</snm><fnm>D</fnm></au><au><snm>Plachetka</snm><fnm>A</fnm></au><au><snm>Shanower</snm><fnm>G</fnm></au><au><snm>Tolstorukov</snm><fnm>MY</fnm></au><au><snm>Luquette</snm><fnm>LJ</fnm></au><au><snm>Xi</snm><fnm>R</fnm></au><au><snm>Jung</snm><fnm>YL</fnm></au><au><snm>Park</snm><fnm>RW</fnm></au><au><snm>Bishop</snm><fnm>EP</fnm></au><au><snm>Canfield</snm><fnm>TK</fnm></au><au><snm>Sandstrom</snm><fnm>R</fnm></au><au><snm>Thurman</snm><fnm>RE</fnm></au><au><snm>MacAlpine</snm><fnm>DM</fnm></au><au><snm>Stamatoyannopoulos</snm><fnm>JA</fnm></au><au><snm>Kellis</snm><fnm>M</fnm></au><au><snm>Elgin</snm><fnm>SC</fnm></au><au><snm>Kuroda</snm><fnm>MI</fnm></au><au><snm>Pirrotta</snm><fnm>V</fnm></au><au><snm>Karpen</snm><fnm>GH</fnm></au><au><snm>Park</snm><fnm>PJ</fnm></au></aug><source>Nature</source><pubdate>2011</pubdate><volume>471</volume><fpage>480</fpage><lpage>485</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09725</pubid><pubid idtype="pmcid">3109908</pubid><pubid idtype="pmpid" link="fulltext">21179089</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>A cis-regulatory map of the Drosophila genome.</p></title><aug><au><snm>Negre</snm><fnm>N</fnm></au><au><snm>Brown</snm><fnm>CD</fnm></au><au><snm>Ma</snm><fnm>L</fnm></au><au><snm>Bristow</snm><fnm>CA</fnm></au><au><snm>Miller</snm><fnm>SW</fnm></au><au><snm>Wagner</snm><fnm>U</fnm></au><au><snm>Kheradpour</snm><fnm>P</fnm></au><au><snm>Eaton</snm><fnm>ML</fnm></au><au><snm>Loriaux</snm><fnm>P</fnm></au><au><snm>Sealfon</snm><fnm>R</fnm></au><au><snm>Li</snm><fnm>Z</fnm></au><au><snm>Ishii</snm><fnm>H</fnm></au><au><snm>Spokony</snm><fnm>RF</fnm></au><au><snm>Chen</snm><fnm>J</fnm></au><au><snm>Hwang</snm><fnm>L</fnm></au><au><snm>Cheng</snm><fnm>C</fnm></au><au><snm>Auburn</snm><fnm>RP</fnm></au><au><snm>Davis</snm><fnm>MB</fnm></au><au><snm>Domanus</snm><fnm>M</fnm></au><au><snm>Shah</snm><fnm>PK</fnm></au><au><snm>Morrison</snm><fnm>CA</fnm></au><au><snm>Zieba</snm><fnm>J</fnm></au><au><snm>Suchy</snm><fnm>S</fnm></au><au><snm>Senderowicz</snm><fnm>L</fnm></au><au><snm>Victorsen</snm><fnm>A</fnm></au><au><snm>Bild</snm><fnm>NA</fnm></au><au><snm>Grundstad</snm><fnm>AJ</fnm></au><au><snm>Hanley</snm><fnm>D</fnm></au><au><snm>MacAlpine</snm><fnm>DM</fnm></au><au><snm>Mannervik</snm><fnm>M</fnm></au><etal/></aug><source>Nature</source><pubdate>2011</pubdate><volume>471</volume><fpage>527</fpage><lpage>531</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09990</pubid><pubid idtype="pmcid">3179250</pubid><pubid idtype="pmpid" link="fulltext">21430782</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Robust analysis of linear models.</p></title><aug><au><snm>McKean</snm><fnm>JW</fnm></au></aug><source>Stat Sci</source><pubdate>2004</pubdate><volume>19</volume><fpage>562</fpage><lpage>570</lpage><xrefbib><pubid idtype="doi">10.1214/088342304000000549</pubid></xrefbib></bibl><bibl id="B34"><title><p>ENCODE ChIP-Seq data describing histone modifications.</p></title><url>http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHistone/</url></bibl><bibl id="B35"><title><p>ENCODE ChIP-Seq data describing transcription factor binding.</p></title><url>http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeYaleChIPseq/</url></bibl><bibl id="B36"><title><p>Unlocking the secrets of the genome.</p></title><aug><au><snm>Celniker</snm><fnm>SE</fnm></au><au><snm>Dillon</snm><fnm>LA</fnm></au><au><snm>Gerstein</snm><fnm>MB</fnm></au><au><snm>Gunsalus</snm><fnm>KC</fnm></au><au><snm>Henikoff</snm><fnm>S</fnm></au><au><snm>Karpen</snm><fnm>GH</fnm></au><au><snm>Kellis</snm><fnm>M</fnm></au><au><snm>Lai</snm><fnm>EC</fnm></au><au><snm>Lieb</snm><fnm>JD</fnm></au><au><snm>MacAlpine</snm><fnm>DM</fnm></au><au><snm>Micklem</snm><fnm>G</fnm></au><au><snm>Piano</snm><fnm>F</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><au><snm>Stein</snm><fnm>L</fnm></au><au><snm>White</snm><fnm>KP</fnm></au><au><snm>Waterston</snm><fnm>RH</fnm></au></aug><source>Nature</source><pubdate>2009</pubdate><volume>459</volume><fpage>927</fpage><lpage>930</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/459927a</pubid><pubid idtype="pmcid">2843545</pubid><pubid idtype="pmpid" link="fulltext">19536255</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Genomic mapping of RNA polymerase II reveals sites of co-transcriptional regulation in human cells.</p></title><aug><au><snm>Brodsky</snm><fnm>AS</fnm></au><au><snm>Meyer</snm><fnm>CA</fnm></au><au><snm>Swinburne</snm><fnm>IA</fnm></au><au><snm>Hall</snm><fnm>G</fnm></au><au><snm>Keenan</snm><fnm>BJ</fnm></au><au><snm>Liu</snm><fnm>XS</fnm></au><au><snm>Fox</snm><fnm>EA</fnm></au><au><snm>Silver</snm><fnm>PA</fnm></au></aug><source>Genome Biol</source><pubdate>2005</pubdate><volume>6</volume><fpage>R64</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2005-6-8-r64</pubid><pubid idtype="pmcid">1273631</pubid><pubid idtype="pmpid" link="fulltext">16086846</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Expression profile of CREB knockdown in myeloid leukemia cells.</p></title><aug><au><snm>Pellegrini</snm><fnm>M</fnm></au><au><snm>Cheng</snm><fnm>JC</fnm></au><au><snm>Voutila</snm><fnm>J</fnm></au><au><snm>Judelson</snm><fnm>D</fnm></au><au><snm>Taylor</snm><fnm>J</fnm></au><au><snm>Nelson</snm><fnm>SF</fnm></au><au><snm>Sakamoto</snm><fnm>KM</fnm></au></aug><source>BMC Cancer</source><pubdate>2008</pubdate><volume>8</volume><fpage>264</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2407-8-264</pubid><pubid idtype="pmcid">2647550</pubid><pubid idtype="pmpid" link="fulltext">18801183</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>Automating dChip: toward reproducible sharing of microarray data analysis.</p></title><aug><au><snm>Li</snm><fnm>C</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2008</pubdate><volume>9</volume><fpage>231</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-9-231</pubid><pubid idtype="pmcid">2390544</pubid><pubid idtype="pmpid" link="fulltext">18466620</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Significance analysis of microarrays applied to the ionizing radiation response.</p></title><aug><au><snm>Tusher</snm><fnm>VG</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au><au><snm>Chu</snm><fnm>G</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2001</pubdate><volume>98</volume><fpage>5116</fpage><lpage>5121</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.091062498</pubid><pubid idtype="pmcid">33173</pubid><pubid idtype="pmpid" link="fulltext">11309499</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>JASPAR: an open-access database for eukaryotic transcription factor binding profiles.</p></title><aug><au><snm>Sandelin</snm><fnm>A</fnm></au><au><snm>Alkema</snm><fnm>W</fnm></au><au><snm>Engstrom</snm><fnm>P</fnm></au><au><snm>Wasserman</snm><fnm>WW</fnm></au><au><snm>Lenhard</snm><fnm>B</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2004</pubdate><volume>32</volume><fpage>D91</fpage><lpage>94</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkh012</pubid><pubid idtype="pmcid">308747</pubid><pubid idtype="pmpid" link="fulltext">14681366</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>Prediction of Polycomb target genes in mouse embryonic stem cells.</p></title><aug><au><snm>Liu</snm><fnm>Y</fnm></au><au><snm>Shao</snm><fnm>Z</fnm></au><au><snm>Yuan</snm><fnm>GC</fnm></au></aug><source>Genomics</source><pubdate>2010</pubdate><volume>96</volume><fpage>17</fpage><lpage>26</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ygeno.2010.03.012</pubid><pubid idtype="pmpid" link="fulltext">20353814</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Whole-genome cartography of estrogen receptor alpha binding sites.</p></title><aug><au><snm>Lin</snm><fnm>CY</fnm></au><au><snm>Vega</snm><fnm>VB</fnm></au><au><snm>Thomsen</snm><fnm>JS</fnm></au><au><snm>Zhang</snm><fnm>T</fnm></au><au><snm>Kong</snm><fnm>SL</fnm></au><au><snm>Xie</snm><fnm>M</fnm></au><au><snm>Chiu</snm><fnm>KP</fnm></au><au><snm>Lipovich</snm><fnm>L</fnm></au><au><snm>Barnett</snm><fnm>DH</fnm></au><au><snm>Stossi</snm><fnm>F</fnm></au><au><snm>Yeo</snm><fnm>A</fnm></au><au><snm>George</snm><fnm>J</fnm></au><au><snm>Kuznetsov</snm><fnm>VA</fnm></au><au><snm>Lee</snm><fnm>YK</fnm></au><au><snm>Charn</snm><fnm>TH</fnm></au><au><snm>Palanisamy</snm><fnm>N</fnm></au><au><snm>Miller</snm><fnm>LD</fnm></au><au><snm>Cheung</snm><fnm>E</fnm></au><au><snm>Katzenellenbogen</snm><fnm>BS</fnm></au><au><snm>Ruan</snm><fnm>Y</fnm></au><au><snm>Bourque</snm><fnm>G</fnm></au><au><snm>Wei</snm><fnm>CL</fnm></au><au><snm>Liu</snm><fnm>ET</fnm></au></aug><source>PLoS Genet</source><pubdate>2007</pubdate><volume>3</volume><fpage>e87</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.0030087</pubid><pubid idtype="pmcid">1885282</pubid><pubid idtype="pmpid" link="fulltext">17542648</pubid></pubidlist></xrefbib></bibl></refgrp>
   </bm>
</art>