<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
<ui>gb-2012-13-10-r87</ui>
<ji>1465-6906</ji>
<fm>
<dochead>Software</dochead>
<bibl>
<title>
<p>methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles</p>
</title>
<aug>
<au ca="yes" id="A1"><snm>Akalin</snm><fnm>Altuna</fnm><insr iid="I1"/><insr iid="I2"/><email>ala2027@med.cornell.edu</email></au>
<au id="A2"><snm>Kormaksson</snm><fnm>Matthias</fnm><insr iid="I3"/><email>mk375@cornell.edu</email></au>
<au id="A3"><snm>Li</snm><fnm>Sheng</fnm><insr iid="I1"/><insr iid="I2"/><email>shl2018@med.cornell.edu</email></au>
<au id="A4"><snm>Garrett-Bakelman</snm><mi>E</mi><fnm>Francine</fnm><insr iid="I4"/><email>frg9015@nyp.org</email></au>
<au id="A5"><snm>Figueroa</snm><mi>E</mi><fnm>Maria</fnm><insr iid="I5"/><email>marfigue@med.umich.edu</email></au>
<au id="A6"><snm>Melnick</snm><fnm>Ari</fnm><insr iid="I4"/><insr iid="I6"/><email>amm2014@med.cornell.edu</email></au>
<au ca="yes" id="A7"><snm>Mason</snm><mi>E</mi><fnm>Christopher</fnm><insr iid="I1"/><insr iid="I2"/><email>chm2042@med.cornell.edu</email></au>
</aug>
<insg>
<ins id="I1"><p>Department of Physiology and Biophysics, 1305 York Ave., Weill Cornell Medical College, New York,
NY 10065, USA</p></ins>
<ins id="I2"><p>The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine,
1305 York Ave., Weill Cornell Medical College, New York, NY 10065, USA</p></ins>
<ins id="I3"><p>Department of Public Health, Weill Cornell Medical College, 1300 York Ave., New York, NY 10065,
USA</p></ins>
<ins id="I4"><p>Department of Medicine, Division of Hematology/Oncology, 1300 York Ave., Weill Cornell Medical
College, New York, NY 10065, USA</p></ins>
<ins id="I5"><p>Department of Pathology, University of Michigan, 109 Zina Pitcher Place, Ann Arbor, MI 48109,
USA</p></ins>
<ins id="I6"><p>Department of Pharmacology, 1300 York Ave., Weill Cornell Medical College, New York, NY 10065,
USA</p></ins>
</insg>
<source>Genome Biology</source>
<issn>1465-6906</issn>
<pubdate>2012</pubdate>
<volume>13</volume>
<issue>10</issue>
<fpage>R87</fpage>
<url>http://genomebiology.com/2012/13/10/R87</url>
<xrefbib><pubidlist><pubid idtype="pmpid">23034086</pubid><pubid idtype="doi">10.1186/gb-2012-13-10-r87</pubid></pubidlist></xrefbib></bibl>
<history><rec><date><day>30</day><month>4</month><year>2012</year></date></rec><revrec><date><day>12</day><month>6</month><year>2012</year></date></revrec><acc><date><day>3</day><month>10</month><year>2012</year></date></acc><pub><date><day>3</day><month>10</month><year>2012</year></date></pub></history>
<cpyrt><year>2012</year><collab>Akalin et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution
License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<p>DNA methylation is a chemical modification of cytosine bases that is pivotal for gene regulation,
cellular specification and cancer development. Here, we describe an R package, methylKit, that
rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and
hydroxymethylation sequencing experiments. methylKit includes functions for clustering, sample
quality visualization, differential methylation analysis and annotation features, thus automating
and simplifying many of the steps for discerning statistically significant bases or regions of DNA
methylation. Finally, we demonstrate methylKit on breast cancer data, in which we find statistically
significant regions of differential methylation and stratify tumor subtypes. methylKit is available
at <url>http://code.google.com/p/methylkit</url>.</p>
</sec>
</abs>
</fm>
<meta>
<classifications>
<classification id="30010002" subtype="man_spc_id" type="BMC">Bioinformatics</classification>
<classification id="30010003" subtype="man_spc_id" type="BMC">Cancer</classification>
<classification id="300100010" subtype="man_spc_id" type="BMC">Genome studies</classification>
<classification id="300100013" subtype="man_spc_id" type="BMC">Methods</classification>
</classifications>
</meta>
<bdy>
<sec>
<st>
<p>Rationale</p>
</st>
<p>DNA methylation is a critical epigenetic modification that guides development, cellular
differentiation and the manifestation of some cancers <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
</abbrgrp>. Specifically, cytosine methylation is a widespread modification in the genome, and it
most often occurs in CpG dinucleotides, although non-CpG cytosines are also methylated in certain
tissues such as embryonic stem cells <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. DNA methylation is one of the many epigenetic control mechanisms associated with gene
regulation. Specifically, cytosine methylation can directly hinder binding of transcription factors
and methylated bases can also be bound by methyl-binding-domain proteins that recruit
chromatin-remodeling factors <abbrgrp>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
</abbrgrp>. In addition, aberrant DNA methylation patterns have been observed in many human
malignancies and can also be used to define the severity of leukemia subtypes <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>. In malignant tissues, DNA is either hypo-methylated or hyper-methylated compared to the
normal tissue. The location of hyper- and hypo-methylated sites gives distinct signatures within
many diseases <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>. Often, hypomethylation is associated with gene activation and hypermethylation is
associated with gene repression, although there are many exceptions to this trend <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>. DNA methylation is also involved in genomic imprinting, where the methylation state of a
gene is inherited from the parents, but <it>de novo </it>methylation also can occur in the early
stages of development <abbrgrp>
<abbr bid="B8">8</abbr>
<abbr bid="B9">9</abbr>
</abbrgrp>.</p>
<p>A common technique for measuring DNA methylation is bisulfite sequencing, which has the advantage
of providing single-base, quantitative cytosine methylation levels. In this technique, DNA is
treated with sodium bisulfite, which deaminates cytosine residues to uracil, but leaves
5-methylcytosine residues unaffected. Single-base resolution, %methylation levels are then
calculated by counting the ratio of C/(C+T) at each base. There are multiple techniques that
leverage high-throughput bisulfite sequencing such as: reduced representation bisulfite sequencing (RRBS)<abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp> and its variants <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>, whole-genome shotgun bisulfite sequencing (BS-seq) <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>, methylC-Seq <abbrgrp>
<abbr bid="B13">13</abbr>
</abbrgrp>, and target capture bisulfite sequencing <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. In addition, 5-hydroxymethylcytosine (5hmC) levels can be measured through a
modification of bisulfite sequencing techniques <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>.</p>
<p>Yet, as bisulfite sequencing techniques have expanded, there are few computational tools
available to analyze the data. Moreover, there is a need for an end-to-end analysis package with
comprehensive features and ease of use. To address this, we have created <it>methylKit</it>, a
multi-threaded R package that can rapidly analyze and characterize data from many methylation
experiments at once. <it>methylKit </it>can read DNA methylation information from a text file and
also from alignment files (for example, SAM files) and carry out operations such as differential
methylation analysis, sample clustering and annotation, and visualization of DNA methylation events
(See Figure <figr fid="F1">1</figr> for a diagram of possible operations). <it>methylKit </it>has
open-source code and is available at <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp> and as Additional file <supplr sid="S1">1</supplr> (see also Additional file <supplr sid="S2">2</supplr> for the user guide and Additional file <supplr sid="S3">3</supplr> for the
package documentation ). Our data framework is also extensible to emerging methods in quantization
of other base modifications, such as 5hmC <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>, or sites discovered through single molecule sequencing <abbrgrp>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp>. For clarity, we describe only examples with DNA methylation data.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Flowchart of possible operations by methylKit</p></caption><text>
   <p><b>Flowchart of possible operations by methylKit</b>. A summary of the most important
<it>methylKit </it>features is shown in a flow chart. It depicts the main features of <it>methylKit
</it>and the sequential relationship between them. The functions that could be used for those
features are also printed in the boxes.</p>
</text><graphic file="gb-2012-13-10-r87-1" hint_layout="double"/></fig>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p><b>methylKit v0.5.3</b>. This version of methylKit is included for archival purposes only. Please
download the most recent version from <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>.</p>
</text>
<file name="gb-2012-13-10-r87-S1.gz">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p><b>methylKit User Guide</b>. A vignette file to accompany the methylKit software package; the
most recent software and vignette can be downloaded at <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>.</p>
</text>
<file name="gb-2012-13-10-r87-S2.pdf">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p><b>methylKit documentation</b>. Documentation for functions and classes in the methylKit software
package; the most recent software and documentation can be downloaded at <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>.</p>
</text>
<file name="gb-2012-13-10-r87-S3.pdf">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Flexible data integration and regional analysis</p>
</st>
<p>High-throughput bisulfite sequencing experiments typically yield millions of reads with reduced
complexity due to cytosine conversion, and there are several different aligners suited for mapping
these reads to the genome (see Frith <it>et al</it>. <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp> and Krueger <it>et al</it>. <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp> for a review and comparison between aligners). Since <it>methylKit </it>only requires a
methylation score per base for all analyses, it is a modular package that can be applied independent
of any aligner. Currently, there are two ways that information can be supplied to
<it>methylKit</it>:: 1) <it>methylKit </it>can read per base methylation scores from a text file
(see Table <tblr tid="T1">1</tblr> for an example of such a file); and, 2) <it>methylKit </it>can
read SAM format <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp> alignments files obtained from Bismark aligner <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>. If a SAM file is supplied, <it>methylkit </it>first processes the alignment file to get
%methylation scores and then reads that information into memory.</p>
<tbl hint_layout="double" id="T1"><title><p>Table 1</p></title><caption><p>Sample text file that can be read by methylKit.</p></caption><tblbdy cols="7">
      <r>
         <c ca="left">
            <p>
               <b>chrBase</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>chr</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>base</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>strand</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>coverage</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>freqC</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>freqT</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="7">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>chr21.9764539</p>
         </c>
         <c ca="left">
            <p>chr21</p>
         </c>
         <c ca="left">
            <p>9764539</p>
         </c>
         <c ca="left">
            <p>R</p>
         </c>
         <c ca="left">
            <p>12</p>
         </c>
         <c ca="left">
            <p>25</p>
         </c>
         <c ca="left">
            <p>75</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>chr21.9764513</p>
         </c>
         <c ca="left">
            <p>chr21</p>
         </c>
         <c ca="left">
            <p>9764513</p>
         </c>
         <c ca="left">
            <p>R</p>
         </c>
         <c ca="left">
            <p>12</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>100</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>chr21.9820622</p>
         </c>
         <c ca="left">
            <p>chr21</p>
         </c>
         <c ca="left">
            <p>9820622</p>
         </c>
         <c ca="left">
            <p>F</p>
         </c>
         <c ca="left">
            <p>13</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>100</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>chr21.9837545</p>
         </c>
         <c ca="left">
            <p>chr21</p>
         </c>
         <c ca="left">
            <p>9837545</p>
         </c>
         <c ca="left">
            <p>F</p>
         </c>
         <c ca="left">
            <p>11</p>
         </c>
         <c ca="left">
            <p>0</p>
         </c>
         <c ca="left">
            <p>100</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>chr21.9849022</p>
         </c>
         <c ca="left">
            <p>chr21</p>
         </c>
         <c ca="left">
            <p>9849022</p>
         </c>
         <c ca="left">
            <p>F</p>
         </c>
         <c ca="left">
            <p>124</p>
         </c>
         <c ca="left">
            <p>72.58</p>
         </c>
         <c ca="left">
            <p>27.42</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>chr21.9853326</p>
         </c>
         <c ca="left">
            <p>chr21</p>
         </c>
         <c ca="left">
            <p>9853326</p>
         </c>
         <c ca="left">
            <p>F</p>
         </c>
         <c ca="left">
            <p>17</p>
         </c>
         <c ca="left">
            <p>70.59</p>
         </c>
         <c ca="left">
            <p>29.41</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p><it>methylKit </it>can read tab-delimited text files with the following format: the text file
should include a unique.id, chromosome name, base position, strand, read coverage, % of C bases and
% of T bases on that location.</p>
   </tblfn></tbl>
<p>Most bisulfite experiments have a set of test and control samples or samples across multiple
conditions, and <it>methylKit </it>can read and store (in memory) methylation data simultaneously
for N-experiments, limited only by memory of the node or computer. The default setting of the
processing algorithm requires that there be least 10 reads covering a base and each of the bases
covering the genomic base position have at least 20 PHRED quality score. Also, since DNA methylation
can occur in CpG, CHG and CHH contexts (H = A, T, or C) <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>, users of <it>methylKit </it>have the option to provide methylation information for all
these contexts: CpG, CHG and CHH from SAM files.</p>
<sec>
<st>
<p>Summarizing DNA methylation information over pre-defined regions or tiling windows</p>
</st>
<p>Although base-pair resolution DNA methylation information is obtained through most bisulfite
sequencing experiments, it might be desirable to summarize methylation information over tiling
windows or over a set of predefined regions (promoters, CpG islands, introns, and so on). For
example, Smith <it>et al</it>. <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp> investigated methylation profiles with RRBS experiments on gametes and zygote and
summarized methylation information on 100bp tiles across the genome. Their analysis revealed a
unique set of differentially methylated regions maintained in early embryo. Using tiling windows or
predefined regions, such as promoters or CpG islands, is desirable when there is not enough
coverage, when bases in close proximity will have similar methylation profiles, or where methylation
properties of a region as a whole determines its function. In accordance with these potential
analytic foci, <it>methylKit </it>provides functionality to do either analysis on tiling windows
across the genome or predefined regions of the genome. After reading the base pair methylation
information, users can summarize the methylation information on pre-defined regions they select or
on tiling windows covering the genome (parameter for tiles are user provided). Then, subsequent
analyses, such as clustering or differential methylation analysis, can be carried out with the same
functions that are used for base pair resolution analysis.</p>
</sec>
<sec>
<st>
<p>Example methylation data set: breast cancer cell lines</p>
</st>
<p>We demonstrated the capabilities of <it>methylKit </it>using an example data set from seven
breast cancer cell lines from Sun <it>et al</it>. <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>. Four of the cell lines express estrogen receptor-alpha (MCF7, T47D, BT474, ZR75-1), and
from here on are referred to as ER+. The other three cell lines (BT20, MDA-MB-231, MDA-MB-468) do
not express estrogen receptor-alpha, and from here on are referred to as ER-. It has been previously
shown that ER+ and ER- tumor samples have divergent gene expression profiles and that those profiles
are associated with disease outcome <abbrgrp>
<abbr bid="B24">24</abbr>
<abbr bid="B25">25</abbr>
</abbrgrp>. Methylation profiles of these cell lines were measured using reduced RRBS <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>. The R objects contained the methylation information for breast cancer cell lines and
functions that produce plots and other results that are shown in the remainder of this manuscript
are in Additional file <supplr sid="S4">4</supplr>.</p>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p><b>R script for example analysis</b>. The file contains R commands that are needed to do analysis
and to produce graphs used in this manuscript. The file contains both the commands and detailed
comments on how those commands can be used. An up to date version of this script will be
consistently maintained at <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>.</p>
</text>
<file name="gb-2012-13-10-r87-S4.r">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
</sec>
<sec>
<st>
<p>Whole methylome characterization: descriptive statistics, sample correlation and clustering</p>
</st>
<sec>
<st>
<p>Descriptive statistics on DNA methylation profiles</p>
</st>
<p>Read coverage per base and % methylation per base are the basic information contained in the
<it>methylKit </it>data structures. <it>methylKit </it>has functions for easy visualization of such
information (Figure <figr fid="F2">2a</figr> and <figr fid="F2">2b</figr> for % methylation and read
coverage distributions, respectively - for code see Additional file <supplr sid="S4">4</supplr>). In
normal cells, % methylation will have a bimodal distribution, which denotes that the majority of
bases have either high or low methylation. The read coverage distribution is also an important
metric that will help reveal if experiments suffer from PCR duplication bias (clonal reads). If such
bias occurs, some reads will be asymmetrically amplified and this will impair accurate determination
of % methylation scores for those regions. If there is a high degree of PCR duplication bias, read
coverage distribution will have a secondary peak on the right side. To correct for this issue,
<it>methylKit </it>has the option to filter bases with very high read coverage.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Descriptive statistics per sample</p></caption><text>
   <p><b>Descriptive statistics per sample</b>. <b>(a</b>) Histogram of %methylation per cytosine for
ER+ T47D sample. Most of the bases have either high or low methylation. (<b>b</b>) Histogram of read
coverage per cytosine for ER+ T47D sample. ER+, estrogen receptor-alpha expressing.</p>
</text><graphic file="gb-2012-13-10-r87-2" hint_layout="double"/></fig>
</sec>
<sec>
<st>
<p>Measuring and visualizing similarity between samples</p>
</st>
<p>We have also included methods to assess sample similarity. Users can calculate pairwise
correlation coefficients (Pearson, Kendall or Spearman) between the %methylation profiles across all
samples. However, to ensure comparable statistics, a new data structure is formed before these
calculations, wherein only cytosines covered in all samples are stored. Subsequently, pairwise
correlations are calculated, to produce a correlation matrix. This matrix allows the user to easily
compare correlation coefficients between pairs of samples and can also be used to perform
hierarchical clustering using 1- correlation distance. <it>methylKit </it>can also further visualize
similarities between all pairs of samples by creating scatterplots of the %methylation scores
(Figure <figr fid="F3">3</figr>). These functions are essential for detecting sample outliers or for
functional clustering of samples based on their molecular signatures.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Scatter plots for sample pairs</p></caption><text>
   <p><b>Scatter plots for sample pairs</b>. Scatter plots of %methylation values for each pair in
seven breast cancer cell lines. Numbers on upper right corner denote pair-wise Pearson's correlation
scores. The histograms on the diagonal are %methylation histograms similar to Figure 2a for each
sample.</p>
</text><graphic file="gb-2012-13-10-r87-3" hint_layout="double"/></fig>
</sec>
<sec>
<st>
<p>Hierarchical clustering of samples</p>
</st>
<p><it>methylKit </it>can also be used to cluster samples hierarchically in a variety of ways. The
user can specify the distance metric between samples ('1 - correlation' 'Euclidean', 'maximum',
'manhattan', 'canberra', 'binary' or 'minkowski') as well as the agglomeration method to be used in
the hierarchical clustering algorithm (for example, 'Ward's method', or 'single/complete linkage',
and so on). Results can either be returned as a dendrogram object or a plot. Dendrogram plots will
be color coded based on user defined groupings of samples. For example, we found that most ER+ and
ER- samples clustered together except MDMB231 (Figure <figr fid="F4">4a</figr>). Moreover, the user
may be interested in employing other more model-intensive clustering algorithms to their data. Users
can easily obtain the %methylation data from <it>methylKit </it>object and perform their own
analysis with the multitude of R-packages already available for clustering. An example of such a
procedure (k-means clustering) is shown in Additional file <supplr sid="S4">4</supplr>.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Sample clustering</p></caption><text>
   <p><b>Sample clustering</b>. (<b>a</b>) Hierarchical clustering of seven breast cancer methylation
profiles using 1-Pearson's correlation distance. (b) Principal Component Analysis (PCA) of seven
breast cancer methylation profiles, plot shows principal component 1 and principal component 2 for
each sample. Samples closer to each other in principal component space are similar in their
methylation profiles.</p>
</text><graphic file="gb-2012-13-10-r87-4" hint_layout="double"/></fig>
</sec>
<sec>
<st>
<p>Principal component analysis of samples</p>
</st>
<p><it>methylKit </it>can be used to perform Principal Component Analysis (PCA) on the samples'
%-methylation profiles (see for example <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>). PCA can reduce the high dimensionality of a data set by transforming the large number
of regions to a few principal components. The principal components are ordered so that the first few
retain most of the variation present in the original data and are often used to emphasize grouping
structure in the data. For example, a plot of the first two or three principal components could
potentially reveal a biologically meaningful clustering of the samples. Before the PCA is performed,
a new data matrix is formed, containing the samples and only those cytosines that are covered in all
samples. After PCA, <it>methylKit </it>then returns to the user a 'prcomp' object, which can be used
to extract and plot the principal components. We found that in the breast cancer data set, PCA
reveals a similar clustering to the hierarchical clustering where MDMB231 is an outlier.</p>
</sec>
</sec>
<sec>
<st>
<p>Differential methylation calculation</p>
</st>
<sec>
<st>
<p>Parallelized methods for detecting significant methylation changes</p>
</st>
<p>Differential methylation patterns have been previously described in malignancies <abbrgrp>
<abbr bid="B27">27</abbr>
<abbr bid="B28">28</abbr>
<abbr bid="B29">29</abbr>
</abbrgrp> and can be used to differentiate cancer and normal cells <abbrgrp>
<abbr bid="B30">30</abbr>
</abbrgrp>. In addition, normal human tissues harbor unique DNA methylation profiles <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>. Differential DNA methylation is usually calculated by comparing methylation levels
between multiple conditions, which can reveal important locations of divergent changes between a
test and a control set. We have designed <it>methylKit </it>to implement two main methods for
determining differential methylation across all regions: logistic regression and Fisher's exact
test. However, the data frames in <it>methylKit </it>can easily be used with other statistical tests
and an example is shown in Additional file <supplr sid="S4">4</supplr> (using a moderated t-test,
although we maintain that most natural tests for this kind of data are Fisher's exact and logistic
regression based tests). For our example data set we compared ER+ to ER- samples, with our 'control
group' being the ER- set.</p>
</sec>
<sec>
<st>
<p>Method #1: logistic regression</p>
</st>
<p>In logistic regression, information from each sample is specified (the number of methylated Cs
and number of unmethylated Cs at a given region), and a logistic regression test will be applied to
compare fraction of methylated Cs across the test and the control groups. More specifically, at a
given base/region we model the methylation proportion P<sub>i</sub>, for sample i= 1,...,n (where n
is the number of biological samples) through the logistic regression model:</p>
<p><display-formula id="M1"><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2012-13-10-r87-i1"><m:mrow>
   <m:mtext>log</m:mtext>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">P</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">i</m:mtext>
               </m:mstyle>
            </m:mrow>
         </m:msub>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">/(1&#160;-&#160;</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">P</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">i</m:mtext>
               </m:mstyle>
            </m:mrow>
         </m:msub>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">))&#160;=&#160;</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mi>&#946;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">+</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#946;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">*</m:mo>
         <m:msub>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">T</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">i</m:mtext>
               </m:mstyle>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:mrow>
</m:mrow>
</m:math></display-formula></p>
<p>where T<sub>i </sub>denotes the treatment indicator for sample i, T<sub>i </sub>= 1 if sample i
is in the treatment group and T<sub>i </sub>= 0 if sample i is in control group. The parameter
&#946;<sub>0 </sub>denotes the log odds of the control group and &#946;<sub>1 </sub>the log
oddsratio between the treatment and control group. Therefore, independent tests for all the
bases/regions of interest are against the null hypothesis H<sub>0</sub>: &#946;<sub>1</sub>= 0. If
the null hypothesis is rejected it implies that the logodds (and hence the methylation proportions)
are different between the treatment and the control group and the base/region would subsequently be
classified as a differentially methylated cytosine (DMC) or region (DMR). However, if the null
hypothesis is not rejected it implies no statistically significant difference in methylation between
the two groups. One important consideration in logistic regression is the sample size and in many
biological experiments the number of biological samples in each group can be quite small. However,
it is important to keep in mind that the relevant sample sizes in logistic regression are not merely
the number of biological samples but rather the total read coverages summed over all samples in each
group separately. For our example dataset, we used bases with at least 10 reads coverage for each
biological sample and we advise (at least) the same for other users to improve power to detect
DMCs/DMRs.</p>
<p>In addition, we have designed <it>methylKit </it>such that the logistic regression framework can
be generalized to handle more than two experimental groups or data types. In such a case, the
inclusion of additional treatment indicators is analogous to multiple regression when there are
categorical variables with multiple groups. Additional covariates can be incorporated into model (1)
by adding to the right side of the model:</p>
<p><display-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="gb-2012-13-10-r87-i2"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>&#945;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">*</m:mo>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">Covariat</m:mtext>
   </m:mstyle>
   <m:msub>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">e</m:mtext>
         </m:mstyle>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">1,i</m:mtext>
         </m:mstyle>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:mi>.</m:mi>
   <m:mi>.</m:mi>
   <m:mi>.</m:mi>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#945;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">K</m:mtext>
         </m:mstyle>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">*</m:mo>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">Covariat</m:mtext>
   </m:mstyle>
   <m:msub>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">e</m:mtext>
         </m:mstyle>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">K,i</m:mtext>
         </m:mstyle>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
</display-formula></p>
<p>where Covariate<sub>1,i</sub>, ..., Covariate<sub>K,i </sub>denote K measured covariates
(continuous or categorical) for sample i = 1,...,n and &#945;<sub>1</sub>,..., &#945;<sub>k
</sub>denote the corresponding parameters.</p>
</sec>
<sec>
<st>
<p>Method #2: Fisher's exact test</p>
</st>
<p>The Fisher's exact test compares the fraction of methylated Cs in test and control samples in the
absence of replicates. The main advantage of logistic regression over Fisher's exact test is that it
allows for the inclusion of sample specific covariates (continuous or categorical) and the ability
to adjust for confounding variables. In practice, the number of samples per group will determine
which of the two methods will be used (logistic regression or Fisher's exact test). If there are
multiple samples per group, <it>methylKit </it>will employ the logistic regression test. Otherwise,
when there is one sample per group, Fisher's exact test will be used.</p>
<p>Following the differential methylation test and calculation of <it>P</it>-values, <it>methylKit
</it>will use the sliding linear model (SLIM) method to correct <it>P</it>-values to q-values <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>, which corrects for the problem of multiple hypothesis testing <abbrgrp>
<abbr bid="B32">32</abbr>
<abbr bid="B33">33</abbr>
</abbrgrp>. However, we also implemented the standard false discovery rate (FDR)-based method
(Benjamini-Hochberg) as an option for <it>P</it>-value correction, which is faster but more
conservative. Finally, <it>methylKit </it>can use multi-threading so that differential methylation
calculations can be parallelized over multiple cores and be completed faster.</p>
</sec>
<sec>
<st>
<p>Extraction and visualization of differential methylation events</p>
</st>
<p>We have designed <it>methylKit </it>to allow a user to specify the parameters that define the
DMCs/DMRs based on: q-value, %methylation difference, and type of differential methylation
(hypo-/hyper-). By default, it will extract bases/regions with a q-value &lt;0.01 and %methylation
difference &gt;25%. These defaults can easily be changed when calling <it>get.methylDiff()
</it>function. In addition, users can specify if they want hyper-methylated bases/regions
(bases/regions with higher methylation compared to control samples) or hypo-methylated bases/regions
(bases/regions with lower methylation compared to control samples). In the literature, hyper- or
hypo-methylated DMCs/DMRs are usually defined relative to a control group. In our examples, and in
<it>methylKit </it>in general, a control group is defined when creating the objects through supplied
treatment vector, and hyper-/hypomethylation definitions are based on that control group.</p>
<p>Furthermore, DMCs/DMRs can be visualized as horizontal barplots showing percentage of hyper- and
hypo-methylated bases/regions out of covered cytosines over all chromosomes (Figure <figr fid="F5">5a</figr>). We observed higher levels of hypomethylation than hypermethylation in the breast cancer
cell lines, which indicates that ER+ cells have lower levels of methylation. Since another common
way to visualize differential methylation events is with a genome browser, <it>methylKit </it>can
output bedgraph tracks (Figure <figr fid="F5">5b</figr>) for use with the UCSC Genome Browser or
Integrated Genome Viewer.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Visualizing differential methylation events</p></caption><text>
   <p><b>Visualizing differential methylation events</b>. (<b>a</b>) Horizontal bar plots show the
number of hyper- and hypomethylation events per chromosome, as a percent of the sites with the
minimum coverage and differential. By default this is a 25% change in methylation and all samples
with 10X coverage. (<b>b</b>) Example of bedgraph file uploaded to UCSC browser. The bedraph file is
for differentially methylated CpGs with at least a 25% difference and q-value &lt;0.01. Hyper- and
hypo-methylated bases are color coded. The bar heights correspond to % methylation difference
between ER+ and ER- sets. ER+, estrogen receptor-alpha expressing; ER-, estrogen receptor-alpha
non-expressing. UCSC, University of California Santa Cruz.</p>
</text><graphic file="gb-2012-13-10-r87-5" hint_layout="double"/></fig>
</sec>
</sec>
<sec>
<st>
<p>Annotating differential methylation events</p>
</st>
<sec>
<st>
<p>Annotation with gene models and CpG islands</p>
</st>
<p>To discern the biological impact of differential methylation events, each event must be put into
its genomic context for subsequent analysis. Indeed, Hansen <it>et al</it>. <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp> showed that most variable regions in terms of methylation in the human genome are CpG
island shores, rather than CpG islands themselves. Thus, it is interesting to know the location of
differential methylation events with regard to CpG islands, their shores, and also the proximity to
the nearest transcription start site (TSS) and gene components. Accordingly, <it>methylKit </it>can
annotate differential methylation events with regard to the nearest TSS (Figure <figr fid="F6">6a</figr>) and it also can annotate regions based on their overlap with CpG islands/shores and
regions within genes (Figures <figr fid="F6">6b</figr> and <figr fid="F6">6c</figr> are output from
methylKit).</p>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>Annotation of differentially methylated CpGs</p></caption><text>
   <p><b>Annotation of differentially methylated CpGs</b>. (<b>a</b>) Distance to TSS for
differentially methylated CpGs are plotted from ER+ versus ER- analysis. (<b>b</b>) Pie chart
showing percentages of differentially methylated CpGs on promoters, exons, introns and intergenic
regions. (<b>c</b>) Pie chart showing percentages of differentially methylated CpGs on CpG islands,
CpG island shores (defined as 2kb flanks of CpG islands) and other regions outside of shores and CpG
islands. (<b>d</b>) Pie chart showing percentages of differentially methylated CpGs on enhancers and
other regions. ER+, estrogen receptor-alpha expressing; ER-, estrogen receptor-alpha non-expressing,
TSS, transcription start site.</p>
</text><graphic file="gb-2012-13-10-r87-6" hint_layout="double"/></fig>
</sec>
<sec>
<st>
<p>Annotation with custom regions</p>
</st>
<p>As with most genome-wide assays, the regions of interest for DNA methylation analysis may be
quite numerous. For example, several reports show that Alu elements are aberrantly methylated in
cancers <abbrgrp>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp> and enhancers are also differentially methylated <abbrgrp>
<abbr bid="B37">37</abbr>
<abbr bid="B38">38</abbr>
</abbrgrp>. Since users may need to focus on specific genomic regions and require customized
annotation for capturing differential DNA methylation events, <it>methylKit </it>can annotate
differential methylation events using user-supplied regions. As an example, we identified
differentially methylated bases of ER+ and ER- cells that overlap with ENCODE enhancer regions <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>, and we found a large proportion of differentially methylated CpGs overlapping with the
enhancer marks, and then plotted them with <it>methylKit </it>(Figure <figr fid="F6">6d</figr>).</p>
</sec>
</sec>
<sec>
<st>
<p>Analyzing 5-hydroxymethylcytosine data with methylKit</p>
</st>
<p>5-Hydroxymethylcytosine is a base modification associated with pluropotency, hematopoiesis and
certain brain tissues (reviewed in <abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp>). It is possible to measure base-pair resolution 5hmC levels using variations of
traditional bisulfite sequencing. Recently, Yu <it>et al</it>. <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp> and Booth <it>et al</it>. <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp> published similar methods for detecting 5hmC levels in base-pair resolution. Both methods
require measuring 5hmC and 5mC levels simultaneously and use 5hmC levels as a substrate to deduce
real 5mC levels, since traditional bisulfite sequencing cannot distinguish between the two <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp>. However, both the 5hmC and 5mC data generated by these protocols are bisulfite
sequencing based, and the alignments and text files of 5hmC levels can be used directly in
<it>methylKit</it>. Furthermore, <it>methylKit </it>has an <it>adjust.methylC() </it>function to
adjust 5mC levels based on 5hmC levels as described in Booth <it>et al</it>. <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Customizing analysis with convenience functions</p>
</st>
<p><it>methylKit </it>is dependent on Bioconductor <abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp> packages such as <it>GenomicRanges </it>and its objects are coercible to
<it>GenomicRanges </it>objects and regular R data structures such as data frames via provided
convenience functions. That means users can integrate <it>methylKit </it>objects to other
Bioconductor and R packages and customize the analysis according to their needs or extend the
analysis further by using other packages available in R.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Methods for detecting methylation across the genome are widely used in research laboratories, and
they are also a substantial component of the National Institutes of Health's (NIH's) EpiGenome
roadmap and upcoming projects such as BLUEPRINT <abbrgrp>
<abbr bid="B44">44</abbr>
</abbrgrp>. Thus, tools and techniques that enable researchers to process and utilize genome-wide
methylation data in an easy and fast manner will be of critical utility.</p>
<p>Here, we show a large set of tools and cross-sample analysis algorithms built into
<it>methylKit</it>, our open-source, multi-threaded R package that can be used for any base-level
dataset of DNA methylation or base modifications, including 5hmC. We demonstrate its utility with
breast cancer RRBS samples, provide test data sets, and also provide extensive documentation with
the release.</p>
</sec>
<sec>
<st>
<p>Abbreviations</p>
</st>
<p>5hmC: 5-hydroxymethylcytosine; 5mC: 5-methylcytosine; bp: base pair; BS-seq,:bisulfite
sequencing; DMC: differentially methylated cytosine; DMR: differentially methylated region; ER:
estrogen receptor alpha; FDR: false discovery rate; PCA: principal component analysis; PCR:
polymerase chain reaction; RRBS: reduced representation bisulfite sequencing; SLIM: sliding linear
model; TSS: transcription start site.</p>
</sec>
<sec>
<st>
<p>Competing interests</p>
</st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>AA designed <it>methylKit</it>, developed the first codebase, and added most features.  
MK designed the logistic regression based statistical test for methylKit and worked on
statistical modeling and initial clustering features. SL wrote some of the features in <it>methylKit
</it>and prepared plots for the manuscript. MEF, FGB and AM tested the code and provided initial
data for development of <it>methylKit</it>. CEM supervised the work, tested code, and coordinated
test data for validation. All authors have read and approved the manuscript for publication.</p>
</sec>
</bdy>
<bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>We wish to acknowledge the invaluable contribution of the WCMC Epigenomics Core Facility. MEF is
supported by the Leukemia &amp; Lymphoma Society Special Fellow Award and a Doris Duke Clinical
Scientist Development Award. FGB is supported by a Sass Foundation Judah Folkman Fellowship. AM is
supported by an LLS SCOR grant (7132-08) and a Burroughs Wellcome Clinical Translational Scientist
Award. AM and CEM are supported by a Starr Cancer Consortium grant (I4-A442). CEM is supported by
the National Institutes of Health (I4-A411, I4-A442, and 1R01NS076465-01).</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>CpG islands and the regulation of transcription.</p></title><aug><au><snm>Deaton</snm><fnm>AM</fnm></au><au><snm>Bird</snm><fnm>A</fnm></au></aug><source>Genes Dev</source><pubdate>2011</pubdate><volume>25</volume><fpage>1010</fpage><lpage>2210</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gad.2037511</pubid><pubid idtype="pmcid">3093116</pubid><pubid idtype="pmpid" link="fulltext">21576262</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>DNA methylation landscapes: provocative insights from epigenomics.</p></title><aug><au><snm>Suzuki</snm><fnm>MM</fnm></au><au><snm>Bird</snm><fnm>A</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2008</pubdate><volume>9</volume><fpage>465</fpage><lpage>476</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">18463664</pubid></xrefbib></bibl><bibl id="B3"><title><p>Human DNA methylomes at base resolution show widespread epigenomic differences.</p></title><aug><au><snm>Lister</snm><fnm>R</fnm></au><au><snm>Pelizzola</snm><fnm>M</fnm></au><au><snm>Dowen</snm><fnm>RH</fnm></au><au><snm>Hawkins</snm><fnm>RD</fnm></au><au><snm>Hon</snm><fnm>G</fnm></au><au><snm>Tonti-Filippini</snm><fnm>J</fnm></au><au><snm>Nery</snm><fnm>JR</fnm></au><au><snm>Lee</snm><fnm>L</fnm></au><au><snm>Ye</snm><fnm>Z</fnm></au><au><snm>Ngo</snm><fnm>Q-M</fnm></au><au><snm>Edsall</snm><fnm>L</fnm></au><au><snm>Antosiewicz-Bourget</snm><fnm>J</fnm></au><au><snm>Stewart</snm><fnm>R</fnm></au><au><snm>Ruotti</snm><fnm>V</fnm></au><au><snm>Millar</snm><fnm>AH</fnm></au><au><snm>Thomson</snm><fnm>JA</fnm></au><au><snm>Ren</snm><fnm>B</fnm></au><au><snm>Ecker</snm><fnm>JR</fnm></au></aug><source>Nature</source><pubdate>2009</pubdate><volume>462</volume><fpage>315</fpage><lpage>322</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature08514</pubid><pubid idtype="pmcid">2857523</pubid><pubid idtype="pmpid" link="fulltext">19829295</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Methylation-induced repression--belts, braces, and chromatin.</p></title><aug><au><snm>Bird</snm><fnm>AP</fnm></au><au><snm>Wolffe</snm><fnm>AP</fnm></au></aug><source>Cell</source><pubdate>1999</pubdate><volume>99</volume><fpage>451</fpage><lpage>454</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0092-8674(00)81532-9</pubid><pubid idtype="pmpid" link="fulltext">10589672</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Identification and characterization of a family of mammalian methyl-CpG binding proteins.</p></title><aug><au><snm>Hendrich</snm><fnm>B</fnm></au><au><snm>Bird</snm><fnm>A</fnm></au></aug><source>Mol Cell Biol</source><pubdate>1998</pubdate><volume>18</volume><fpage>6538</fpage><lpage>6547</lpage><xrefbib><pubidlist><pubid idtype="pmcid">109239</pubid><pubid idtype="pmpid" link="fulltext">9774669</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function,
and impair hematopoietic differentiation.</p></title><aug><au><snm>Figueroa</snm><fnm>ME</fnm></au><au><snm>Abdel-Wahab</snm><fnm>O</fnm></au><au><snm>Lu</snm><fnm>C</fnm></au><au><snm>Ward</snm><fnm>PS</fnm></au><au><snm>Patel</snm><fnm>J</fnm></au><au><snm>Shih</snm><fnm>A</fnm></au><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Bhagwat</snm><fnm>N</fnm></au><au><snm>Vasanthakumar</snm><fnm>A</fnm></au><au><snm>Fernandez</snm><fnm>HF</fnm></au><au><snm>Tallman</snm><fnm>MS</fnm></au><au><snm>Sun</snm><fnm>Z</fnm></au><au><snm>Wolniak</snm><fnm>K</fnm></au><au><snm>Peeters</snm><fnm>JK</fnm></au><au><snm>Liu</snm><fnm>W</fnm></au><au><snm>Choe</snm><fnm>SE</fnm></au><au><snm>Fantin</snm><fnm>VR</fnm></au><au><snm>Paietta</snm><fnm>E</fnm></au><au><snm>L&#246;wenberg</snm><fnm>B</fnm></au><au><snm>Licht</snm><fnm>JD</fnm></au><au><snm>Godley</snm><fnm>LA</fnm></au><au><snm>Delwel</snm><fnm>R</fnm></au><au><snm>Valk</snm><fnm>PJM</fnm></au><au><snm>Thompson</snm><fnm>CB</fnm></au><au><snm>Levine</snm><fnm>RL</fnm></au><au><snm>Melnick</snm><fnm>A</fnm></au></aug><source>Cancer Cell</source><pubdate>2010</pubdate><volume>18</volume><fpage>553</fpage><lpage>567</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.ccr.2010.11.015</pubid><pubid idtype="pmpid" link="fulltext">21130701</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>A DNA methylation fingerprint of 1628 human samples.</p></title><aug><au><snm>Fernandez</snm><fnm>AF</fnm></au><au><snm>Assenov</snm><fnm>Y</fnm></au><au><snm>Martin-Subero</snm><fnm>JI</fnm></au><au><snm>Balint</snm><fnm>B</fnm></au><au><snm>Siebert</snm><fnm>R</fnm></au><au><snm>Taniguchi</snm><fnm>H</fnm></au><au><snm>Yamamoto</snm><fnm>H</fnm></au><au><snm>Hidalgo</snm><fnm>M</fnm></au><au><snm>Tan</snm><fnm>A-C</fnm></au><au><snm>Galm</snm><fnm>O</fnm></au><au><snm>Ferrer</snm><fnm>I</fnm></au><au><snm>Sanchez-Cespedes</snm><fnm>M</fnm></au><au><snm>Villanueva</snm><fnm>A</fnm></au><au><snm>Carmona</snm><fnm>J</fnm></au><au><snm>Sanchez-Mut</snm><fnm>JV</fnm></au><au><snm>Berdasco</snm><fnm>M</fnm></au><au><snm>Moreno</snm><fnm>V</fnm></au><au><snm>Capella</snm><fnm>G</fnm></au><au><snm>Monk</snm><fnm>D</fnm></au><au><snm>Ballestar</snm><fnm>E</fnm></au><au><snm>Ropero</snm><fnm>S</fnm></au><au><snm>Martinez</snm><fnm>R</fnm></au><au><snm>Sanchez-Carbayo</snm><fnm>M</fnm></au><au><snm>Prosper</snm><fnm>F</fnm></au><au><snm>Agirre</snm><fnm>X</fnm></au><au><snm>Fraga</snm><fnm>MF</fnm></au><au><snm>Gra&#241;a</snm><fnm>O</fnm></au><au><snm>Perez-Jurado</snm><fnm>L</fnm></au><au><snm>Mora</snm><fnm>J</fnm></au><au><snm>Puig</snm><fnm>S</fnm></au><etal/></aug><source>Genome Res</source><pubdate>2012</pubdate><volume>22</volume><fpage>407</fpage><lpage>419</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.119867.110</pubid><pubid idtype="pmcid">3266047</pubid><pubid idtype="pmpid" link="fulltext">21613409</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Role for DNA methylation in genomic imprinting</p></title><aug><au><snm>Li</snm><fnm>E</fnm></au><au><snm>Beard</snm><fnm>C</fnm></au></aug><source>Nature</source><pubdate>1993</pubdate><volume>366</volume><fpage>362</fpage><lpage>365</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/366362a0</pubid><pubid idtype="pmpid" link="fulltext">8247133</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>A unique regulatory phase of DNA methylation in the early mammalian embryo</p></title><aug><au><snm>Smith</snm><fnm>ZD</fnm></au><au><snm>Chan</snm><fnm>MM</fnm></au><au><snm>Mikkelsen</snm><fnm>TS</fnm></au><au><snm>Gu</snm><fnm>H</fnm></au><au><snm>Gnirke</snm><fnm>A</fnm></au><au><snm>Regev</snm><fnm>A</fnm></au><au><snm>Meissner</snm><fnm>A</fnm></au></aug><source>Nature</source><pubdate>2012</pubdate><volume>484</volume><fpage>339</fpage><lpage>344</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature10960</pubid><pubid idtype="pmpid" link="fulltext">22456710</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Genome-scale DNA methylation maps of pluripotent and differentiated cells</p></title><aug><au><snm>Meissner</snm><fnm>A</fnm></au><au><snm>Mikkelsen</snm><fnm>TS</fnm></au><au><snm>Gu</snm><fnm>H</fnm></au><au><snm>Wernig</snm><fnm>M</fnm></au><au><snm>Hanna</snm><fnm>J</fnm></au><au><snm>Sivachenko</snm><fnm>A</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Bernstein</snm><fnm>BE</fnm></au><au><snm>Nusbaum</snm><fnm>C</fnm></au><au><snm>Jaffe</snm><fnm>DB</fnm></au><au><snm>Gnirke</snm><fnm>A</fnm></au><au><snm>Jaenisch</snm><fnm>R</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au></aug><source>Nature</source><pubdate>2008</pubdate><volume>454</volume><fpage>766</fpage><lpage>770</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2896277</pubid><pubid idtype="pmpid" link="fulltext">18600261</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Base-pair resolution DNA methylation sequencing reveals profoundly divergent epigenetic
landscapes in acute myeloid leukemia</p></title><aug><au><snm>Akalin</snm><fnm>A</fnm></au><au><snm>Garrett-Bakelman</snm><fnm>FE</fnm></au><au><snm>Kormaksson</snm><fnm>M</fnm></au><au><snm>Busuttil</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>Khrebtukova</snm><fnm>I</fnm></au><au><snm>Milne</snm><fnm>TA</fnm></au><au><snm>Huang</snm><fnm>Y</fnm></au><au><snm>Biswas</snm><fnm>D</fnm></au><au><snm>Hess</snm><fnm>JL</fnm></au><au><snm>Allis</snm><fnm>D</fnm></au><au><snm>Roeder</snm><fnm>RG</fnm></au><au><snm>Valk</snm><fnm>PJM</fnm></au><au><snm>Lo</snm><fnm>B</fnm></au><au><snm>Paietta</snm><fnm>E</fnm></au><au><snm>Tallman</snm><fnm>MS</fnm></au><au><snm>Schroth</snm><fnm>GP</fnm></au><au><snm>Mason</snm><fnm>CE</fnm></au><au><snm>Melnick</snm><fnm>A</fnm></au><au><snm>Figueroa</snm><fnm>ME</fnm></au></aug><source>PLoS Genet</source><pubdate>2012</pubdate><volume>8</volume><fpage>e1002781</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.1002781</pubid><pubid idtype="pmcid">3380828</pubid><pubid idtype="pmpid" link="fulltext">22737091</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning.</p></title><aug><au><snm>Cokus</snm><fnm>SJ</fnm></au><au><snm>Feng</snm><fnm>S</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Chen</snm><fnm>Z</fnm></au><au><snm>Merriman</snm><fnm>B</fnm></au><au><snm>Haudenschild</snm><fnm>CD</fnm></au><au><snm>Pradhan</snm><fnm>S</fnm></au><au><snm>Nelson</snm><fnm>SF</fnm></au><au><snm>Pellegrini</snm><fnm>M</fnm></au><au><snm>Jacobsen</snm><fnm>SE</fnm></au></aug><source>Nature</source><pubdate>2008</pubdate><volume>452</volume><fpage>215</fpage><lpage>219</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature06745</pubid><pubid idtype="pmcid">2377394</pubid><pubid idtype="pmpid" link="fulltext">18278030</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Highly integrated single-base resolution maps of the epigenome in Arabidopsis.</p></title><aug><au><snm>Lister</snm><fnm>R</fnm></au><au><snm>O'Malley</snm><fnm>RC</fnm></au><au><snm>Tonti-Filippini</snm><fnm>J</fnm></au><au><snm>Gregory</snm><fnm>BD</fnm></au><au><snm>Berry</snm><fnm>CC</fnm></au><au><snm>Millar</snm><fnm>AH</fnm></au><au><snm>Ecker</snm><fnm>JR</fnm></au></aug><source>Cell</source><pubdate>2008</pubdate><volume>133</volume><fpage>523</fpage><lpage>536</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2008.03.029</pubid><pubid idtype="pmcid">2723732</pubid><pubid idtype="pmpid" link="fulltext">18423832</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells.</p></title><aug><au><snm>Ball</snm><fnm>MP</fnm></au><au><snm>Li</snm><fnm>JB</fnm></au><au><snm>Gao</snm><fnm>Y</fnm></au><au><snm>Lee</snm><fnm>J-H</fnm></au><au><snm>LeProust</snm><fnm>EM</fnm></au><au><snm>Park</snm><fnm>I-H</fnm></au><au><snm>Xie</snm><fnm>B</fnm></au><au><snm>Daley</snm><fnm>GQ</fnm></au><au><snm>Church</snm><fnm>GM</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2009</pubdate><volume>27</volume><fpage>361</fpage><lpage>368</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.1533</pubid><pubid idtype="pmpid" link="fulltext">19329998</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base
resolution</p></title><aug><au><snm>Booth</snm><fnm>MJ</fnm></au><au><snm>Branco</snm><fnm>MR</fnm></au><au><snm>Ficz</snm><fnm>G</fnm></au><au><snm>Oxley</snm><fnm>D</fnm></au><au><snm>Krueger</snm><fnm>F</fnm></au><au><snm>Reik</snm><fnm>W</fnm></au><au><snm>Balasubramanian</snm><fnm>S</fnm></au></aug><source>Science</source><pubdate>2012</pubdate><volume>336</volume><fpage>934</fpage><lpage>937</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1220671</pubid><pubid idtype="pmpid" link="fulltext">22539555</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>methylKit</p></title><url>http://code.google.com/p/methylkit</url></bibl><bibl id="B17"><title><p>Direct detection of DNA methylation during single-molecule, real-time sequencing.</p></title><aug><au><snm>Flusberg</snm><fnm>BA</fnm></au><au><snm>Webster</snm><fnm>DR</fnm></au><au><snm>Lee</snm><fnm>JH</fnm></au><au><snm>Travers</snm><fnm>KJ</fnm></au><au><snm>Olivares</snm><fnm>EC</fnm></au><au><snm>Clark</snm><fnm>TA</fnm></au><au><snm>Korlach</snm><fnm>J</fnm></au><au><snm>Turner</snm><fnm>SW</fnm></au></aug><source>Nat Methods</source><pubdate>2010</pubdate><volume>7</volume><fpage>461</fpage><lpage>465</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.1459</pubid><pubid idtype="pmcid">2879396</pubid><pubid idtype="pmpid" link="fulltext">20453866</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Automated forward and reverse ratcheting of DNA in a nanopore at 5-&#197; precision.</p></title><aug><au><snm>Cherf</snm><fnm>GM</fnm></au><au><snm>Lieberman</snm><fnm>KR</fnm></au><au><snm>Rashid</snm><fnm>H</fnm></au><au><snm>Lam</snm><fnm>CE</fnm></au><au><snm>Karplus</snm><fnm>K</fnm></au><au><snm>Akeson</snm><fnm>M</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2012</pubdate><volume>30</volume><fpage>344</fpage><lpage>348</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.2147</pubid><pubid idtype="pmcid">3408072</pubid><pubid idtype="pmpid" link="fulltext">22334048</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>A mostly traditional approach improves alignment of bisulfite-converted DNA.</p></title><aug><au><snm>Frith</snm><fnm>MC</fnm></au><au><snm>Mori</snm><fnm>R</fnm></au><au><snm>Asai</snm><fnm>K</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2012</pubdate><volume>40</volume><fpage>e100</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gks275</pubid><pubid idtype="pmcid">3401460</pubid><pubid idtype="pmpid" link="fulltext">22457070</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>DNA methylome analysis using short bisulfite sequencing data</p></title><aug><au><snm>Krueger</snm><fnm>F</fnm></au><au><snm>Kreck</snm><fnm>B</fnm></au><au><snm>Franke</snm><fnm>A</fnm></au><au><snm>Andrews</snm><fnm>SR</fnm></au></aug><source>Nat Methods</source><pubdate>2012</pubdate><volume>9</volume><fpage>145</fpage><lpage>151</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth.1828</pubid><pubid idtype="pmpid" link="fulltext">22290186</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>The Sequence Alignment/Map format and SAMtools.</p></title><aug><au><snm>Li</snm><fnm>H</fnm></au><au><snm>Handsaker</snm><fnm>B</fnm></au><au><snm>Wysoker</snm><fnm>A</fnm></au><au><snm>Fennell</snm><fnm>T</fnm></au><au><snm>Ruan</snm><fnm>J</fnm></au><au><snm>Homer</snm><fnm>N</fnm></au><au><snm>Marth</snm><fnm>G</fnm></au><au><snm>Abecasis</snm><fnm>G</fnm></au><au><snm>Durbin</snm><fnm>R</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><fpage>2078</fpage><lpage>2079</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp352</pubid><pubid idtype="pmcid">2723002</pubid><pubid idtype="pmpid" link="fulltext">19505943</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications.</p></title><aug><au><snm>Krueger</snm><fnm>F</fnm></au><au><snm>Andrews</snm><fnm>SR</fnm></au></aug><source>Bioinformatics</source><pubdate>2011</pubdate><volume>27</volume><fpage>1571</fpage><lpage>1572</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btr167</pubid><pubid idtype="pmcid">3102221</pubid><pubid idtype="pmpid" link="fulltext">21493656</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast
cancer cells by deep sequencing.</p></title><aug><au><snm>Sun</snm><fnm>Z</fnm></au><au><snm>Asmann</snm><fnm>YW</fnm></au><au><snm>Kalari</snm><fnm>KR</fnm></au><au><snm>Bot</snm><fnm>B</fnm></au><au><snm>Eckel-Passow</snm><fnm>JE</fnm></au><au><snm>Baker</snm><fnm>TR</fnm></au><au><snm>Carr</snm><fnm>JM</fnm></au><au><snm>Khrebtukova</snm><fnm>I</fnm></au><au><snm>Luo</snm><fnm>S</fnm></au><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>Schroth</snm><fnm>GP</fnm></au><au><snm>Perez</snm><fnm>EA</fnm></au><au><snm>Thompson</snm><fnm>EA</fnm></au></aug><source>PloS One</source><pubdate>2011</pubdate><volume>6</volume><fpage>e17490</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0017490</pubid><pubid idtype="pmcid">3045451</pubid><pubid idtype="pmpid" link="fulltext">21364760</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Gene expression profiling predicts clinical outcome of breast cancer.</p></title><aug><au><snm>van 't Veer</snm><fnm>LJ</fnm></au><au><snm>Dai</snm><fnm>H</fnm></au><au><snm>van de Vijver</snm><fnm>MJ</fnm></au><au><snm>He</snm><fnm>YD</fnm></au><au><snm>Hart</snm><fnm>AAM</fnm></au><au><snm>Mao</snm><fnm>M</fnm></au><au><snm>Peterse</snm><fnm>HL</fnm></au><au><snm>van der Kooy</snm><fnm>K</fnm></au><au><snm>Marton</snm><fnm>MJ</fnm></au><au><snm>Witteveen</snm><fnm>AT</fnm></au><au><snm>Schreiber</snm><fnm>GJ</fnm></au><au><snm>Kerkhoven</snm><fnm>RM</fnm></au><au><snm>Roberts</snm><fnm>C</fnm></au><au><snm>Linsley</snm><fnm>PS</fnm></au><au><snm>Bernards</snm><fnm>R</fnm></au><au><snm>Friend</snm><fnm>SH</fnm></au></aug><source>Nature</source><pubdate>2002</pubdate><volume>415</volume><fpage>530</fpage><lpage>536</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/415530a</pubid><pubid idtype="pmpid" link="fulltext">11823860</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Breast cancer classification and prognosis based on gene expression profiles from a
population-based study.</p></title><aug><au><snm>Sotiriou</snm><fnm>C</fnm></au><au><snm>Neo</snm><fnm>S-Y</fnm></au><au><snm>McShane</snm><fnm>LM</fnm></au><au><snm>Korn</snm><fnm>EL</fnm></au><au><snm>Long</snm><fnm>PM</fnm></au><au><snm>Jazaeri</snm><fnm>A</fnm></au><au><snm>Martiat</snm><fnm>P</fnm></au><au><snm>Fox</snm><fnm>SB</fnm></au><au><snm>Harris</snm><fnm>AL</fnm></au><au><snm>Liu</snm><fnm>ET</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2003</pubdate><volume>100</volume><fpage>10393</fpage><lpage>10398</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.1732912100</pubid><pubid idtype="pmcid">193572</pubid><pubid idtype="pmpid" link="fulltext">12917485</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><aug><au><snm>Joliffe</snm><fnm>I</fnm></au></aug><source>Principal Component Analysis</source><publisher>New York, USA, Springer</publisher><edition>2</edition><pubdate>2002</pubdate></bibl><bibl id="B27"><title><p>A gene hypermethylation profile of human cancer.</p></title><aug><au><snm>Esteller</snm><fnm>M</fnm></au><au><snm>Corn</snm><fnm>PG</fnm></au><au><snm>Baylin</snm><fnm>SB</fnm></au><au><snm>Herman</snm><fnm>JG</fnm></au></aug><source>Cancer Res</source><pubdate>2001</pubdate><volume>61</volume><fpage>3225</fpage><lpage>3229</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">11309270</pubid></xrefbib></bibl><bibl id="B28"><title><p>DNA hypermethylation in tumorigenesis: epigenetics joins genetics.</p></title><aug><au><snm>Baylin</snm><fnm>SB</fnm></au><au><snm>Herman</snm><fnm>JG</fnm></au></aug><source>Trends Genet</source><pubdate>2000</pubdate><volume>16</volume><fpage>168</fpage><lpage>174</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0168-9525(99)01971-X</pubid><pubid idtype="pmpid" link="fulltext">10729832</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Aberrant CpG-island methylation has non-random and tumour-type-specific patterns.</p></title><aug><au><snm>Costello</snm><fnm>JF</fnm></au><au><snm>Fr&#252;hwald</snm><fnm>MC</fnm></au><au><snm>Smiraglia</snm><fnm>DJ</fnm></au><au><snm>Rush</snm><fnm>LJ</fnm></au><au><snm>Robertson</snm><fnm>GP</fnm></au><au><snm>Gao</snm><fnm>X</fnm></au><au><snm>Wright</snm><fnm>FA</fnm></au><au><snm>Feramisco</snm><fnm>JD</fnm></au><au><snm>Peltom&#228;ki</snm><fnm>P</fnm></au><au><snm>Lang</snm><fnm>JC</fnm></au><au><snm>Schuller</snm><fnm>DE</fnm></au><au><snm>Yu</snm><fnm>L</fnm></au><au><snm>Bloomfield</snm><fnm>CD</fnm></au><au><snm>Caligiuri</snm><fnm>MA</fnm></au><au><snm>Yates</snm><fnm>A</fnm></au><au><snm>Nishikawa</snm><fnm>R</fnm></au><au><snm>Su Huang</snm><fnm>H</fnm></au><au><snm>Petrelli</snm><fnm>NJ</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>O'Dorisio</snm><fnm>MS</fnm></au><au><snm>Held</snm><fnm>WA</fnm></au><au><snm>Cavenee</snm><fnm>WK</fnm></au><au><snm>Plass</snm><fnm>C</fnm></au></aug><source>Nat Genet</source><pubdate>2000</pubdate><volume>24</volume><fpage>132</fpage><lpage>138</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/72785</pubid><pubid idtype="pmpid" link="fulltext">10655057</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human
induced pluripotent stem cells, embryonic stem cells and fibroblasts.</p></title><aug><au><snm>Doi</snm><fnm>A</fnm></au><au><snm>Park</snm><fnm>I-H</fnm></au><au><snm>Wen</snm><fnm>B</fnm></au><au><snm>Murakami</snm><fnm>P</fnm></au><au><snm>Aryee</snm><fnm>MJ</fnm></au><au><snm>Irizarry</snm><fnm>R</fnm></au><au><snm>Herb</snm><fnm>B</fnm></au><au><snm>Ladd-Acosta</snm><fnm>C</fnm></au><au><snm>Rho</snm><fnm>J</fnm></au><au><snm>Loewer</snm><fnm>S</fnm></au><au><snm>Miller</snm><fnm>J</fnm></au><au><snm>Schlaeger</snm><fnm>T</fnm></au><au><snm>Daley</snm><fnm>GQ</fnm></au><au><snm>Feinberg</snm><fnm>AP</fnm></au></aug><source>Nat Genet</source><pubdate>2009</pubdate><volume>41</volume><fpage>1350</fpage><lpage>1353</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.471</pubid><pubid idtype="pmcid">2958040</pubid><pubid idtype="pmpid" link="fulltext">19881528</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets
with dependence structures.</p></title><aug><au><snm>Wang</snm><fnm>H-Q</fnm></au><au><snm>Tuominen</snm><fnm>LK</fnm></au><au><snm>Tsai</snm><fnm>C-J</fnm></au></aug><source>Bioinformatics</source><pubdate>2011</pubdate><volume>27</volume><fpage>225</fpage><lpage>231</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btq650</pubid><pubid idtype="pmpid" link="fulltext">21098430</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>A direct approach to false discovery rates</p></title><aug><au><snm>Storey</snm><fnm>J</fnm></au></aug><source>J R Stat Soc Series B Stat Methodol</source><pubdate>2002</pubdate><volume>64</volume><fpage>479</fpage><lpage>498</lpage><xrefbib><pubid idtype="doi">10.1111/1467-9868.00346</pubid></xrefbib></bibl><bibl id="B33"><title><p>Statistical significance for genomewide studies.</p></title><aug><au><snm>Storey</snm><fnm>JD</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2003</pubdate><volume>100</volume><fpage>9440</fpage><lpage>9445</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.1530509100</pubid><pubid idtype="pmcid">170937</pubid><pubid idtype="pmpid" link="fulltext">12883005</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Increased methylation variation in epigenetic domains across cancer types.</p></title><aug><au><snm>Hansen</snm><fnm>KD</fnm></au><au><snm>Timp</snm><fnm>W</fnm></au><au><snm>Bravo</snm><fnm>HC</fnm></au><au><snm>Sabunciyan</snm><fnm>S</fnm></au><au><snm>Langmead</snm><fnm>B</fnm></au><au><snm>McDonald</snm><fnm>OG</fnm></au><au><snm>Wen</snm><fnm>B</fnm></au><au><snm>Wu</snm><fnm>H</fnm></au><au><snm>Liu</snm><fnm>Y</fnm></au><au><snm>Diep</snm><fnm>D</fnm></au><au><snm>Briem</snm><fnm>E</fnm></au><au><snm>Zhang</snm><fnm>K</fnm></au><au><snm>Irizarry R</snm><fnm>a</fnm></au><au><snm>Feinberg</snm><fnm>AP</fnm></au></aug><source>Nat Genet</source><pubdate>2011</pubdate><volume>43</volume><fpage>768</fpage><lpage>775</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.865</pubid><pubid idtype="pmcid">3145050</pubid><pubid idtype="pmpid" link="fulltext">21706001</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>DNA hypomethylation in cancer cells.</p></title><aug><au><snm>Ehrlich</snm><fnm>M</fnm></au></aug><source>Epigenomics</source><pubdate>2009</pubdate><volume>1</volume><fpage>239</fpage><lpage>259</lpage><xrefbib><pubidlist><pubid idtype="doi">10.2217/epi.09.33</pubid><pubid idtype="pmcid">2873040</pubid><pubid idtype="pmpid" link="fulltext">20495664</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Genome-wide tracking of unmethylated DNA Alu repeats in normal and cancer cells.</p></title><aug><au><snm>Rodriguez</snm><fnm>J</fnm></au><au><snm>Vives</snm><fnm>L</fnm></au><au><snm>Jord&#224;</snm><fnm>M</fnm></au><au><snm>Morales</snm><fnm>C</fnm></au><au><snm>Mu&#241;oz</snm><fnm>M</fnm></au><au><snm>Vendrell</snm><fnm>E</fnm></au><au><snm>Peinado</snm><fnm>MA</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>770</fpage><lpage>784</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2241897</pubid><pubid idtype="pmpid" link="fulltext">18084025</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>DNA-binding factors shape the mouse methylome at distal regulatory regions.</p></title><aug><au><snm>Stadler</snm><fnm>MB</fnm></au><au><snm>Murr</snm><fnm>R</fnm></au><au><snm>Burger</snm><fnm>L</fnm></au><au><snm>Ivanek</snm><fnm>R</fnm></au><au><snm>Lienert</snm><fnm>F</fnm></au><au><snm>Sch&#246;ler</snm><fnm>A</fnm></au><au><snm>Wirbelauer</snm><fnm>C</fnm></au><au><snm>Oakeley</snm><fnm>EJ</fnm></au><au><snm>Gaidatzis</snm><fnm>D</fnm></au><au><snm>Tiwari</snm><fnm>VK</fnm></au><au><snm>Sch&#252;beler</snm><fnm>D</fnm></au></aug><source>Nature</source><pubdate>2011</pubdate><volume>480</volume><fpage>490</fpage><lpage>495</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">22170606</pubid></xrefbib></bibl><bibl id="B38"><title><p>DNA methylation status predicts cell type-specific enhancer activity.</p></title><aug><au><snm>Wiench</snm><fnm>M</fnm></au><au><snm>John</snm><fnm>S</fnm></au><au><snm>Baek</snm><fnm>S</fnm></au><au><snm>Johnson</snm><fnm>TA</fnm></au><au><snm>Sung</snm><fnm>M-H</fnm></au><au><snm>Escobar</snm><fnm>T</fnm></au><au><snm>Simmons</snm><fnm>CA</fnm></au><au><snm>Pearce</snm><fnm>KH</fnm></au><au><snm>Biddie</snm><fnm>SC</fnm></au><au><snm>Sabo</snm><fnm>PJ</fnm></au><au><snm>Thurman</snm><fnm>RE</fnm></au><au><snm>Stamatoyannopoulos</snm><fnm>JA</fnm></au><au><snm>Hager</snm><fnm>GL</fnm></au></aug><source>EMBO J</source><pubdate>2011</pubdate><volume>30</volume><fpage>3028</fpage><lpage>3039</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/emboj.2011.210</pubid><pubid idtype="pmcid">3160184</pubid><pubid idtype="pmpid" link="fulltext">21701563</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>Mapping and analysis of chromatin state dynamics in nine human cell types.</p></title><aug><au><snm>Ernst</snm><fnm>J</fnm></au><au><snm>Kheradpour</snm><fnm>P</fnm></au><au><snm>Mikkelsen</snm><fnm>TS</fnm></au><au><snm>Shoresh</snm><fnm>N</fnm></au><au><snm>Ward</snm><fnm>LD</fnm></au><au><snm>Epstein</snm><fnm>CB</fnm></au><au><snm>Zhang</snm><fnm>X</fnm></au><au><snm>Wang</snm><fnm>L</fnm></au><au><snm>Issner</snm><fnm>R</fnm></au><au><snm>Coyne</snm><fnm>M</fnm></au><au><snm>Ku</snm><fnm>M</fnm></au><au><snm>Durham</snm><fnm>T</fnm></au><au><snm>Kellis</snm><fnm>M</fnm></au><au><snm>Bernstein</snm><fnm>BE</fnm></au></aug><source>Nature</source><pubdate>2011</pubdate><volume>473</volume><fpage>43</fpage><lpage>49</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature09906</pubid><pubid idtype="pmcid">3088773</pubid><pubid idtype="pmpid" link="fulltext">21441907</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Uncovering the role of 5-hydroxymethylcytosine in the epigenome</p></title><aug><au><snm>Branco</snm><fnm>MR</fnm></au><au><snm>Ficz</snm><fnm>G</fnm></au><au><snm>Reik</snm><fnm>W</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2011</pubdate><volume>13</volume><fpage>7</fpage><lpage>13</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">22083101</pubid></xrefbib></bibl><bibl id="B41"><title><p>Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome</p></title><aug><au><snm>Yu</snm><fnm>M</fnm></au><au><snm>Hon</snm><fnm>GC</fnm></au><au><snm>Szulwach</snm><fnm>KE</fnm></au><au><snm>Song</snm><fnm>C-X</fnm></au><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>Kim</snm><fnm>A</fnm></au><au><snm>Li</snm><fnm>X</fnm></au><au><snm>Dai</snm><fnm>Q</fnm></au><au><snm>Shen</snm><fnm>Y</fnm></au><au><snm>Park</snm><fnm>B</fnm></au><au><snm>Min</snm><fnm>J-H</fnm></au><au><snm>Jin</snm><fnm>P</fnm></au><au><snm>Ren</snm><fnm>B</fnm></au><au><snm>He</snm><fnm>C</fnm></au></aug><source>Cell</source><pubdate>2012</pubdate><volume>149</volume><fpage>1368</fpage><lpage>1380</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2012.04.027</pubid><pubid idtype="pmpid" link="fulltext">22608086</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing.</p></title><aug><au><snm>Huang</snm><fnm>Y</fnm></au><au><snm>Pastor</snm><fnm>WA</fnm></au><au><snm>Shen</snm><fnm>Y</fnm></au><au><snm>Tahiliani</snm><fnm>M</fnm></au><au><snm>Liu</snm><fnm>DR</fnm></au><au><snm>Rao</snm><fnm>A</fnm></au></aug><source>PloS One</source><pubdate>2010</pubdate><volume>5</volume><fpage>e8888</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0008888</pubid><pubid idtype="pmcid">2811190</pubid><pubid idtype="pmpid" link="fulltext">20126651</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Bioconductor: open software development for computational biology and bioinformatics.</p></title><aug><au><snm>Gentleman</snm><fnm>RC</fnm></au><au><snm>Carey</snm><fnm>VJ</fnm></au><au><snm>Bates</snm><fnm>DM</fnm></au><au><snm>Bolstad</snm><fnm>B</fnm></au><au><snm>Dettling</snm><fnm>M</fnm></au><au><snm>Dudoit</snm><fnm>S</fnm></au><au><snm>Ellis</snm><fnm>B</fnm></au><au><snm>Gautier</snm><fnm>L</fnm></au><au><snm>Ge</snm><fnm>Y</fnm></au><au><snm>Gentry</snm><fnm>J</fnm></au><au><snm>Hornik</snm><fnm>K</fnm></au><au><snm>Hothorn</snm><fnm>T</fnm></au><au><snm>Huber</snm><fnm>W</fnm></au><au><snm>Iacus</snm><fnm>S</fnm></au><au><snm>Irizarry</snm><fnm>R</fnm></au><au><snm>Leisch</snm><fnm>F</fnm></au><au><snm>Li</snm><fnm>C</fnm></au><au><snm>Maechler</snm><fnm>M</fnm></au><au><snm>Rossini</snm><fnm>AJ</fnm></au><au><snm>Sawitzki</snm><fnm>G</fnm></au><au><snm>Smith</snm><fnm>C</fnm></au><au><snm>Smyth</snm><fnm>G</fnm></au><au><snm>Tierney</snm><fnm>L</fnm></au><au><snm>Yang</snm><fnm>JYH</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au></aug><source>Genome Biol</source><pubdate>2004</pubdate><volume>5</volume><fpage>r80</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2004-5-10-r80</pubid><pubid idtype="pmcid">545600</pubid><pubid idtype="pmpid" link="fulltext">15461798</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>BLUEPRINT to decode the epigenetic signature written in blood</p></title><aug><au><snm>Adams</snm><fnm>D</fnm></au><au><snm>Altucci</snm><fnm>L</fnm></au><au><snm>Antonarakis</snm><fnm>SE</fnm></au><au><snm>Ballesteros</snm><fnm>J</fnm></au><au><snm>Beck</snm><fnm>S</fnm></au><au><snm>Bird</snm><fnm>A</fnm></au><au><snm>Bock</snm><fnm>C</fnm></au><au><snm>Boehm</snm><fnm>B</fnm></au><au><snm>Campo</snm><fnm>E</fnm></au><au><snm>Caricasole</snm><fnm>A</fnm></au><au><snm>Dahl</snm><fnm>F</fnm></au><au><snm>Dermitzakis</snm><fnm>ET</fnm></au><au><snm>Enver</snm><fnm>T</fnm></au><au><snm>Esteller</snm><fnm>M</fnm></au><au><snm>Estivill</snm><fnm>X</fnm></au><au><snm>Ferguson-Smith</snm><fnm>A</fnm></au><au><snm>Fitzgibbon</snm><fnm>J</fnm></au><au><snm>Flicek</snm><fnm>P</fnm></au><au><snm>Giehl</snm><fnm>C</fnm></au><au><snm>Graf</snm><fnm>T</fnm></au><au><snm>Grosveld</snm><fnm>F</fnm></au><au><snm>Guigo</snm><fnm>R</fnm></au><au><snm>Gut</snm><fnm>I</fnm></au><au><snm>Helin</snm><fnm>K</fnm></au><au><snm>Jarvius</snm><fnm>J</fnm></au><au><snm>K&#252;ppers</snm><fnm>R</fnm></au><au><snm>Lehrach</snm><fnm>H</fnm></au><au><snm>Lengauer</snm><fnm>T</fnm></au><au><snm>Lernmark</snm><fnm>&#197;</fnm></au><au><snm>Leslie</snm><fnm>D</fnm></au><etal/></aug><source>Nat Biotechnol</source><pubdate>2012</pubdate><volume>30</volume><fpage>224</fpage><lpage>226</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.2153</pubid><pubid idtype="pmpid" link="fulltext">22398613</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm>
</art>