Open Access Highly Accessed Research

Genome-wide identification and characterization of replication origins by deep sequencing

Jia Xu12, Yoshimi Yanagisawa3, Alexander M Tsankov4, Christopher Hart58, Keita Aoki6, Naveen Kommajosyula3, Kathleen E Steinmann59, James Bochicchio4, Carsten Russ4, Aviv Regev47, Oliver J Rando3, Chad Nusbaum4, Hironori Niki6, Patrice Milos5*, Zhiping Weng1* and Nicholas Rhind3*

Author Affiliations

1 Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA

2 Bioinformatics Core Facility, University of Massachusetts Medical School, Worcester, MA 01605, USA

3 Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA

4 Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA

5 Helicos BioSciences Corporation, Cambridge, MA 02139, USA

6 Microbial Genetics Laboratory, Genetic Strains Research Center, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan

7 Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, MD 20815-6789, USA

8 Division of Natural Sciences, New College of Florida, Sarasota, FL 34243, USA

9 Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA

For all author emails, please log on.

Genome Biology 2012, 13:R27  doi:10.1186/gb-2012-13-4-r27

Published: 24 April 2012

Additional files

Additional file 1:

Supplemental Figures S1 to S6. Figure S1: Peak detection by template fitting. Examples of the template-fitting approach to peak detection. The blue curve is the normalized, smoothed data. The blue dot is the peak call before template fitting. The red curve is the template. The red dot is the peak call after template fitting. The numbers are serial numbers assigned to the peaks during the iterative peak-calling process. Peak 106 can be seen at about 4.27 Mb in Figure 1 and Figure S2 in Additional file 1. Figure S2: Comparison of independent replication profiles. Replication profiles from (a) Sp1, Sp2 and Sp3 and (b) Sj1 and Sj2 are compared as in Figure 2. (c) Venn diagrams of peak overlap between the indicated datasets. Most cases of non-overlapping peak calls are due to the peak in one of the datasets being below the cutoff, such as at 4.23 Mb is S. pombe and 1.55 Mb is S. japonicus. Figure S3: The Sap1 binding site. Logos for the Sap1 binding site derived from (a) MEME analysis of S. japonicus origins (Figure 3; Table S6 in Additional file 2) and (b) in vitro selection [54]. Figure S4: Orc2 and Orc4 domain structures. Domain structures of Orc2 and Orc4 as defined by PFAM [55]. Figure S5: Origin motifs are nucleosome depleted are origin and non-origin sites. Nucleosome occupancy over motifs is depicted as in Figure 4, except motifs are divided into those within 1 kb of a replication peak and those farther away. Figure S6: Nucleosome alignment on transcriptional start sites. Nucleosome occupancy over all annotated transcriptional start sites is depicted.

Format: PDF Size: 4.5MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Supplemental Tables S1 to S8. Table S1: dataset analysis parameters. Table S2: validated S. pombe origins. Table S3: S. japonicus ARSs. Table S4: k-mer auROC scores. Table S5: cross-species SVM performance. Table S6: MEME results. Table S7: dataset overlap. Table S8: accession numbers.

Format: XLS Size: 531KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data