Research
Characterizing and measuring bias in sequence data
The Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
Genome Biology 2013, 14:R51 doi:10.1186/gb-2013-14-5-r51
Published: 29 May 2013Additional files
Additional file 1:
The 'bad promoters' list for Human assembly 19 (GRCh 37), as described in the main text and method section, computed from HiSeq v2 data set A2; intervals are annotated with gene names and the coverage ratios used to select them (see Materials and methods for details).
Format: TXT Size: 44KB Download file
Additional file 2:
The 'bad promoters' list for Human assembly 19 (GRCh 37), as described in the main text and method section, computed from HiSeq v3 data set A3; intervals are annotated with gene names and the coverage ratios used to select them (see Materials and methods for details).
Format: TXT Size: 43KB Download file
Additional file 3:
The supplementary tables referred to in the text.
Format: DOCX Size: 89KB Download file
Additional file 4:
Figure S1 - Human error rates as a function of GC composition and reference. Each graph shows mismatch (light blue), deletion (dark blue), and insertion (maroon) rates (y-axis) as a function of GC composition (x-axis). Data are shown for the human NA12878 sample sequenced by Illumina HiSeq (Table 2, data set 14) and Ion Torrent PGM (Table 2, data set 15) aligned both to the standard Human assembly 19 (GRCh37) reference and to the NA12878-specific diploid reference created by the Gerstein lab [37]. Error rates are only plotted for GC percentages for which there are at least 1,000 100-base windows in Human assembly 19.
Format: PDF Size: 144KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 5:
Figure S2 - Human error rates as a function of homopolymer length and reference. Each graph shows mismatch (light blue), deletion (dark blue), and insertion (maroon) rates (y-axis) within homopolymers of various lengths (x-axis). Data are plotted from human sample NA12878 as sequenced by Illumina HiSeq (Table 2, data set 14) and Ion Torrent PGM (Table 2, data set 15) and aligned both to the standard Human assembly 19 (GRCh37) reference and to the NA12878-specific diploid reference created by the Gerstein lab [37].
Format: PDF Size: 63KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 6:
The intervals of the human reference that had less than 0.1 relative coverage in data set 14 and could not be categorized as biological variations or as similar to known bias motifs. Also included are the GC content fraction and homopolymer N50 for each interval.
Format: CSV Size: 1.2MB Download file
Additional file 7:
The SRA numbers for all Illumina, Ion Torrent, and Pacific Biosciences data used in the paper.
Format: XLSX Size: 123KB Download file



