Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

This article is part of a special issue on exome sequencing.

Open Access Method

Effective detection of rare variants in pooled DNA samples using Cross-pool tailcurve analysis

Tejasvi S Niranjan12, Abby Adamczyk1, H├ęctor Corrada Bravo34, Margaret A Taub5, Sarah J Wheelan56, Rafael Irizarry5 and Tao Wang1*

Author Affiliations

1 McKusick-Nathans Institute of Genetic Medicine and Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA

2 Predoctoral Training Program in Human Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA

3 Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD 20742, USA

4 Current address: Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD 20742, USA

5 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA

6 Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA

For all author emails, please log on.

Genome Biology 2011, 12:R93  doi:10.1186/gb-2011-12-9-r93

Published: 28 September 2011

Additional files

Additional file 1:

Depth of coverage for each amplicon pool derived from first cohort sequencing data. Blue line depicts absolute coverage for plus-strand aligned reads. Green line depicts coverage of minus-strand aligned reads. Scales of X- and Y-axes are identical for all graphs depicted for each exon. Light red line indicates presumptive mismatch rate determined from plus-strand aligned reads. Light orange line indicates presumptive mismatch rate determined from minus-strand aligned reads. Ratio of mismatch rate between plus and minus strands is later incorporate into the tailcurve factor used in filtering by SERVIC4E.

Format: PDF Size: 8.3MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Description of tailcurve (nucleotide proportion at individual cycles along the sequence read). With perfect random fragmentation, a given position and its associated base calls (consensus and variant) should be represented at multiple sequencing cycles. With high coverage, a particular base call will be present for that position at all or most cycles. Example: for a sequencing module of 25 cycles with several hundred (24 shown) overlapping reads covering the highlighted position, all the cycles are represented by 'G', with variant reads producing the 'T' at a handful of cycles (potential variant).

Format: PDF Size: 245KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Diagrammatic output of first three filtering steps using SERVIC4E on first cohort data. Left-hand panel uses Illumina base calls. Right-hand panel uses Srfim base calls. Individual filtering steps progress while moving down each panel. Colored dots incorporate validation data for visualization purposes; blue dots are valid variant pools and red dots are invalid variant pools. Within each panel, the graphs on the left are Average quality versus Weighted allele frequency distributions. X-axis is average Phred quality for each variant-pool. Y-axis is log10 of weighted allele frequency. Histograms on the right depict the frequency of evaluated tailcurve ratios across bins of length = 2.

Format: PDF Size: 896KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Variant call results from first cohort analysis. All positions are given in reference to chromosome 3 of hg19. For each program, a '+' value indicates that a variant call was made by that program for that variant position and pool. Column 'P' indicates the position is in exonic sequence (not intronic). Column 'Valid' indicates validation results for each variant-pool tested; '+' indicates a valid call and '-'indicates an invalid call. Column 'Dist' indicates the position of the variant call in each amplicon.

Format: XLS Size: 119KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Genotyping results for individual first cohort samples. For all samples validated by Sanger sequencing, homozygous wild types are indicated by '-', heterozygotes are indicated by '+', and homozygous mutants are indicated by '++'.

Format: XLS Size: 215KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Variant call output of SERVIC4E on the first cohort using Illumina base calls.

Format: TXT Size: 28KB Download file

Open Data

Additional file 7:

Comparisons of annotated SNPs, transition-transversion ratios, and synonymous-non-synonymous ratios. Calculated metrics for annotation rates, transition-transversion rates, and synonymous-non-synonymous rates for first cohort data only.

Format: XLS Size: 17KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Variant call output of SERVIC4E on the first cohort using Srfim base calls.

Format: TXT Size: 28KB Download file

Open Data

Additional file 9:

Pooling strategy for second cohort samples. Example: Normalized DNA samples from column 12 of plates 1 and 2 as well as samples from plate 3, column 12, rows A, B, C, and D are pooled together to form pool 12. Normalized DNA samples from column 1 of plates 4 and 5 as well as samples from plate 3, column 1, rows E, F, G, and H are pooled together to form pool 13.

Format: PDF Size: 202KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 10:

Effect of strict alignment on coverage from concatenated amplicons. Panel 1 indicates targets for amplification (primers denoted by black half-arrows). Color-coding for each unique target region is retained in all panels. Panel 2 depicts ligation (concatenation) of amplicons. Only two amplicons are depicted; in practice many amplicons ligate together in a row. Darker shaded regions are from primer sequence. Panel 3 depicts random fragmentation to generate 150- to 200-bp segments for sequencing. Panel 4 depicts subsequent strict alignment of short (left) and long (right) reads to genomic reference sequence.

Format: PDF Size: 41KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 11:

Depth of coverage for each amplicon pool derived from second cohort sequencing data. Blue line depicts absolute coverage for plus-strand aligned reads. Green line depicts coverage of minus-strand aligned reads. Scales of X- and Y-axes are identical for all graphs depicted for each exon. Light red line indicates presumptive mismatch rate determined from plus-strand aligned reads. Light orange line indicates presumptive mismatch rate determined from minus-strand aligned reads. Ratio of mismatch rate between plus and minus strands is later incorporated into the tailcurve factor used in filtering by SERVIC4E.

Format: PDF Size: 17.5MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 12:

Variant call results from second cohort analysis. All positions are given in reference to chromosome 3 of hg19. For each program, a '+' value indicates that a variant call was made by that program for that variant position and pool. Column 'P' indicates the position is in exonic sequence (not intronic). Column 'Valid' indicates validation results for each variant-pool tested; '+' indicates a valid call and '-' indicates an invalid call. Column 'Dist' indicates the position of the variant call in each amplicon.

Format: XLS Size: 86KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 13:

Variant call output of SERVIC4E on the second cohort using Illumina base calls.

Format: TXT Size: 26KB Download file

Open Data

Additional file 14:

Genotyping results for individual second cohort samples. For all samples validated by Sanger sequencing, homozygous wild types are indicated by '-', heterozygotes are indicated by '+', and homozygous mutants are indicated by '++'.

Format: XLS Size: 69KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data