|
Resolution: standard / high Figure 3.
Genotyping performance on simulated data sets. (a) Bar charts report Sniper genotyping accuracy based on resequencing of four different
synthetic genomic DNA templates. Sample genomes were generated from each known template
by introducing single nucleotide sequence variation randomly to 0.1%. We simulated
36-nucleotide PE reads from each unknown genome to 50-fold coverage and mapped them
to the respective known genomic template according to a unique (red), best no-guess
(yellow), best guess (green), or total (blue) mapping strategy using k = 1, 2, or 3 mismatches, as shown. Accuracy for each bar was determined using only
those genotype loci identified at or above the specified stringency level Q = -10
log10(1 - P), where P is the posterior probability of the MAP genotype at a single nucleotide locus. Error
bars represent ± standard deviation across five replicate simulations of generating
a new sample genome and simulating reads from this genome. (b) Receiver operating characteristic (ROC) styled plot relating genotyping sensitivity,
TPR = (TP + TPFG)/(TP + TPFG + FN) (where TP = true positive loci, FN = false negative
loci, and TPFG = true position, false genotype loci, to number of false positives
(FPs) for simulation results genotyping the moderate difficulty Yeast 2 × RPL + 5%
template at 50-fold coverage; as in (a), each point represents the average over five
replicates. Points representing stringency levels Q ≥ 40 and Q ≥ 90 are labeled for
clarity. (c) Bar chart reporting the estimated false discovery rate FDR = (FP + TPFG)/(TP + FP
+ TPFG) for genotyping the Yeast 2 × RPL + 5% template at 50-fold coverage using Q
≥ 40 confidence. Error bars represent ± standard deviation over five replicates, as
in (a). See Additional file 14 for a complete description of estimates.
Simola and Kim Genome Biology 2011 12:R55 doi:10.1186/gb-2011-12-6-r55 |