Performance evaluation. (a) The positive predictive value (PPV) and sensitivity of UDT-Seq for each of the five calibration datasets. The error bars represent the standard deviation of the values obtained from different calibration schemes (Table S5 in Additional file 2). (b) Average sensitivity estimation for the calibration SNPs at different prevalence's. The error bars represent the standard deviation over all calibration-tested sample combinations. (c) PPV at increasing prevalence intervals. (d) Expected prevalence of the calibration SNPs (x-axis) are highly correlated (r = 0.97) with the observed prevalence (y-axis) across all calibration samples. The width of the boxes is proportional to the root mean square of the number of SNPs in each category. The whiskers extend to the closest data point within 1.5-fold of the inter-quartile distance. On average, the mutations expected at 1%, 5%, 20% and 74% where observed at 1.5% (± 0.9), 5.4% (± 2.8), 20.3% (± 7.9) and 72.1% (± 10.6), respectively. The minor differences are likely due to measurement errors during the preparation of the calibration samples. (e) Average PPV and sensitivity calculated using samples CAL-C and CAL-D trained with CAL-B after random sampling the reads to lower coverage. The error bars represent the standard deviation of the results obtained from the two samples. (f) The prevalence of the calibration SNPs identified in CAL-B with and without whole genome amplification (WGA) (log scale x- and y-axis, respectively) is plotted against the prevalence estimated from the WGA sample replicates (red and blue). The minimum specified prevalence of the assay (1%) is indicated by dotted lines.
Harismendy et al. Genome Biology 2011 12:R124 doi:10.1186/gb-2011-12-12-r124