Figure 3.

k-mer coverage. 15-mer coverage model fit to 76× coverage of 36 bp reads from E. coli. Note that the expected coverage of a k-mer in the genome using reads of length L will be <a onClick="popup('http://genomebiology.com/2010/11/11/R116/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://genomebiology.com/2010/11/11/R116/mathml/M10">View MathML</a> times the expected coverage of a single nucleotide because the full k-mer must be covered by the read. Above, q-mer counts are binned at integers in the histogram. The error k-mer distribution rises outside the displayed region to 0.032 at coverage two and 0.691 at coverage one. The mixture parameter for the prior probability that a k-mer's coverage is from the error distribution is 0.73. The mean and variance for true k-mers are 41 and 77 suggesting that a coverage bias exists as the variance is almost twice the theoretical 41 suggested by the Poisson distribution. The likelihood ratio of error to true k-mer is one at a coverage of seven, but we may choose a smaller cutoff for some applications.

Kelley et al. Genome Biology 2010 11:R116   doi:10.1186/gb-2010-11-11-r116
Download authors' original image