|
Resolution: standard / high Figure 2.
Quality scores are of limited use in predicting accuracy of unknown sequences. The
quality scores reported by the GS20 software correlate with decreased confidence in
calling the correct homopolymer length rather than the accuracy of the called bases.
(a, b) The average quality score of reads decreases as the number of errors in the read increases.
(c) The average quality score as a function of position in the homopolymer: as the length
of the homopolymer increases, the quality scores decrease, for both correctly and
incorrectly called bases. (d) The average quality scores of perfect reads containing differing numbers of homopolymers.
The average quality scores decrease with the number of homopolymers. Our sequences
contain only short homopolymers, primarily 3-mers. As the length and frequency of
homopolymers increases, the expected quality scores will decrease. Without a priori knowledge of the number and length of homopolymers in a particular read, it will be
difficult to assess an appropriate quality threshold - a low threshold may not cull
data adequately and a high threshold may remove homopolymeric regions.
Huse et al. Genome Biology 2007 8:R143 doi:10.1186/gb-2007-8-7-r143 |