Figure 1.

The CRAC algorithm. (a) Illustration of a break in the location profile. We consider each k-mer of the read and locate it exactly on the genome. In all figures, located k-mers are shown in blue, and unmapped k-mers in light orange. If the read differs from the genome by, for example an SNV or an error, then the k-mers containing this position are not located exactly on the genome. The interval of positions of unmapped k-mers is called a break. The end position of the break indicates the error or SNV position. (b) The support profile. The support value of a k-mer is the number of reads from the collection in which this k-mer appears at least once. The two plots show the support profile as a black curve on top of the location profile (in blue and orange). The support remains high (left plot) over the break if many reads covering this region are affected by a biological difference (for example, a mutation); it drops in the region of the break when the analyzed read is affected by a sequencing error; in this case, we say the support is dropping. (c) Rules for differentiating a substitution, a deletion, or an insertion depending on the break. Given the location profile, one can differentiate a substitution, a deletion, or an insertion by computing the difference between the gap in the genome and the gap in the read between k-mers starting before and after the break. (d) False locations and mirage breaks. When false locations occur inside or at the edges of a break they cause mirage breaks. False locations are represented in red. The break verification and break merging procedures correct for the effects of false locations to determine the correct break boundaries (and for example the correct splice junction boundaries) to avoid detecting a false chimera (Rule 2a) instead of a deletion. SNV: single nucleotide variant

Philippe et al. Genome Biology 2013 14:R30   doi:10.1186/gb-2013-14-3-r30
Download authors' original image