|
Resolution: standard / high Figure 1.
Assisted assembly principle. (a) In this example, five reads align uniquely to the reference genome, and the two leftmost
of these (purple) also appear as the two rightmost reads in an existing de novo contig. We can then extend the de novo contig by using the three unassembled reads (green), even if there is no supporting
linking evidence (in general, ARACHNE requires a read to be linked to the contig it
overlaps before using it to extend the contig). (b) Two scaffolds (blue and purple) are mapped and oriented on the reference genome by
the trusted green reads. Furthermore, the two scaffolds are joined by a single link
(black dotted line), although this is not trusted per se. The ARACHNE scaffolding algorithm would not normally join the two scaffolds; however,
in this case the separation of the two scaffolds implied by the link is consistent
with the separation implied by the mapping on the reference genome, and we thus implicitly
validate the black dotted link and join the two scaffolds. (c) Trusted read placements anchor portions of a single scaffold onto two distant parts
of the reference genome, suggesting either a bona fide syntenic break or a misassembly. To test for the latter, the contested region on the
scaffold is subject to a stringent test for misassembly, and broken if it fails. The
same level of stringency of misassembly testing could not be applied to the entire
assembly because, at low coverage, there would be too many false positives.
Gnerre et al. Genome Biology 2009 10:R88 doi:10.1186/gb-2009-10-8-r88 |