Table 2

Genome assembly continuity and correctness using hybrid and self-correction approaches
Organism Corrected by Assembly bp Number of contigs (expected) Number of contigs (actual) N50 (expected) N50 (actual) LAP Number of discordant bases QV
E. coli K12 Reference 4,639,675 1 4,639,675 NA -9.65E + 07 4 >60
MiSeq 100× 4,647,253 1 2 2,367,319 -9.64E + 07 3 >60
454 50× 4,649,004 1 1 4,649,004 -9.64E + 07 3 >60
CCS 25X 4,653,267 1 1 4,653,267 -9.64E + 07 3 >60
Self 4,653,486 1 1 4,653,486 -9.64E + 07 3 >60
E .coli O157:H7 Near neighbor 5,594,477 3 3,776,951 NA -3.82E + 07 1,282 36.40
MiSeq 100× 5,624,394 10 10 3,089,011 -3.66E + 07 4 >60
454 40× 5,613,057 10 12 927,294 -3.67E + 07 13 56.35
Self 5,611,389 10 9 4,324,437 -3.66E + 07 0 >60
B. trehalosi MiSeq 100× 2,402,545 6 1,603,511 -3.28E + 07 1 >60
454 50× 2,413,761 4 1,051,672 -3.27E + 07 2 >60
CCS 25X 2,411,501 1 2,411,501 -3.27E + 07 0 >60
Self 2,411,068 1 2,411,068 -3.27E + 07 0 >60
M. haemolytica MiSeq 100× 2,712,467 1 2,712,467 -3.31E + 07 0 >60
CCS 25X 2,739,949 2 2,686,992 -3.31E + 07 0 >60
Self 2,736,037 1 2,736,037 -3.31E + 07 0 >60
F. tularensis Near neighbor 1,895,727 1 965,253 NA -1.33E + 07 113 42.25
MiSeq 100× 1,879,071 3 10 357,518 -1.33E + 07 0 >60
454 50× 1,863,947 3 15 201,203 -1.33E + 07 0 >60
Self 1,828,135 3 8 401,731 -1.33E + 07 0 >60
Self (300×) 1,877,407 3 3 573,021 -1.33E + 07 0 >60
S. enterica Newport Near neighbor 5,007,719 2 4,827,641 NA -2.26E + 07 20 53.99
MiSeq 56X 5,027,784 4 2 4,918,796 -2.24E + 07 2 >60
454 25X 5,034,500 4 3 4,095,943 -2.24E + 07 2 >60
CCS 22X 5,030,885 4 2 4,921,886 -2.24E + 07 2 >60
Self 5,029,197 4 2 4,919,684 -2.24E + 07 2 >60

Organism: the genome being assembled. Corrected by: the short-read data used for correction. Assembly bp: the total number of base pairs in all contigs (only contigs containing at least 100 reads are included in all results). Number of contigs (expected): predicted number of contigs for a known reference (or near-neighbor). Number of contigs (actual): the number of contigs comprising the assembly. N50: N such that 50% of the genome is contained in contigs of length ≥N. LAP: the assembly likelihood score. A score closer to zero indicates a better assembly. Number of discordant bases: the number of SNPs and indels identified by mapping MiSeq sequences back to the assembly and recording discrepancies. Each incorrect base is counted (that is, an indel that is a deletion of two bases from the assembly counts as two in this column). QV: estimated from the number of discordant bases as <a onClick="popup('http://genomebiology.com/2013/14/9/R101/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://genomebiology.com/2013/14/9/R101/mathml/M1">View MathML</a>. The QV can be converted to an error probability P=10^(-QV/10). Assemblies were generated by Celera Assembler [31] followed by post-processing with Quiver [32]. NA, not available.

Koren et al.

Koren et al. Genome Biology 2013 14:R101   doi:10.1186/gb-2013-14-9-r101

Open Data