Table 1

Total sequence and estimated coverage for six sequenced genomes
Organism Genome Mbp GC % Class PB CLR Gbp PB % error (200×) PB % >7 kbp (200×) PB coverage >7 kbp (200×) PB CCS Gbp 454 Gbp MiSeq Gbp
Escherichia coli K12 MG1655 4.65 50% I 1.4 (294×) 10.44 16.07 32.11× 0.2 (44×) 0.19 (42×) 1.73 (372×)
Escherichia coli O157:H7 F8092B [41] 5.52 50% III 2.2 (397×) 10.49 13.37 26.75× NA 0.22 (40×) 0.65 (118×)
Bibersteinia trehalosi USDA-ARS-USMARC-192 2.41 41% I 1.1 (439×) 13.17 5.56 11.12× 0.09 (35×) 0.15 (59×) 0.62 (247×)
Mannheimia haemolytica USDA-ARS-USMARC-2286 2.55 41% III 0.5 (212×) 10.46 23.06 47.12× 0.1 (42×) NA 0.39 (85×)
Francisella tularensis 99A-2628 1.90 32% III 0.7 (331×) 11.82 8.52 17.03× NA 0.09 (40×) 11.73 (5870×)
Salmonella enterica Newport SN31241 [42] 5.01 52% I 1.1 (217×) 12.33 9.30 18.61× 0.1 (22×) 0.13 (25×) 0.28 (56×)

The table lists available sequencing coverage (in base pairs and estimated fold coverage) of six bacterial genomes chosen to validate the closure recipe, for the four types of data produced (PacBio CLR, PacBio CCS, 454 titanium, and MiSeq paired end). Sequencing data were randomly down sampled following the procedure described in Materials and methods. For each genome, the expected genome size, GC content, and complexity class is included. PB % error (200×): the estimated error rate based on BLASR [28] mappings of the analyzed random 200× subset to the assembly. PB % >7 kbp (200×): the percentage of bases in sequences longer than 7 kbp, before correction. PB coverage >7 kbp (200×): the coverage represented by sequences longer than 7 kbp, before correction. NA, not available; PB, PacBio.

Koren et al.

Koren et al. Genome Biology 2013 14:R101   doi:10.1186/gb-2013-14-9-r101

Open Data