Table 1

Gap closure results obtained on the bacterial datasets

Method

Original

IMAGE

SOAPdenovo

GapFiller

GapFiller-LC


Escherichia coli

Genome size (bp)

4,478,287

4,530,961

4,490,973

4,490,638

Scaffolds

179

179

179

179

Gap count

544

291

16

11

Total gap length (bp)

12,516

2,861

16

130

Errors (SNPs)

12

40

33

22

Errors (indels)

4

17

25

9

Errors (misjoins)

1

1

1

1

N50

50,557

50,558

50,558

50,558

Streptomyces coelicolor

Genome size (bp)

8,558,275

8,576,331

8,557,720

8,558,333

Scaffolds

115

115

115

115

Gap count

158

63

60

23

Total gap length (bp)

9,221

4,009

1,288

806

Errors (SNPs)

299

423

406

280

Errors (indels)

664

677

769

686

Errors (misjoins)

12

17

18

18

N50

173,822

173,822

173,822

173,822

Staphylococcus aureus

Genome size (bp)

2,880,676

2,880,926

2,881,756

2,883,448

Scaffolds

19

19

19

19

Gap count

48

27

27

22

Total gap length (bp)

9,900

1,547

5,508

1,861

Errors (SNPs)

79

260

98

173

Errors (indels)

16

53

26

37

Errors (misjoins)

4

13

7

5

N50

1,091,731

1,091,333

1,092,281

1,092,421

Rhodobacter sphaeroides

Genome size (bp)

4,609,785

4,609,466

4,609,596

4,610,796

Scaffolds

38

38

38

38

Gap count

170

163

161

139

Total gap length (bp)

21,409

14,166

20,667

17,625

Errors (SNPs)

218

410

230

300

Errors (indels)

187

294

190

199

Errors (misjoins)

6

10

6

7

N50

3,192,334

3,192,075

3,192,215

3,192,974


Gap closure results obtained on four bacterial datasets show that the GapFiller strategy yields the most accurate finished genomes. Also, the gap count is lower compared to the other methods. The IMAGE method significantly underperforms on all quality measures and would therefore not be the preferred method to use. Differences are smaller between GapFiller and SOAPdenovo. Interestingly, whereas the gap count after closure is generally less for GapFiller, SOAPdenovo yields in three cases a shorter total gap length. This suggests the latter method is able to close larger gaps. Strikingly, however, the amount of errors is significantly higher for SOAPdenovo regardless of the source (SNPs, indels and misjoins). Even when applying less strict settings for GapFiller (GapFiller-LC: minimum coverage o = 1, ratio r = 0.5) to shorten the total gap length, our method still yields significantly less errors.

Boetzer and Pirovano Genome Biology 2012 13:R56   doi:10.1186/gb-2012-13-6-r56

Open Data