Table 1

Comparison of assembly statistics

Dataset

Assembler

#ctgs/scfs

Good Ctgs/scfs

Total aln (Mbp)

Slt

Hvy

Ch

Size @ 10 Mbp

#@ 10 Mbp

Max ctg size

Err per Mbp


mockE

SOAPdenovo

63,014

99.3%

51

167

131

1

28,208

195

249,819

5.9

mockE

SOAPdenovo_MA

63,107

99.3%

51

166

131

1

28,208

195

249,819

5.8

mockE

Velvet

12,381

96.0%

41

269

106

2

46,122

128

183,815

9.2

mockE

Velvet_MA

12,830

96.2%

41

256

100

2

42,269

137

179,673

8.7

mockE

MetaVelvet

23,323

96.7%

49

474

160

5

62,131

93

367,458

13.0

mockE

MetaVelvet_MA

22,772

96.8%

49

462

156

4

62,138

91

367,458

12.7

mockE

Meta-IDBA

22,064

95.3%

47

362

151

3

26,141

223

249,069

11.0

mockE

Meta-IDBA_MA

22,032

95.4%

47

362

151

3

26,141

223

249,069

11.0

mockS

SOAPdenovo

45,251

98.8%

28

135

99

0

5,672

626

186,064

8.4

mockS

SOAPdenovo_MA

44,928

98.8%

28

135

98

0

5,672

626

186,064

8.3

mockS

Velvet

20,981

95.6%

28

498

127

1

6,134

770

119,120

22.4

mockS

Velvet_MA

21,050

95.8%

28

485

115

1

6,060

775

119,120

21.5

mockS

MetaVelvet

19,649

94.5%

28

518

158

2

13,028

351

217,330

24.2

mockS

MetaVelvet_MA

20,551

95.3%

28

517

143

3

6,685

622

217,330

20.1

mockS

Meta-IDBA

4,573

92.3%

18

101

83

0

13,150

368

119,604

10.2

mockS

Meta-IDBA_MA

4,559

92.5%

18

101

83

0

13,150

368

119,604

10.2

HMP

SOAPdenovo

39,028

89.9%

11

1,138

2,686

0

9,881

514

116,204

347.6

HMP

SOAPdenovo_MA

35,230

89.1%

11

1,138

2,618

0

11,359

426

238,051

341.5

HMP

Meta-IDBA

25,861

88.9%

7

718

2,102

0

4,215

1144

59,188

402.8

HMP

Meta-IDBA_MA

25,698

88.7%

7

710

2,087

0

4,215

1144

59,188

399.6

HMPscf

SOAPdenovo

31,673

99.9%

11

-

-

10

9,906

510

116,181

0.9

HMPscf

SOAPdenovo_MA

27,231

99.9%

11

-

-

10

11,359

426

238,051

0.9

HMPscf

Meta-IDBA

20,352

99.9%

7

-

-

10

4,946

939

59,188

1.4

HMPscf

Meta-IDBA_MA

22,886

99.9%

7

-

-

9

22,304

238

66,401

1.3


Datasets are mockE (mock Even), mockS (mock Staggered), HMP (Tongue dorsum, contig-level analysis), HMPscf (Tongue dorsum, scaffold-level analysis). All analyses other than HMPscf were done at the contig level. If necessary, contigs were extracted from scaffolds by splitting at three consecutive Ns. Assemblers with suffix _MA indicate the results produced by running MetAMOS on contigs produced by the corresponding assembler. #ctgs/scfs: total number of contigs/scaffolds in the assembly. Good Ctgs/scfs: fraction of contigs/scaffolds that mapped without errors to reference genomes. For the HMP dataset (Tongue dorsum contigs) alignments were only made to a small set of genomes estimated by the HMP project to match the genomes in this sample. For the HMPscf dataset good scaffolds are those without chimeric errors. Total Aln: total amount of sequence that can be aligned to the reference genomes (in Mbp). Slt: slight mis-assemblies determined by alignments that cover 80% or more of the aligned contig in a single match. Hvy: heavy misassemblies determined by alignments that cover less than 80% of the aligned contig in a single match or have two or more matches to a single reference. Ch: Chimeras are contigs with matches to two distinct reference genomes. Neither heavy mis-assemblies nor chimeras count towards reference coverage. Size @ 10 Mbp: the size of the largest contig c such that the sum of all contigs larger than c is more than 10 Mbp (similar to the commonly used N50 size). #@ 10 Mbp: smallest number of contigs whose cumulative size adds up to more than 10 Mbp. Max ctg size: size of the largest contig in the assembly. Err per Mbp: average number of errors per Mbp. Numbers in bold represent the best value for the specific dataset.

Treangen et al. Genome Biology 2013 14:R2   doi:10.1186/gb-2013-14-1-r2

Open Data