Table 2

Performance comparison of metagenomic annotation of reads versus contigs

Class level (pre-propagate)

Class level (post-propagate)



Dataset

Assembler

Run time (speedup)

Number unclassified

Number correctly classified

Number incorrectly classified

Number unclassified

Number correctly classified

Number incorrectly classified


mockE

None

84.2 h (-)

11,116,265

3,920,471

681,801

NA

NA

NA

mockE

SOAPdenovo_MA

33.0 h (2.6×)

634,091

14,852,561

231,885

612,517

14,874,157

231,863

mockE

Velvet_MA

29.4 h (2.9×)

870,073

14,611,333

237,130

854,554

14,626,870

237,112

mockE

MetaVelvet_MA

29.9 h (2.8×)

709,938

14,800,318

208,281

693,142

14,811,333

214,062

mockE

MetaIDBA_MA

37.8 h (2.2×)

1,700,699

13,652,114

365,724

1,676,319

13,676,524

365,724

mockS

None

167.1 h (-)

18,081,508

5,200,170

849,672

NA

NA

NA

mockS

SOAPdenovo_MA

72.3 h (2.3×)

1,971,900

21,772,125

387,325

1,850,541

21,884,121

386,688

mockS

Velvet_MA

71.8 h (2.3×)

2,392,898

21,313,998

424,454

2,250,852

21,456,487

424,011

mockS

MetaVelvet_MA

54.4 h (3.1×)

2,301,985

21,449,129

380,236

2,134,599

21,614,171

382,580

mockS

MetaIDBA_MA

53.8 h (3.1×)

2,576,941

21,316,513

237,896

2,210,972

21,681,036

239,342


Datasets are mockE (mock Even) and mockS (mock Staggered). Representing the truth, a total of 15,718,537/22,735,802 (69.14%) sequences could be unambiguously mapped using Bowtie for the mockE dataset and 24,131,350/39,918,454 (60.45%) for the mockS dataset. Assembler: each assembler was run within MetAMOS and the output contigs were classified using FCP. In the None case, the read sequences were classified by FCP prior to assembly. Classifications of reads with no known truth were neither penalized nor rewarded. Run time: the time required to run either FCP on the reads or the Preprocessing, Assembly (for a specific assembler), Annotate and Propagate steps within MetAMOS is reported in CPU hours. The speedup factor is the FCP run time divided by the time required to perform the analysis within MetAMOS. All experiments were performed on a 64-bit Linux server equipped with eight 2.8 GHz dual-core processors and 128 GB RAM. Number unclassified, Number correctly classified, and Number incorrectly classified: total count of sequences, either unclassified, correctly classified, or incorrectly classified at the class taxonomic level. When compared to the unassembled results, classification within MetAMOS yields at least a three-fold increase in correctly classified sequences and a two-fold reduction in incorrectly classified sequences. Number unclassified, Number correctly classified, and Number incorrectly classified (post-propagate): the MetAMOS propagate step was used to transfer the annotations using the assembly graph. The total number of correctly classified sequences increases slightly in all cases, while not significantly increasing the number of incorrectly classified sequences. The full classification at each taxonomic level is given in Table S1 in Additional file 1. NA, not applicable.

Treangen et al. Genome Biology 2013 14:R2   doi:10.1186/gb-2013-14-1-r2

Open Data