Table 2 |
||||||||
|
Performance comparison of metagenomic annotation of reads versus contigs |
||||||||
|
Class level (pre-propagate) |
Class level (post-propagate) |
|||||||
|
|
|
|||||||
|
Dataset |
Assembler |
Run time (speedup) |
Number unclassified |
Number correctly classified |
Number incorrectly classified |
Number unclassified |
Number correctly classified |
Number incorrectly classified |
|
|
||||||||
|
mockE |
None |
84.2 h (-) |
11,116,265 |
3,920,471 |
681,801 |
NA |
NA |
NA |
|
mockE |
SOAPdenovo_MA |
33.0 h (2.6×) |
634,091 |
14,852,561 |
231,885 |
612,517 |
14,874,157 |
231,863 |
|
mockE |
Velvet_MA |
29.4 h (2.9×) |
870,073 |
14,611,333 |
237,130 |
854,554 |
14,626,870 |
237,112 |
|
mockE |
MetaVelvet_MA |
29.9 h (2.8×) |
709,938 |
14,800,318 |
208,281 |
693,142 |
14,811,333 |
214,062 |
|
mockE |
MetaIDBA_MA |
37.8 h (2.2×) |
1,700,699 |
13,652,114 |
365,724 |
1,676,319 |
13,676,524 |
365,724 |
|
mockS |
None |
167.1 h (-) |
18,081,508 |
5,200,170 |
849,672 |
NA |
NA |
NA |
|
mockS |
SOAPdenovo_MA |
72.3 h (2.3×) |
1,971,900 |
21,772,125 |
387,325 |
1,850,541 |
21,884,121 |
386,688 |
|
mockS |
Velvet_MA |
71.8 h (2.3×) |
2,392,898 |
21,313,998 |
424,454 |
2,250,852 |
21,456,487 |
424,011 |
|
mockS |
MetaVelvet_MA |
54.4 h (3.1×) |
2,301,985 |
21,449,129 |
380,236 |
2,134,599 |
21,614,171 |
382,580 |
|
mockS |
MetaIDBA_MA |
53.8 h (3.1×) |
2,576,941 |
21,316,513 |
237,896 |
2,210,972 |
21,681,036 |
239,342 |
|
|
||||||||
|
Datasets are mockE (mock Even) and mockS (mock Staggered). Representing the truth, a total of 15,718,537/22,735,802 (69.14%) sequences could be unambiguously mapped using Bowtie for the mockE dataset and 24,131,350/39,918,454 (60.45%) for the mockS dataset. Assembler: each assembler was run within MetAMOS and the output contigs were classified using FCP. In the None case, the read sequences were classified by FCP prior to assembly. Classifications of reads with no known truth were neither penalized nor rewarded. Run time: the time required to run either FCP on the reads or the Preprocessing, Assembly (for a specific assembler), Annotate and Propagate steps within MetAMOS is reported in CPU hours. The speedup factor is the FCP run time divided by the time required to perform the analysis within MetAMOS. All experiments were performed on a 64-bit Linux server equipped with eight 2.8 GHz dual-core processors and 128 GB RAM. Number unclassified, Number correctly classified, and Number incorrectly classified: total count of sequences, either unclassified, correctly classified, or incorrectly classified at the class taxonomic level. When compared to the unassembled results, classification within MetAMOS yields at least a three-fold increase in correctly classified sequences and a two-fold reduction in incorrectly classified sequences. Number unclassified, Number correctly classified, and Number incorrectly classified (post-propagate): the MetAMOS propagate step was used to transfer the annotations using the assembly graph. The total number of correctly classified sequences increases slightly in all cases, while not significantly increasing the number of incorrectly classified sequences. The full classification at each taxonomic level is given in Table S1 in Additional file 1. NA, not applicable. |
||||||||
|
Treangen et al. Genome Biology 2013 14:R2 doi:10.1186/gb-2013-14-1-r2 |
||||||||