Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

Open Access Highly Accessed Method

Metagenomic biomarker discovery and explanation

Nicola Segata1, Jacques Izard23, Levi Waldron1, Dirk Gevers4, Larisa Miropolsky1, Wendy S Garrett567 and Curtis Huttenhower1*

Author Affiliations

1 Department of Biostatistics, 677 Huntington Avenue, Harvard School of Public Health, Boston, MA 02115, USA

2 Department of Molecular Genetics, 245 First Street, The Forsyth Institute, Cambridge, MA 02142, USA

3 Department of Oral Medicine, Infection and Immunity, 188 Longwood Ave, Harvard School of Dental Medicine, Boston, MA 02115, USA

4 Microbial Sequencing Center, 7 Cambridge Center, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA

5 Department of Immunology and Infectious Diseases, 665 Huntington Avenue, Harvard School of Public Health, Boston, MA 02115, USA

6 Department of Medicine, 75 Francis Street, Harvard Medical School, Boston, MA 02115, USA

7 Department of Medical Oncology, 44 Binney Street, Dana-Farber Cancer Institute, MA 02215, USA

For all author emails, please log on.

Genome Biology 2011, 12:R60  doi:10.1186/gb-2011-12-6-r60

Published: 24 June 2011

Additional files

Additional file 1:

Supplementary Figure S6. Histogram of within-subject β-diversity (community dissimilarity) between different mucosal (red) and non-mucosal (green) body sites.

Format: PDF Size: 65KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Supplementary Figure S1. Cladogram representing the differences between viromes and microbiomes on the subsystem framework.

Format: PDF Size: 1.8MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Supplementary Figure S2. Histogram of LDA logarithmic scores of biomarkers found by LEfSe comparing microbiomes and viromes within the subsystem framework.

Format: PDF Size: 384KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Supplementary Figure S3. Histogram of LDA logarithmic scores of COG biomarkers found by LEfSe comparing adult and infant microbiomes.

Format: PDF Size: 184KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Supplementary Figure S4. Functional features (COGs) that are discrimantive for the comparison between adult and infant microbiomes according to LEfSe but not detected by Metastats among the discriminant features with LDA score higher than 3. If we consider all the discriminant features without threhold on LDA score, LEfSe identifies 366 COGs in total, 185 of which are not discriminant for Metastats.

Format: PDF Size: 230KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Supplementary Figure S5. Functional features (COGs) that are discrimantive for the comparison between adult and infant microbiomes according to Metastats but not detected by LEfSe. Even if median and variance suggest the differences to be discriminative, there are always some microbiomes (at least two) that are overlapping between classes. This is due to the stringent α-value (0.01) set for the KW test in LEfSe and to the fact that we use non-parametric statistics (differently from Metastats). Notice, however, that even using a low α-value LEfSe detects many more biomarkers than metastats (366 versus 192).

Format: PDF Size: 289KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Supplementary Figure S9. Comparison between LEfSe and Metastats using the synthetic data described in Figure 5 and in the Materials and methods. LEfSe was applied as detailed in the paper; for Metastats we used the default settings (that is, α = 0.05 and Npermutations = 1,000) and, as for LEfSe and KW, we disabled the per-sample normalization as the features are independent. (a,b) Metastats has a higher false positive rate (average 5%) than LEfSe (average below 0.5%) and lower false negative rate. (c) When the subclass information is meaningful (see Figure 5 for the representation of the dataset), LEfSe performs substantially better than Metastats both in terms of false positive and false negatives. Overall, on these synthetic data, Metastats achieves very similar results compared to KW (Figure 5) and neither of them can make use of additional information regarding the within-class structure, thus achieving poor results compared to LEfSe when such kinds of information are available.

Format: PDF Size: 376KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 8:

Supplementary Figure S7. SVM-based effect size estimation for the biomarkers found for the Rag2-/- versus T-bet-/-xRag2-/- comparison reported in Figure 3 of the manuscript. The LDA-based approach for assessing effect size (Figure 3) is closer to the biological follow-up experiments and is more visually consistent. The reason for LDA superiority over SVM approaches for effect size estimation is theoretically connected with the ability of LDA to find the axis with the highest variance, and the SVM effort on evaluating the combined feature predictive power rather than single feature relevance. It is worth specifying that the effect size estimation accuracy of an algorithm is not directly connected with its predictive ability (SVM approaches are usually considered more accurate than LDA for prediction).

Format: PDF Size: 207KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 9:

Supplementary Figure S8. Comparison between the features with the highest SVM-based effect size (Papillibacter, on the left), the highest LDA-based effect size (Bifidobacterium, in the center), and the Actinobacteria phylum (on the right). From a visual analysis, Bifidobacerium shows a larger effect size, which is also evident looking at the ratios between class means, suggesting LDA as a better option for effect size estimation than SVM approaches. As detailed in the manuscript, the relevance of Bifidobacterium has been experimentally validated. Moreover, the large difference in the score given by the SVM approach to Actinobacteria compared to Bifidobacterium and Papillibacter is not consistent.

Format: PDF Size: 71KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 10:

T-bet-/- × Rag2-/- - Rag2-/- dataset. Input LEfSe file for the analysis of the ulcerative colitis phenotype in mice.

Format: TXT Size: 83KB Download file

Open Data