|
Resolution: standard / high Figure 6.
Schematic representation of the statistical and computational steps implemented in
LEfSe. Input data consist of a collection of m samples (columns) each made up of n numerical features (rows, typically normalized per-sample, red representing high values
and green low). These samples are labeled with a class (taking two or more possible
values) that represents the main biological comparison under investigation; they may
also have one or more subclass labels reflecting within-class groupings. (a) Step 1 analyzes all features, testing whether values in different classes are differentially
distributed. (b) Features violating the null hypothesis are further analyzed in step 2, which tests
whether all pairwise comparisons between subclasses in different classes significantly
agree with the class level trend. (c) The resulting subset of vectors is used to build a LDA model from which the relative
difference among classes is used to rank the features. The final output thus consists
of a list of features that are discriminative with respect to the classes, consistent
with the subclass grouping within classes, and ranked according to the effect size
with which they differentiate classes.
Segata et al. Genome Biology 2011 12:R60 doi:10.1186/gb-2011-12-6-r60 |