|
Resolution: standard / high Figure 1.
LEfSe mines a wide range of high-throughput genetic data to find biologically relevant
features characterizing one or more experimental conditions. The inputs to the system are the specifications of the biological hypothesis under
investigation (conditions and inter-condition sample groupings), the high-dimensional
data obtained experimentally, and, optionally, prior knowledge from literature or
databases used to define known relationships between features (used for meaningful
hierarchical organization of the discovered biomarkers) or samples (used for testing
biological consistency of potential biomarkers). LEfSe is a three-step algorithm (detailed
in Figure 6). (a) LEfSe first provides the list of features that are differential among conditions of
interest with statistical and biological significance, ranking them according to the
effect size. (b) For problems with known hierarchical structure, either phylogenetic or functional,
we then provide a mapping of the differences to taxonomic or functional trees. (c) Finally, the system produces a histogram visualizing the raw data within the specified
problem structure for each relevant feature. While LEfSe has been developed primarily
for metagenomic data containing taxon or gene abundances, it can be used for biomarker
discovery in any setting where prior biological knowledge regarding the structure
of a comparison is coupled with statistically significant differences in high-dimensional
genomic features. KEGG, Kyoto Encyclopedia of Genes and Genomes; WGS, whole genome
shotgun.
Segata et al. Genome Biology 2011 12:R60 doi:10.1186/gb-2011-12-6-r60 |