Open Access Highly Accessed Software

The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists

Da Wei Huang1, Brad T Sherman1, Qina Tan1, Jack R Collins2, W Gregory Alvord3, Jean Roayaei3, Robert Stephens2, Michael W Baseler4, H Clifford Lane5 and Richard A Lempicki1*

Author Affiliations

1 Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA

2 Advanced Biomedical Computing Center, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA

3 Computer and Statistical Services, Data Management Services, National Cancer Institute at Frederick, Frederick, MD 21702, USA

4 Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA

5 Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA

For all author emails, please log on.

Genome Biology 2007, 8:R183  doi:10.1186/gb-2007-8-9-r183

Published: 4 September 2007

Additional files

Additional data file 1:

Genes used in the paper: 409 Affymetrix IDs of demo list 2; 84 chemokine genes; approximately 17,000 pairs of protein-protein interactions; and 16 Affy IDs.

Format: XLS Size: 830KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional data file 2:

Complete output in text format for demo list 2 analyzed by the DAVID Gene Functional Classification Tool.

Format: XLS Size: 37KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional data file 3:

Complete output in text format for demo list 2 analyzed by the DAVID Functional Annotation Clustering Tool.

Format: XLS Size: 77KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional data file 4:

The genes in demo list 2 were analyzed by DAVID Gene Functional Classification Tool. The identified biological groups/modules were displayed by the fuzzy heat map.

Format: PPT Size: 151KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional data file 5:

The binary gene-term matrix (like Figure 2a) was compiled and submitted to different clustering engines, including Hierarchical clustering, and K-means. The results were evaluated and compared side-by-side.

Format: DOC Size: 592KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional data file 6:

An example of the group enrichment score calculation used for the Functional Annotation Clustering Tool.

Format: PPT Size: 32KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional data file 7:

Fourteen annotation categories used in the DAVID Functional Classification Tool.

Format: DOC Size: 37KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional data file 8:

Graphical instruction and tutorial on how to use the DAVID Functional Classification Tool and the DAVID Functional Annotation Clustering Tool.

Format: DOC Size: 1.4MB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional data file 9:

(a) Related gene search for 'interleukin 8' in the scope of demo list 2. (b) Related term search for 'inflammatory response' in the scope of all annotations. (c) Related gene search for a group of genes, group 1 for demo list 2, identified by the DAVID Gene Functional Classification Tool.

Format: PPT Size: 566KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional data file 10:

(a) Significant kappa scores (≥0.35 based on randomization study in Figure 3) can be obtained only for gene-gene pairs with higher overlapped annotation terms (≥10). Thus, there is no reason to calculate kappa scores, in an attempt to save the calculating time for DAVID Functional Classification, for the large number of those gene-gene pairs with fewer annotation terms overlapped. A conservative default filter is 4. (b) Such a default filer (blue curve) has somewhat greater impact on the significant kappa scores in the higher end, compared to those in the lower end. However, it will skip a significant amount of kappa calculation of gene-gene pairs.

Format: PPT Size: 646KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional data file 11:

The annotation data contents contain many more 0 s than 1 s. The test shows that kappa statistics is able to detect 1-1 relationships, which are the key biological co-occurrences that we desire to measure.

Format: DOC Size: 346KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional data file 12:

The examples suggest that the 'flat' matrix strategy, along with kappa statistics, allows for the quantitative measurement of gene-gene and term-term relationships based on global annotation profiles. All levels of annotation are important to measurement contribution.

Format: DOC Size: 262KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional data file 13:

The example provides a step-by-step demonstration of the clustering algorithm, thereby showing how the members are grouped together, how the number of total groups are determined, and how fuzziness can occur.

Format: DOC Size: 143KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional data file 14:

Detailed results, comparisons and discussion of the new DAVID clustering tools with regards to yeast cell cycle G1 genes [21]

Format: XLS Size: 1.1MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional data file 15:

The annotation results all point in the right direction, that is, inflammatory responses. However, the redundant/similar/hierarchical terms are spread throughout the results, which decreases analytical efficiency. In addition, some of the key terms reported by the original publication are not on the top of the results produced by other tools, but are always covered by the DAVID tools.

Format: XLS Size: 1.9MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data