Open Access Highly Accessed Research

Pathway and gene-set activation measurement from mRNA expression data: the tissue distribution of human pathways

David M Levine1, David R Haynor2, John C Castle1, Sergey B Stepaniants1, Matteo Pellegrini3, Mao Mao1 and Jason M Johnson1*

Author Affiliations

1 Rosetta Inpharmatics LLC, a wholly owned subsidiary of Merck and Co., Inc., Terry Avenue North, Seattle, WA 98109, USA

2 Department of Radiology, University of Washington, Seattle, WA 98195, USA

3 Department of MCD Biology, University of California at Los Angeles, Los Angeles, CA 90095, USA

For all author emails, please log on.

Genome Biology 2006, 7:R93  doi:10.1186/gb-2006-7-10-r93

Published: 17 October 2006



Interpretation of lists of genes or proteins with altered expression is a critical and time-consuming part of microarray and proteomics research, but relatively little attention has been paid to methods for extracting biological meaning from these output lists. One powerful approach is to examine the expression of predefined biological pathways and gene sets, such as metabolic and signaling pathways and macromolecular complexes. Although many methods for measuring pathway expression have been proposed, a systematic analysis of the performance of multiple methods over multiple independent data sets has not previously been reported.


Five different measures of pathway expression were compared in an analysis of nine publicly available mRNA expression data sets. The relative sensitivity of the metrics varied greatly across data sets, and the biological pathways identified for each data set are also dependent on the choice of pathway activation metric. In addition, we show that removing incoherent pathways prior to analysis improves specificity. Finally, we create and analyze a public map of pathway expression in human tissues by gene-set analysis of a large compendium of human expression data.


We show that both the detection sensitivity and identity of pathways significantly perturbed in a microarray experiment are highly dependent on the analysis methods used and how incoherent pathways are treated. Analysts should thus consider using multiple approaches to test the robustness of their biological interpretations. We also provide a comprehensive picture of the tissue distribution of human gene pathways and a useful public archive of human pathway expression data.