Open Access Highly Accessed Research

A comprehensive transcript index of the human genome generated using microarrays and computational approaches

Eric E Schadt1*, Stephen W Edwards1, Debraj GuhaThakurta1, Dan Holder2, Lisa Ying2, Vladimir Svetnik2, Amy Leonardson1, Kyle W Hart3, Archie Russell1, Guoya Li1, Guy Cavet1, John Castle1, Paul McDonagh4, Zhengyan Kan1, Ronghua Chen1, Andrew Kasarskis1, Mihai Margarint1, Ramon M Caceres1, Jason M Johnson1, Christopher D Armour1, Philip W Garrett-Engele1, Nicholas F Tsinoremas5 and Daniel D Shoemaker1*

Author Affiliations

1 Rosetta Inpharmatics LLC, 12040 115th Avenue NE, Kirkland, WA 98034, USA

2 Merck Research Laboratories, W42-213 Sumneytown Pike, POB 4, Westpoint, PA 19846, USA

3 Rally Scientific, 41 Fayette Street, Suite 1, Watertown, MA 02472, USA

4 Amgen Inc, 1201 Amgen Court W, Seattle, WA 98119, USA

5 The Scripps Research Institute, Jupiter, FL 33458, USA

For all author emails, please log on.

Genome Biology 2004, 5:R73  doi:10.1186/gb-2004-5-10-r73

Published: 23 September 2004

Abstract

Background

Computational and microarray-based experimental approaches were used to generate a comprehensive transcript index for the human genome. Oligonucleotide probes designed from approximately 50,000 known and predicted transcript sequences from the human genome were used to survey transcription from a diverse set of 60 tissues and cell lines using ink-jet microarrays. Further, expression activity over at least six conditions was more generally assessed using genomic tiling arrays consisting of probes tiled through a repeat-masked version of the genomic sequence making up chromosomes 20 and 22.

Results

The combination of microarray data with extensive genome annotations resulted in a set of 28,456 experimentally supported transcripts. This set of high-confidence transcripts represents the first experimentally driven annotation of the human genome. In addition, the results from genomic tiling suggest that a large amount of transcription exists outside of annotated regions of the genome and serves as an example of how this activity could be measured on a genome-wide scale.

Conclusions

These data represent one of the most comprehensive assessments of transcriptional activity in the human genome and provide an atlas of human gene expression over a unique set of gene predictions. Before the annotation of the human genome is considered complete, however, the previously unannotated transcriptional activity throughout the genome must be fully characterized.