Log on / register
BioMed Central home | Journals A-Z | Feedback | Support
.refereed research
 |  |  |  |  | 


Open AccessHighly AccessResearch

Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome

Jia Qian Wu* 1 email, Jiang Du* 2 email, Joel Rozowsky3 email, Zhengdong Zhang3 email, Alexander E Urban1 email, Ghia Euskirchen1 email, Sherman Weissman4 email, Mark Gerstein2,3 email and Michael Snyder1,3 email

1Molecular, Cellular and Developmental Biology Department, KBT918, Yale University, 266 Whitney Avenue, New Haven, Connecticut 06511, USA

2Computer Science Department, Yale University, 51 Prospect St., New Haven, Connecticut 06511, USA

3Molecular Biophysics and Biochemistry Department, Yale University, 260 Whitney Avenue, New Haven, Connecticut 06511, USA

4Genetics Department, Yale University, 333 Cedar Street, New Haven, Connecticut 06511, USA

author email corresponding author email* Contributed equally

Genome Biology 2008, 9:R3doi:10.1186/gb-2008-9-1-r3

Published: 3 January 2008

Subject areas: Bioinformatics, Genome studies, Molecular biology

Abstract

Background

Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced.

Results

We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins.

Conclusion

We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.


© 1999-2008 BioMed Central Ltd unless otherwise stated < info@genomebiology.com >   Terms and conditions