Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

This article is part of the supplement: EGASP '05: ENCODE Genome Annotation Assessment Project

Open Access Research

A computational approach for identifying pseudogenes in the ENCODE regions

Deyou Zheng1 and Mark B Gerstein123*

Author Affiliations

1 Department of Molecular Biophysics and Biochemistry, Yale University, Whitney Avenue, New Haven, CT 06520, USA

2 Department of Computer Science, Yale University, Prospect Street, New Haven, CT 06520, USA

3 Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA

For all author emails, please log on.

Genome Biology 2006, 7(Suppl 1):S13  doi:10.1186/gb-2006-7-s1-s13

Published: 7 August 2006

Abstract

Background

Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions).

Results

Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications.

Conclusion

Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.