This article is part of the supplement: EGASP '05: ENCODE Genome Annotation Assessment Project
Research
A computational approach for identifying pseudogenes in the ENCODE regions
1 Department of Molecular Biophysics and Biochemistry, Yale University, Whitney Avenue, New Haven, CT 06520, USA
2 Department of Computer Science, Yale University, Prospect Street, New Haven, CT 06520, USA
3 Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
Genome Biology 2006, 7(Suppl 1):S13 doi:10.1186/gb-2006-7-s1-s13
Published: 7 August 2006Abstract
Background
Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions).
Results
Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications.
Conclusion
Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.



