Figure 1.

Pseudogene annotation flowchart. A flowchart to describe the GENCODE pseudogene annotation procedure and the incorporation of functional genomics data from the 1000 Genomes (1000G) project and ENCODE. This is an integrated procedure including manual annotation done by the HAVANA team and two automated prediction pipelines: PseudoPipe and RetroFinder. The loci that are annotated by both PseudoPipe and RetroFinder are collected in a subset labeled as '2-way consensus', which is further intersected with the manually annotated HAVANA pseudogenes. The intersection results in three subsets of pseudogenes. Level 1 pseudogenes are loci that have been identified by all three methods (PseudoPipe, RetroFinder and HAVANA). Level 2 pseudogenes are loci that have been discovered through manual curation and were not found by either automated pipeline. Delta 2-way contains pseudogenes that have been identified only by computational pipelines and were not validated by manual annotation. As a quality control exercise to determine completeness of pseudogene annotation in chromosomes that have been manually annotated, 2-way consensus pseudogenes are analyzed by the HAVANA team to establish their validity and are included in the manually annotated pseudogene set if appropriate. The final set of pseudogenes is compared with functional genomics data from ENCODE and genomic variation data from the 1000 Genomes project.

Pei et al. Genome Biology 2012 13:R51   doi:10.1186/gb-2012-13-9-r51