|
Resolution: standard / high Figure 12.
Summary of pseudogene annotation and case studies. (a) A heatmap showing the annotation for transcribed pseudogenes including active chromatin
segmentation, DNaseI hypersensitivity, active promoter, active Pol2, and conserved
sequences. Raw data were from the K562 cell line. (b) A transcribed duplicated pseudogene (Ensembl gene ID: ENST00000434500.1; genomic location,
chr7: 65216129-65228323) showing consistent active chromatin accessibility, histone
marks, and TFBSs in its upstream sequences. (c) A transcribed processed pseudogene (Ensembl gene ID: ENST00000355920.3; genomic location,
chr7: 72333321-72339656) with no active chromatin features or conserved sequences.
(d) A non-transcribed duplicated pseudogene showing partial activity patterns (Ensembl
gene ID: ENST00000429752.2; genomic location, chr1: 109646053-109647388). (e) Examples of partially active pseudogenes. E1 and E2 are examples of duplicated pseudogenes.
E1 shows UGT1A2P (Ensembl gene ID: ENST00000454886), indicated by the green arrowhead. UTG1A2P is a non-transcribed pseudogene with active chromatin and it is under negative selection.
Coding exons of protein-coding paralogous loci are represented by dark green boxes
and UTR exons by filled red boxes. E2 shows FAM86EP (Ensembl gene ID: ENST00000510506) as open green boxes, which is a transcribed pseudogene
with active chromatin and upstream TFBSs and Pol2 binding sites. The transcript models
associated with the locus are displayed as filled red boxes. Black arrowheads indicate
features novel to the pseudogene locus. E3 and E4 show two unitary pseudogenes. E3
shows DOC2GP (Ensembl gene ID: ENST00000514950) as open green boxes, and transcript models associated
with the locus are shown as filled red boxes. E4 shows SLC22A20 (Ensembl gene ID: ENST00000530038). Again, the pseudogene model is represented as
open green boxes, transcript models associated with the locus as filled red boxes,
and black arrowheads indicate features novel to the pseudogene locus. E5 and E6 show
two processed pseudogenes. E5 shows pseudogene EGLN1 (Ensembl gene ID: ENST00000531623) inserted into duplicated pseudogene SCAND2 (Ensembl gene ID: ENST00000541103), which is a transcribed pseudogene showing active
chromatin but no upstream regulatory regions as seen in the parent gene. The pseudogene
models are represented as open green boxes, transcript models associated with the
locus are displayed as filled red boxes, and black arrowheads indicate features novel
to the pseudogene locus. E6 shows a processed pseudogene RP11-409K20 (Ensembl gene ID: ENST00000417984; filled green box), which has been inserted into
a CpG island, indicated by an orange arrowhead. sRNA, small RNA.
Pei et al. Genome Biology 2012 13:R51 doi:10.1186/gb-2012-13-9-r51 |