|
Identification of exons on the genome |
|||||||||||
| Category |
Database |
Total records |
Percent placed (%) |
Total unique exons |
Exons in complete ORFs |
Exons in partial ORFs |
Exon length (bp) |
ORF length (bp) |
Putative genes (non-splicing singletons) |
Protein homology (Pfam hits) |
CpG islands |
|
|
|||||||||||
| Known |
UTR-DB |
40,258 |
80 |
19,195 |
5,075 |
1,895 |
6,925,762 |
1,990,818 |
10,007 (426) |
5,701 (3,813) |
3,866 |
| genes |
HTDB |
15,305 |
89 |
48,477 |
12,077 |
7,706 |
11,893,081 |
4,043,544 |
4,816 (148) |
2,938 (1,943) |
1,960 |
| Consensus |
HINT |
87,125 |
77 |
103,817 |
47,055 |
15,061 |
23,381,024 |
10,144,988 |
20,357 (959) |
9,121 (6,453) |
7,557 |
| transcripts |
EG |
62,064 |
80 |
13,085 |
5,389 |
1,904 |
4,562,954 |
1,873,723 |
4,800 (154) |
2,177 (1,679) |
2,462 |
| THC |
84,837 |
81 |
38,806 |
15,463 |
6,671 |
12,406,081 |
5,078,661 |
8,604 (322) |
2,907 (2,026) |
3,983 |
|
| Transcripts |
GenBank CDS |
110,222 |
81 |
41,917 |
31,626 |
1,452 |
5,303,064 |
4,299,272 |
2,634 (227) |
1,858 (1,607) |
1,178 |
| dbEST Human |
2,154,995 |
73 |
273,881 |
147,819 |
17,694 |
32,288,385 |
14,975,758 |
20,073 (7,136) |
5,377 (3,745) |
11,807 |
|
| Rodent |
MINT |
92,531 |
30 |
8,284 |
5,433 |
120 |
866,046 |
780,566 |
777 |
123 (56) |
486 |
| transcripts |
RINT |
37,367 |
46 |
5,600 |
3,588 |
75 |
592,788 |
546,932 |
458 |
65 (32) |
255 |
| EMBL |
43,488 |
28 |
5,819 |
4,108 |
59 |
724,630 |
655,993 |
202 |
68 (72) |
135 |
|
| Protein |
SWISS-PROT |
86,593 |
38 |
27,526 |
12,072 |
1,163 |
9,858,797 |
7,784,205 |
1,648 |
1,648 (1,244) |
158 |
| homology |
TrEMBL |
351,834 |
13 |
22,670 |
8,134 |
1,677 |
4,385,497 |
2,886,034 |
1,185 |
1,185 (654) |
92 |
| PIR |
182,106 |
16 |
4,106 |
1,175 |
383 |
1,355,644 |
764,339 |
321 |
321 (132) |
20 |
|
| Total |
613,183 |
299,014 |
55,860 |
114,543,753 |
55,824,833 |
75,982 (9,372) |
33,489 (23,008) |
33,959 |
|||
|
Exons were identified after vector screening using transcript, rodent, and protein databases. The definition of a record varies according to the database, while 'exons' refer to high-scoring segment pairs in BlastN comparisons (E < 10-15 and sequence identity >90%) to the genome. Unique exons and all subsequent columns refer to placements that were possible after considering the preceding databases. Placement of rodent transcripts required evidence of splicing and sequence identity >80%. ORFs were identified using getorf [84] using a minimum size of 30bp to report. Protein homology required BlastX E < 10-15. Pfam hits required score >20 using hmmpfam [92]. Gene prediction programs are described in Table 2. CpG islands were identified using cpgreport [84] using standard criteria [45]. | |||||||||||
Wright et al. Genome Biology 2001 2:research0025.1 doi:10.1186/gb-2001-2-7-research0025 |
|||||||||||