|
Identification of exons on the genome after vector screening using transcript, rodent, and protein databases. |
||||||||
| Category |
Database |
Total Records |
Percent Placed |
Unique Exons |
Exon Length (bp) |
Putative genes (Non-Splicing Singletons) |
Protein Homology (Pfam Hit) |
CpG Isplands |
|
|
||||||||
| Known genes |
UTR-DB |
40,258 |
80% |
19,195 |
6,925,762 |
10,007 (426) |
5,701 (3,813) |
3,866 |
| HTDB |
15,305 |
89% |
48,477 |
11,893,081 |
4,816 (148) |
2,938 (1,943) |
1,960 |
|
| Consensus Transcripts |
HINT |
87,000 |
77% |
103,817 |
23,381,024 |
20,357 (959) |
9,121 (6,453) |
7,557 |
| EG |
62,064 |
80% |
13,085 |
4,562,954 |
4,800 (154) |
2,177 (1,679) |
2,462 |
|
| THC |
84,837 |
81% |
38,806 |
12,406,081 |
8,604 (322) |
2,907 (2,026) |
3,983 |
|
| Transcripts |
GenBank CDS |
110,222 |
81% |
41,917 |
5,303,064 |
2,634 (227) |
1,858 (1,607) |
1,178 |
| DbEST Human |
2,154,995 |
73% |
273,881 |
32,288,385 |
20,073 (7,136) |
5,377 (3,745) |
11,807 |
|
| Rodent Transcripts |
MINT |
92,531 |
30% |
8,284 |
866,046 |
777 |
123 (56) |
486 |
| RINT |
37,367 |
46% |
5,600 |
592,788 |
458 |
65 (32) |
255 |
|
| EMBL Rodent |
43,488 |
28% |
5,819 |
724,630 |
202 |
68 (72) |
135 |
|
| Protein Homology |
SWISS-PROT |
86,593 |
38% |
27,526 |
9,858,797 |
1,648 |
1,648 (1,244) |
158 |
| TrEMBL |
351,834 |
13% |
22,670 |
4,385,497 |
1,185 |
1,185 (654) |
92 |
|
| PIR |
182,106 |
16% |
4,106 |
1,355,644 |
321 |
321 (132) |
20 |
|
| Total |
613,183 |
114,543,753 |
75,982 (9,372) |
33,489 (23,008) |
33,959 |
|||
|
The definition of a record varies according to the database, while 'exons' refer to high-scoring segment pairs in BlastN comparisons (E < 10-15 and sequence identity > 90%) to the genome. Unique Exons and all subsequent columns refer to placements that were possible after considering the preceding databases. Placement of rodent transcripts required evidence of splicing and sequence identity >80%. Protein homology required BlastX E < 10-15. Pfam hits required score > 20 using hmmpfam (http://hmmer.wustl.edu). CpG islands were identified using cpgreport (http://www.emboss.org) using standard criteria [24]. | ||||||||
Wright et al. Genome Biology 2001 2:preprint0001.1 doi:10.1186/gb-2001-2-3-preprint0001 |
||||||||