Table 1

Identification of exons on the genome after vector screening using transcript, rodent, and protein databases.

Category
Database
Total Records
Percent Placed
Unique Exons
Exon Length (bp)
Putative genes (Non-Splicing Singletons)
Protein Homology (Pfam Hit)
CpG Isplands

Known genes
UTR-DB
40,258
80%
19,195
6,925,762
10,007 (426)
5,701 (3,813)
3,866
HTDB
15,305
89%
48,477
11,893,081
4,816 (148)
2,938 (1,943)
1,960
Consensus Transcripts
HINT
87,000
77%
103,817
23,381,024
20,357 (959)
9,121 (6,453)
7,557
EG
62,064
80%
13,085
4,562,954
4,800 (154)
2,177 (1,679)
2,462
THC
84,837
81%
38,806
12,406,081
8,604 (322)
2,907 (2,026)
3,983
Transcripts
GenBank CDS
110,222
81%
41,917
5,303,064
2,634 (227)
1,858 (1,607)
1,178
DbEST Human
2,154,995
73%
273,881
32,288,385
20,073 (7,136)
5,377 (3,745)
11,807
Rodent Transcripts
MINT
92,531
30%
8,284
866,046
777
123 (56)
486
RINT
37,367
46%
5,600
592,788
458
65 (32)
255
EMBL Rodent
43,488
28%
5,819
724,630
202
68 (72)
135
Protein Homology
SWISS-PROT
86,593
38%
27,526
9,858,797
1,648
1,648 (1,244)
158
TrEMBL
351,834
13%
22,670
4,385,497
1,185
1,185 (654)
92
PIR
182,106
16%
4,106
1,355,644
321
321 (132)
20
Total



613,183
114,543,753
75,982 (9,372)
33,489 (23,008)
33,959

The definition of a record varies according to the database, while 'exons' refer to high-scoring segment pairs in BlastN comparisons (E < 10-15 and sequence identity > 90%) to the genome. Unique Exons and all subsequent columns refer to placements that were possible after considering the preceding databases. Placement of rodent transcripts required evidence of splicing and sequence identity >80%. Protein homology required BlastX E < 10-15. Pfam hits required score > 20 using hmmpfam (http://hmmer.wustl.edu). CpG islands were identified using cpgreport (http://www.emboss.org) using standard criteria [24].

Wright et al. Genome Biology 2001 2:preprint0001.1   doi:10.1186/gb-2001-2-3-preprint0001