|
Comparison of Release 2 and 3 genome statistics |
||
| Description* |
Release 2 (% of total) |
Release 3 euchromatin† (% of total) |
|
|
||
| Total protein-coding genes |
13,474 |
13,379 |
| Total length of euchromatin |
116.2 Mb |
116.8 Mb |
| Exons |
54,793 |
60,897 |
| Protein-coding exons‡ |
50,667 |
54,934 |
| Length of genome in exons |
23.3 Mb (20%) |
27.8 Mb (24%) |
| Introns |
41,381 |
48,257 |
| Genes with 5' UTR |
7,680 (57%) |
10,227 (76%) |
| Transcripts with 5' UTR |
8,499 (59%) |
14,707 (81%) |
| Average 5' UTR length |
204 nucleotides |
265 nucleotides |
| Genes with 3' UTR |
4,824 (36%) |
9,646 (72%) |
| Transcripts with 3' UTR |
5,381 (38%) |
14,012 (77%) |
| Average 3' UTR length |
370 nucleotides |
442 nucleotides |
| Average ratio of length of CDS/transcript§ |
0.86 |
0.75 |
| Total protein-coding transcripts |
14,335 |
18,106 |
| Genes with alternative transcripts |
689 (5%) |
2,729 (20%) |
| Average number of transcripts per alternatively spliced gene |
2.25 |
2.75 |
| Total number alternative transcripts |
861 |
4,743 |
| Number of introns contained in 5'UTRs |
2,977 |
6,787 |
| Number of introns contained in 3' UTRs |
1,004 |
1,088 |
| Unique peptides¶ |
13,922 |
15,848 |
| Unique peptides unchanged from R2 to R3 |
8,769 (63%) |
8,769 (55%) |
| Genes deleted from R2 to R3 |
345 |
NA |
| New protein-coding genes in R3 |
NA |
802 |
|
*Abbreviations: UTR, untranslated region; CDS, (protein)-coding sequence; R2, Release 2; R3, Release 3; NA, not applicable. All statistics are for protein-coding genes only. †Based on the annotation of protein-coding genes in the euchromatin (long chromosome arms); another 297 protein-coding genes are annotated in the heterochromatin (non-redundant WGS3 [25]). In this and Tables 2,3,4, the numbers are based on a version of the annotation database frozen on November 25, 2002. ‡Any exon containing CDS, even if the majority of the exon is UTR. §The length of the coding region divided by the length of the entire protein-coding transcript, averaged over all protein-coding transcripts. ¶Determined because many alternative transcripts encoded the identical CDS and differed only in the UTR. | ||
Misra et al. Genome Biology 2002 3:research0083.1 doi:10.1186/gb-2002-3-12-research0083 |
||