Table 1

Comparison of Release 2 and 3 genome statistics

Description*

Release 2 (% of total)

Release 3 euchromatin(% of total)


Total protein-coding genes

13,474

13,379

Total length of euchromatin

116.2 Mb

116.8 Mb

Exons

54,793

60,897

Protein-coding exons

50,667

54,934

Length of genome in exons

23.3 Mb (20%)

27.8 Mb (24%)

Introns

41,381

48,257

Genes with 5' UTR

7,680 (57%)

10,227 (76%)

Transcripts with 5' UTR

8,499 (59%)

14,707 (81%)

Average 5' UTR length

204 nucleotides

265 nucleotides

Genes with 3' UTR

4,824 (36%)

9,646 (72%)

Transcripts with 3' UTR

5,381 (38%)

14,012 (77%)

Average 3' UTR length

370 nucleotides

442 nucleotides

Average ratio of length of CDS/transcript§

0.86

0.75

Total protein-coding transcripts

14,335

18,106

Genes with alternative transcripts

689 (5%)

2,729 (20%)

Average number of transcripts per alternatively spliced gene

2.25

2.75

Total number alternative transcripts

861

4,743

Number of introns contained in 5'UTRs

2,977

6,787

Number of introns contained in 3' UTRs

1,004

1,088

Unique peptides

13,922

15,848

Unique peptides unchanged from R2 to R3

8,769 (63%)

8,769 (55%)

Genes deleted from R2 to R3

345

NA

New protein-coding genes in R3

NA

802


*Abbreviations: UTR, untranslated region; CDS, (protein)-coding sequence; R2, Release 2; R3, Release 3; NA, not applicable. All statistics are for protein-coding genes only. Based on the annotation of protein-coding genes in the euchromatin (long chromosome arms); another 297 protein-coding genes are annotated in the heterochromatin (non-redundant WGS3 [25]). In this and Tables 2,3,4, the numbers are based on a version of the annotation database frozen on November 25, 2002. Any exon containing CDS, even if the majority of the exon is UTR. §The length of the coding region divided by the length of the entire protein-coding transcript, averaged over all protein-coding transcripts. Determined because many alternative transcripts encoded the identical CDS and differed only in the UTR.

Misra et al. Genome Biology 2002 3:research0083.1-0083.22   doi:10.1186/gb-2002-3-12-research0083

Open Data