Table 1

Statistics of the 25 selected tracks, arranged in the order of the UCSC genome browser

UCSC track
Model with introns
Model with introns and CDS
Single exon model (some clipped)
Unique introns in mRNA
All introns in mRNA
Input or method

HAVANA Gencode (Sanger, UK) known + putative
1,691
649
70
3,618
9,693
MEP,CA,H
EGASP model submissions
     AceView (NCBI, US)
1,630
1,460
24
3,530
9,597
ME,(H)
     UP Dogfish (Sanger, UK)
204
204
15
1,679
1,679
CA
     Exogean (ENS, France)
554
538
2
2,855
6,178
MEP,CA
     UP ExonHunter (U Waterloo, Canada)
807
807
220
3,237
3,237
MEP,CA
     Fgenesh (U London, UK)
462
458
97
2,610
3,241
P,CA
     UP GeneId (IMIM, Spain)
267
267
51
1,905
1,905
A
     UP GeneMark (Georgia IT, US)
551
551
81
2,185
2,185
A
     UP Jigsaw (TIGR, US)
259
259
67
2,168
2,168
MEP,CA
     PairagonAny (Wash U, US)
471
437
38
2,300
3,470
MEP?,CA
     UP SGP2 (IMIM, Spain)
552
552
159
2,645
2,645
P,CA
     P Twinscan-MARS (Wash U,US)
547
547
108
2,501
4,943
CA
     UP Augustus Any (U Göttingen, Germany)
312
316
87
2,291
2,291
MEP,CA
     UP GeneZilla (TIGR, US)
477
477
179
2,758
2,758
A
     UP Saga (UC Berkeley, US)
331
331
47
1,737
1,737
CA
UCSC gene tracks
     *Known Gene (UCSC)
501
477
53
2,264
4,427
MP
     *P CCDS
201
201
14
1,296
1,508
MP,H
     *RefSeq (NCBI, US)
342
325
41
2,082
2,922
M(E)P,H
     *MGC
323
310
19
1,400
2,101
M
     *Ensembl (EBI, UK)
427
418
58
2,429
3,548
MEP,CA
     *AceView (Aug 2005 NCBI)
1,792
1,627
902
3,812
9,792
ME, (H)
     *ECgene (Korea)
3,851
3,551
2,569
3,942
30,660
ME,C
     *U NscanEst (Wash U, US)
282
252
27
2,292
2,292
ME,CA
     *UP GenScan (MIT, US)
395
395
59
3,042
3,042
A

The number of models, with or without introns (after clipping at region boundaries), the number of spliced coding models, and the number of unique and multiply used introns are given over the 31 ENCODE test regions. Coded information has been added in front of the track name: asterisks distinguish standard gene tracks, available genome-wide, from an ENCODE only track; a U track predicts a unique model per gene; P predicts protein coding regions only. According to their documentation, the programs use different input or methods: M, E, P stand for human mRNA, EST, protein sequences or alignments, respectively; C stands for for conservation, or use of cDNA or protein evidence from other species; A stands for ab initio prediction; H stands for Hand curation; and parenthesized letters stand for minimal use of the particular type. Notice the low proportion of Gencode mRNA models with an annotated CDS (in bold).

Thierry-Mieg and Thierry-Mieg Genome Biology 2006 7(Suppl 1):S12   doi:10.1186/gb-2006-7-s1-s12

Open Data