|
cDNA analysis |
|||
| DGCr1 |
DGCr2 |
Total |
|
|
|
|||
| Clones that encode complete ORFs |
|||
| ORFs identical to the Release 3 predicted proteins* |
3,429 |
1,946 |
5,375 |
| ORFs with 1-2% differences to Release 3 proteins† |
235 |
306 |
541 |
| Total |
3,664 |
2,252 |
5,916 |
| Clones known to be compromised‡ |
|||
| Nucleotide discrepancies |
485 |
829 |
1314 |
| 5' short |
618 |
150 |
768 |
| 3' truncated |
57 |
26 |
83 |
| Co-ligated inserts |
23 |
54 |
77 |
| ORFs with less than 50 amino acids |
49 |
21 |
70 |
| Antisense transcripts |
53 |
58 |
111 |
| Transposable elements |
12 |
9 |
21 |
| Bacterial contaminants |
2 |
4 |
6 |
| Total |
1,299 |
1,151 |
2,450 |
| Clones that may represent alternative transcripts§ |
|||
| 5' short with upstream in-frame stop codon |
32 |
4 |
36 |
| 3' truncated with downstream in-frame stop codon |
55 |
17 |
72 |
| Putative missed micro-exon in Release 3 annotation |
23 |
7 |
30 |
| Total |
110 |
28 |
138 |
| Unclassified clones¶ |
257 |
160 |
417 |
|
Summary of analysis of the 8,770 clones in GenBank plus 151 clones for which we do not have accession numbers yet. *The ORF predicted from the cDNA sequence is identical to the corresponding Release 3 predicted protein; 4,620 of these clones are from the LD, GH, HL, LP, RE or RH cDNA libraries, which were made from the same strain that was sequenced. Thus, we required their ORFs to be identical to those of the predicted Release 3 proteins. An additional 755 clones with ORFs identical to Release 3 proteins are from the AT, GM or SD libraries. †The ORF predicted from the cDNA sequence is the same length as the Release 3 predicted protein with less than 2% amino-acid difference. These clones are derived from the AT, GM or SD cDNA libraries, which were made from strains or cell lines that are not isogenic with the strain that was sequenced. ‡See text for explanation of the individual subclasses of compromised clones. §These clones have structures that are inconsistent with the corresponding Release 3 predicted gene. The 5'-short and 3'-truncated clones may reflect alternative splice products or promoters, or perhaps more likely, incompletely processed primary transcripts with retained introns. Additional experimental work will be required to distinguish these possibilities. Those clones referred to as putative missed micro-exons in Release 3 annotations are cases in which the cDNA clone contains additional nucleotides that are a multiple of 3, relative to the Release 3 predicted mRNA, and maintains the ORF. We expect that most of these discrepancies result from a failure of Sim4 to align micro-exons and that these cases will be resolved by modifying the Release 3 gene model; see [15] for more discussion. ¶The predicted ORF from the cDNA clone does not match a Release 3 predicted protein, but the underlying cause could not be classified into one of the above categories. We expect that very few of these clones accurately reflect actual gene transcripts. | |||
Stapleton et al. Genome Biology 2002 3:research0080.1 doi:10.1186/gb-2002-3-12-research0080 |
|||