Distribution of predicted peptide lengths in Release 2 and 3. (a) Comparison of protein lengths less than 2,000 amino acids shows that overall, Release 3 proteins of all lengths (blue) are more numerous than those in Release 2 (black). One exception is those proteins shorter than 100 amino acids: because of stricter data requirements for Release 3 annotations, some small Release 2 annotations were not preserved (inset). (b) Comparison of Release 2 (black) and 3 (light blue) protein lengths with predictions by GENSCAN (purple) and Genie (dark blue). Also shown are the lengths of proteins that were deleted (orange) or added (green) in Release 3. Of note is the underprediction of genes expressing small proteins by the program GENSCAN (purple).
Misra et al. Genome Biology 2002 3:research0083.1-0083.22 doi:10.1186/gb-2002-3-12-research0083