Figure 2.

Size distributions of different pools of smORFs. The size distribution of different pools of smORFs is represented according to their length in codons. Medians are indicated. (a) 43,197 smORFs with tBLASTn hits with E-value < 1 × 10-3 representing putative smORFs with some kind of sequence conservation in D. pseudoobscura. Mean size = 44 codons, standard deviation = 12. (b) 4,561 putative smORFs with conservation of sequence and start and stop codons in D. pseudoobscura, representing our upper estimate for the number of smORFs in Drosophila. Mean size = 25 codons, standard deviation = 12. (c) 1,075 smORFs with syntenic conservation, and start and stop codons in D. pseudoobscura, and with a Ka/Ks (ratio of non-synonymous (Ka) to synonymous (Ks) nucleotide substitution) score < 0.1. Mean size = 19 codons, standard deviation = 8. (d) 401 smORFs with conservation of sequence, and start and stop codons in D. pseudoobscura, with a Ka/Ks score < 0.1, and also present in transcribed regions, representing our conservative estimate. Mean size = 21 codons, standard deviation = 12. For a statistical analysis of the differences between these distributions, see Additional file 1.

Ladoukakis et al. Genome Biology 2011 12:R118   doi:10.1186/gb-2011-12-11-r118
