Database-independent cluster statistics. (a) Size and (b) percentage identity of clusters containing amino acid sequences present only in DNA datasets or in both DNA + RNA datasets from five representative samples. Cluster sizes are based on counts of only the DNA-derived sequences within each cluster type. Numbers in legends indicate mean cluster size (a) and mean amino acid identity (b). Amino acid sequences were clustered above a threshold identity of 55%.
Stewart et al. Genome Biology 2011 12:R26 doi:10.1186/gb-2011-12-3-r26