Figure 5.

The expected number of total gene clusters and core gene clusters identified at the addition of each genome to the clustering dataset. Modeling predictions are based on the eight strain training set (see 'Mathematical development of a finite supragenome model'). The number of genes observed in all strains levels off to an asymptote that corresponds to a core set of genes. The rate of increase in total genes decreases, but does not level off due to the discovery of rare genes.

Hogg et al. Genome Biology 2007 8:R103   doi:10.1186/gb-2007-8-6-r103
Download authors' original image