Table 2

Sequence and gene family discovery rates for various complete and partial genome datasets



Sequence rate (%)
Family rate (%)




Dataset*
No. of complete/partial genomes
OSDR
CSDR
OGDR
CGDR

CG Archaea
19
37.8
-
38.7
-
CG Bacteria
161
19.5
11.8 (± 1.5)
22.4
15.4 (± 1.8)
CG Bacteria strains filtered
127
28.4
15.9 (± 1.5)
26.6
20.6 (± 1.7)
CG Bacteria
127

13.4 (± 1.7)

17.0 (± 2.0)
CG Bacteria species filtered
86
23.2
20.9 (± 1.6)
31.5
26.1 (± 1.6)
CG Bacteria
86

16.3 (± 1.8)

19.9 (± 2.1)
CG Eukarya
19
39.0
-
30.8
-
PG All
193
53.7
40.3 (± 2.9)
47.7
42.8 (± 2.8)
PG Arthropods
16
74.7
-
66.4
-
PG Deuterostomes
21
71.7
-
60.8
-
PG Fungi
27
70.2
-
60.2
-
PG Nematodes
31
62.8
-
47.0
-
PG Protists
17
88.1
-
71.5
-
PG Viridiplantae
76
48.3
-
37.8
-
CG Bacteria sequences > 100 residues
161
-
8.6 (± 1.4)
-
-
PG Sequences > 300 bp
193
-
35.6 (± 2.8)
-
-

*CG, complete genome datasets; PG, partial genome datasets; 'strains filtered' indicate that only a single species representative was included in the analysis; 'species filtered' indicate that only a single genus representative was included in the analysis. OSDR, overall sequence discovery rate (the total number of distinct sequences/total number of sequences); CSDR, current sequence discovery rate (obtained from Figure 1d, e); OGDR, overall gene family discovery rate (total number of families/total number of sequences); CGDR, current gene family discovery rate (obtained from Figure 1d, e).

Peregrín-Álvarez and Parkinson Genome Biology 2007 8:R238   doi:10.1186/gb-2007-8-11-r238

Open Data