Table 2 |
|||||
|
Sequence and gene family discovery rates for various complete and partial genome datasets |
|||||
| Sequence rate (%)† |
Family rate (%)† |
||||
|
|
|
||||
| Dataset* |
No. of complete/partial genomes |
OSDR |
CSDR |
OGDR |
CGDR |
|
|
|||||
| CG Archaea |
19 |
37.8 |
- |
38.7 |
- |
| CG Bacteria |
161 |
19.5 |
11.8 (± 1.5) |
22.4 |
15.4 (± 1.8) |
| CG Bacteria strains filtered |
127 |
28.4 |
15.9 (± 1.5) |
26.6 |
20.6 (± 1.7) |
| CG Bacteria |
127 |
13.4 (± 1.7) |
17.0 (± 2.0) |
||
| CG Bacteria species filtered |
86 |
23.2 |
20.9 (± 1.6) |
31.5 |
26.1 (± 1.6) |
| CG Bacteria |
86 |
16.3 (± 1.8) |
19.9 (± 2.1) |
||
| CG Eukarya |
19 |
39.0 |
- |
30.8 |
- |
| PG All |
193 |
53.7 |
40.3 (± 2.9) |
47.7 |
42.8 (± 2.8) |
| PG Arthropods |
16 |
74.7 |
- |
66.4 |
- |
| PG Deuterostomes |
21 |
71.7 |
- |
60.8 |
- |
| PG Fungi |
27 |
70.2 |
- |
60.2 |
- |
| PG Nematodes |
31 |
62.8 |
- |
47.0 |
- |
| PG Protists |
17 |
88.1 |
- |
71.5 |
- |
| PG Viridiplantae |
76 |
48.3 |
- |
37.8 |
- |
| CG Bacteria sequences > 100 residues |
161 |
- |
8.6 (± 1.4) |
- |
- |
| PG Sequences > 300 bp |
193 |
- |
35.6 (± 2.8) |
- |
- |
|
|
|||||
|
*CG, complete genome datasets; PG, partial genome datasets; 'strains filtered' indicate that only a single species representative was included in the analysis; 'species filtered' indicate that only a single genus representative was included in the analysis. †OSDR, overall sequence discovery rate (the total number of distinct sequences/total number of sequences); CSDR, current sequence discovery rate (obtained from Figure 1d, e); OGDR, overall gene family discovery rate (total number of families/total number of sequences); CGDR, current gene family discovery rate (obtained from Figure 1d, e). |
|||||
|
Peregrín-Álvarez and Parkinson Genome Biology 2007 8:R238 doi:10.1186/gb-2007-8-11-r238 |
|||||