Table 6 |
|||||||
|
Counts and mean percentage identity of amino acid sequence clusters for four representative samples |
|||||||
|
Cluster countsa |
Mean percentage identityd |
||||||
|
|
|||||||
|
Sample |
Total |
Singleton |
DNA+RNAb |
DNA onlyc |
RNA onlyc |
DNA+RNA |
DNA only |
|
|
|||||||
|
OMZ 50 m |
213,683 |
180,311 |
1804 |
26,505 |
5063 |
77.0 |
85.2 |
|
OMZ 200 m |
257,388 |
209,564 |
2712 |
40,401 |
4711 |
79.4 |
83.7 |
|
HOT 75 m |
353,573 |
297,850 |
5681 |
44,163 |
5879 |
80.3 |
82.9 |
|
HOT 500 m |
500,413 |
425,524 |
4677 |
66,151 |
4061 |
73.7 |
79.7 |
|
Soil |
1,277,816 |
1,046,744 |
29,980 |
141,158 |
59,934 |
72.6 |
87.5 |
|
|
|||||||
|
aCD-HIT clustering parameters: sequence identity 55% over local aligned region, with a length difference cutoff of 90%, and clustering to the most similar cluster (g = 1). bClusters containing both DNA- and RNA-derived sequences. cClusters containing only DNA- or RNA-derived sequences. dMean amino acid identity (relative to the cluster reference sequence) across all DNA sequences per cluster. BATS, Bermuda Atlantic Time Series; HOT, Hawaii Ocean Time Series; OMZ, oxygen minimum zone. |
|||||||
|
Stewart et al. Genome Biology 2011 12:R26 doi:10.1186/gb-2011-12-3-r26 |
|||||||