Table 6

Counts and mean percentage identity of amino acid sequence clusters for four representative samples

Cluster countsa

Mean percentage identityd


Sample

Total

Singleton

DNA+RNAb

DNA onlyc

RNA onlyc

DNA+RNA

DNA only


OMZ 50 m

213,683

180,311

1804

26,505

5063

77.0

85.2

OMZ 200 m

257,388

209,564

2712

40,401

4711

79.4

83.7

HOT 75 m

353,573

297,850

5681

44,163

5879

80.3

82.9

HOT 500 m

500,413

425,524

4677

66,151

4061

73.7

79.7

Soil

1,277,816

1,046,744

29,980

141,158

59,934

72.6

87.5


aCD-HIT clustering parameters: sequence identity 55% over local aligned region, with a length difference cutoff of 90%, and clustering to the most similar cluster (g = 1). bClusters containing both DNA- and RNA-derived sequences. cClusters containing only DNA- or RNA-derived sequences. dMean amino acid identity (relative to the cluster reference sequence) across all DNA sequences per cluster. BATS, Bermuda Atlantic Time Series; HOT, Hawaii Ocean Time Series; OMZ, oxygen minimum zone.

Stewart et al. Genome Biology 2011 12:R26   doi:10.1186/gb-2011-12-3-r26

Open Data