Table 1

Manually and automatically derived PP-clusters*

Procedure of PP-cluster definition

Number of PP-clusters

Total number of COGs in all clusters

COGs shared with manually derived PP-clusters

Average number of COGs in a cluster

Number of clusters absent in manually derived PP-clusters

Number of pure RSclusters

FPs


Manual annotation

223

890

N/A

4.1

N/A

-

-

Automated tree cutting at average branch length 0.2

89

1,774

315

19.9

38

20

0.19

Automated tree cutting at average branch length 0.3

89

3,960

395

44.5

26

12

0.16


*PP-clusters, clusters of COGs functionally linked on the basis of similar phyletic patterns. RS clusters: clusters containing only COGs annotated as 'poorly characterized' in COGs database, where R stands for 'general function prediction only' and S stands for 'function unknown'. The number of false positives (FPs) is the proportion of clusters that were not presented in manually derived PP-clusters.

Glazko and Mushegian Genome Biology 2004 5:R32  

Open Data