Table 2

Assessment, by keyword recovery, of the functional linkages established by the Operon method at various distance thresholds

Threshold (bp)
Functional links between SwissProt Annotated Proteins
Functional links with no keywords in common
Correct keywords recovered
Total keywords
Maximum false positive fraction*
Keyword recovery

0
308
78
446
883
0.25
0.51
25
642
180
856
1766
0.28
0.48
50
818
254
1044
2226
0.31
0.47
75
912
326
1080
2453
0.36
0.44
100
1044
362
1224
2726
0.35
0.45

*The maximum false positive fractions were calculated as the fraction of pairwise links that do not have any SWISS-PROT keywords in common (ignoring the keywords 'hypothetical protein', 'three-dimensional structure', 'transmembrane' and 'complete proteome'). Keyword recovery was calculated by comparing the SWISS-PROT keyword annotation between each pair of linked M. tuberculosis genes. The keyword recovery of all linkages was calculated as:

where X is the total number of query protein keywords, Y is the total number of linked gene pairs, x is the number of query protein SWISS-PROT keywords, and nj is the number of times the query protein keyword j occurs in the linked protein. Notice that at 0 bp the keyword recovery is quite high, about 50%, while the maximum false positive rate is about 25%. As the distance threshold increases from 0 bp to 100 bp the keyword recovery decreases, while the maximum false positive fraction increases.

Strong et al. Genome Biology 2003 4:R59   doi:10.1186/gb-2003-4-9-r59

Open Data