Table 2

Benchmarking the performance of ngLOC (7-gram) against its confidence score

Confidence score


0

10

20

30

40

50

60

70

80

90


% of dataset

0.0

2.4

11.8

6.1

4.4

4.5

5.8

9.3

18.1

37.5

% overall accuracy

0.0

56.2

41.4

70.1

88.3

93.0

97.0

98.1

99.2

99.8

Cumulative % of data:

100.0

100.0

97.6

85.7

79.6

75.2

70.7

64.9

55.6

37.5

Cumulative % overall accuracy

88.8

88.8

89.6

96.2

98.3

98.8

99.2

99.4

99.6

99.8


This table shows how the confidence score associated with each prediction relates to the overall accuracy. The higher the score, the more likely the prediction is to be the correct one. For example, all sequences scoring 90 or better had an accuracy of 99.8%. About 80% of the dataset was scored 40 or higher with a cumulative accuracy of 98.3%.

King and Guda Genome Biology 2007 8:R68   doi:10.1186/gb-2007-8-5-r68

Open Data