Table 1

Predictor performance

GB

- IS

+ IS

Manual


A. dehalogenans 2CPC (NC_007760)

Total IS ORF

1

4

4

2

Complete ORF

-

0

0

0

Partial ORF

-

1

1

1

Pseudogene

1

2

2

1

Unknown ORF

-

1

1

0

Total IS

-

4

4

2

Different IS

-

4

4

2

Anaeromyxobacter sp. Fw109 5 (NC_009675)

Total IS ORF

15

22

24

19

Complete ORF

-

4

12

12

Partial ORF

-

1

2

6

Pseudogene

1

4

4

1

Unknown ORF

-

13

6

0

Total IS

-

20

21

16

Different IS

-

16

17

12

Anaeromyxobacter sp. K (NC_011145)

Total IS ORF

14

25

28

27

Complete ORF

-

12

26

26

Partial ORF

-

2

0

0

Pseudogene

-

1

1

1

Unknown ORF

-

10

1

0

Total IS

-

19

19

18

Different IS

-

10

10

9

A. dehalogenans 2CP1 (NC_011891)

Total IS ORF

15

33

35

35

Complete ORF

-

18

24

27

Partial ORF

-

4

2

3

Pseudogene

-

8

8

5

Unknown ORF

-

3

1

0

Total IS

-

25

25

23

Different IS

-

12

12

14

A. aeolicus VF5 (NC_000918)

Total IS ORF

-

7

7

3

Complete ORF

-

0

2

2

Partial ORF

-

1

1

1

Pseudogene

-

0

0

0

Unknown ORF

-

6

4

0

Total IS

-

7

7

3

Different IS

-

6

6

2

C. thermocellum 27405 (NC_009012)

Total IS ORF

75

143

144

160

Complete ORF

-

81

123

125

Partial ORF

-

43

11

27

Pseudogene

-

7

7

8

Unknown ORF

-

12

3

0

Total IS

-

115

115

119

Different IS

-

27

27

26

S. maltophilia R5513 (NC_011071)

Total IS ORF

11

21

22

20

Complete ORF

-

13

19

19

Partial ORF

-

7

1

1

Pseudogene

-

1

1

0

Unknown ORF

-

0

1

0

Total IS

-

18

19

16

Different IS

-

6

7

4

S. maltophilia K279a (NC_010943)

Total IS ORF

49

53

54

57

Complete ORF

-

18

45

47

Partial ORF

-

27

5

9

Pseudogene

-

3

3

1

Unknown ORF

3

5

1

0

Total IS

-

38

39

36

Different IS

-

18

19

18


The table shows a comparison of IS annotations of eight bacterial genomes contained in the corresponding GenBank files (GB) with those obtained by manual annotation (Manual) and using the ISsaga predictor with two different IS reference databases. In one database (-IS) the reference ISs contained in the genome under test were removed while in the other these ISs were included (+IS). The total number of IS-associated ORFs (Total IS ORF) are divided into four categories: Complete ORFs, Partial ORFs, Pseudogenes and Unknown. The category 'Unknown' includes all examples that cannot be distinguished by the predictor as complete or partial due to the absence of sufficient numbers of closely related examples in the reference database. The categories 'Total IS' and 'Different IS' are based on nucleotide predictions. In these predictions the number of ORFs carried by the IS are taken into account. For example, if an IS includes two ORFs, this will be counted as two examples in 'Complete ORF' but as a single IS in 'Total IS'.

Varani et al. Genome Biology 2011 12:R30   doi:10.1186/gb-2011-12-3-r30

Open Data