Table 2

Presence of genes in gene clusters of all available finished and unfinished genome sequences





Presence and names of genes in each species





Gene family
Description
Protein size (in M. tb)
ESAT-6 cluster region
M. tuberculosis H37Rv
M. tuberculosis CDC1551 (CSU#93)
M. tuberculosis* 210
M. bovis* AF2122/97 (spoligotype 9)
M. bovis* BCG Pasteur 1173P2

A
ABC transporter family signature, 19-27% homology
283
1
Rv3866
MT3980
ND
MB851A
No sequence data


276
2
Rv3889c
MT4004
MTB12A
MB727.3A (partly deleted #)
No sequence data


295
3
Rv0289
MT0302
MTB203A
MB548A
No sequence data


-
4
No duplication
No duplication
No duplication
No duplication
No duplication


300
5
Rv1794
MT1843
MTB196A
MB557A
No sequence data
B
AAA+ class ATPases, CBXX/CFQX family, SpoVK, 1× ATP/GTP-binding site, 29-39% homology
573
1
Rv3868
MT3981
MTB44B
MB851B
No sequence data


619
2
Rv3884c
MT3999
MTB12B
MB727.1B
No sequence data


631
3
Rv0282
MT0295
MTB23B
MB672B
No sequence data


-
4
No duplication
No duplication
No duplication
No duplication
No duplication


610
5
Rv1798
MT1847
MTB196B
MB542B
No sequence data
C
Amino-terminal transmembrane protein, possible ATP/GTP-binding motif, 31-41% homology
480
1
Rv3869
MT3982
MTB44C
MB851C
No sequence data


495
2
Rv3895c
MT4011
MTB136C
MB780.1C
No sequence data


538
3
Rv0283
MT0296
MTB23C
MB672C
No sequence data


470
4
Rv3450c
MT3556
MTB45C
MB493.1C
No sequence data


506
5
Rv1782
MT1832
MTB46C
MB771.1C
No sequence data
D
DNA segregation ATPase, ftsK chromosome partitioning protein, SpoIIIE, yukA, 3× ATP/GTP-binding sites, 2× amino-terminal transmembrane protein, 28-39% homology
747 + 591
1
Rv3870+71
MT3983+85
MTB44Da+Db
MB851D
MB851D (partly deleted)


1396
2
Rv3894c
MT4010
MTB3D
MB780.1D
No sequence data


1330
3
Rv0284
MT0297
MTB23D
MB672D
No sequence data


1236
4
Rv3447c
MT3553
MTB45D
MB585.1D
No sequence data


435 + 932
5
Rv1783+84
MT1833
MTB46Da+Db
MB771.1D
No sequence data










E
PE, 18-90% homology
99
1
Rv3872
MT3986
MTB44E
MB851E
Deleted


77
2
Rv3893c
MT4008
MTB3E
MB780.1E
No sequence data


102
3
Rv0285
MT0298
MTB23E
MB389E
No sequence data


-
4
No duplication
No duplication
No duplication
No duplication
No duplication


99 & 99
5
Rv1788 & 91
MT1837 & 40
MTB196Ea & Eb
MB771.0E & MB557E
No sequence data
F
PPE, 19-88% homology
368
1
Rv3873
MT3987
MTB44F
MB851F
Deleted


399
2
Rv3892c
MT4007
MTB3F
MB780.1F
No sequence data


513
3
Rv0286
MT0299
MTB472F
MB528F
No sequence data


-
4
No duplication
No duplication
No duplication
No duplication
No duplication


365, 393 & 350

5
Rv1787 & 89 & 90
MT1836 & 38 & 39
MTB196Fa & Fb & Fc
MB771.0Fa & Fb & MB557F
No sequence data
G
lhp or CFP-10, also MTSA-10, grouped into ESAT-6 family, potent secreted T-cell antigens, 9-32% homology
100
1
Rv3874
MT3988
MTB44G
MB851G
Deleted


107
2
Rv3891c
MT4006
MTB12G
MB727.3G
No sequence data


97
3
Rv0287
MT0300
MTB472G
MB548G
No sequence data


125
4
Rv3445c
MT3550
MTB45G
MB585.0G
No sequence data


98
5
Rv1792 (Stop)
MT1841 (Stop)
MTB196G (Stop)
MB557G
No sequence data
H
ESAT-6 family, cfp7, L45 or l-esat, also Mtb9.9 family, potent secreted T-cell antigens, 15-27% homology
95
1
Rv3875
MT3989
MTB44H
MB851H
Deleted


95
2
Rv3890c
MT4005
MTB12H
MB727.3H
No sequence data


96
3
Rv0288
MT0301
MTB203H
MB548H
No sequence data


100
4
Rv3444c
MT3549
MTB45H
MB585.0H
No sequence data


94
5
Rv1793
MT1842
MTB196H
MB557H
No sequence data
I
ATPases involved in chromosome partitioning, 1× ATP/GTP-binding motif, -33% homology-
666
1
Rv3876
MT3990
MTB60I
MB477I
Deleted


341
2
Rv3888c
MT4003
MTB12I
Deleted #
No sequence data


-
3
No duplication
No duplication
No duplication
No duplication
No duplication


-
4
No duplication
No duplication
No duplication
No duplication
No duplication


-
5
No duplication
No duplication
No duplication
No duplication
No duplication
J
Integral inner membrane protein, binding-protein-dependent transport systems inner membrane component signature, putative transporter protein, 19-27% homology
511
1
Rv3877
MT3991
MTB369J
MB477J
Deleted


509
2
Rv3887c
MT4002
MTB12J
MB727.3J (partly deleted #)
No sequence data


472
3
Rv0290
MT0303
MTB203J
MB548J
No sequence data


467
4
Rv3448
MT3554
MTB45J
MB585.1J
No sequence data


503
5
Rv1795
MT1844
MTB196J
MB506J
No sequence data
K
Mycosins, subtilisin-like cell-wall associated serine proteases, 43-49% homology
446
1
Rv3883c
MT3998
MTB12Ka
MB727.0K
No sequence data


550
2
Rv3886c
MT4001(Frame)
MTB12Kb
MB727.2K
No sequence data


461
3
Rv0291
MT0304
MTB203K
MB548K
No sequence data


455
4
Rv3449
MT3555
MTB45K
MB585.1K
No sequence data


585
5
Rv1796
MT1845
MTB196K
MB506K
No sequence data
L
2× amino-terminal transmembrane protein, 16-27% homology
462
1
Rv3882c
MT3997
MTB12La
MB727.0L
No sequence data


537
2
Rv3885c
MT4000 (Frame)
MTB12Lb
MB727.2L
No sequence data


331
3
Rv0292
MT0305
MTB203L
MB694.0L
No sequence data


-
4
No duplication
No duplication
No duplication
No duplication
No duplication


406
5
Rv1797
MT1846
MTB196L
MB542L
No sequence data




Presence and names of genes in each species





Gene family
Description
Protein size (in M. tb)
ESAT-6 cluster region
M. leprae TN
M. avium* 104
M. paratuberculosis K 10
M. smegmatis* MC2 155
C. diphtheriae* NCTC13129
S. coelicolor A3 (2)

A
ABC transporter family signature, 19-27% homology
283
1
ML0057(pseudo)
ND
ND
MS29A
ND
ND


276
2
MLabc (pseudo)
MA138A
MP3889c
ND
ND
ND


295
3
ML2530
MA141A
MP0289
MS32A
ND
ND


-
4
No duplication
No duplication
No duplication
No duplication
No duplication
No duplication


300
5
ML1540
MA310A
MP1794
ND
ND
ND
B
AAA+ class ATPases, CBXX/CFQX family, SpoVK, 1x ATP/GTP binding site, 29-39% homology
573
1
ML0055
ND
ND
MS29B
ND
ND


619
2
ML0039(pseudo)
MA177B
MP3884c
ND
ND
ND


631
3
ML2537
MA78B
MP0282
MS32B
ND
ND


-
4
No duplication
No duplication
No duplication
No duplication
No duplication
No duplication






610
5
ML1536
MA310B
MP1798
ND
ND
ND
C
Amino-terminal transmembrane protein, possible ATP/GTP- binding motif, 31-41% homology
480
1
ML0054
ND
ND
MS29C
ND
ND


495
2
Deleted
MA144C
MP3895c
ND
ND
ND


538
3
ML2536
MA78C
MP0283
MS32C
ND
ND


470
4
Deleted
MA94C
MP3450c
MS8C
CORDmem
SC3C3.07


506
5
ML1544
MA221C
MP1782
ND
ND
ND
D
DNA segregation ATPase, ftsK chromosome partitioning protein, SpoIIIE, yukA, 3× ATP/GTP-binding sites 2 × amino-terminal transmembrane protein, 28-39% homology
747+591
1
ML0053+52
ND
ND
MS29D (Stop$)
ND
ND


1396
2
Deleted
MA144D
MP3894c
ND
ND
ND


1330
3
ML2535
MA78D
MP0284
MS32D
ND
ND


1236
4
Deleted
MA504D
MP3447c
MS8D
CORDyuk
SC3C3.20c


435+932
5
ML1543
MA221D
MP1783
ND
ND
ND
E
PE, 18-90% homology
99
1
Deleted
ND
ND
MS29E
ND
ND


77
2
Deleted
MA138E
MP3893c
ND
ND
ND


102
3
ML2534
MA78E
MP0285
MS32E
ND
ND


-
4
No duplication
No duplication
No duplication
No duplication
No duplication
No


99 & 99
5
Deleted
MA310Ea & Eb
MP1788 & 91
ND
ND
ND
F
PPE, 19-88% homology
368
1
ML0051
ND
ND
MS29F
ND
ND


399
2
Deleted
MA138F
MP3892c
ND
ND
ND


513
3
ML2533 (pseudo)
MA78F
MP0286
MS32F
ND
ND


-
4
No duplication
No duplication
No duplication
No duplication
No duplication
No duplication


365, 393 & 350

5
Deleted
MA310Fa & Fb & Fc
MP1787 & 89 & 90
ND
ND
ND
G
lhp or CFP-10, also MTSA-10, grouped ESAT-6 family, potent secreted T-cell antigens, 9-32% homology
100
1
ML0050
ND
ND
MS29G
ND
SC3C3.10 and SC3C3.11(c)


107
2
Deleted
MA138G
MP3891c §
ND
ND
ND


97
3
ML2532
MA141G
MP0287
MS32G
ND
ND


125
4
Deleted
MA319G
MP3445c
MS8G
CORDcfp10
ND


98
5
MLcfp (pseudo)
MA310G
MP1792
ND
ND
ND
H
ESAT-6 family, cfp7, L45 or l-esat, also Mtb9.9 family, potent secreted T-cell antigens, 15-27% homology
95
1
ML0049
ND
ND
MS29H
ND
SC3C3.10 and SC3C3.11


95
2
ML0034 (pseudo)
MA138H
MP3890c §
ND
ND
ND


96
3
ML2531
MA141H
MP0288
MS32H
ND
ND


100
4
ML0363
MA319H
MP3444c
MS8H
CORDesat6
ND


94
5
MLesat (pseudo)
MA310H
MP1793
ND
ND
ND
I
ATPases involved in chromosome partitioning, 1x ATP/GTP-binding motif, 33% homology
666
1
ML0048
ND
ND
MS29I
ND
SC3C3.03c


341
2
ML0035 (pseudo)
MA138I
MP3888c
ND
ND
ND


-
3
No duplication
No duplication
No duplication
No duplication
No duplication
No duplication


-
4
No duplication
No duplication
No duplication
No duplication
No duplication
No duplication


-
5
No duplication
No duplication
No duplication
No duplication
No duplication
No duplication
J
Integral inner membrane protein, binding-protein-dependent transport systems inner membrane component signature, putative transporter protein, 19-27% homology
511
1
ML0047
ND
ND
MS29J
ND
ND


509
2
ML0036 (pseudo)
MA138J
MP3887c
ND
ND
ND


472
3
ML2529
MA141J
MP0290
MS32J
ND
ND


467
4
Deleted
MA504J
MP3448
MS8J
CORDtransporter
SC3C3.21


503
5
ML1539
MA310J
MP1795
ND
ND
ND
K
Mycosins, subtilisin-like cell-wall associated serine proteases, 43-49% homology
446
1
ML0041
ND
ND
MS65K
ND
ND


550
2
ML0037 (pseudo)
MA177K
MP3886c
ND
ND
ND


461
3
ML2528
MA141K
MP0291
MS32K
ND
ND


455
4
Deleted
MA439K
MP3449
MS8K
CORDsub
SC3C3.17c and SC3C3.08


585
5
ML1538
MA310K
MP1796
ND
ND
ND
L
2× amino-terminal transmembrane protein, 16-27% homology
462
1
ML0042
ND
ND
MS65L
ND
ND


537
2
ML0038 (pseudo)
MA177L
MP3885c
ND
ND
ND


331
3
ML2527
MA81L
MP0292
MS32L
ND
ND


-
4
No duplication
No duplication
No duplication
No duplication
No duplication
No duplication


406
5
ML1537
MA310L
MP1797
ND
ND
ND
Other region-specific genes of known functions (not assigned to a family)
Region 5 (not present in M. smegmatis, C. diphtheriae and S. coelicolor)
Rv1785c
Probable member of the cytochrome P450 family (pseudogene in M. leprae)

Rv1786
Probable ferredoxin (pseudogene in M. leprae)
Other region-specific genes of unknown functions (not assigned to a family)
Region 1(deleted in M. avium and M. paratuberculosis, not present in C. diphtheriae and S. coelicolor)
Rv3867
Unknown, annotated as part of MT3980 (Rv3866) in M. tuberculosis CDC1551 sequence with a frameshift (functional in M. leprae)

Rv3878
Unknown, some similarity to PPE family, deleted with RD1 deletion region in M. bovis BCG (pseudogene in M. leprae)



Rv3879c
Unknown, repetitive, highly proline-rich N-terminus, deleted with RD1 deletion region in M. bovis BCG (pseudogene in M. leprae)



Rv3880c
Unknown (functional in M. leprae)



Rv3881c
Unknown (pseudogene in M. leprae)
Region 4 (not present in S. coelicolor)
Rv3446c
Unknown, may contain a possible ABC transporter signature (deleted in M. leprae)

*Names of genes of these organisms were given arbitrarily by the authors of this paper. Gene not identified by BLAST, data obtained from [1], GenBank accession no. U34848 and AAC44033. The gene is present in the sequence, but not annotated (name given arbitrarily by authors of this paper). §Genes identified by BLAST as well as data obtained from GenBank, accession no. AJ250015. Orthologs in S. coelicolor are equally similar to family G and H. ND, Not detected - not necessarily absent from genome but possibly not detected because of unfinished sequencing process. No duplication, no duplication of this gene is present in this region. No sequence data, no sequence data is available for this organism, published deletion information is included ([1] and others). Deleted, deleted from the genome of this particular species or strain (# = deleted in only some strains of this species). Frame, frameshift. Stop, in-frame stop codon. Stop$, stop codon corresponds to stop codon in M. tuberculosis H37Rv, which splits gene into Rv3870 and Rv3871. Pseudo, confirmed pseudogene due to multiple frameshifts and stop codons.

Gey van Pittius et al. Genome Biology 2001 2:research0044.1   doi:10.1186/gb-2001-2-10-research0044

Open Data