Table 1

A comparison of the number of pufferfish hits by hmmsearch results versus the pufferfish database both before and after the THoR process

Domain name (SMART name)
N(SMART)
N(THoR)
N(THoR) - N(SMART)

14-3-3 homologs (14_3_3)
9
9
0
Domains in Ataxins and HMG-containing proteins (AXH)
6
6
0
Breast cancer carboxy-terminal domain (BRCT)
31
39
8
Bromo domain (BROMO)
89
89
0
Bulb-type mannose-specific lectins (B_lectin)
1
2
1
Chromatin organization modifier domain (CHROMO)
62
69
7
Calpain-like thiol protease family (CysPc)
31
32
1
Tandem repeat (DM15)
6
6
0
Endothelin (END).
5
5
0
Exonuclease (EXOIII)
10
12
2
Receptor for Ubiquitination Targets (FBOX)
34
45
11
Formin homology 2 domain (FH2)
20
35
15
Fibronectin type 1 domain (FN1)
49
49
0
High mobility group (HMG)
82
84
2
Homeodomain (HOX)
319
323
4
Protein kinase C-related kinase homology region 1 homologs (HR1)
19
19
0
Short calmodulin-binding motif containing conserved Ile and Gln residues (IQ)
228
226
-2
Kyprides, Ouzounis, Woese motif (KOW)
12
12
0
Kringle (KR)
33
34
1
Zinc-binding domain present in Lin-11, Isl-1, Mec-3 (LIM)
204
214
10
Pleckstrin homology (PH)
373
436
63
Zinc finger (PHD)
216
303
87
Phosphoinositide 3-kinase, region postulated to contain C2 domain (PI3K_C2)
10
12
2
Motif in proteasome subunits, Int-6, Nip-1 and TRIP-15 (PINT)
16
17
1
Phosphatidylinositol phosphate kinases (PIPKc)
14
15
1
Domain found in a protein subunit of human RNase MRP and RNase P ribonucleoprotein complexes and archaeal proteins (POP4)
1
1
0
Domain found in Plexins, Semaphorins and Integrins (PSI)
116
119
3
Domain with conserved PWWP motif (PWWP)
27
29
2
Guanine nucleotide exchange factor for Rho/Rac/Cdc42-like GTPases (RhoGEF)
99
111
12
Src homology 2 domains (SH2)
142
153
11
Src homology 3 domains (SH3)
358
373
15
Staphylococcal nuclease homologs (SNc)
3
6
3
Domain in short gastrulation protein and chordin (SOG)
3
3
0
snRNP Sm proteins (Sm)
18
18
0
TopoisomeraseII (TOP2c)
3
3
0
Tetratricopeptide repeats (TPR)
573
552
-21
Tudor domain (TUDOR)
25
44
19
Domain present in VPS-27, Hrs and STAM (VHS)
15
14
-1

N(SMART) is the number of domains found in the predicted set of pufferfish proteins using hmmsearch with SMART thresholds. N(THoR) is the number of domains found in pufferfish using hmmsearch with SMART thresholds using the alignment created by THoR. N(THoR) - N(SMART) is the difference between the THoR results and the SMART results. The SMART domain families COLIPASE, ChW, CheW, Galanin, IL10, IL2, LIGANc, POLIIIc, POX and REC were used for the benchmarking as negative controls. None of these domains was expected to provide positive hits to the pufferfish database, because they are prokaryote-specific or mammal-specific domains; indeed, no pufferfish homologs were detected by THoR. The domains AAA and WD40 were both searched by THoR with only one round of PSI-BLAST, because they were known to contain many members and a full search of five rounds would require an unnecessarily lengthy period of time to complete. They are not shown because they encountered memory-allocation errors with hmmbuild and their search iterations did not complete.

Dickens and Ponting Genome Biology 2003 4:R52   doi:10.1186/gb-2003-4-8-r52

Open Data