|
Summary of key features of MotifCluster and a selection of other programs that perform clustering of motifs or remote homology detection |
|||
| Strategy |
Program |
Overview of program |
Publication |
|
|
|||
| Clustering proteins by motifs they contain |
MotifCluster |
Takes aligned or unaligned protein and nucleotide sequences and a MEME file showing motifs; allows clustering of the sequences according to the motifs they contain, and visualization of the motifs on the aligned and unaligned sequences and three-dimensional structures |
This article |
| Clustering of transcription factor binding sites (in DNA) |
MCAST |
Takes list of transcription factor binding sites as input: uses hidden Markov models to find cis-regulatory modules in DNA |
[21] |
| Cluster-Buster |
Takes list of transcription factor binding sites as input: uses Forward algorithm and expected uniform distribution to find motif co-occurrence in DNA |
[22] |
|
| ClusterDraw |
Takes list of transcription factor binding sites as input: uses r-scan algorithm and sweep over parameter values to visualize significant clusters as peaks on the DNA sequence |
[23] |
|
| COMET |
Calculates significance of collection of position-specific score matrices that appear in order: can apply to DNA or protein, in principle |
[24] |
|
| PEAKS |
Calculates significance of collection of transcription factor binding sites that appear at specified distance from transcription start site or other feature in the DNA |
[25] |
|
| CompMoby |
Aligns all pairs of motifs that appear significant in different promoters, then groups these into clusters using the CAST algorithm. DNA-specific |
[26] |
|
| CREME |
Identifies groups of DNA motifs that co-occur significantly within a defined distance using both order-dependent and order-independent models |
[27] |
|
| PHYLOCLUS |
Uses Bayesian method to find clusters of evolutionarily conserved DNA motifs that appear in different promoters. |
[28] |
|
| INCLUSive |
Clusters genes based on microarray analysis: feeds promoters to Gibbs sampler to find DNA motifs overrepresented in each cluster |
[29] |
|
| Identifying kernels for SVMs* |
SVM kernels |
Introduces kernels based on k-word occurrences and best BLAST hit for SVM clustering: does not focus on conserved motifs |
[30] |
| WCM (word correlation matrices) |
Introduces k-word kernel for SVM clustering based on correlations in appearance of pairs of k-words: does not focus on conserved motifs. |
[31] |
|
| ODH (oligomer distance histograms) |
Introduces new kernel for SVM clustering based on histograms of distances between all words in protein: does not focus on conserved motifs |
[32] |
|
| Iterative BLAST |
Shotgun |
BLAST-based approach for identifying remote homologs by iterative searches: not motif-based |
[3] |
| DivergentSet |
Among other features, can perform BLAST and PSI-BLAST versions of Shotgun and choose representative sequences of each group: not motif-based |
[20] |
|
| Cascade PSI-BLAST |
Performs iterative steps of PSI-BLAST, otherwise like Shotgun: not motif-based. |
[33] |
|
| ProClust |
Performs graph-based connection of proteins based on pairwise sequence similarity: not motif based |
[34] |
|
| k-word clustering |
CD-Hit |
Clusters proteins based on shared segments of overall sequence, not by motifs already known to be significant |
[35] |
| Profile-profile alignment |
COMPASS |
Performs profile-profile alignments for remote homology detection: assesses statistical significance matches in the profiles overall, rather than specifically using shared motifs |
[1] |
| Clustering of motifs |
STAMP |
Aligns motifs with one another so that relationships among motifs can be detected; performs many other tasks for promoter characterization, but specific to promoters |
[36] |
| TAMO |
Performs many functions for cis-regulatory analysis: is able to cluster DNA motifs with one another |
[37] |
|
| SOMBRERO |
Aligns and clusters DNA motifs with one another to improve transcription factor binding site searches |
[38] |
|
| Identification of functions in labeled structures |
FunClust |
Takes set of three-dimensional structures with annotated functions; identifies three-dimensional motif fragments that are common to the structures with each function. |
[39] |
|
*SVMs are support vector machines, a common machine learning approach to pattern classification. A kernel is a function that calculates the inner product of all pairs of input vectors in an abstract space, which is an important step in the process and affects the clustering. | |||
Hamady et al. Genome Biology 2008 9:R128 doi:10.1186/gb-2008-9-8-r128 |
|||