Figure 2.
Multiple alignment of the occurrences of the NEAT domain, generated with the ClustalW
program [28]. SMART [9,10], which identifies repeats using prospero [29,30], was used
to search for domains in some sequences. The internal repeats detected in this manual
analysis were used to generate subsequences that were used for building the first
alignment. Then, we followed an iterative procedure by building a Hidden Markov Model
(HMM) of the alignment and adding to the alignment significant hits from an HMM search
[31,32] comparison of the HMM to the NCBI's nonredundant protein database. For the
final HMM (derived from the alignment presented in this figure) no more similar sequences
were detected below a standard E-value threshold (E = 0.001). The consensus in 70% of the sequences is reported below the alignment.
Residue ranges are listed next to the protein code name. The letters h, l, and p indicate
hydrophobic, aliphatic, and polar residues, respectively. Hydrophobic residues are
highlighted in dark blue, polar residues in green, and a fairly conserved arginine
(R in the consensus sequence) in red. Codes are the same as in Figure 1 and Table 1. The predicted secondary structure [8], mostly beta sheet, is displayed at the bottom
of the figure. Although the B_anth sequence corresponds to a fragment of the domain,
examination of the corresponding DNA sequence indicates that the actual translation
product might extend further in both the amino- and carboxy-terminal directions.
Andrade et al. Genome Biology 2002 3:research0047.1 doi:10.1186/gb-2002-3-9-research0047 |