The BLAST algorithm. (a) Given a query sequence of length L, BLAST derives a list of words of length w, where w = 3 for amino-acid sequences (shown) and 11 for nucleotide sequences. There are at most L - w + 1 such words. This word list is then expanded to include all high-scoring matching words, keeping only those that score more than the neighborhood word score threshold T when scored using a scoring matrix such as PAM250 or BLOSUM62. For typical parameter values, this results in about 50 words per residue of the query sequence. (b) The high-scoring word list is compared to the sequence database and exact matches are identified. (c) For each word match, the alignment is extended in both directions to generate alignments that score higher than the score threshold S.
Pertsemlidis and Fondon Genome Biology 2001 2:reviews2002.1 doi:10.1186/gb-2001-2-10-reviews2002