Figure 1.
Representation of transcription-factor binding sites. (a) An example of six sequences and the consensus sequence that can be derived from them.
The consensus simply gives the nucleotide that is found most often in each position;
the alternate (or degenerate) consensus sequence gives the possible nucleotides in
each position; R represents A or G; N represents any nucleotide. (b) A position weight matrix for the -10 region of E. coli promoters, as an example of a well-studied regulatory element. The boxed elements
correspond to the consensus sequence (TATAAT). The score for each nucleotide at each
position is derived from the observed frequency of that nucleotide at the corresponding
position in the input set of promoters. The score for any particular site is the sum
of the individual matrix values for that site's sequence; for example, the score for
TATAAT is 85. Note that the matrix values in (b) do not come from the example shown
in (a) but rather are derived from a much larger collection of -10 promoter regions.
Adapted, with permission, from [3].
Bulyk Genome Biology 2003 5:201 doi:10.1186/gb-2003-5-1-201 |