Table 2

The ten most significant motifs in the core promoter sequences from -60 to +40, as identified by the MEME algorithm

Motif
Pictogram
Bits
Consensus
Number
E value

1

15.2
YGGTCACACTR
311
5.1e-415
2 DRE

13.3
WATCGATW
277
1.7e-183
3 TATA

13.2
STATAWAAR
251
2.1e-138
4 INR

11.6
TCAGTYKNNNTYNR
369
3.4e-117
5

15.2
AWCAGCTGWT
125
2.9e-93
6

15.1
KTYRGTATWTTT
107
1.9e-62
7

12.7
KNNCAKCNCTRNY
197
1.9e-63
8

14.7
MKSYGGCARCGSYSS
82
5.1e-29
9 DPE

15.4
CRWMGCGWKCGGTTS
56
1.9e-12
10

15.3
CSARCSSAACGS
40
8.3e-9

We show the identified motifs in pictogram representation, where the height of letters corresponds to their frequencies relative to the single-nucleotide background used when running MEME. The information content in bits is also calculated with respect to this background. The consensus sequence represents only the highly conserved part of each motif, using the IUPAC code for ambiguous nucleotides. The number of occurrences refers to the sequences that MEME decided to use to build each motif model. The E-value refers to the probability that a motif of the same width is found with equally or higher likelihood in the same number of random sequences having the same single-nucleotide frequencies as our promoter set.

Ohler et al. Genome Biology 2002 3:research0087.1   doi:10.1186/gb-2002-3-12-research0087

Open Data