Table 1

Consensus sequences for the most significant groups of word pairs


Hexamer list for word 1
Compiled sequence 1
TF for consensus 1
Hexamer list for word 2
Compiled sequence 2
TF for consensus 2
Number of word pairs

1
GAGATG
GCGATG
  AGATGA
  CGATGA
    GATGAG
      ATGAGA
      ATGAGC
        TGAGAT
        TGAGCT
          GAGATG
            AGATGA
            AGCTCA
GMGATGAGMTSA
Unknown (PAC motif [38])
TGAAAA
  GAAAAA
    AAAAAT
      AAAATT
        AAATTT
TGAAAATTT
Unknown (RRPE motif [38])
75
2
AAGTGA
  AATGAA
  AGTGAA
    ATGAAA
    CTGAAA
      TGAAAA
ANTGAAAAA
Unknown (RRPE motif [38])
GAAAAA
GAAAAT
  AAAATT
    AAATTT
GAAAAWTT
Unknown (RRPE motif [38])
40
3
GTTCCC
  CTCCCC
    ACCCCT
    TCCCCT
GYWCCCCT
(motif 38 [26])
CCCTTT
  CCTTTT
  CCTTAT
CCCTTWT
(motif 38 [26])
5
4*
GGCGGC
  GCGGCT
GGCGGCT
Ume6p
GTGGCA
  GGCAAA
GTGGCAAA
Rpn4p
2
5
CCCTTT
  CCTTTT
CCCTTTT
Msn2/4p-like
GGAGAA
  GGGAAA
GGRGAAA
Hsf1p
2
6
CGGCGG
CGGCGG
Ume6p
TACCCC
ACCCCA
  CCCCAA
TACCCCAA
Mig1p
3
7*
CCGCGG
CCGCGG
Pdr1/3p
CGGAAA
CGGAAA
Unknown
1
8
AAACGC
  GACGCG
  AACGCG
    ACGCGT
    ACGCGA
    TCGCGT
      CGCGTC
ARWCGCGW
Mbp1p
CGCGAA
  ACGAAA
  GCGAAA
    CGAAAC
    CGAAAA
CRCGAAAM
Swi4/6p
9
9
TCACGT
  CACGTG
    ACGTGC
TCACGTGC
Cbf1p
ACTGTG
  CTGTGG
    TGTGGC
      GTGGCT
ACTGTGGCT
Met31/32p
6
10
TATTTT
  TTTTGT
    TTTGTT
    ATTGTT
TWTTGTT
Fkh1/2p
TGTTTA
  GTTTAC
TGTTTAC
Fkh1/2p
4
11
TTTGTT
  TTGTTT
TTTGTTT
Fkh1/2p
TTTTTC
TTTTTT
TTTTTY
TnC
4
12*
TCGTTT
  CGTTTA
TCGTTTA
Ecm22p | Upc2p
CCGATA
  CGATAA
CCGATAA
Hap1p
4
13
TCGTTT
  CGTTTA
TCGTTTA
Ecm22p | Upc2p
TATTGT
  ATTGTT
TATTGTT
Rox1p
2
14
CGTTTC
  GTTTCT
CGTTTCT
Ecm22p | Upc2p
TTCTTT
  TCTTTT
    CTTTTT
TTCTTTTT
TnC
5

The output P × C matrix of word pairs (P) that were significantly associated (p < 0.001) with at least five or more environmental conditions (C) was ordered using hierarchical clustering. Numbers correspond to groups of overlapping word pairs indicated in Figure 4. Asterisks denote sequence pairs whose involvement in multifactorial regulation has not been previously reported. Compiled sequences were assembled from groups of word pairs that were found in adjacent rows in the ordering of K-S p-values. As individual words must have passed all three statistical tests to be included in the output matrix, these consensus sequences may not reflect the actual biological specificities of conserved transcription factor binding sites (refer to [26,36] for a more complete list). Residues are shown in bold if they are invariant in at least two hexamers. Numbers denote the groups that are indicated in Figure 4. Multiple transcription factors that may bind the same sequence motif are separated by |. IUPAC codes used: K (G or T); M (A or C); R (A or G); S (C or G); W (A or T).

Chiang et al. Genome Biology 2003 4:R43   doi:10.1186/gb-2003-4-7-r43

Open Data