Table 1 |
|||||||
|
Consensus sequences for the most significant groups of word pairs |
|||||||
| Hexamer list for word 1 |
Compiled sequence 1 |
TF for consensus 1 |
Hexamer list for word 2 |
Compiled sequence 2 |
TF for consensus 2 |
Number of word pairs |
|
|
|
|||||||
| 1 |
GAGATG GCGATG AGATGA CGATGA GATGAG ATGAGA ATGAGC TGAGAT TGAGCT GAGATG AGATGA AGCTCA |
GMGATGAGMTSA |
Unknown (PAC motif [38]) |
TGAAAA GAAAAA AAAAAT AAAATT AAATTT |
TGAAAATTT |
Unknown (RRPE motif [38]) |
75 |
| 2 |
AAGTGA AATGAA AGTGAA ATGAAA CTGAAA TGAAAA |
ANTGAAAAA |
Unknown (RRPE motif [38]) |
GAAAAA GAAAAT AAAATT AAATTT |
GAAAAWTT |
Unknown (RRPE motif [38]) |
40 |
| 3 |
GTTCCC CTCCCC ACCCCT TCCCCT |
GYWCCCCT |
(motif 38 [26]) |
CCCTTT CCTTTT CCTTAT |
CCCTTWT |
(motif 38 [26]) |
5 |
| 4* |
GGCGGC GCGGCT |
GGCGGCT |
Ume6p |
GTGGCA GGCAAA |
GTGGCAAA |
Rpn4p |
2 |
| 5 |
CCCTTT CCTTTT |
CCCTTTT |
Msn2/4p-like |
GGAGAA GGGAAA |
GGRGAAA |
Hsf1p |
2 |
| 6 |
CGGCGG |
CGGCGG |
Ume6p |
TACCCC ACCCCA CCCCAA |
TACCCCAA |
Mig1p |
3 |
| 7* |
CCGCGG |
CCGCGG |
Pdr1/3p |
CGGAAA |
CGGAAA |
Unknown |
1 |
| 8 |
AAACGC GACGCG AACGCG ACGCGT ACGCGA TCGCGT CGCGTC |
ARWCGCGW |
Mbp1p |
CGCGAA ACGAAA GCGAAA CGAAAC CGAAAA |
CRCGAAAM |
Swi4/6p |
9 |
| 9 |
TCACGT CACGTG ACGTGC |
TCACGTGC |
Cbf1p |
ACTGTG CTGTGG TGTGGC GTGGCT |
ACTGTGGCT |
Met31/32p |
6 |
| 10 |
TATTTT TTTTGT TTTGTT ATTGTT |
TWTTGTT |
Fkh1/2p |
TGTTTA GTTTAC |
TGTTTAC |
Fkh1/2p |
4 |
| 11 |
TTTGTT TTGTTT |
TTTGTTT |
Fkh1/2p |
TTTTTC TTTTTT |
TTTTTY |
TnC |
4 |
| 12* |
TCGTTT CGTTTA |
TCGTTTA |
Ecm22p | Upc2p |
CCGATA CGATAA |
CCGATAA |
Hap1p |
4 |
| 13 |
TCGTTT CGTTTA |
TCGTTTA |
Ecm22p | Upc2p |
TATTGT ATTGTT |
TATTGTT |
Rox1p |
2 |
| 14 |
CGTTTC GTTTCT |
CGTTTCT |
Ecm22p | Upc2p |
TTCTTT TCTTTT CTTTTT |
TTCTTTTT |
TnC |
5 |
|
|
|||||||
|
The output P × C matrix of word pairs (P) that were significantly associated (p < 0.001) with at least five or more environmental conditions (C) was ordered using hierarchical clustering. Numbers correspond to groups of overlapping word pairs indicated in Figure 4. Asterisks denote sequence pairs whose involvement in multifactorial regulation has not been previously reported. Compiled sequences were assembled from groups of word pairs that were found in adjacent rows in the ordering of K-S p-values. As individual words must have passed all three statistical tests to be included in the output matrix, these consensus sequences may not reflect the actual biological specificities of conserved transcription factor binding sites (refer to [26,36] for a more complete list). Residues are shown in bold if they are invariant in at least two hexamers. Numbers denote the groups that are indicated in Figure 4. Multiple transcription factors that may bind the same sequence motif are separated by |. IUPAC codes used: K (G or T); M (A or C); R (A or G); S (C or G); W (A or T). |
|||||||
|
Chiang et al. Genome Biology 2003 4:R43 doi:10.1186/gb-2003-4-7-r43 |
|||||||