Table 5 |
||||||||
|
Known and novel predicted regulatory elements, obtained when applying FastCompare to H. sapiens and M. musculus |
||||||||
|
Sequence |
Rank |
DATG |
WATG |
Orientation |
U/C |
Experiment |
TRANSFAC |
Comments |
|
|
||||||||
|
(a) Known regulatory sequences |
||||||||
|
CCCGCCC |
1 |
256 |
- |
- |
2.26 |
8(7/1) |
Sp1, GC box |
Known Sp1 site, transcription from pol II promoter (p < 10-5) |
|
GCCCCGCCC |
2 |
165 |
- |
- |
4.64 |
9(9/0) |
Sp1, GC box |
Known Sp1 site, variant from above |
|
CCGGAAG |
4 |
160.5 |
[0;700] |
- |
2.37 |
- |
Ets1, Elk1 |
Known Ets site, RNA metabolism (p < 10-6) |
|
CACGTGAC |
18 |
122.5 |
[0;600] |
- |
4.90 |
- |
USF, GBP, SREBP-1 |
Known Myc/Max site |
|
TGACGTCA |
19 |
107 |
[0;1000] |
- |
4.24 |
- |
CREB |
Known CREB site |
|
CGCATGCG |
24 |
132 |
[0;1600] |
- |
4.26 |
- |
- |
Known palindromic octamer sequence (POS) |
|
CCAATCAG |
37 |
239 |
[0;700] |
- |
2.85 |
4(0/4) |
NF-Y, CCAAT |
Known CAAT box and CCAAT enhancer binding protein site |
|
CGGAAGTGA |
51 |
94 |
[0;1000] |
- |
3.96 |
- |
STAT3 |
Known GA-binding protein (GAB) site |
|
CCGCCTC |
78 |
632 |
[0;500] |
- |
4.26 |
9(8/1) |
- |
Known insulin response element |
|
CACGTGG |
82 |
429.5 |
[0;300] |
- |
2.09 |
- |
USF, Myc-Max |
Known Myc/Max site, different from above |
|
TAATCCCAG |
119 |
1258 |
[100;2000] |
← (p < 10-14) |
7.06 |
3(1/2) |
- |
Similar to Bicoid (Drosophila), RNA processing (p < 10-5) |
|
CACCTGC |
227 |
925 |
[0;600] |
- |
1.64 |
1(1/0) |
E47, Lmo2 |
Known ZEB site in vertebrates, Zfh-1 in Drosophila |
|
ATTTGCAT |
234 |
729 |
[0;300] |
- |
1.95 |
- |
Oct-1 |
Known Oct-1 site, chromatin assembly/disassembly (p < 10-8) |
|
CCAAGGTCA |
242 |
801 |
[0;1800] |
- |
1.59 |
- |
- |
Known HRE site |
|
GGAAGTCCC |
253 |
124.5 |
[0;300] |
- |
2.60 |
- |
NFκB |
Known NFκB site |
|
CAGCTGC |
256 |
850 |
[0;1600] |
- |
1.03 |
- |
AP-4, HEN1 |
Known AP-4, MyoD site |
|
TTTCGCGC |
275 |
245 |
- |
2.42 |
- |
E2F |
Known E2F site |
|
|
(b) Novel predicted regulatory sequences |
||||||||
|
CGCAGGCGC |
6 |
127 |
- |
- |
2.76 |
- |
- |
Unknown site |
|
GCGCCGC |
13 |
311 |
[0;1900] |
← (p < 10-5) |
1.41 |
- |
- |
Unknown site |
|
TCTCGCGA |
17 |
116 |
[0;1700] |
- |
4.45 |
- |
StuAp |
Unknown site, similar to E2F |
|
TTAAAAA |
52 |
1142 |
[100;2000] |
- |
2.19 |
21(0/21) |
- |
Unknown site |
|
CTCCGCCC |
60 |
242.5 |
[0;1300] |
- |
3.85 |
- |
- |
Unknown site, similar to Sp1 |
|
CCCCTCCC |
67 |
563 |
[0;500] |
→ (p < 10-4) |
5.12 |
1(0/1) |
- |
Unknown site, regulation of transcription, DNA-dependent (p < 10-5) |
|
AAGATGGCG |
76 |
334 |
[0;1300] |
- |
1.14 |
- |
- |
Unknown site |
|
CTGCGCA |
89 |
199 |
[0;300] |
- |
3.63 |
- |
- |
Unknown site |
|
CCAGCCTGG |
123 |
1245 |
[200;2000] |
- |
4.42 |
- |
- |
Unknown site |
|
CCTGCCC |
162 |
788 |
[0;1800] |
- |
1.55 |
21(20/1) |
E47/Sp1 |
Unknown site |
|
CCCTTTAAG |
166 |
230 |
[0;800] |
→ (p < 10-10) |
3.45 |
- |
- |
Unknown site |
|
CCCCAGC |
207 |
785 |
- |
- |
1.42 |
22(22/0) |
- |
Unknown site |
|
TACAACTCC |
225 |
154 |
[0;700] |
- |
2.51 |
- |
- |
Unknown site |
|
GTGAGCCAC |
248 |
1208 |
- |
→ (p < 10-6) |
6.28 |
- |
- |
Unknown site |
|
|
||||||||
|
(a) For each known regulatory element, we show the best k-mer, its rank within the set of 284 highest scoring k-mers, the median distance to ATG (for occurrences upstream of genes within the conserved set), the optimal window, the orientation bias, the corrected ratio of upstream/coding bias, the total (upregulated/downregulated) number of microarray conditions in which the k-mer was found (see Materials and methods), TRANSFAC matches, and the best GO enrichment. (b) Novel predicted regulatory elements. k-mers shown here were selected from the list of 284 highest-scoring k-mers based on their short median distance to ATG, short optimal window, significant orientation bias, strong over-representation ratio (U/C), presence in upstream regions of over/underexpressed genes in several microarray conditions, palindromicity or resemblance to known sites in other species. |
||||||||
|
Elemento and Tavazoie Genome Biology 2005 6:R18 doi:10.1186/gb-2005-6-2-r18 |
||||||||