Table 4

Known and novel predicted regulatory elements, obtained when applying FastCompare to D. melanogaster and D. pseudoobscura

Sequence

Rank

DATG

WATG

Orientation

U/C

TRANSFAC

Comments


(a) Known regulatory elements

AACAGCTG

1

373

[0;1800]

-

1.64

-

Known AP-4/MyoD site

ATTTGCATA

3

882

[100;2000]

-

3.20

Oct-1

Known (mammalian) Oct-1 site

CACGTGC

5

825.5

-

-

1.02

Myc/Max, PHO4, USF

Known Myc/Max site

ATTTATGC

6

866

-

-

3.52

CdxA

Known CdxA site

TGACGTCA

9

825

-

-

2.36

CREB

Known CREB site

TGATAAG

11

760.5

[0;1100]

-

2.53

GATA

Known GATA site, carbohydrate metabolism (p < 10-5)

TATCGATA

12

168

[0;1900]

-

5.39

-

Known DRE site

TTTATGGC

14

978.5

-

-

2.82

Abd-B

Known Abd-B site

TAATTGA

24

907

[0;1900]

-

2.58

Ubx, Athb-1

Known Antp site

GAGAGAG

26

705.5

-

← (p < 10-4)

1.87

-

Known GAGA site, morphogenesis (p < 10-23)

CAGGTGC

33

1020.5

-

-

0.83

Sn

Known Snail site

TGACTCA

46

911

[100;2000]

-

1.89

AP-1, GCN4

Known AP-1 site

ATCAATCA

51

967

[0;1900]

-

1.72

Pbx-1

Known Pbx-1 site

AAGGTCA

93

1015.5

[400;1900]

-

1.16

HNF-4, ER

Known HRE

AACATGTG

105

994

[100;2000]

-

1.62

-

Known Twist site

GTAAACA

147

813

[0;1200]

-

2.54

Freac, SRY

Known DAF-16 site in C. elegans

(b) Novel predicted regulatory elements

ACACACAC

2

922.5

-

→ (p < 10-12)

1.97

-

Unknown site, embryonic development (p < 10-9)

CAAGGAG

13

1091

[200;2000]

← (p < 10-8)

0.84

-

Unknown site

GCACACAC

29

886

-

-

1.80

-

Unknown site, histogenesis (p < 10-5)

CAAGTTCA

30

920

[0;1900]

-

1.23

-

Unknown site

TAATTAA

31

871

[500;2000]

-

3.07

Ftz

Unknown palindromic homeodomain-like site

CAACAACA

42

968.5

[200;2000]

-

1.22

-

Unknown site, regulation of transcription (p < 10-5)

TGGCGCC

48

951

-

-

0.84

-

Unknown palindromic site

CCTGTTGC

111

653

[0;1800]

-

0.90

-

Unknown site

GTGTGACC

112

296

[0;1900]

→ (p < 10-5)

2.22

-

Unknown site

CAGGTAG

143

924.5

[0;1700]

-

0.94

-

Unknown site, cell fate commitment (p < 10-8)

CACACGCA

145

968.5

-

-

1.49

-

Unknown site, cellular morphogenesis (p < 10-5)

GTCAACAA

169

904

-

-

1.48

-

Unknown site, similar to DAF-16

AAATGGCG

205

592

-

-

1.54

-

Unknown site

TTGACCCA

239

860

[0;1700]

-

1.60

-

Unknown site

TGACACAC

273

860

-

-

1.83

-

Unknown site

TGTCAAC

281

999

[100;1900]

1.55

-

Unknown site


(a) For each known regulatory element, we show the best k-mer, its rank within the set of 469 highest scoring k-mers, the median distance to ATG (for occurrences upstream of genes within the conserved set), the optimal window, the orientation bias, the corrected ratio of upstream/coding bias, the total (up-regulated/down-regulated) number of microarray conditions in which the k-mer was found (see Method), TRANSFAC matches, and the best GO enrichment. (b) Novel predicted regulatory elements. k-mers shown here were selected from the list of 469 highest scoring k-mers based on their short median distance to ATG, short optimal window, significant orientation bias, strong over-representation ratio (U/C), presence in upstream regions of over/underexpressed genes in several microarray conditions, palindromicity or ressemblance to known sites in other species.

Elemento and Tavazoie Genome Biology 2005 6:R18   doi:10.1186/gb-2005-6-2-r18

Open Data