Table 5

Comparison of ENCODE and Bhangale et al. (ten ENCODE regions) indel data

ENCODE (44 ENCODE regions/Baylor)

Bhangale et al. (ten ENCODE regions/Baylor)


Indels

Rate (per 100 kb)

Indels

Rate (per 100 kb)


n

bp

n

bp

n

bp

n

bp


Manual

2,186

6,504

14.6

43.4

362

1,122

13.0

40.4

Random

2,300

6,506

15.3

43.4

502

1,350

14.3

38.6

Overall

4,486

13,010

15.0

43.4

864

2,472

13.8

39.4

RNA transcription

CDS

5

5

0.7

0.7

1

1

1.2

1.2

TSS

2

2

3.3

3.3

0

0

0.0

0.0

RACEfrags

9

28

2.1

6.6

0

0

0.0

0.0

TARs/transfrags

37

78

5.8

12.3

6

11

7.5

13.7

Pseudo-exons

9

26

6.6

19.1

2

10

9.7

48.7

3' UTR

48

103

11.0

23.6

11

29

18.7

49.2

5' UTR

7

32

6.0

27.4

4

8

37.3

74.6

TUF

53

160

12.2

36.9

4

18

8.1

36.4

Open chromatin

FAIRE sites

106

327

7.7

23.8

17

72

5.6

23.6

DHS (NHGRI)

19

61

6.1

19.7

1

1

2.8

2.8

DHS (Regulome)

43

135

8.6

27.0

15

40

8.5

22.6

DNA-protein intreraction/transcript Regulation

HisPolTAF

141

348

13.1

32.4

32

114

12.8

45.5

Seq_specific (all motifs)

131

420

11.2

35.8

28

122

33.4

145.3

SeqSp (sequence specific factors)

54

225

10.2

42.5

9

45

5.1

25.6

Ancestral repeats

532

1,592

7.9

26.5

110

280

8.7

22.1

Evolutionary constraint

MCS strict

19

31

2.5

4.1

5

9

3.3

5.9

MCS moderate

78

170

5.1

11.2

17

36

5.4

11.4

MCS loose

356

960

9.8

26.4

63

136

8.4

18.1

Cell cycle

EarlyRepSeg

1,124

2,989

16.4

43.5

161

495

16.4

50.4

MidRepSeg

1,190

3,352

15.4

43.2

270

797

16.4

48.3

LateRepSeg

1,110

3,345

13.9

41.9

300

819

11.3

31.0


Both datasets (Encyclopedia of DNA Elements [ENCODE] and that reported by Bhangale and coworkers [19]) are based on a subset of 8 African Americans (the Baylor samples). bp, base pairs; CDS, coding sequence; CI, confidence interval; DHS, DNAse hypersensitive sites; ENCODE, Encyclopedia of DNA Elements; FAIRE, formaldehyde assisted isolation of regulatory elements; kb, kilobases; MCS, multi-species conserved sequence; NHGRI, National Human Genome Research Institute; transfrag, transcribed fragment; RACEfrag, rapid amplification of cDNA ends fragment; SNP, single nucleotide polymorphism; TAR, transcriptionally active region; TSS, transcription start site; TUF, transcripts of unknown function; UTR, untranslated region.

Clark et al. Genome Biology 2007 8:R180   doi:10.1186/gb-2007-8-9-r180

Open Data