Table 4

Experimental feature definitions




RNA transcription (coding and noncoding)


Coding sequence: well characterized transcribed regions with an annotated protein-coding open reading frame (ORF)


5' and 3' rapid Amplification of cDNA ends (RACE), using polyA or total RNA to construct full-length cDNA. This technique has revealed previously unrecognized UTRs


Transcriptionally active regions/transcribed fragments as determined by analyses of cellular RNA (polyA or total) hybridizations to multiple microarray platforms. For the analyses reported here, portions of TARs/transfrags overlapping any CDS, 5' or 3' UTR annotations were removed from the dataset


A pre-mRNA sequence that resembles an exon but is not recognized as such by the splicing machinery


Transcription start site

5' UTR

Untranslated region: portions of CDS-containing transcripts before the start codon. For the analyses reported here, 5' UTRs overlapping alternatively transcribed CDS annotations were removed from the dataset


Transcripts of unknown function for noncoding transcripts

3' UTR

Untranslated region: portions of CDS-containing transcripts after the stop codon

Transcript regulation: open chromatin/DNA-protein interaction


DNAse I hypersensitive sites are short regions of DNA that are relatively easily cleaved by deoxyribonuclease. Regions of open chromatin detected by quantitative chromatin profiling and novel microarray-based methods. For the analyses reported here, regions that overlap repetitive sequence were removed. Measures of DHS are reported using two sources: the ENCODE Regulome group and the NHGRI


Formaldehyde assisted isolation of regulatory elements: a procedure used to isolate chromatin that is resistant to the formation of protein-DNA crosslinks. Data suggest that depletion of nucleosomes (the most basic organizational unit of chromatin) at active regulatory regions, such as promotors, is the primary underlying basis for FAIRE [38]


Histone modifications, RNA polymerase II (PolII), and transcription regulator TAF250

Sequence specific factors

Regions of DNA determined to be bound by sequence-specific transcription factors through chromatin immunoprecipitation followed by microarray chip hybridization (so-called 'ChIP-Chip') analyses

Sequence specific (all motifs)

Computationally identified short sequence motifs found to be over-represented in the sequence specific factors dataset

Ancestral repeats

Mobile elements with well defined consensus sequences that inserted into the ancestral genome prior to mammalian radiation. These sequences are considered to be predominantly non-functional and are often used as models of neutrally evolving DNA

Cell cycle


Early replicating segments


Mid replicating segments


Late replicating segments

Evolutionary constraint

MCS strict

Multi-species conserved sequences: strict criteria

MCS moderate

Multi-species conserved sequences: modest criteria

MCS loose

Multi-species conserved sequences: loose criteria

Clark et al. Genome Biology 2007 8:R180   doi:10.1186/gb-2007-8-9-r180

Open Data