Table 1

The 44 selected sequences within the ENCODE region

Random picks Mouse homology


Sequence Set

Manual picks

Low

Medium

High

Gene density


Training

ENm006

ENr132

ENr231

ENr333

High

ENr232

ENr334

ENm004

-

ENr222

ENr323

Medium

ENr223

ENr324

-

ENr111

-

-

Low

ENr114

Test

ENm002

ENr131

ENr233

ENr331

High

ENm005

ENr133

ENr332

ENm007

ENm008

ENm009

ENm010

ENm011

ENm001

ENr121

ENr221

ENr321

Medium

ENm003

ENr122

ENr322

ENm012

ENr123

ENm013

ENm014

-

ENr112

ENr211

ENr311

Low

ENr113

ENr212

ENr312

ENr213

ENr313


ENCODE sequences were assigned to either the training or the test set based on annotation data availability (see the section 'The EGASP experiment'). For the performance evaluation, only the test set sequences were used. The numeric code for the randomly picked sequence names correspond to the non-exonic conservation with the mouse genome, the density of previously identified genes, and the sequence number, respectively; numbers vary from 1 (low), to 3 (high). Manually selected sequences range in size from 500 kbp to 2 Mbp, while random regions are 500 kbp. The selection and stratification criteria for all the sequences is described at the ENCODE project web site [34].

Guig√≥ et al. Genome Biology 2006 7(Suppl 1):S2   doi:10.1186/gb-2006-7-s1-s2