|
The 44 selected sequences within the ENCODE region |
|||||
| Random picks Mouse homology |
|||||
|
|
|||||
| Sequence Set |
Manual picks |
Low |
Medium |
High |
Gene density |
|
|
|||||
| Training |
ENm006 |
ENr132 |
ENr231 |
ENr333 |
High |
| ENr232 |
ENr334 |
||||
| ENm004 |
- |
ENr222 |
ENr323 |
Medium |
|
| ENr223 |
ENr324 |
||||
| - |
ENr111 |
- |
- |
Low |
|
| ENr114 |
|||||
| Test |
ENm002 |
ENr131 |
ENr233 |
ENr331 |
High |
| ENm005 |
ENr133 |
ENr332 |
|||
| ENm007 |
|||||
| ENm008 |
|||||
| ENm009 |
|||||
| ENm010 |
|||||
| ENm011 |
|||||
| ENm001 |
ENr121 |
ENr221 |
ENr321 |
Medium |
|
| ENm003 |
ENr122 |
ENr322 |
|||
| ENm012 |
ENr123 |
||||
| ENm013 |
|||||
| ENm014 |
|||||
| - |
ENr112 |
ENr211 |
ENr311 |
Low |
|
| ENr113 |
ENr212 |
ENr312 |
|||
| ENr213 |
ENr313 |
||||
|
ENCODE sequences were assigned to either the training or the test set based on annotation data availability (see the section 'The EGASP experiment'). For the performance evaluation, only the test set sequences were used. The numeric code for the randomly picked sequence names correspond to the non-exonic conservation with the mouse genome, the density of previously identified genes, and the sequence number, respectively; numbers vary from 1 (low), to 3 (high). Manually selected sequences range in size from 500 kbp to 2 Mbp, while random regions are 500 kbp. The selection and stratification criteria for all the sequences is described at the ENCODE project web site [34]. | |||||
Guigó et al. Genome Biology 2006 7(Suppl 1):S2 doi:10.1186/gb-2006-7-s1-s2 |
|||||