Table 1

Coverage of validation sets (excluding PMIDs in the training set) within the top10k, top50k, and top100k ranked abstracts for the vector space model relevancy ranking


TRANSFAC
FlyReg
ORegAnno Queue
ORegAnno prior to RegCreative
RegCreative success
RegCreative failure

Number of PMIDs
5,719
200
4,145
376
260
218
Number of PMIDs (no training data)
5,183
186
3,687
340
228
212
Number in top10k
1,390
38
1,035
89
59
18
Percent in top10k
26.8%
20.4%
28.1%
26.2%
25.9%
8.5%
Number in top50k
3,908
146
2,753
260
165
79
Percent in top50k
75.4%
78.5%
74.7%
76.5%
72.4%
37.3%
Number in top100k
4,572
166
3,208
301
199
110
Percent in top100k
88.2%
89.2%
87.0%
88.5%
87.3%
51.9%

Aerts et al. Genome Biology 2008 9:R31   doi:10.1186/gb-2008-9-2-r31