Table 6

GN results: performance of nine heuristics used to filter false-positive gene mentions or modify gene mentions to improve dictionary matching performance.

Presence of ...

Example

P

R

F

Modified


0

0.770

0.673

0.718

0

1

Gene chromosome location

3p11-3p12.1

0.772

0.673

0.719

34

2

Single, short lowercase word

heme

0.778

0.672

0.721

112

3

Strings of only numbers &/or punct

9+/-76

0.779

0.672

0.722

206

4

Extra preceding words

protein SNF to SNF

0.790

0.681

0.731

225a

5

Extra trailing words

SNF protein to SNF

0.812

0.723

0.765

419a

6

Amino acids

Ser-119

0.815

0.723

0.766

460

7

Protein families

Bcl-2 family proteins

0.816

0.722

0.766

701

8

Protein domains, motifs, fusion

SNH domain

0.828

0.722

0.771

883

9

Nonhuman keywords

rat IFN gamma

0.829

0.725

0.774

1,086a


Results depicted here are from the development dataset. Step 0 indicates performance before application of any rules. At each step, the rules of preceding steps are also applied. Modified refers to the cumulative number of gene mentions removed or altered. aRules 4 and 5 result in modification of gene mentions only. Rule 9 can result in either modification or removal of gene mentions. All other rules result in removal of gene mentions. GN, gene normalization.

Baumgartner et al. Genome Biology 2008 9(Suppl 2):S9   doi:10.1186/gb-2008-9-s2-s9

Open Data