Table 7

Filtering rules for species, direct references, and chromosomal locations

Species


-

non-human-species <candidate name>

+

human and nonhuman-species <candidate name>

-

<candidate name> {(, ','} {a, an, the} not-human-species

+

<candidate name> {(, ','} {a, an, the} human

+

human <candidate name> {(, ','} {a, an, the}

Direct mentions, cell lines, chromosomal loci

+

<candidate name> {gene, protein}

-

<candidate name> {cell(s), culture(s)}

+

{locus, loci, location, chromosome, chromosomal, gene * associated}


Examples for heuristic rules to filter out candidate names when they appear to refer to some other concept (gene from another species, cell line, disease locus). '<candidate name>' refers to the occurrence of the potential gene name under consideration. Keep (+) or remove (-) a candidate name when the sentence contains the pattern ('+' rules have preference). 'human' includes references to mammals.

Hakenberg et al. Genome Biology 2008 9(Suppl 2):S14   doi:10.1186/gb-2008-9-s2-s14

Open Data