Table 12 |
|
|
The (Java) regular expressions used for the character feature in the GM task |
|
|
Description |
Regexp |
|
|
|
|
Capitals, lower case, hyphen then digit |
[A-Z]+[a-z]*-[0-9] |
|
Capitals followed by digit |
[A-Z]{2,}[0-9]+ |
|
Single capital |
[A-Z] |
|
Single Greek character |
\ p{InGreek} |
|
Letters followed by digits |
[A-Za-z]+[0-9]+ |
|
Lower case, hyphen then capitals |
[a-z]+-[A-Z]+ |
|
Single digit |
[0-9] |
|
Two digits |
[0-9][0-9] |
|
Four digits |
[0-9][0-9][0-9][0-9] |
|
Two capitals |
[A-Z][A-Z] |
|
Three capitals |
[A-Z][A-Z][A-Z] |
|
Four capitals |
[A-Z]{4} |
|
Five or more capitals |
[A-Z]{5,} |
|
Digit then hyphen |
[0-9]+- |
|
All lower case |
[a-z]+ |
|
All digits |
[0-9]+ |
|
Nucleotide |
[AGCT]{3,} |
|
Capital, lower case then digit |
[A-Z][a-z]{2,}[0-9] |
|
Lower case, capitals then any |
[a-z][A-Z][A-Z].* |
|
Greek letter name |
Match any Greek letter name |
|
Roman digit |
[IVXLC]+ |
|
Capital, lower, capital and any |
[A-Z][a-z][A-Z].* |
|
Contains digit |
.*[0-9].* |
|
Contains capital |
.*[A-Z].* |
|
Contains hyphen |
.*-.* |
|
Contains period |
.*\ ..* |
|
Contains punctuation |
.*\ p{Punct}.* |
|
All digits |
[0-9]+ |
|
All capitals |
[A-Z]+ |
|
Is a personal title |
(Mr|Mrs|Miss|Dr|Ms) |
|
Looks like an acronym |
([A-Za-z]\.)+ |
|
|
|
|
GM, gene mention. |
|
|
Alex et al. Genome Biology 2008 9(Suppl 2):S10 doi:10.1186/gb-2008-9-s2-s10 |
|