Table 5 |
|
|
An example for constructing string features |
|
|
Input and processed documents and condidate string features |
Details |
|
|
|
|
Input document |
The Three Human Syntrophin Genes Are Expressed in Diverse Tissues, Have Distinct Chromosomal Locations, and Each Bind to Dystrophin and Its Relatives |
|
Processed document |
the three human syntrophin genes are expressed in diverse tissues have distinct chromosomal locations and each bind to dystrophin and its relatives |
|
Candidate string features |
the thr |
|
he thre |
|
|
e three |
|
|
three h |
|
|
hree hu |
|
|
ree hum |
|
|
ee huma |
|
|
e human |
|
|
... |
|
|
|
|
|
The length of substring is fixed to 7. The example document only has one sentence (the title of the document of PMID:8576247). A seven-character window moves along the sequential text. All characters are converted to lower case. Only alphabetical letters and the space character are processed. Punctuation is converted to the space character. |
|
|
Huang et al. Genome Biology 2008 9(Suppl 2):S12 doi:10.1186/gb-2008-9-s2-s12 |
|