Table 2

Article filtering performance with different features and classifiers

Model

Precision

Recall

F1 score

AUC


Mean

0.6642

0.7636

0.6868

0.7351

Standard deviation

0.0810

0.1926

0.1035

0.0741

Best reported in terms of AUC [8]

0.7080

0.8609

0.7770

0.8554

Our results in BioCreative 2006

0.7507

0.8107

0.7795

0.8471


Term (baseline)

0.7016

0.8213

0.7568

0.8037

String

0.7044

0.8960

0.7887

0.8416

Named entity (NE)

0.5815

0.9600

0.7243

0.7570

Template

0.7841

0.7653

0.7746

0.8239

String + NE

0.7360

0.8773

0.8005

0.8479

String + template

0.7416

0.8880

0.8082

0.8372

String + NE + template

0.7585

0.8373

0.7959

0.8507

String + term + NE + template

0.7432

0.8720

0.8025

0.8608


Naïve Bayes classifier

0.6321

0.8613

0.7291

0.7884

Multinomial classifier

0.6264

0.8720

0.7290

0.7770

Linear kernel SVM

0.7016

0.8213

0.7568

0.8037

p-spectrum kernel SVM (p = 7)

0.7352

0.8293

0.7794

0.8376


Integration of the above four classifiers (AdaBoost)

0.7995

0.8933

0.8438

0.8746


This table shows the experimental results from article filtering. AUC, area under the receiving operator characteristic curve; SVM, support vector machine.

Huang et al. Genome Biology 2008 9(Suppl 2):S12   doi:10.1186/gb-2008-9-s2-s12

Open Data