Table 2

Top features in CoreBoost

Classifier type
Features

CpG
P versus U
Log-likelihood ratios from third order Markov chain, log-likelihood ratios from TSS weight matrix


GC-box score, weighted score of transcription factor NFY, weighted energy score at position +1


Weighted score of transcription factor YY1, TATA score, weighted score of transcription factor ELK1


MTE score, weighted score of transcription factor CREB

P versus D
Log-likelihood ratios from third order Markov chain, GC-box score


Weighted score of transcription factor NFY


Log-likelihood ratios from TSS weight matrix


Difference between the energy score around positions -25 and +1 and the average from surroundings


Log-likelihood ratios from transcription factor ELK1, frequency of G+C


Log-likelihood ratios from transcription factor YY1, TATA score, frequency of G
Non-CpG
P versus U
Correlation between vector of energy scores and empirical average energy profile


Log-likelihood ratios from third order Markov chain, TATA score


Difference between the energy score around positions -25 and +1 and the average from surroundings


Weighted energy at position +1


Proportion of Inr and GC-box pair within 10 bp of observed distance, Inr score.

P versus D
Correlation between vector of energy scores and empirical average energy profile, TATA score


Log-likelihood ratios from third order Markov chain


Weighted energy at position +1


Correlation between vector of flexibility scores and empirical average flexibility profile, Inr score


Difference between the flexibility score around position +1 and the average from surroundings, GC-box score

bp, base pairs; D, immediate downstream sequence; P, promoter; TSS, transcription start site; U, immediate upstream sequence.

Zhao et al. Genome Biology 2007 8:R17   doi:10.1186/gb-2007-8-2-r17

Open Data