Table 1 

Performance of different modeling and bin selection strategies 

Allbins 
TSSbin 
bins.0.2 
best5bins 
bestbin 



Simple model 
0.772 (2.77) 
0.836 (2.40) 
0.770 (2.78) 
0.867 (2.16) 
0.871 (2.14) 
Twostep model 
0.839 (2.37) 
0.877 (2.10) 
0.841 (2.36) 
0.889 (1.99) 
0.895 (1.94) 


Simple models only perform regression, whereas our twostep model performs classification before regression. The columns are different binselection strategies, where 'allbins' uses the mean density of all bins, 'TSSbin' uses the two bins flanking the TSS, 'bins.0.2' uses the bins with individual correlation coefficient (r) greater than a threshold (0.2 in this case), 'best5bins' uses the top five bins with the greatest r, and 'bestbin' uses the bin(s) with the greatest r. The values are PCCs (r) between predicted and measured expression levels of PolyA+ cytosolic RNA from K562 cells measured by CAGE, and the values in brackets are RMSE for the predictions. 

Dong et al. Genome Biology 2012 13:R53 doi:10.1186/gb2012139r53 