Figure 9.

Average number of motifs and matches at 15% FDR recovered by the continuous and discrete hypergeometric-distribution scoring functions. The same optimization and filtering procedure was applied on both versions (seed FDR < 0.001, α = 0.75, γ = 0.75). (Left) Average results for the 24 yeast datasets. The leftmost column shows the results obtained by the continuous version of the scoring function implemented in RED2. The 'k = i' columns show the results obtained by the discrete version and i clusters. The 'best k' column shows the results obtained when the number of clusters that yields the highest number of motifs is selected a posteriori for each dataset. (Right) Comparison of the number of predicted motifs that match a known motif in the ScerTF database for the 24 yeast datasets at 15% FDR. The y-axis corresponds to the number achieved by RED2 and the x-axis to the number achieved by the discrete version and the best k. Superimposed points are indicated by shading. RED2 found more motifs than the best clustering for 17 datasets and fewer for three datasets, which gives a sign test P value of 0.003.

Lajoie et al. Genome Biology 2012 13:R109   doi:10.1186/gb-2012-13-11-r109
Download authors' original image