Figure 2.

The effect of the number of sequence reads on the comparison of nucleosome maps using different measures. (a, b) We randomly sampled in vivo nucleosome data from yeast at different levels of genomic coverage [10]. At each level, five pairs of nucleosome maps were generated and, for each map, we estimated five different quantities: nucleosome occupancy, absolute nucleosome positioning (without smoothing), conditional positioning (without smoothing), smoothed absolute positioning and smoothed conditional positioning (both using a Gaussian filter with 20-bp standard deviation). For each pair of maps and every estimated measure, the Pearson correlation between each pair of maps was computed; this simulates the comparison of two replicates with the same level of coverage and thus shows the difference between two random samples from the same experiment with the same number of reads. The black arrow indicates an average read number beyond the scale of the y-axis. (b) An expansion of (a) for low numbers of reads. Standard deviation at all plotted points is smaller than 0.001. The coverage of the full in vivo map was about 2.2 nucleosome read starts per base pair. This simulation addresses only the error introduced by sampling and does not simulate the effect of other sources of errors in the experiments. These include the effect of variability in the extracted lengths of nucleosome-protected sequences, to which measures such as positioning are especially sensitive. Vertical dashed lines indicate the approximate amount of uniquely mapped reads in various studies [5,10-12,18,24], suggesting that sequencing coverage in several of these studies might lead to underestimation of similarities among maps, depending on the estimated quantity.

Kaplan et al. Genome Biology 2010 11:140   doi:10.1186/gb-2010-11-11-140
Download authors' original image