Figure 4.

Evidence for the 'buffering' of deleterious TFBS variation by neighboring homotypic motifs in Drosophila. (a) Distributions of average motif load per 100 kb window along Drosophila chromosome 2R and chromosome × (yellow; see Figure S5 in Additional file 1 for other chromosomes). Recombination rate distributions along the chromosomes (dashed lines) are from [22] (and are near-identical to an earlier analysis [43]); note that there is no apparent correlation between these two parameters. Regions of high average motif load marked with asterisks are further examined in (b). Average motif load is computed excluding a single maximum value to reduce the impact of outliers. (b) Examples of motif arrangement at regions that fall within 100 kb windows having high average motif load (L >5e-3). Motifs with no detected deleterious variation (L = 0) are colored grey, and those with non-zero load pink (low load) to red (high load). Asterisks refer to similarly labeled peaks from (a). Note that most high-load motifs found in these regions have additional motifs for the same TF in their proximity. (c) Distributions of average load across ranges of phylogenetic conservation for motifs with a single match within a bound region ('singletons', blue) versus those found in pairs ('duplets', red). For equivalent comparison, a random motif out of the duplet was chosen for each bound region and the process was repeated 100 times. Results are shown for the four TFs for which appreciable differences between 'singletons' and 'duplets' were detected. Phylogenetic conservation is expressed in terms of branch length score (BLS) ranges, similarly to Figure 2b. The P-value is from a permutation test for the sum of average load differences for each range between 'singleton' and 'duplet' motifs. Average load was computed excluding a single maximum value. (d) Relationship between the average load per TF and the average number of motifs per bound region. Average load was computed excluding a single maximum value; r is Pearson's correlation coefficient and the P-value is from the correlation test. (e) The difference in motif score between motif pairs mapping to the same bound regions: the one with the highest load versus one with a zero load ('constant'; left) or in random pairs (right). These results suggest that the major alleles of motifs with a high load are generally not 'weaker' than their non-varying neighbors (the P-value is from the Wilcoxon test).

Spivakov et al. Genome Biology 2012 13:R49   doi:10.1186/gb-2012-13-9-r49