#suppress_mobile_home()
|
Resolution: standard / high Figure 6.
One-parameter model reveals strong relationship between indel rate and repeat-sequence
abundance. (A) Indel rate as a function of repeat length is plotted, with coloring indicating the
inserted or deleted nucleotides as shown in the legend. Repeat length is the average
of the ‘reference’ and ‘indel’ read lengths; thus, for single-base indels, repeat
length is ‘x.5’ for integer values of x. (B-E) Gray dotted lines show repeat-sequence abundance as a function of length for A:T
homopolymers (B, E) G:C homopolymers (C), and AT:TA dyad-repeats (D). The colored lines show the lowest-error model fit based on the indel rates in (A), with error and α values specified. To prevent overfitting at low repeat-length values, error is calculated
as the average squared deviation in log space, not linear space. (F) Abundance of A:T homopolymers as a function of length in various indicated organisms.
A histogram was generated for each species independently; to facilitate comparisons
among species, the data were then normalized such that the abundance at length 3 is
1.0 and then scaled - to adjust for differences in genomic A:T content - such that
the abundance at length 6 is 0.75. The dashed line indicates where α = 0.
Muzzey et al. Genome Biology 2013 14:R97 doi:10.1186/gb-2013-14-9-r97 |