#suppress_mobile_home()
|
Resolution: standard / high Figure 4.
Indels are enriched in repeat sequences upstream of genes. (A) Close-up of 10 kb region of chromosome 1 containing several positions where hundreds
of reads deviate from the reference in support of an indel. (B) Expected values for max-to-sum ratios of ‘reference’ and ‘indel’ reads in heterozygous
and homozygous regions. (C) Scatterplot of max-to-sum ratios in heterozygous and homozygous regions for every
putative indel in the genome. Histograms at top and right show the distribution of
data on each perpendicular axis as indicated. The color of each point is based on
the legend, where W and C indicate reads from the Watson and Crick strands, respectively.
(D) The cutoff for indel designation, indicated in red, has a 5% false discovery rate
(FDR), based on fitting the sum of gamma and Gaussian distributions, which reflect
the true and false indels, respectively. The histogram in green considered only points
with homozygous max-to-sum ratios <1.0 and rectilinear distances of 0.6 or less from
the point [1.0,0.5]. (E) Indel density as a function of indel size and distance from the start codon. Density
values were normalized to account for the fact that not all coding or intergenic regions
span 1,000 nucleotides. (F) Indels are strongly enriched in repeat sequences. (G) Indels are not a sequencing artifact. The average size reported by all reads supporting
an indel was calculated and then compiled into a histogram representing all indels.
Random sequencing errors would have yielded density at non-integer values and, more
importantly, around zero.
Muzzey et al. Genome Biology 2013 14:R97 doi:10.1186/gb-2013-14-9-r97 |