#suppress_mobile_home()
|
Resolution: standard / high Figure 2.
Nucleotide distribution surrounding fragment ends and calculation of bias weights. (a) Sequence logos showing the distribution of nucleotides in a 23 bp window surrounding
the ends of fragments from an experiment primed with 'not not so random' (NNSR) hexamers
[11]. The 3' end sequences are complemented (but not reversed) to show the sequence of
the primer during first-strand synthesis (see Figure 1). The offset is calculated
so that zero is the 'first' base of the end sequence and only non-negative values
are internal to the fragment. Counts were taken only from transcripts mapping to single-isoform
genes. (b) Sequence logo showing normalized nucleotide frequencies after reweighting by initial
(not bias corrected) FPKM in order to account for differences in abundance. (c) The background distribution for the yeast transcriptome, assuming uniform expression
of all single-isoform genes. The difference in 5' and 3' distributions are due to
the ends being primed from opposite strands. Comparing (c) to (a) and (b) shows that
while the bias is confounded with expression in (a), the abundance normalization reveals
the true bias to extend from 5 bp upstream to 5 bp downstream of the fragment end.
Taking the ratio of the normalized nucleotide frequencies (b) to the background (c)
for the NNSR dataset gives bias weights (d), which further reveal that the bias is partially due to selection for upstream sequences
similar to the strand tags, namely TCCGATCTCT in first-strand synthesis (which selects
the 5' end) and TCCGATCTGA in second-strand synthesis (which selects the 3' end).
Although the weights here are based on independent frequencies, we found correlations
among sites in the window and take these into account in our full model to produce
more informative weights (see Supplementary methods in Additional file 3). A similar figure to this for the standard Illumina Random Hexamer protocol and
plots similar to (d) for all datasets in the paper can be found in Figures S1 and
S2 of Additional file 1 respectively.
Roberts et al. Genome Biology 2011 12:R22 doi:10.1186/gb-2011-12-3-r22 |