Figure 2.

MMSEQ data structures to represent read mappings to alternative isoforms and alternative haplotypes. (a) Schematic of a gene with an alternatively spliced cassette exon. Each read is labeled according to the transcripts it maps to and placed along its alignment position. Reads that map to both transcripts, t1 and t2, are shown in red, reads that map only to t1 are shown in blue and the read that maps only to t2 is shown in green. Reads that align with their start positions in the regions labeled by d1 and d3 (in red) may have come from either transcript, reads with their start positions in d2 (in blue) can only have come from transcript 1, and reads with their start positions in d4 (in green) must be from transcript 2. Each row i of the indicator matrix M characterizes a unique set of transcripts that is mapped to by ki reads. There are three transcript sets: {t1, t2} (red), {t1} (blue) and {t2} (green). Exon lengths are e1, e2, e3. Hence s1 = d1 + d3, s2 = d2 and s3 = d4. The effective length of transcript t is equal to the sum over the elements of s that have a corresponding 1 in column t of M, that is ∑i siMit. It can be seen from the figure that these lengths are the sums of the exons minus read length (ϵ) plus one, as expected. (b)Schematic of a single-exon gene with a heterozygote near the center. Reads with starting positions in region d2 contain either the 'C' allele or the 'G' allele and thus map to either the haplo-isoform t1A, which has a 'C' or t1B, which has a 'G'. It is evident that the heterozygote acts like an alternative middle exon, and that the same model and data structures as in the alternative isoform schematic apply.

Turro et al. Genome Biology 2011 12:R13   doi:10.1186/gb-2011-12-2-r13
Download authors' original image