|
Resolution: standard / high Figure 2.
Summarizing mapped reads into a gene level count. (a) Mapped reads from a small region of the RNA-binding protein 39 (RBM39) gene are shown for LNCaP prostate cancer cells [90], human liver and human testis from the UCSC track. The three rows of RNA-seq data
(blue and black graphs) are shown as a 'pileup track', where the y-axis at each location
measures the number of mapped reads that overlap that location. Also shown are the
genomic coordinates, gene model (labeled RBM39; blue boxes indicate exons) and conservation score across vertebrates. It is clear
that many reads originate from regions with no known exons. (b) A schematic of a genomic region and reads that might arise from it. Reads are color-coded
by the genomic feature from which they originate. Different summarization strategies
will result in the inclusion or exclusion of different sets of reads in the table
of counts. For example, including only reads coming from known exons will exclude
the intronic reads (green) from contributing to the results. Splice junctions are
listed as a separate class to emphasize both the potential ambiguity in their assignment
(such as which exon should a junction read be assigned to) and the possibility that
many of these reads may not be mapped because they are harder to map than continuous
reads. CDS, coding sequence.
Oshlack et al. Genome Biology 2010 11:220 doi:10.1186/gb-2010-11-12-220 |