|
Resolution: standard / high Figure 1.
G-Mo.R-Se method for building gene models from short reads. The five black boxes show the 5 steps of the approach. Step 1 (covtig construction)
is the construction of covtigs (coverage contigs), which are built from positions
where short reads are mapped above a given depth threshold. Step 2 (candidate exons)
is the definition of a list of stranded candidate exons derived from each covtig.
Splice sites are searched 100 nucleotides around each covtig boundary, which allows
the orientation of the candidate exons on the forward or the reverse strand, as shown
in the second box. Step 3 (junction validation) consists of the validation of junctions
between candidate exons using a word dictionary built from the unmapped reads. During
step 4 (graph of candidates exons linked by validated junctions), a graph is created
where nodes are candidate exons (black boxes) and oriented edges (purple arrows) between
two nodes represent validated junctions. The two last connected components show an
example of a split gene that can be corrected using open reading frame detection between
the last exon of the first model and the first exon of the second model. In the final
step, step 5 (model construction and coding sequence detection) we go through the
previous graph and extract all possible paths between each source and each sink. Each
path will then represent a predicted transcript, and a CDS will be identified for
each transcript. Models M1, M2, M5 and M7 (untranslated regions are in grey, introns in black and coding exons in red) correctly
model real transcripts T1, T2, T3 and T5 (untranslated regions are in grey, and introns and exons are indicated by black lines
and boxes, respectively). As all possible paths are extracted from the graph, some
of them may not correspond to real transcripts (for example, models M3, M4 and M6).
Denoeud et al. Genome Biology 2008 9:R175 doi:10.1186/gb-2008-9-12-r175 |