|
Resolution: standard / high Figure 1.
Overview of sequencing and annotation for a whole-genome shotgun project, for example,
sequencing a bacterial genome. First (a), genomic DNA is purified, broken into short fragments and cloned into E. coli. The cloned fragments are then sequenced from both ends on an automated sequencing
machine. The resulting sequences (shown in (b) as they appear on the sequencing machine display) are then assembled using a complex
software program that identifies overlaps into (c) large, contiguous sequences representing the chromosomes from the original DNA. Gaps
are filled until the genome is complete. (d) Annotation begins with the execution of several gene-finding programs, such as Glimmer,
which identifies protein-coding genes, tRNAScan, which identifies tRNAs, and other
programs for other genome features. (e) These initial predictions are used as the basis for BLAST searches against large protein
databases, which identify related proteins based on sequence similarity. Translated
(Blastx) searches are then used to scan the databases to detect any proteins that
match the DNA regions in between predicted genes. Customized annotation programs are
used to decide what name and function to assign to each protein, leading to (f) the final annotated genome.
Salzberg Genome Biology 2007 8:102 doi:10.1186/gb-2007-8-1-102 |