Table 1

Hierarchy of assembly data types

Data type
Description

Scaffold (100 kb to 10 Mb)
Layout of potentially nonoverlapping contigs based on mate-pair information, ideally spanning entire chromosomes or replicons
Contig (5 kb to 500 kb)
Layout of overlapping reads with a consensus sequence
Mate-pair (2 kb to 100 kb)
Pair of end-sequenced reads with a known orientation and separation
Read (0.5 kb to 1.0 kb)
Base-calls and quality scores assigned to a chromatogram
Chromatogram (4× 10,000 time points)
Signal data from a sequencing reaction of a physical piece of DNA

Each type is composed of the next lower level type. Typical sizes are also listed. bp, base pairs; Mb, megabases.

Schatz et al. Genome Biology 2007 8:R34   doi:10.1186/gb-2007-8-3-r34

Open Data