Table 2

Comparison between genome assemblies

Sequence identity level
February 2002 assembly*
February 2003 assembly

Duplication content (bp)
90-92%
4,966,470
3,543,429
92-94%
15,685,840
13,981,642
94-96%
17,533,730
17,970,287
96-98%
11,539,392
11,731,958
98-99.5%
5,865,024
5,487,899



Potential sequence misassignment error detected (bp)
99.5-100%
4,832,594
18,456,096

The comparison is of duplication content by sequence identity and potential sequence misassignment errors between the February 2002 (MGSCv3) and February 2003 (a hybrid assembly of MGSCv3 with 705 Mb finished BAC sequence) genome assemblies. *Analysis of the duplication content for February 2002 assembly can be found at [14].Sequences detected to show extremely high percent identity duplications are likely to be genome assembly artifacts and were not included in the duplication content shown in Table 1.

Cheung et al. Genome Biology 2003 4:R47   doi:10.1186/gb-2003-4-8-r47

Open Data