Genome Biology

official impact factor 6.89

Open Access Research

Genomic DNA k-mer spectra: models and modalities

Benny Chor1*, David Horn2, Nick Goldman3, Yaron Levy1 and Tim Massingham3

Author Affiliations

1 School of Computer Science, Tel Aviv University, Klausner St, Ramat-Aviv, Tel-Aviv 39040, Israel

2 School of Physics and Astronomy, Tel Aviv University, Klausner St, Ramat-Aviv, Tel-Aviv 39040, Israel

3 European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK

For all author emails, please log on.

Genome Biology 2009, 10:R108 doi:10.1186/gb-2009-10-10-r108

Published: 8 October 2009

Additional files

Additional data file 1:

On the top are 9-mer spectra of human chromosomes (left to right) 1, 6, 20. At the bottom are 11-mer spectra of human chromosomes (left to right) 1, 6, 20. All six spectra are multimodal.

Format: PDF Size: 57KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 2:

Whole genome spectra of human (top) and opossum (bottom); 9-mers (left) and 11-mers (right). All four spectra are multimodal.

Format: PDF Size: 61KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 3:

From top-left: Escherichia coli, Aeropyrum pernix, zebrafish (Danio rerio), pufferfish (Tetraodon nigroviridis), Arabidopsis thaliana, bee (Apis mellifera), nematode (Caenorhabditis elegans), yeast (Saccharomyces cerevisiae), and sea squirt (Ciona savignyi). All spectra are unimodal.

Format: JPEG Size: 374KB Download file

Open Data

Additional data file 4:

Chicken (k = 10), platypus (Ornithorhynchus anatinus, an egg laying mammal; k = 10), frog (k = 11), and lizard (k = 11). All four k-mer spectra are multimodal.

Format: JPEG Size: 104KB Download file

Open Data

Additional data file 5:

Each plot is partitioned according to 8-mers that contain the CpG dimer (colored green), and those that do not (colored blue). The green ones comprise the left-most part in the multimodal spectra for (a) human, and (b) chicken. In the two other, non-tetrapodal species, (c) C. elegans (nematode), and (d) pufferfish, there is no such effect.

Format: JPEG Size: 934KB Download file

Open Data

Additional data file 6:

Whole genome (k = 10), all introns (k = 10), all 3' UTRs (k = 10), all exons (k = 9), all 5' UTRs (k = 8), all 600 base long promotors (k = 6), all 1,000 base long promotors (k = 7), all 5,000 base long promotors (k = 7).

Format: JPEG Size: 260KB Download file

Open Data

Additional data file 7:

Additional information includes taxonomical classification, usable length, percentage C+G content, CpG suppression (measured by ρCG) and whether the k-mer spectrum is observed to have unimodal or multimodal behavior. It also specifies web sites from which sequences were downloaded [22-27], and where different functional genomic regions were downloaded from [28-30].

Format: PDF Size: 59KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data