Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

Paper report

Identifying structured regions in E. coli DNA

Rachel Brem

  • Correspondence: Rachel Brem

Author Affiliations

Genome Biology 2000, 1:reports0056  doi:10.1186/gb-2000-1-2-reports0056

The electronic version of this article is the complete one and can be found online at: http://genomebiology.com/2000/1/2/reports/0056


Received:20 May 2000
Published:19 July 2000

© 2000 BioMed Central Ltd

Significance and context

It is now becoming possible to predict some features of DNA structure. But many computational methods for this purpose focus on just bending, or stacking stability, or flexibility - that is, each program is restricted to a single structural feature. Pedersen et al. have designed a new approach that uses five of these single-feature programs simultaneously. If a region of DNA is given a high score in all five programs, the authors hypothesize that the region is biologically significant. The authors report and analyze these putatively significant regions in the genes, promoters and non-coding regions of 18 prokaryotic genomes. The new methodology is important, in that its signal-to-noise ratio may be very much greater than that in individual programs: it may pick out biologically relevant sequences where other methodologies cannot.

Key results

Pedersen et al. list 20 putatively significant regions of 'extreme structure' - that is, regions predicted to be more significantly structured than controls - in the genome of Escherichia coli. Only one of these - an operon containing the uncharacterized rhsE gene - has been previously identified. The authors also cluster all E. coli genes with respect to bending score, stacking stability score, and so on, as scored by the programs. At least 8 of the resulting 11 clusters are enriched for genes involved in specific functions, such as respiration. (There is no control for significance level in this calculation.) Lastly, Pedersen et al. study the differences in bending, stacking stability, and other parameters between coding and non-coding DNA across all genomes, relative to shuffled controls. Although trends do not stand out with strong significance in these data, the authors determine that intergenic DNA containing promoters is more curved, less flexible and less stable than coding DNA.

Methodological innovations

The authors use previously documented programs that score di- or tri-nucleotides via empirical parameters trained on the following types of data: DNaseI cutting frequencies, which report flexibility; nucleosome binding, which reports flexibility; disparity of positions in X-ray crystal structures of DNA bound to proteins, which reports deformability; quantum-mechanical energy calculations, which report stability; and mobility on polyacrylamide gel electrophoresis, which reports curvature. Pedersen et al. apply each program to each di- or tri-nucleotide in a genome of interest, then identify significant 1000 bp regions as those containing many di- or tri-nucleotides given high scores by all five programs. Similar calculations on shuffled genomes provide a control, which establishes the probability of finding high-scoring regions by chance.

Conclusions

The authors speculate that several of their 20 predicted regions of 'extreme structure' in the E. coli genome may be positions of kinks in supercoiled DNA. They also speculate, on the basis of results from their 11 clusters of E. coli genes, that functionally related genes might have similar DNA structure. And their finding that promoter DNA is less stable and more curved is consistent with biochemical hypotheses: during transcriptional initiation, the double helix needs to unwind easily, and it is also believed to wrap around the RNA polymerase molecule.

Reporter's comments

The methodology in this paper is sound and potentially important, but it is hard to evaluate the results fully because they contain few positive controls. The next step should be experimental verification of the authors' 20 putatively significant DNA regions. Then can Pedersen et al. can make a convincing case that their new tool makes truly useful predictions.

Table of links

Assumptions that are made about each paper that is the subject of a report, unless otherwise specified:
The full text and figures are available only to subscribers of the journal, but are available over the internet from the journal's website. The paper itself is abstracted by PubMed. There is no supplementary material.