Genome Biology

official impact factor 6.89

Open Access Research

PRESTA: associating promoter sequences with information on gene expression

Václav Mach

Author Affiliations

Institute of Entomology, Czech Academy of Sciences, Branisovská 31, 370 05 Ceské Budejovice, Czech Republic

Genome Biology 2002, 3:research0050-research0050.7 doi:10.1186/gb-2002-3-9-research0050

Published: 21 August 2002

Abstract

Background

Large sets of well-characterized promoter sequences are required to facilitate the understanding of promoter architecture. The major sequence databases are a prospective source of upstream regulatory regions, but suffer from inaccurate annotation. The software tool PRESTA (PRomoter EST Association) presented in this study is designed for efficient recovery of characterized and partially verified promoters from GenBank and EMBL libraries.

Results

The PRESTA algorithm examines the putative GenBank/EMBL promoters and automatically removes most of the poorly annotated entries. The remaining records are connected to expressed sequence tags (ESTs) through a high-stringency BLAST search. The frequency and source of recovered ESTs provide an estimate of the activity and expression pattern of the promoter, and the ESTs' 5' ends assist in transcription start-site verification. The PRESTA database provides easy access to non-redundant upstream regulatory regions recently extracted by the PRESTA algorithm. The current size of this resource is 552 human and 241 mouse promoters. Surprisingly, no overlap between the PRESTA database and the Eukaryotic Promoter Database (EPD) was detected by sequence comparison.

Conclusions

The PRESTA algorithm demonstrates the principle of promoter verification by mapping EST 5' ends. The publicly available PRESTA database collects hundreds of characterized and partially verified promoter sequences and is complementary to other promoter databases.