Email updates

Keep up to date with the latest news and content from Genome Biology and BioMed Central.

Open Access Research

Improved analytical methods for microarray-based genome-composition analysis

Charles C Kim*, Elizabeth A Joyce, Kaman Chan and Stanley Falkow

Author Affiliations

Department of Microbiology and Immunology, 299 Campus Drive, Stanford University Medical Center, Stanford, CA 94305, USA

For all author emails, please log on.

Genome Biology 2002, 3:research0065-research0065.17  doi:10.1186/gb-2002-3-11-research0065

Published: 29 October 2002

Abstract

Background

Whereas genome sequencing has given us high-resolution pictures of many different species of bacteria, microarrays provide a means of obtaining information on genome composition for many strains of a given species. Genome-composition analysis using microarrays, or 'genomotyping', can be used to categorize genes into 'present' and 'divergent' categories based on the level of hybridization signal. This typically involves selecting a signal value that is used as a cutoff to discriminate present (high signal) and divergent (low signal) genes. Current methodology uses empirical determination of cutoffs for classification into these categories, but this methodology is subject to several problems that can result in the misclassification of many genes.

Results

We describe a method that depends on the shape of the signal-ratio distribution and does not require empirical determination of a cutoff. Moreover, the cutoff is determined on an array-to-array basis, accounting for variation in strain composition and hybridization quality. The algorithm also provides an estimate of the probability that any given gene is present, which provides a measure of confidence in the categorical assignments.

Conclusions

Many genes previously classified as present using static methods are in fact divergent on the basis of microarray signal; this is corrected by our algorithm. We have reassigned hundreds of genes from previous genomotyping studies of Helicobacter pylori and Campylobacter jejuni strains, and expect that the algorithm should be widely applicable to genomotyping data.