Genome Biology

official impact factor 6.89

Open Access Highly Access Research

A new non-linear normalization method for reducing variability in DNA microarray experiments

Christopher Workman1,2*, Lars J Jensen2, Hanne Jarmer2, Randy Berka4, Laurent Gautier2, Henrik B Nielser2, Hans-Henrik Saxild3, Claus Nielsen5, Søren Brunak2 and Steen Knudsen2

Author Affiliations

1 GeneData AG, Maulbeerstrasse 46, CH-4058 Basel, Switzerland

2 Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark

3 Center for Microbiology, Technical University of Denmark, DK-2800 Lyngby, Denmark

4 Novozymes Biotechnology, 1445 Drew Avenue, Davis, CA 95616, USA

5 Statens Serum Institut, DK-2300 Copenhagen, Denmark

For all author emails, please log on.

Genome Biology 2002, 3:research0048-research0048.16 doi:10.1186/gb-2002-3-9-research0048

Published: 30 August 2002

Abstract

Background

Microarray data are subject to multiple sources of variation, of which biological sources are of interest whereas most others are only confounding. Recent work has identified systematic sources of variation that are intensity-dependent and non-linear in nature. Systematic sources of variation are not limited to the differing properties of the cyanine dyes Cy5 and Cy3 as observed in cDNA arrays, but are the general case for both oligonucleotide microarray (Affymetrix GeneChips) and cDNA microarray data. Current normalization techniques are most often linear and therefore not capable of fully correcting for these effects.

Results

We present here a simple and robust non-linear method for normalization using array signal distribution analysis and cubic splines. These methods compared favorably to normalization using robust local-linear regression (lowess). The application of these methods to oligonucleotide arrays reduced the relative error between replicates by 5-10% compared with a standard global normalization method. Application to cDNA arrays showed improvements over the standard method and over Cy3-Cy5 normalization based on dye-swap replication. In addition, a set of known differentially regulated genes was ranked higher by the t-test. In either cDNA or Affymetrix technology, signal-dependent bias was more than ten times greater than the observed print-tip or spatial effects.

Conclusions

Intensity-dependent normalization is important for both high-density oligonucleotide array and cDNA array data. Both the regression and spline-based methods described here performed better than existing linear methods when assessed on the variability of replicate arrays. Dye-swap normalization was less effective at Cy3-Cy5 normalization than either regression or spline-based methods alone.