Genome Biology

official impact factor 6.89

Open Access Method

Improving identification of differentially expressed genes in microarray studies using information from public databases

Richard D Kim1 and Peter J Park1,2*

Author Affiliations

1 Harvard-Partners Center for Genetics and Genomics, 77 Avenue Louis Pasteur, Boston, MA 02115, USA

2 Children's Hospital Informatics Program, 300 Longwood Ave, Boston, MA 02115, USA

For all author emails, please log on.

Genome Biology 2004, 5:R70 doi:10.1186/gb-2004-5-9-r70

Published: 26 August 2004

Abstract

We demonstrate that the process of identifying differentially expressed genes in microarray studies with small sample sizes can be substantially improved by extracting information from a large number of datasets accumulated in public databases. The improvement comes from more reliable estimates of gene-specific variances based on other datasets. For a two-group comparison with two arrays in each group, for example, the result of our method was comparable to that of a t-test analysis with five samples in each group or to that of a regularized t-test analysis with three samples in each group. Our results are further improved by weighting the results of our approach with the regularized t-test results in a hybrid method.