Genome Biology

official impact factor 6.89

Open Access Highly Access Method

Clustering analysis of SAGE data using a Poisson approach

Li Cai1, Haiyan Huang2,5, Seth Blackshaw3,6, Jun S Liu4, Connie Cepko3 and Wing H Wong4,2*

Author Affiliations

1 Department of Research Computing, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA

2 Department of Biostatistics, Harvard School of Public Health, 66 Huntington Avenue, Boston, MA 02115, USA

3 Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA

4 Department of Statistics, Harvard University, Science Center, 1 Oxford Street, Cambridge, MA 02138, USA

5 Current address: Department of Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, CA 94720, USA

6 Current address: Department of Neuroscience, Johns Hopkins University School of Medicine, 773 N Broadway Ave, Baltimore, MD 21287, USA

For all author emails, please log on.

Genome Biology 2004, 5:R51 doi:10.1186/gb-2004-5-7-r51

Published: 29 June 2004

Abstract

Serial analysis of gene expression (SAGE) data have been poorly exploited by clustering analysis owing to the lack of appropriate statistical methods that consider their specific properties. We modeled SAGE data by Poisson statistics and developed two Poisson-based distances. Their application to simulated and experimental mouse retina data show that the Poisson-based distances are more appropriate and reliable for analyzing SAGE data compared to other commonly used distances or similarity measures such as Pearson correlation or Euclidean distance.