Clustering analysis of SAGE data using a Poisson approach
-
* Corresponding author: Wing H Wong wwong@stat.harvard.edu
- Equal contributors
1 Department of Research Computing, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA
2 Department of Biostatistics, Harvard School of Public Health, 66 Huntington Avenue, Boston, MA 02115, USA
3 Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
4 Department of Statistics, Harvard University, Science Center, 1 Oxford Street, Cambridge, MA 02138, USA
5 Current address: Department of Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, CA 94720, USA
6 Current address: Department of Neuroscience, Johns Hopkins University School of Medicine, 773 N Broadway Ave, Baltimore, MD 21287, USA
Genome Biology 2004, 5:R51 doi:10.1186/gb-2004-5-7-r51
Published: 29 June 2004Abstract
Serial analysis of gene expression (SAGE) data have been poorly exploited by clustering analysis owing to the lack of appropriate statistical methods that consider their specific properties. We modeled SAGE data by Poisson statistics and developed two Poisson-based distances. Their application to simulated and experimental mouse retina data show that the Poisson-based distances are more appropriate and reliable for analyzing SAGE data compared to other commonly used distances or similarity measures such as Pearson correlation or Euclidean distance.