Log on / register
BioMed Central home | Journals A-Z | Feedback | Support | My details
.refereed research
 |  |  |  |  | 


Open AccessResearch

Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags

Jan Gorodkin1 email, Susanna Cirera1 email, Jakob Hedegaard2 email, Michael J Gilchrist3 email, Frank Panitz2 email, Claus Jørgensen1 email, Karsten Scheibye-Knudsen1 email, Troels Arvin1 email, Steen Lumholdt1 email, Milena Sawera1 email, Trine Green1 email, Bente J Nielsen1 email, Jakob H Havgaard1 email, Carina Rosenkilde1 email, Jun Wang4,5,6 email, Heng Li4,5 email, Ruiqiang Li4,6 email, Bin Liu4 email, Songnian Hu4 email, Wei Dong4 email, Wei Li4 email, Jun Yu4 email, Jian Wang4 email, Hans-Henrik Stærfeldt7 email, Rasmus Wernersson7 email, Lone B Madsen2 email, Bo Thomsen2 email, Henrik Hornshøj2 email, Zhan Bujie2 email, Xuegang Wang2 email, Xuefei Wang2 email, Lars Bolund4,5 email, Søren Brunak7 email, Huanming Yang4 email, Christian Bendixen2 email and Merete Fredholm1 email

1Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, DK-1870 Frederiksberg C, Denmark

2Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Blichers Alle, DK-8830 Tjele, Denmark

3The Wellcome Trust/Cancer Research UK Gurdon Institute, Cambridge, CB2 1QN, UK

4Beijing Genomics Institute, The Airport Industrial Road, Beijing 101300, PR China

5Institute of Human Genetics, University of Aarhus, Nordre Ringgade 1, DK-8000 Aarhus C, Denmark

6Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campus Vej 55, DK-5230 Odense M, Denmark

7Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, DK-2800 Lyngby, Denmark

author email corresponding author email

Genome Biology 2007, 8:R45doi:10.1186/gb-2007-8-4-r45

Published: 2 April 2007

Subject areas: Bioinformatics, Genome studies

Abstract

Background

Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages.

Results

Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories.

Conclusion

This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.


© 1999-2008 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.