Genome Biology

official impact factor 6.89

Open Access Highly Access Research

Improved genome assembly and evidence-based global gene model set for the chordate Ciona intestinalis: new insight into intron and operon populations

Yutaka Satou1*, Katsuhiko Mineta2, Michio Ogasawara3, Yasunori Sasakura4, Eiichi Shoguchi1, Keisuke Ueno2, Lixy Yamada5, Jun Matsumoto6, Jessica Wasserscheid7, Ken Dewar7, Graham B Wiley8, Simone L Macmil8, Bruce A Roe8, Robert W Zeller9, Kenneth EM Hastings6, Patrick Lemaire10, Erika Lindquist11, Toshinori Endo2, Kohji Hotta12 and Kazuo Inaba4

Author Affiliations

1 Department of Zoology, Graduate School of Science, Kyoto University, Sakyo, Kyoto, 606-8502, Japan

2 Graduate School of Information Science and Technology, Hokkaido University, N14W9, Sapporo, 060-0814, Japan

3 Graduate School of Science and Technology, Chiba University, Inage, Chiba, 263-8522, Japan

4 Shimoda Marine Research Center, University of Tsukuba, Shimoda, Shizuoka, 415-0025, Japan

5 Division of Disease Proteomics, Institute for Enzyme Research, The University of Tokushima, 3-15-18 Kuramoto-cho, Tokushima, 770-8503, Japan

6 Montreal Neurological Institute and Departments of Neurology and Neurosurgery and Biology, McGill University, 3801 University St, Montreal, Quebec, H3A 2B4, Canada

7 McGill University and Genome Quebec Innovation Centre, and Department of Human Genetics, McGill University, Montreal, Quebec, H3A 2B4, Canada

8 Advanced Center for Genome Technology, and Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma, 73019-0370, USA

9 Department of Biology, San Diego State University, San Diego, California, 92182-4614, USA

10 Institut de Biologie du Developpement de Marseille Luminy (IBDML), CNRS-UMR6216/Universite de la Mediterranee Aix-Marseille, Marseille, 13288, France

11 DOE Joint Genome Institute, Genomic Technologies Department, 2800 Mitchell Drive, Walnut Creek, California, 94598, USA

12 Faculty of Science and Technology, Keio University, Kouhoku, Yokohama, 223-8522, Japan

For all author emails, please log on.

Genome Biology 2008, 9:R152 doi:10.1186/gb-2008-9-10-r152

Published: 14 October 2008

Abstract

Background

The draft genome sequence of the ascidian Ciona intestinalis, along with associated gene models, has been a valuable research resource. However, recently accumulated expressed sequence tag (EST)/cDNA data have revealed numerous inconsistencies with the gene models due in part to intrinsic limitations in gene prediction programs and in part to the fragmented nature of the assembly.

Results

We have prepared a less-fragmented assembly on the basis of scaffold-joining guided by paired-end EST and bacterial artificial chromosome (BAC) sequences, and BAC chromosomal in situ hybridization data. The new assembly (115.2 Mb) is similar in length to the initial assembly (116.7 Mb) but contains 1,272 (approximately 50%) fewer scaffolds. The largest scaffold in the new assembly incorporates 95 initial-assembly scaffolds. In conjunction with the new assembly, we have prepared a greatly improved global gene model set strictly correlated with the extensive currently available EST data. The total gene number (15,254) is similar to that of the initial set (15,582), but the new set includes 3,330 models at genomic sites where none were present in the initial set, and 1,779 models that represent fusions of multiple previously incomplete models. In approximately half, 5'-ends were precisely mapped using 5'-full-length ESTs, an important refinement even in otherwise unchanged models.

Conclusion

Using these new resources, we identify a population of non-canonical (non-GT-AG) introns and also find that approximately 20% of Ciona genes reside in operons and that operons contain a high proportion of single-exon genes. Thus, the present dataset provides an opportunity to analyze the Ciona genome much more precisely than ever.