Open Access Research

Assembly of the Candida albicans genome into sixteen supercontigs aligned on the eight chromosomes

Marco van het Hoog1, Timothy J Rast2, Mikhail Martchenko1, Suzanne Grindle2, Daniel Dignard1, Hervé Hogues1, Christine Cuomo3, Matthew Berriman4, Stewart Scherer5, BB Magee2, Malcolm Whiteway1, Hiroji Chibana6, André Nantel1 and PT Magee2*

  • * Corresponding author: PT Magee magee@umn.edu

  • † Equal contributors

Author affiliations

1 Biotechnology Research Institute, National Research Council of Canada, Montreal, Quebec, H4P 2R2, Canada

2 University of Minnesota, Minneapolis, MN, 55455, USA

3 Broad Institute of MIT and Harvard, Cambridge, MA, USA

4 Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK

5 Paseo Grande, Moraga, CA 94556, USA

6 Research Center for Pathogenic Fungi and Microbial Toxicoses, Chiba University, Chiba, 260-8673, Japan

For all author emails, please log on.

Citation and License

Genome Biology 2007, 8:R52  doi:10.1186/gb-2007-8-4-r52

Published: 9 April 2007

Abstract

Background

The 10.9× genomic sequence of Candida albicans, the most important human fungal pathogen, was published in 2004. Assembly 19 consisted of 412 supercontigs, of which 266 were a haploid set, since this fungus is diploid and contains an extensive degree of heterozygosity but lacks a complete sexual cycle. However, sequences of specific chromosomes were not determined.

Results

Supercontigs from Assembly 19 (183, representing 98.4% of the sequence) were assigned to individual chromosomes purified by pulse-field gel electrophoresis and hybridized to DNA microarrays. Nine Assembly 19 supercontigs were found to contain markers from two different chromosomes. Assembly 21 contains the sequence of each of the eight chromosomes and was determined using a synteny analysis with preliminary versions of the Candida dubliniensis genome assembly, bioinformatics, a sequence tagged site (STS) map of overlapping fosmid clones, and an optical map. The orientation and order of the contigs on each chromosome, repeat regions too large to be covered by a sequence run, such as the ribosomal DNA cluster and the major repeat sequence, and telomere placement were determined using the STS map. Sequence gaps were closed by PCR and sequencing of the products. The overall assembly was compared to an optical map; this identified some misassembled contigs and gave a size estimate for each chromosome.

Conclusion

Assembly 21 reveals an ancient chromosome fusion, a number of small internal duplications followed by inversions, and a subtelomeric arrangement, including a new gene family, the TLO genes. Correlations of position with relatedness of gene families imply a novel method of dispersion. The sequence of the individual chromosomes of C. albicans raises interesting biological questions about gene family creation and dispersion, subtelomere organization, and chromosome evolution.