Open Access Highly Accessed Research

The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

Juan C Motamayor1*, Keithanne Mockaitis2, Jeremy Schmutz13, Niina Haiminen4, Donald Livingstone III15, Omar Cornejo6, Seth D Findley1, Ping Zheng7, Filippo Utro4, Stefan Royaert5, Christopher Saski8, Jerry Jenkins13, Ram Podicheti9, Meixia Zhao10, Brian E Scheffler11, Joseph C Stack1, Frank A Feltus8, Guiliana M Mustiga1, Freddy Amores12, Wilbert Phillips13, Jean Philippe Marelli14, Gregory D May15, Howard Shapiro1, Jianxin Ma10, Carlos D Bustamante6, Raymond J Schnell15, Dorrie Main7, Don Gilbert2, Laxmi Parida4 and David N Kuhn5

Author Affiliations

1 Mars, Incorporated, 6885 Elm Street, McLean, VA, 22101, USA

2 Department of Biology, and Center for Genomics and Bioinformatics, Indiana University, 915 E. Third St, Bloomington, IN, 47405, USA

3 HudsonAlpha Institute for Biotechnology, 601 Genome Way NW, Huntsville, AL, 35806, USA

4 IBM T J Watson Research, Yorktown Heights, NY, 10598, USA

5 United States Department of Agriculture-Agriculture Research Service, Subtropical Horticulture Research Station, 13601 Old Cutler Rd, Miami, FL, 33158, USA

6 Department of Genetics, Stanford University, 300 Pasteur Dr, Stanford, CA, 94305, USA

7 Department of Horticulture, Washington State University, Johnson Hall, Pullman, WA, 99164, USA

8 Clemson University Genomics Institute, 105 Collings Street, Clemson, SC, 29634, USA

9 Center for Genomics and Bioinformatics and School of Informatics and Computing, Indiana University, 919 E 10th St, Bloomington, IN, 47408, USA

10 Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA

11 United States Department of Agriculture-Agriculture Research Service, Genomics and Bioinformatics Research Unit, 141 Experiment Station Road, Stoneville, MS, 38776, USA

12 Estación Experimental Tropical Pichilingue, Instituto Nacional Autónomo de Investigaciones Agropecuarias (INIAP), Código Postal 24, Km 5 vía Quevedo - El Empalme, Quevedo, Ecuador

13 Programa de Mejoramiento de Cacao, CATIE 7170, Turrialba, Costa Rica

14 Mars Center for Cocoa Science (MCCS), CP 55, Itajuipe, Bahia, 45630, Brazil

15 National Center for Genome Resources, 2935 Rodeo Park Drive E, Santa Fe, NM, 87505, USA

For all author emails, please log on.

Genome Biology 2013, 14:r53  doi:10.1186/gb-2013-14-6-r53

Published: 3 June 2013

Abstract

Background

Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders.

Results

We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina

1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation.

Conclusions

We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

Keywords:
Theobroma cacao L.; genome; Matina 1-6; haplotype phasing; genetic mapping; pod color; MYB113