Open Access Highly Accessed Research

Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights

Raluca Gordân1, Kevin F Murphy1, Rachel P McCord12, Cong Zhu1, Anastasia Vedenko1 and Martha L Bulyk1234*

Author Affiliations

1 Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA

2 Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138, USA

3 Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA

4 Harvard-MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA 02115, USA

For all author emails, please log on.

Genome Biology 2011, 12:R125  doi:10.1186/gb-2011-12-12-r125

Published: 21 December 2011

Additional files

Additional file 1:

Detailed methods, additional figures, and additional tables. Figure S1: ClustalW protein sequence alignment of Vhr1 and its homologs in sensu stricto Saccharomyces species. The alignment shows that the second putative basic region of Vhr1 is more conserved than the first basic region. Figure S2: unlike AP-1 bZIPs, Vhr1 and Vhr2 bind only to overlapping half-sites. (a) AP-1 bZIP transcription factors (Gcn4, Yap1, Jundm2, and the Fos-Jun heterodimer) and Vhr1 transcription factors (Vhr1 and Vhr2) bind to overlapping TGAC or TTAC half-sites. For each TF we sorted the 8-mers in decreasing order of their E-score, from 0.5 (highest affinity) to -0.5 (lowest affinity). The black lines show the 8-mers that contain TGACT (or TTACT for Yap1). (b) AP-1 factors (Gcn4, Yap1, Jundm2, and Fos-Jun) also bind to non-overlapping half-sites, while Vhr1 factors (Vhr1 and Vhr2) do not bind to non-overlapping half-sites. The black lines show the 8-mers that contain TGACGT (or TTACGT for Yap1). The PBM data were reported in Zhu et al. [11] (Gcn4, Yap1), Badis et al. [16] (Jundm2), Alibés et al. [76] (Jun-Fos), or this study (Vhr1 and Vhr2). Figure S3: comparison of the DNA binding specificities of Hac1 (both from this study and from Badis et al. [10]) against bHLH and bZIP TFs. (a) PBM-derived motifs for bZIP TF Hac1 match motifs of bHLH TFs better than motifs of bZIP TFs. (b, c) In-depth comparison of the DNA binding specificities of Hac1 and bHLH TF Cbf1. (d) In-depth comparison of the DNA binding specificities of Hac1 (this study) and two bZIP proteins that bind overlapping or adjacent TGAC half-sites: Gcn4 and Sko1, respectively. The scatter plots show the 8-mer E-scores. Figure S4: primary and secondary DNA binding site motifs derived from high-resolution in vitro PBM data. Figure S5: comparison of motif enrichment in ChIP-chip data for the 27 TF motifs reported in this study versus previously reported PBM-derived (Badis et al. [10]), ChIP-derived (MacIsaac et al. [20]), or MITOMI-derived (Fordyce et al. [12]) motifs for these 27 TFs (where available). Figure S6: S. cerevisiae orphan DNA binding site motifs. Figure S7: Schema of PBM experimental pipeline and results. A total of 228 ORFs/DBDs were considered in this study. Those lacking in vitro PBM data refers to initiation of this study in late 2008 after completion of our prior PBM survey (Zhu et al. [11]) and prior to publication of two more recent in vitro surveys (Badis et al. [10]; Fordyce et al. [12]). Table S1: TF DNA binding site motifs from the in vitro PBM data of Badis et al. [10]. Table S2: TF DNA binding site motifs from the in vitro MITOMI data of Fordyce et al. [12]. Table S3: TFs with curated high-resolution DNA binding site motifs derived from in vitro PBM data. The source of the selected motif (PWM) is indicated. Table S5: TFs with DNA binding site motifs reported by MacIsaac et al. [20] according to in vivo ChIP-chip data. TFs for which high-resolution in vitro motifs are also available are marked in boldface font. Table S8: TFs with secondary DNA binding site motifs identified from the curated set of high-resolution PBM data.

Format: DOC Size: 2MB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 2:

Data file S1. Curated set of high-resolution DNA binding site motifs (PWMs) for 150 S. cerevisiae TFs. The file contains 150 primary motifs and 39 secondary motifs derived from PBM data.

Format: TXT Size: 52KB Download file

Open Data

Additional file 3:

Data file S2. Curated high-resolution PBM data for 150 S. cerevisiae TFs, represented as E-scores for all ungapped 8-mers. These data correspond to the motifs provided in Additional file 2 (that is, the E-scores in this data file and the PWMs in Additional file 2 were generated from the same PBM experiments).

Format: ZIP Size: 49.9MB Download file

Open Data

Additional file 4:

Table S4. Comparison of high-resolution in vitro DNA binding site motifs for S. cerevisiae TFs.

Format: PDF Size: 2.2MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Table S6. Comparison of in vivo motifs (MacIsaac et al. [20]) and in vitro motifs (selected from this study, Zhu et al. [11], or Badis et al. [10]) for 150 S. cerevisiae TFs. TFs for which the in vivo and in vitro motifs are different are marked in red font.

Format: PDF Size: 2.5MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Table S7. Discrepancies between in vivo and in vitro motifs for S. cerevisiae TFs.

Format: XLS Size: 315KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Table S9. All over-represented functional categories of target genes for each TF examined in this study.

Format: XLS Size: 77KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Data file S3. Gapped and ungapped 8-mers with a PBM enrichment score of at least 0.35.

Format: ZIP Size: 13.1MB Download file

Open Data

Additional file 9:

Table S10. All significant specific conditions and condition categories from CRACR analysis for each TF examined in this study.

Format: XLS Size: 919KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 10:

Table S11. Predicted direct and indirect TF-DNA interactions.

Format: XLS Size: 115KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 11:

Table S12. DNA binding site motifs available for known or putative S. cerevisiae TFs.

Format: XLS Size: 7.8MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 12:

Table S13. Categorization of remaining S. cerevisiae potential sequence-specific DNA binding proteins. For each of the 222 yeast proteins below, we list: the systematic name (column A); standard name (column B); structural domain found within protein (column C); designation for sequence specific DNA binding ability, either Likely, Maybe or Unlikely (column D); description of protein from the Saccharomyces Genome Database, including additional literature references to experimental evidence for DNA binding consensus sequences, ChIP motifs or other relevant information (column E). Criteria used for categorizing likelihood of sequence-specific DNA binding for Likely category included having a well characterized sequence-specific DNA binding domain and/or experimental evidence for sequence-specific DNA binding involving direct contact with DNA molecule (as opposed to indirect binding mediated through another protein factor). The Maybe category included proteins that contain structural domains for which instances of sequence-specific DNA binding have been demonstrated in other proteins containing that domain. Additionally, literature evidence for DNA binding ability, though not determined if sequence specific, or directly contacting DNA, was also considered. Finally, the Unlikely category contains proteins with structural domains that have failed to produce sequence-specific DNA binding in vitro, or have ChiP motifs likely to be through indirect interactions with DNA, or completely lack literature evidence for sequence-specific DNA binding by direct contact with DNA.

Format: XLS Size: 83KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 13:

Data file S4. Collection of 4,160 previously published PWMs derived from S. cerevisiae TF-DNA binding and gene expression data.

Format: TXT Size: 3MB Download file

Open Data

Additional file 14:

Table S14. List of the 27 S. cerevisiae TFs that successfully yielded PBM data in this study. For each TF the table shows: (A) SGD ID; (B) common gene symbol; (C) Pfam DBD class (if known); (D) clone type (full-length ORF or DBD alone); (E) the Gateway entry clone used; (F) nucleotide sequence of cloned insert; (G) amino acid sequence of cloned insert; (H) the expected molecular weight (kDa) for the GST fusion protein expressed; (I) estimated concentration of protein used on PBM experiment, based on Western blot visual examination. All proteins were expressed by in vitro transcription and translation.

Format: XLS Size: 48KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data