Open Access Highly Accessed Research

Modeling precision treatment of breast cancer

Anneleen Daemen1132*, Obi L Griffith136*, Laura M Heiser14, Nicholas J Wang14, Oana M Enache1, Zachary Sanborn5, Francois Pepin114, Steffen Durinck1, James E Korkola14, Malachi Griffith6, Joe S Hur7, Nam Huh8, Jongsuk Chung8, Leslie Cope9, Mary Jo Fackler9, Christopher Umbricht9, Saraswati Sukumar9, Pankaj Seth10, Vikas P Sukhatme10, Lakshmi R Jakkula1, Yiling Lu11, Gordon B Mills11, Raymond J Cho12, Eric A Collisson12, Laura J van’t Veer2, Paul T Spellman13 and Joe W Gray14*

Author Affiliations

1 Department of Cancer & DNA Damage Responses, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

2 Laboratory Medicine, University of California San Francisco, San Francisco, CA 94115, USA

3 Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR 97239, USA

4 Department of Biomedical Engineering, Center for Spatial Systems Biomedicine, Knight Cancer Institute, Oregon Health and Science University, Portland, OR 97239, USA

5 Five3 genomics, 101 Cooper St, Santa Cruz, CA 95060, USA

6 The Genome Institute, Washington University School of Medicine, St Louis, MO 63105, USA

7 Samsung Electronics Headquarters, Seocho-gu, Seoul 137-857, Korea

8 Emerging Technology Research Center, Samsung Advanced Institute of Technology, Kyunggi-do 446-712, Korea

9 Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA

10 Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA

11 Department of Systems Biology, MD Anderson Cancer Center, Houston, TX 77030, USA

12 Department of Dermatology, University of California, San Francisco, CA 94115, USA

13 Present address: Department of Bioinformatics & Computational Biology, Genentech Inc, 1 DNA Way, South San Francisco, CA 94080, USA

14 Present address: Sequenta Inc, South San Francisco, CA 94080, USA

For all author emails, please log on.

Genome Biology 2013, 14:R110  doi:10.1186/gb-2013-14-10-r110

Published: 31 October 2013

Additional files

Additional file 1:

Supplementary Methods, Supplementary Results, Figures S1 to S10, and Tables S4, S6, S8, S9, S10, S12, and S13. Supplementary Methods: detailed description of the therapeutic compound response data, molecular data for the breast cancer cell lines, molecular data for the external breast cancer tumor samples used for validation, classification methods, data integration approach, statistical methods, and pathway overrepresentation analysis. Supplementary Results: assessment of cell line signal in tumor samples, inter-data relationships, prediction comparison of datasets, validation against other cell line datasets, and the patient response prediction toolbox for the R project for statistical computing. Table S4: overview of genes with good correlation (FDR P-value <0.05) between SNP6 and gene expression; 22 to 39% of genes in copy number aberration regions show a significant concordance between their genomic and transcriptomic profile after multiple testing correction. Table S6: data type ranking of the importance of the molecular datasets by comparison of prediction performance of LS-SVM and RF classifiers built on individual data sets and their combination, and by comparison of the average appearance of data types in the top 100 of ranked features, with and without inclusion of RPPA data. Examples are also provided of compounds for which (most) datasets give similar results or for which one dataset performs better (shown in bold). Table S8: performance for 'splice-specific' response predictors (RF) with an AUC increase >0.05 when comparing all transcript features to gene-level values alone. Table S9: statistical association between clinical variables and predicted response for 306 TCGA patients with expression, methylation and copy number data available. For each compound, the best performing model was utilized (LS-SVM or RF with any combination of expression, copy number and methylation data). Table S10: resistant/intermediate/sensitive cutoffs for 22 compounds with model AUC >0.7 and at least one patient with probability of response >0.65. Cutoff value 1 separates patients considered resistant from intermediate. Cutoff value 2 separates patients considered intermediate from sensitive. The percentage value for each group indicates the percentage of total patients (n = 306) in each group. Table S12: presence and variance of filtered features from U133A and exon array cell line data in tumor samples. Features from U133A and the exon array that passed the variance and presence filter in the cell lines were present in the majority of breast cancer tumor samples. Table S13: summary of 167 predictors in random forests classifier for lapatinib (all data types, optimal predictor number). Figure S1: data summary in terms of number of features before and after data-type-specific reduction and unsupervised filtering based on variance and signal detection above background. Figure S2: overview of the mutation prevalence in the cell line panel and TCGA data set for the list of seven common coding variants detected by TCGA, with a distinction between luminal, basal and ERBB2-enriched. Cell lines with unknown subtype are displayed in orange. To make the subtypes comparable, luminal A and B were grouped into luminal for the TCGA data set, whilst basal and claudin-low cell lines were grouped into basal. The mutation rate in TCGA and the cell line panel shows a similar distribution across the subtypes. Figure S3: comparison of the best LS-SVM and RF models for the 90 compounds, sorted according to highest AUC obtained with either model. Figure S4: validation of the cell line signature for vorinostat in tumor samples grown in three dimensions: heatmap of the 150-gene signature for vorinostat in the cell line panel and 13 tumor samples treated with valproic acid. Seven out of eight sensitive samples (87.5%) and four out of five resistant samples (80%) are classified correctly with a probability threshold of 0.5 for response dichotomization. Figure S5: predicted probability of response of TCGA tumor samples to compounds lapatinib, sigma AKT1-2 inhibitor, GSK2126458 and docetaxel. The TCGA tumor samples are ordered according to increasing probability of response. Figure S6: correlation-based coherence heatmap for two cell line-derived gene signatures: coherence among 67 genes of the U133A signature for the sigma AKT1-2 inhibitor in the cell lines (left) and TCGA tumor samples (right) (Jaccard coefficient = 0.85; P-value <0.0001); coherence among 109 genes of the RNAseq signature for everolimus in the cell lines (left) and TCGA tumor samples (right) (Jaccard coefficient = 0.79; P-value <0.0001). Figure S7: comparison of the best model per dataset for the 90 compounds, sorted according to highest AUC obtained with either model (LS-SVM or RF). For RNAseq and exon array, the highest AUC is shown among models built on gene-level data only or all features (exons, junctions, and so on). Figure S8: distributions of response probabilities for 5-FU determined by mixed model clustering and used for cutoff selection. With a cutoff of 0.74, 23.9% of TCGA tumor samples were predicted to respond to 5-FU (Table S10 in Additional file 3). Figure S9: association between response to lapatinib and ERBB2 status, response to BIBW2992 and ERBB2 status, and response to tamoxifen and ER status for 306 TCGA patients with expression, methylation and copy number data available. Figure S10: heatmap of the 167 highest ranked features for lapatinib, obtained with RF applied to the full set of molecular data.

Format: XLSX Size: 104KB Download file

Open Data

Additional file 2: Table S1:

Overview of 84 cell lines with subtype information and available data. GI50 values for 90 therapeutic compounds are provided for 70/84 cell lines included in all analyses.

Format: XLSX Size: 95KB Download file

Open Data

Additional file 3: Table S2:

Processed Reverse Protein Lysate Array (RPPA) intensity data for 70 (phospho)proteins with fully validated antibodies in 49 cell lines. See Supplementary Methods in Additional file 3 for data processing details.

Format: DOCX Size: 2.6MB Download file

Open Data

Additional file 4: Table S3:

GI50 dichotomization threshold for each compound, defined as the mean GI50 for the 48 core cell lines.

Format: XLSX Size: 46KB Download file

Open Data

Additional file 5: Table S5:

Overview of the best LS-SVM/RF model for all 90 therapeutic compounds with comparison to the LS-SVM AUC based on subtype and ERBB2 status. For the subset of 51 therapeutic compounds with test AUC exceeding 0.7, additional information is provided on clinical trial status, comparison of GI50 with TGI, validation results of the cell line signal in the TCGA tumor samples, and most significant non-subtype related KEGG/BioCarta pathways from Additional file 6.

Format: XLSX Size: 49KB Download file

Open Data

Additional file 6: Table S7:

List of significant non-subtype specific GO categories and KEGG/BioCarta pathways with FDR P-value <0.05. Per category/pathway information includes FDR P-value and the number of signature genes, percentage of signature genes and list of signature genes that are part of this category/pathway. Significant pathways associated with both drug response and transcriptional subtype were excluded, to capture biology underlying each compound’s mechanism of action.

Format: XLSX Size: 368KB Download file

Open Data

Additional file 7: Table S11:

Compound response signatures for the 22 compounds featured in Figure 5 with model AUC >0.7 and at least one patient from the TCGA set of 306 tumor samples with expression, copy number and methylation data available with probability of response >0.65.

Format: XLSX Size: 2MB Download file

Open Data

Additional file 8: Table S14:

Validation results for six drugs (BIBW2992, lapatinib, rapamycin, GSK2126458, gefitinib and GSK2141795) in 11 HER2+ lines.

Format: XLSX Size: 60KB Download file

Open Data

Additional file 9:

Raw drug response data. Raw drug response data used to compute GI50 values used in this study. The columns represent the following: cellline = cell line lineage; compound = compound tested; drug_plate_id = unique identifier for the plate of 3 compounds; T0_plate_id = unique identifier for the time 0 h control plate associated with the drug plate; background_od1, background_od2 = background od values (for correction of background luminesence); od0.1, od0.2, od0.3 = triplicate measures for untreated cells; od1.1, od1.2, od1.3… od9.1, od9.2, od9.3 = triplicate measures of number of cells alive after treatment with lowest to highest drug; T0_background_od1, T0_background_od2 = background od values (for correction of background luminesence); T0_median_od = median od at T0; c1 to c9 = drug concentrations tested; units = units of drug concentration tested.

Format: TXT Size: 2.5MB Download file

Open Data