Open Access Highly Accessed Research

MicroRNA expression profiling of human breast cancer identifies new markers of tumor subtype

Cherie Blenkiron1234, Leonard D Goldstein125, Natalie P Thorne125, Inmaculada Spiteri12, Suet-Feung Chin12, Mark J Dunning12, Nuno L Barbosa-Morais12, Andrew E Teschendorff12, Andrew R Green6, Ian O Ellis6, Simon Tavaré125, Carlos Caldas12* and Eric A Miska34*

Author Affiliations

1 Cancer Research UK, Cambridge Research Institute, Li Ka-Shing Centre, Robinson Way, Cambridge CB2 0RE, UK

2 Department of Oncology, University of Cambridge, Hills Road, Cambridge CB2 2XZ, UK

3 Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, The Henry Wellcome Building of Cancer and Developmental Biology, Tennis Court Rd, Cambridge CB2 1QN, UK

4 Department of Biochemistry, University of Cambridge, Tennis Court Rd, Cambridge CB2 1GA, UK

5 Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA, UK

6 Department of Histopathology, School of Molecular Medical Sciences, University of Nottingham, Nottingham NG5 1PB, UK

For all author emails, please log on.

Genome Biology 2007, 8:R214  doi:10.1186/gb-2007-8-10-r214

Published: 8 October 2007

Additional files

Additional data file 1:

This file contains a detailed description of the computational analyis.

Format: PDF Size: 222KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 2:

A. Data matrix of miRNA expression values (schematic). The 333 rows and 168 columns correspond to probes and samples respectively. Expression values for each sample were obtained from hybridizations to four distinct bead sets (with approximately 90 probes each), carried out in separate wells of 96-well plates. Hybridizations were performed on eight plates, using two plates for each bead set. The allocation of samples between the two plates for a given bead set was kept consistent for all four bead sets. Thus both probes and samples could be ordered according to the plate of origin, partitioning the data matrix into eight blocks corresponding to measurements from distinct plates. Expression values for a representative well on plate 1 for beadset 1 are indicated in grey. B. Heatmap of unnormalized log2 MFI values for all miRNA probes and all samples. Probes were median centred prior to plotting. C. Heatmap of differences between the probe median for the randomized samples on a given plate and the probe median for all samples on both plates.

Format: PDF Size: 4.2MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 3:

A. Histograms of log2 MFI values obtained from wells containing sample material (white) and blank control wells (blue). B. The number of detected probes after filtering was plotted against a range of cutoff values. Probes were removed (filtered) if they did not exceed the chosen cutoff (red) in at least one sample. C, D. Sample quality control. Pearson correlation coefficients for technical replicate samples were plotted against the smaller of the two sample means for (C) cell line technical replicate samples and (D) normal and tumor technical replicate samples. The cutoff used for quality control is indicated by a vertical line. Colours corresponding to sample status are explained in the colour key. E. Technical sample effects. Pairwise differences between the medians of technical replicate samples were plotted for unnormalized data (black), data normalized based on spike-in controls (blue) and data normalized by sample median centering (red). Dashed lines indicate the maximum difference between the medians of any two samples for unnormalized data (black) and for data normalized using spike-in controls (blue).

Format: PDF Size: 362KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 4:

Between-sample normalization. A. Shown are data normalized based on spike-in controls for the same miRNAs and factors as in Figure 3 in the main text. B. miRNAs and factors with at least one association at adjusted p < 0.01 based on data normalized using spike-in controls. All miRNAs thus identified were also identified after sample median centering with the exception of miR-152, which was found to be associated with all three factors at p < 0.05 (Additional data file 18). Heatmap colours reflect relative miRNA expression. The expression values for a given sample group of interest were summarized by their mean. Brackets in the left margin indicate members of the same miRNA family. Significance levels are indicated in the right margins: * adjusted p < 0.05, ** adjusted p < 0.01, *** adjusted p < 0.001. Abbreviations for subtype: B = Basal-like, H = HER2+, LA = Luminal A, LB = Luminal B, N = Normal-like.

Format: PDF Size: 38KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 5:

Pairwise scatter plots of replicate probes after sample quality control, probe filtering and within-plate probe correction (none of the replicated probes were removed due to probe filtering). Scatter plots for one failed probe (miR-224-4) are marked in red.

Format: PDF Size: 508KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 6:

Pairwise scatter plots of technical replicate samples after sample quality control, probe filtering, within-plate probe correction and summarizing replicate probes.

Format: PDF Size: 271KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 7:

Normalized log2 MFI values were plotted against log2-transformed and median-corrected measurements obtained by qRT-PCR

Format: PDF Size: 31KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 8:

Expression values are based on Illumina data when available, and Agilent data otherwise. The two data sets were normalized as described. Missing values in the Agilent data are indicated in white. Samples were ordered according to molecular subtype (see colour key). The heatmap does not present a hierarchical clustering but merely illustrates differences in gene expression. A. Luminal/ER+ gene cluster. B. ERBB2 and GRB7-containing cluster. C. Interferon-regulated cluster including STAT1. D. Basal epithelial cluster. E. Proliferation cluster.

Format: PDF Size: 291KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 9:

Pairwise comparison of Kaplan-Meier survival curves for 74 classified samples with available follow up data (21 Basal-like, 7 HER2+, 25 Luminal A, 10 Luminal B, 11 Normal-like). A non-parametric log rank test was used to assess differences in clinical outcome.

Format: PDF Size: 37KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 10:

Shown are boxplots of log2 expression for DGCR8, DICER1, DROSHA (RNASEN), AGO1 (EIF2C1), AGO2 (EIF2C2), AGO3 (EIF2C3) and AGO4 (EIF2C4). The data were obtained for 58 samples classified according to subtype (17 Basal-like, 5 HER2+, 18 Luminal A, 8 Luminal B, 10 Normal-like) and 99 samples with known ER status (31 ER-, 68 ER+). We only included Illumina probes not mapping to introns and which could be detected at log2 expression 6 in at least one sample. Differential expression was assessed using a non-parametric Kruskal-Wallis test (subtype) and Wilcoxon rank sum test (ER status).

Format: PDF Size: 194KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 11:

Shown are boxplots of normalized gene expression units for each candidate gene that showed differential expression (Student's t-test p < 0.001). Data were obtained from the cancer microarray database ONCOMINE [98], and differential expression was assessed using Student's t-test. Each row of plots corresponds to a unique gene; data obtained from different studies are separated by a solid vertical black line. For each data set the number of ER negative (blue) and ER positive (yellow) samples is included in the lower figure margin. The first authors of the relevant publications are included in the plot title.

Format: PDF Size: 32KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 12:

Pearson correlation coefficients for mature miRNAs mapping to the same chromosome and strand were plotted against decreasing ranks of pairwise distances. Diamonds represent a moving average over five correlation coefficients. The absolute distance is plotted in blue and indicated on the right y-axis. Distance 50 kb is indicated by a vertical red line.

Format: PDF Size: 58KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 13:

Heatmap of Pearson correlation coefficients (accounting for DNA copy number changes as described) between miRNA probes and selected Illumina probes on the same chromosome and strand. Blank entries are due to missing DNA copy number information. Probes are arranged in genomic order. Black boxes indicate clusters of adjacent probes less than 50 kb apart. Green boxes indicate clusters of probes mapping to the same host gene. Mature miRNAs included in multiple stem-loops are indicated in blue. Relative genomic probe positions are marked as white bars on the chromosomal plot below each heatmap.

Format: PDF Size: 757KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 14:

SSP molecular subtype classification based on the Affymetrix gene expression data for normal breast and breast tumor samples in [56,85]. Spearman correlations with the five subtype centroids are shown for all 14 samples. The solid horizontal black line indicates the minimum correlation required for subtype assignment. If the minimal correlation with a subtype centroid was achieved, the classification was made using the centroid with highest Spearman correlation. B. Shown are class posterior probabilities for 16 Basal-like and 15 Luminal A tumors in the training set (using all detected 138 miRNAs); and three Basal-like and two Luminal A tumors in the test set (using the 77 detected miRNAs in common with the training set). Red and blue indicate the posterior probability of belonging to the Basal-like and Luminal A subtype respectively. Plotting characters indicate the gene expression based subtype classification with squares and triangles representing Basal-like and Luminal A samples respectively. Samples were assigned to the class with posterior probability greater than 0.5 (solid horizontal black line).

Format: PDF Size: 13KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional data file 15:

Intrinsic genes, probe sets (single sample predictor)

Format: TXT Size: 33KB Download file

Open Data

Additional data file 16:

Spatial miRNA clusters

Format: TXT Size: 11KB Download file

Open Data

Additional data file 17:

Host gene coordinates of intragenic miRNAs

Format: TXT Size: 8KB Download file

Open Data

Additional data file 18:

Associations between individual miRNAs, molecular tumor subtypes and clinicopathological factors

Format: TXT Size: 93KB Download file

Open Data

Additional data file 19:

Intrinsic genes, probe sets (model-based discriminant analysis)

Format: TXT Size: 72KB Download file

Open Data